Tribhuvan University
Institute of Science and Technology
Bachelor of Science in Computer Science and Information Technology
Course Title: Data Warehousing and Data Mining
Course no.: CSC-451 Full Marks: 60+20+20
Credit Hours: 3 Pass Marks: 24+8+8
Nature of Course: Theory (3 Hrs.) + Lab (3 Hrs.)
Course Synopsis: Analysis of advanced aspect of data warehousing and data mining.
Goal: This course introduces advanced aspects of data warehousing and data mining, encompassing the principles, research results and commercial application of the current technologies.
Course Contents:
Unit 1: (5 Hrs.)
Concepts of Data Warehouse and Data Mining including its functionalities, stages of Knowledge discovery in database (KDD), Setting up a KDD environment, Issues in Data Warehouse and Data Mining, Application of Data Warehouse and Data Mining
Unit 2: (4 Hrs.)
DBMS vs. Data Warehouse, Data marts, Metadata, Multidimensional data model, Data Cubes, Schemes for Multidimensional Database: Stars, Snowflakes and Fact Constellations.
Unit 3: (6 Hrs.)
Data Warehouse Architecture, Distributed and Virtual Data Warehouse, Data Warehouse Manager, OLTP, OAP, MOLAP, HOLAP, types of OLAP, Servers.
Unit 4: (4 Hrs.)
Computation of Data Cubes, modeling: OLAP data, OLAP queries, Data Warehouse back end tools, tuning and testing of Data Warehouse.
Unit 5: (4 Hrs.)
Data Mining definition and Task, KDD versus Data Mining, Data Mining techniques, tools and application.
Unit 6: (5 Hrs.)
Data mining query languages, data specification, specifying knowledge, hierarchy specification, pattern presentation & visualization specification, data languages and standardization of data mining.
Unit 7: (6 Hrs.)
Mining Association Rules in Large Database: Association Rule Mining, why Association Mining is necessary, Pros and Cons of Association Rules, Apriori Algorithm.
Unit 8: (7 Hrs.)
Classification and Prediction: Issues Regarding Classification and Prediction, Classification by Decision Tree Induction, Introduction to Regression, Types of Regression, Introduction to Clustering, K-mean and K-Mediod Algorithms.
Unit 9: (4 Hrs.)
Mining complex Types of Data: Mining Text Databases, Mining the World Wide Web, Mining Multimedia and Spatial Databases.
Laboratory Works: Cover all the concept of data warehouse and mining mentioned in a course.
Samples
- Creating a simple data warehouse
- OLAP operations: Roll Up, Drill Down, Slice, Dice through SQL Server
- Concepts of data cleaning and preparing for operation
- Association rule mining through data mining tools
- Data Classification through data mining tools
- Clustering through data mining tools
- Data visualization through data mining tools
Reference Books:
- Data Mining Concepts and Techniques, Morgan Kaufmann J. Han, M. Kamber Second Edition ISBN: 987-1-55860-901-3
- Data Warehousing in the Real Worlds – Sam Anahory and Dennis Murray, Pearson Edition Asia.
- Data Mining Techniques – Arun K. Pajari, University Press.
- Data Mining – Pieter Adriaans, DolfZantinge.
- Data Mining, Alex Berson; Stephen Smith, KorthTheorling, TMH.
- Data Mining, Adriaans, Addison – Wesley Longman.