Data Warehouse and OLAP Data Mining: Concepts and Techniques by J. Han and M. Kamber 8/7/2019 CSE591: Data Mining by H. Liu
What is a data warehouse? A repository of information collected from multiple sources, stored under a unified schema at a single site Characteristics: Subject-oriented Integrated Time-variant Nonvolatile A semantically consistent data store for decision support at enterprise level 8/7/2019 CSE591: Data Mining by H. Liu
Data warehousing and a multidimensional data model DWing - the process of constructing and using DW. OLTP and OLAP: user & system orientation, data contents, database design, view, access patterns (Table 2.1) A DM is usually modeled by a multidimensional database structure - a data cube An example of a data cube of Student Information dimensions: nationality, level, status, … 8/7/2019 CSE591: Data Mining by H. Liu
Schemas for multidimensional databases OLTP and OLAP on the same databases(?) achieving high performance of both systems Dimensions are the perspectives or entities Facts are numeric measures Dimension table and fact table DW schemas Star Snowflakes Fact constellations 8/7/2019 CSE591: Data Mining by H. Liu
Examples for defining schemas define cube, define dimension …as…in Star Fig. 2.4 Example 2.4 Snowflake Fig. 2.5 Example 2.5 Fact constellation Fig. 2.6 Example 2.6 8/7/2019 CSE591: Data Mining by H. Liu
CSE591: Data Mining by H. Liu OLAP operations Concept hierarchy one for each dimension Operations (Fig. 2.10) Dice Slice Pivot Roll-up Drill-down 8/7/2019 CSE591: Data Mining by H. Liu
Data Warehouse Architecture A 3-tier data warehouse architecture Front-end tools OLAP server Data warehouse server Three models: Enterprise warehouse Data mart Virtual warehouse 8/7/2019 CSE591: Data Mining by H. Liu
CSE591: Data Mining by H. Liu Metadata Repository When used in DW, metadata are the data that define warehouse objects: a directory to help the decision support system analyst locate the contents of the DW a guide to the mapping of data from the source to the DW a guide to the algorithms used for summarization A metadata repository contains the DW structure, operational metadata, algorithms used, mapping, data related to system performance, business metadata 8/7/2019 CSE591: Data Mining by H. Liu
CSE591: Data Mining by H. Liu From DW to DM DW back-end tools and utilites data extraction, cleaning, transformation, load, refresh The uses of DW generate reports and answer predefined queries analyze summarized and detailed data performing multidimensional analysis knowledge discovery and strategic decision making using data mining Information, analytical processing and DM 8/7/2019 CSE591: Data Mining by H. Liu