Data Mining and Data Warehousing: Concepts and Techniques Conceptual Modeling of Data Warehouses Defining a Snowflake Schema in Data Mining Query Language.

Slides:



Advertisements
Similar presentations
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Advertisements

April 30, Data Warehousing and OLAP Technology: An Overview  What is a data warehouse?  Data warehouse architecture  From data warehousing to.
1 Copyright Jiawei Han. Modified by Charles Ling for CS411a/538a, UWO, Data Mining and Data Warehousing v Introduction v Data warehousing and OLAP.
Data Warehousing.
Data Warehousing Willem Visser RW334. Somebody is watching! Everybody seems to be recording your every move Loyalty cards Cookies – Facebook, Twitter,…
Introduction to Data Warehousing CPS Notes 6.
The Role of Data Warehousing and OLAP Technologies CS 536 – Data Mining These slides are adapted from J. Han and M. Kamber’s book slides (
Data Warehouses and OLAP
Data Warehousing Xintao Wu. Evolution of Database Technology (See Fig. 1.1) 1960s: Data collection, database creation, IMS and network DBMS 1970s: Relational.
Data Warehousing.
Dr. M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2010 COMP207: Data Mining Data Warehousing COMP207: Data Mining.
© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ Data Warehouse and Data Cube Lecture Notes for Chapter 3 Introduction to Data Mining By.
Business Systems Intelligence: 3. Data Warehousing Dr. Brian Mac Namee (
Ch3 Data Warehouse part2 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
1 Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously.  A decision support database that is maintained.
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
1 Data Warehouses C hapter 2. 2 Chapter 2 Outline Chapter 2 Outline – Introduction –Data Warehouses –Data Warehouse in Organisation – OLTP vs. OLAP –Why.
August 14, 2015Data Mining: Concepts and Techniques 1 Chapter 3: Data Warehousing and OLAP Technology: An Overview What is a data warehouse? Data warehouse.
Dr. Bernard Chen Ph.D. University of Central Arkansas
8/20/ Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously. Defined in many different ways, but.
CIS664-Knowledge Discovery and Data Mining
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
8/25/2015Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 3 — Jiawei Han Department of Computer Science University.
August 25, 2015Data Mining: Concepts and Techniques 1 Data Warehousing What is a data warehouse? A multi-dimensional data model Data warehouse architecture.
Data Warehousing and Decision Support courtesy of Jiawei Han, Larry Kerschberg, and etc. for some slides. Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
School of Management, HUST
Data warehouses and Online Analytical Processing.
Introduction to Data Mining and Data Warehousing Muhammad Ali Yousuf DSC – ITM Friday, 9 th May 2003.
Data Warehousing Xintao Wu. Can You Easily Answer These Questions? What are Personnel Services costs across all departments for all funding sources? What.
1 Fall 2004, CIS, Temple University CIS527: Data Warehousing, Filtering, and Mining Lecture 2 Data Warehousing and OLAP Technology for Data Mining Lecture.
Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator –
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
Knowledge Discovery and Data Mining Vasileios Megalooikonomou Dept. of Computer and Information Sciences Temple University Data Warehousing and OLAP Technology.
Dr. N. MamoulisAdvanced Database Technologies1 Topic 6: Data Warehousing & OLAP Defined in many different ways, but not rigorously. A decision support.
1 CSE 592 Data Mining Instructor: Pedro Domingos.
Data Mining Data Warehouses.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 This is the full course notes, but not quite complete. You.
2016年1月21日星期四 2016年1月21日星期四 2016年1月21日星期四 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 3 — Jiawei Han Department.
January 21, 2016Data Mining: Concepts and Techniques 1 Chapter 3: Data Warehousing and OLAP Technology: An Overview What is a data warehouse? A multi-dimensional.
Datawarehousing and OLAP C.Eng 714 Spring
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
June 12, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 3 — Jiawei Han Department of Computer Science University.
Data Warehouses and OLAP. Data Warehousing and OLAP Technology for Data Mining  What is a data warehouse?  A multi-dimensional data model  Data warehouse.
Data Mining and Data Warehousing: Concepts and Techniques Motivation Evolution of Database Technology - overview Why Data Mining? — Potential Applications.
1 Chapter 4: Data Warehousing and On-line Analytical Processing Data Warehouse: Basic Concepts Data Warehouse Modeling: Data Cube and OLAP Data Warehouse.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
1 Meng Jiang  Postdoctoral Researcher in UIUC since Aug 2015 (Office 2130)  Ph. D. from Tsinghua University, Beijing ( )  Research topic: Data-driven.
Chapter 4. Data Warehousing and On-line Analytical Processing (OLAP)
Data Mining: Data Warehousing
Introduction to Data Warehousing
Data Mining: Concepts and Techniques — Chapter 3 —
Data Mining: Concepts and Techniques
Information Management course
A multi-dimensional data model
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 4 —
Information Management course
OLAP Concepts and Techniques
Data Warehouse.
Data Mining Data Warehousing
Data Warehousing and OLAP Technology for Data Mining
Chapter 2: Data Warehousing and OLAP Technology for Data Mining
Data Warehouse and OLAP
Lecture 4: From Data Cubes to ML
Overview of Data Warehousing and OLAP
Data Warehousing and Decision Support Chapter 25
Data Mining: Concepts and Techniques
Data Warehouse.
Data Warehouse and OLAP
Presented by: Tek Narayan Adhikari
CIS671-Knowledge Discovery and Data Mining
Presentation transcript:

Data Mining and Data Warehousing: Concepts and Techniques Conceptual Modeling of Data Warehouses Defining a Snowflake Schema in Data Mining Query Language DMQL Course outlines

2 Conceptual Modeling of Data Warehouses Modeling data warehouses: dimensions & measurements Star schema: A single object (fact table) in the middle connected to a number of objects (dimension tables) Snowflake schema: A refinement of star schema where the dimensional hierarchy is represented explicitly by normalizing the dimension tables. Fact constellations: Multiple fact tables share dimension tables. A multi-dimensional data model

3 Example of Star Schema Product Day Month Year Date CustId CustName CustCity CustCountry Cust Sales Fact Table Date Store Customer unit_sales dollar_sales Yen_sales ProductNo ProdName ProdDesc Category QOH Product StoreID City State Country Region Store Conceptual Modeling of Data Warehouses Data Cleanin g Data Integration Databases Selection Data Mining Pattern Evaluation Data Warehouse Task- relevant Data Knowledge Measurements

4 Example of Snowflake Schema Day Month Date CustId CustName CustCity CustCountry Cust Measurements ProductNo ProdName ProdDesc Category QOH Product Month Year Month Year City State City Country Region Country State Country State StoreID City Store Data Cleanin g Data Integration Databases Selection Data Mining Pattern Evaluation Data Warehouse Task- relevant Data Knowledge Product Sales Fact Table Date Store Customer unit_sales dollar_sales Yen_sales Conceptual Modeling of Data Warehouses

5 Example of Fact Constellation time_key day day_of_the_week month quarter year time location_key street city province_or_street country location Sales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measures item_key item_name brand type supplier_type item branch_key branch_name branch_type branch Shipping Fact Table time_key item_key shipper_key from_location to_location dollars_cost units_shipped shipper_key shipper_name location_key shipper_type shipper

6 A Data Mining Query Language: DMQL Language Primitives Cube Definition ( Fact Table ) define cube [ ]: Dimension Definition ( Dimension Table ) define dimension as ( ) Special Case (Shared Dimension Tables)  First time as “cube definition”  define dimension as in cube Data Cleanin g Data Integration Databases Selection Data Mining Pattern Evaluation Data Warehouse Task- relevant Data Knowledge Conceptual Modeling of Data Warehouses

7 Defining a Snowflake Schema in DMQL define cube sales [date, product, store, customer]: dollar_sales = sum(sales_in_dollars), yen_sales = sum(sales_in_yens), unit_sales = count(*) define dimension product as (product_no, prod_name,prod_desc, category, QOH) define dimension cust as (custID, cust_name, cust_city, cust_country) define dimension date as (day, month (month_key, year (year_key) ) ) define dimension store as ( storeID, city ( city_key, state( state_key, country(country_key, region) ) ) ) Conceptual Modeling of Data Warehouses

8 A concept hierarchy: dimension (location) all EuropeNorth_America MexicoCanadaSpainGermany Vancouver M. YoungL. Chan... all region office country TorontoFrankfurtcity

9 View of Warehouses and Hierarchies  Importing data  Table Browsing  Dimension creation  Dimension browsing  Cube building  Cube browsing

10 Multidimensional Data Sales volume as a function of product, month, and region Product Region Month Dimensions: Product, Location, Time Hierarchical summarization paths Industry Region Year Category Country Quarter Product City Month Week Office Day

11 From Tables and Spreadsheets to Data Cubes  A data warehouse is based on a multidimensional data model which views data in the form of a data cube  A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions  Dimension tables, such as item (item_name, brand, type), or time(day, week, month, quarter, year)  Fact table contains measures (such as dollars_sold) and keys to each of the related dimension tables  In data warehousing literature, an n-D base cube is called a base cuboid. The top most 0-D cuboid, which holds the highest-level of summarization, is called the apex cuboid. The lattice of cuboids forms a data cube.

12 Cube: A Lattice of Cuboids all timeitemlocationsupplier time,itemtime,location time,supplier item,location item,supplier location,supplier time,item,location time,item,supplier time,location,supplier item,location,supplier time, item, location, supplier 0-D(apex) cuboid 1-D cuboids 2-D cuboids 3-D cuboids 4-D(base) cuboid

13 A Sample Data Cube Total annual sales of TV in U.S.A. Date Product Country sum TV VCR PC 1Qtr 2Qtr 3Qtr 4Qtr U.S.A Canada Mexico sum A multi-dimensional data model

14 Cuboids Corresponding to the Cube all product date country product,dateproduct,countrydate, country product, date, country 0-D(apex) cuboid 1-D cuboids 2-D cuboids 3-D(base) cuboid

15 Browsing a Data Cube Visualization OLAP capabilities Interactive manipulation

16 Typical OLAP Operations  Roll up (drill-up): summarize data; by climbing up hierarchy or by dimension reduction  Drill down (roll down): reverse of roll-up; from higher level summary to lower level summary or detailed data, or introducing new dimensions  Slice and dice: project and select  Pivot (rotate): reorient the cube, visualization, 3D to series of 2D planes.  Other operations  drill across: involving (across) more than one fact table.  drill through: through the bottom level to its back-end relational tables. Data warehousing

17 OLAP Operations Single CellMultiple CellsSliceDice Roll Up Drill Down

Fruits623 Viande648 1S052S051S062S061S07 Fruits Viande Fruits Viande Pomme ………… Boeuf Alim Roll up Drill down Dimension Produit Dimension Temps Drill down Roll up Drill-up, drill-down Roll down

19 A Star-Net Query Model Shipping Method AIR-EXPRESS TRUCK ORDER Customer Orders CONTRACTS Customer Product PRODUCT GROUP PRODUCT LINE PRODUCT ITEM SALES PERSON DISTRICT DIVISION OrganizationPromotion CITY COUNTRY REGION Location DAILYQTRLYANNUALY Time

Design of a Data Warehouse

21 Design of a Data Warehouse: A Business Analysis Framework Four different views regarding the design of a data warehouse must be considered 1.Top-down view: Allows selection of the relevant information necessary for the data warehouse. 2.Data source view: exposes the information being captured, stored, and managed by operational systems 3.Data warehouse view: fact tables and dimension tables 4.Business query view: perspectives of data in the warehouse from the view of end-user.

22 The Process of Data Warehouse Design  Top-down, bottom-up approaches or a combination of both  Top-down: Starts with overall design and planning (mature)  Bottom-up: Starts with experiments and prototypes (rapid)  From software engineering point of view  Waterfall: structured and systematic analysis at each step before proceeding to the next.  Spiral: rapid generation of increasingly functional systems, short turn around time, quick turn around.  Typical data warehouse design process:  Choose a business process to model, e.g., orders, invoices, etc.  Choose the grain ( atomic level of data ) of the business process  Choose the dimensions that will apply to each fact table record  Choose the measure that will populate each fact table record.

23 OLAP Engine Data Sources Front-End Tools Data Storage Extract Transform Load ETL Refresh Data Warehouse Analysis Query Reports Data mining Monitor & Integrator Metadata Serve Data Marts Operational DBs Other sources OLAP Server Data warehouse software architecture Multi-Tiered Architecture

24 Three-Tier Data Warehouse Architecture  Enterprise warehouse: collects all of the information about subjects spanning the entire organization.  Data Mart: a subset of corporate-wide data that is of value to a specific groups of users.  Its scope is confined to specific, selected groups, such as marketing data mart.  Independent vs. dependent (directly from warehouse) data mart  Virtual warehouse:  A set of views over operational databases.  Only some of the possible summary views may be materialized. Extract Transform Load ETL Refresh Data Warehouse Analysis Query Reports Data mining Monitor & Integrator Metadata Serv e Data Marts Operational DBs Other sourc es OLAP Server Data warehouse software architecture

25 Data Warehouse Development: A Recommended Approach Define a high-level corporate data model Data Mart Distributed Data Marts Multi-Tier Data Warehouse Enterprise Data Warehouse Model refinement

26 Approaches to Building Warehouses … OLAP Server Architectures Relational OLAP (ROLAP):  relational database system tuned for star schemas, e.g., using special index structures such as:  “Bitmap indexes” (for each key of a dimension table, e.g., bar name, a bit-vector telling which tuples of the fact table have that value).  Materialized views = answers to general queries from which more specific queries can be answered with less work than if we had to work from the raw data.  Use relational or extended-relational DBMS to store and manage warehouse data and OLAP middle ware to support missing pieces.  Include optimization of DBMS backend, implementation of aggregation navigation logic, and additional tools and services Multidimensional OLAP (MOLAP) - A specialized model based on a “cube” view of data.  Array-based multidimensional storage engine (sparse matrix techniques)  fast indexing to pre-computed summarized data Hybrid OLAP (HOLAP) - User flexibility, e.g., low level: relational, high-level: array. Specialized SQL servers - specialized support for SQL queries over star, snowflake schemas Building Data Warehouses

27 Metadata Repository  Description the structure of the warehouse - schema, view, dimensions, hierarchies, derived data defn, data mart locations and contents  Operational meta-data - data lineage (history of migrated data and transformation path), currency of data (active, archived, or purged), monitoring information (warehouse usage statistics, error reports, audit trails)  The algorithms used for summarization  The mapping from operational environment to the data warehouse  Data related to system performance - warehouse schema, view and derived data definitions  Business data - business terms and definitions, ownership of data, charging policies Meta data are the data defining warehouse objects Building Data Warehouses

28 Data Warehouse Back-End Tools and Utilities Data Extraction: get data from multiple, heterogeneous, and external sources Data cleaning: detect errors in the data and rectify them when possible Data Transformation: convert data from legacy or host format to warehouse format Load: sort, summarize, consolidate, compute views, check integrity, and build indices and partitions Refresh: propagate the updates from the data sources to the warehouse Extract Transform Load ETL Refresh Data Warehouse Analysis Query Reports Data mining Monitor & Integrator Metadata Serv e Data Marts Operational DBs Other sourc es OLAP Server Building Data Warehouses

29 Data Warehouse Usage Three kinds of data warehouse applications Information processing - supports querying, basic statistical analysis, and reporting using crosstabs, tables, charts and graphs Analytical processing  multidimensional analysis of data warehouse data  supports basic OLAP operations, slice-dice, drilling, pivoting Data mining  knowledge discovery from hidden patterns  supports associations, constructing analytical models, performing classification and prediction, and presenting the mining results using visualization tools. From data warehousing to data mining

30 From On-Line Analytical Processing OLAP to On Line Analytical Mining OLAM  Why online analytical mining? High quality of data in data warehouses  DW contains integrated, consistent, cleaned data  Available information processing structure surrounding data warehouses  ODBC, OLEDB, Web accessing, service facilities, reporting and OLAP tools  OLAP-based exploratory data analysis  mining with drilling, dicing, pivoting, etc.  On-line selection of data mining functions  integration and swapping of multiple mining functions, algorithms, and tasks.  Architecture of OLAM From data warehousing to data mining

31 An OLAM Architecture Layer4 User Interface Meta Data MDDB OLAM Engine OLAP Engine User GUI API Data Cube API Database API Data cleaning Data integration Layer3 OLAP/OLAM Layer2 MDDB Layer1 Data Repository Filtering & Integration Filtering Mining query Mining result On Line Analytical Mining From data warehousing to data mining Databases Data Warehouse