© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/2005 1 Data Warehouse and Data Cube Lecture Notes for Chapter 3 Introduction to Data Mining By.

Slides:



Advertisements
Similar presentations
COMP 5318 –Data Exploration and Analysis
Advertisements

© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan,
© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan,
5.1Database System Concepts - 6 th Edition Chapter 5: Advanced SQL Advanced Aggregation Features OLAP.
Data Warehouses and Data Cubes
Introduction to Data Warehousing CPS Notes 6.
1 Lecture 09: OLAP
The Role of Data Warehousing and OLAP Technologies CS 536 – Data Mining These slides are adapted from J. Han and M. Kamber’s book slides (
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Cluster Analysis: Advanced Concepts and Algorithms Figures for Chapter 9 Introduction.
Data Warehousing & OLAP
Data Warehouses and OLAP
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Cluster Analysis: Basic Concepts and Algorithms Figures for Chapter 8 Introduction.
Dr. M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2010 COMP207: Data Mining Data Warehousing COMP207: Data Mining.
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Classification: Alternative Techniques Figures for Chapter 5 Introduction to.
1 Lecture 10: More OLAP - Dimensional modeling
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining: Exploring Data Figures for Chapter 3 Introduction to Data Mining by Tan, Steinbach,
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Association Analysis: Advanced Concepts Figures for Chapter 7 Introduction to.
1 Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously.  A decision support database that is maintained.
OLAP OPERATIONS. OLAP ONLINE ANALYTICAL PROCESSING OLAP provides a user-friendly environment for Interactive data analysis. In the multidimensional model,
1 Basic concepts of On-Line Analytical processing DT211 /4.
8/20/ Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously. Defined in many different ways, but.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
8/25/2015Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 3 — Jiawei Han Department of Computer Science University.
Data Warehousing and Decision Support courtesy of Jiawei Han, Larry Kerschberg, and etc. for some slides. Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Minqi.
Multi-Dimensional Databases & Online Analytical Processing This presentation uses some materials from: “ An Introduction to Multidimensional Database Technology,
1 Cube Computation and Indexes for Data Warehouses CPS Notes 7.
Data Warehousing Xintao Wu. Can You Easily Answer These Questions? What are Personnel Services costs across all departments for all funding sources? What.
Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator –
T.ROKAYAH BAYAN OLAP IN THE DATA WAREHOUSE. CHAPTER OBJECTIVES  Review the major features and functions of OLAP in detail  Grasp the intricacies of.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
Dr. N. MamoulisAdvanced Database Technologies1 Topic 6: Data Warehousing & OLAP Defined in many different ways, but not rigorously. A decision support.
Data dan Eksplorasi Data Kuliah 2 dan 3 1. What is Data?  Collection of data objects and their attributes  An attribute is a property or characteristic.
© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan,
Shilpa Seth.  Multidimensional Data Model Concepts Multidimensional Data Model Concepts  Data Cube Data Cube  Data warehouse Schemas Data warehouse.
Data Mining Data Warehouses.
© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan,
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 This is the full course notes, but not quite complete. You.
2016年1月21日星期四 2016年1月21日星期四 2016年1月21日星期四 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 3 — Jiawei Han Department.
January 21, 2016Data Mining: Concepts and Techniques 1 Chapter 3: Data Warehousing and OLAP Technology: An Overview What is a data warehouse? A multi-dimensional.
Data Mining: Exploring Data
Data Warehouse [ Example ] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001, ISBN Data Mining: Concepts and.
Chapter 3 Exploring Data.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
Datawarehousing and OLAP C.Eng 714 Spring
Data Warehouses and OLAP. Data Warehousing and OLAP Technology for Data Mining  What is a data warehouse?  A multi-dimensional data model  Data warehouse.
© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Minqi.
Data Mining and Data Warehousing: Concepts and Techniques Conceptual Modeling of Data Warehouses Defining a Snowflake Schema in Data Mining Query Language.
Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar But we start with a brief discussion of the.
Data Mining: Data Warehousing
Introduction to Data Warehousing
Data Analysis and OLAP Dr. Ms. Pratibha S. Yalagi Topic Title
Data Mining: Exploring Data
Information Management course
Data Mining: EXPLORING DATA
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5 —
A B D C G5b Date 1Qtr 2Qtr 3Qtr 4Qtr TV Product PC
Data Mining: Exploring Data
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 4 —
Information Management course
OLAP Concepts and Techniques
Data Warehousing and OLAP Technology for Data Mining
Chapter 2: Data Warehousing and OLAP Technology for Data Mining
Data Mining: Exploring Data
Lecture 4: From Data Cubes to ML
Data Mining: Exploring Data
Fundamentals of Data Cube & OLAP Operations
Data Mining: Exploring Data
Presentation transcript:

© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ Data Warehouse and Data Cube Lecture Notes for Chapter 3 Introduction to Data Mining By Tan, Steinbach, Kumar And Data Mining, by Han and Kamber, 2 nd Edition Revised by QY

© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ OLAP l On-Line Analytical Processing (OLAP) was proposed by E. F. Codd, the father of the relational database. l Relational databases put data into tables, while OLAP uses a multidimensional array representation. –Such representations of data previously existed in statistics and other fields l There are a number of data analysis and data exploration operations that are easier with such a data representation.

© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ Creating a Multidimensional Array l Two key steps in converting tabular data into a multidimensional array. –First, identify which attributes are to be the dimensions and which attribute is to be the target attribute whose values appear as entries in the multidimensional array.  The attributes used as dimensions must have discrete values  The target value is typically a count or continuous value, e.g., the cost of an item  Can have no target variable at all except the count of objects that have the same set of attribute values –Second, find the value of each entry in the multidimensional array by summing the values (of the target attribute) or count of all objects that have the attribute values corresponding to that entry.

© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ Example: Iris data l We show how the attributes, petal length, petal width, and species type can be converted to a multidimensional array: from iris data –First, we discretized the petal width and length to have categorical values: low, medium, and high

© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ Example: Iris data (continued) l Each unique tuple of petal width, petal length, and species type identifies one element of the array. l This element is assigned the corresponding count value. l The figure illustrates the result. l All non-specified tuples are 0. Length

© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ OLAP Operations: Data Cube l The key operation of a OLAP is the formation of a data cube –A data cube is a multidimensional representation of data, together with all possible aggregates. –Aggregates: similar to class attribute  result by selecting a proper subset of the dimensions and summing over all remaining dimensions.  Cached to improve speed and support online computation –For example,  if we choose the species type dimension of the Iris data and –sum over all other dimensions, –the result will be a one-dimensional entry with three entries, –each of which gives the number of flowers of each type.

© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ June 29, 2015Data Mining: Concepts and Techniques 7 From Tables and Spreadsheets to Data Cubes l A data warehouse is based on a multidimensional data model which views data in the form of a data cube l A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions –Dimension tables, such as item (item_name, brand, type), or time(day, week, month, quarter, year) –Fact table contains measures (such as dollars_sold) and keys to each of the related dimension tables

© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ June 29, 2015Data Mining: Concepts and Techniques 8 Cube: A Lattice of Cuboids time,item time,item,location time, item, location, supplier all timeitemlocationsupplier time,location time,supplier item,location item,supplier location,supplier time,item,supplier time,location,supplier item,location,supplier 0-D(apex) cuboid 1-D cuboids 2-D cuboids 3-D cuboids 4-D(base) cuboid

© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ June 29, 2015Data Mining: Concepts and Techniques 9 A Concept Hierarchy: Dimension (location) all EuropeNorth_America MexicoCanadaSpainGermany Vancouver M. WindL. Chan... all region office country TorontoFrankfurtcity

© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ June 29, 2015Data Mining: Concepts and Techniques 10 A Sample Data Cube Total annual sales of TV in U.S.A. Date Product Country sum TV VCR PC 1Qtr 2Qtr 3Qtr 4Qtr U.S.A Canada Mexico sum

© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ June 29, 2015Data Mining: Concepts and Techniques 11 Cuboids Corresponding to the Cube all product date country product,dateproduct,countrydate, country product, date, country 0-D(apex) cuboid 1-D cuboids 2-D cuboids 3-D(base) cuboid

© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ l The following figure table shows one of the two dimensional aggregates, along with two of the one-dimensional aggregates, and the overall total Data Cube Example (continued)

© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ OLAP Operations: Slicing and Dicing l Slicing is selecting a group of cells from the entire multidimensional array by specifying a specific value for one or more dimensions. l Dicing involves selecting a subset of cells by specifying a range of attribute values. –This is equivalent to defining a subarray from the complete array. l In practice, both operations can also be accompanied by aggregation over some dimensions.

© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ OLAP Operations: Roll-up and Drill-down l This hierarchical structure gives rise to the roll-up and drill-down operations. –For sales data, we can aggregate (roll up) the sales across all the dates in a month. –Conversely, given a view of the data where the time dimension is broken into months, we could split the monthly sales totals (drill down) into daily sales totals. –Likewise, we can drill down or roll up on the location or product ID attributes.