Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator –

Slides:



Advertisements
Similar presentations
Data Warehousing Willem Visser RW334. Somebody is watching! Everybody seems to be recording your every move Loyalty cards Cookies – Facebook, Twitter,…
Advertisements

Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
Introduction to Data Warehousing CPS Notes 6.
Jennifer Widom On-Line Analytical Processing (OLAP) Introduction.
Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.
The Role of Data Warehousing and OLAP Technologies CS 536 – Data Mining These slides are adapted from J. Han and M. Kamber’s book slides (
Data Warehousing & OLAP
Data Warehouses and OLAP
Data Warehousing Xintao Wu. Evolution of Database Technology (See Fig. 1.1) 1960s: Data collection, database creation, IMS and network DBMS 1970s: Relational.
Data Warehousing.
Data Sources Data Warehouse Analysis Results Data visualisation Analytical tools OLAP Data Mining Overview of Business Intelligence Data visualisation.
Data Warehousing & OLAP. 2 What is Data Warehouse? “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data.
Dr. M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2010 COMP207: Data Mining Data Warehousing COMP207: Data Mining.
1 Lecture 10: More OLAP - Dimensional modeling
© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ Data Warehouse and Data Cube Lecture Notes for Chapter 3 Introduction to Data Mining By.
Business Systems Intelligence: 3. Data Warehousing Dr. Brian Mac Namee (
Lab3 CPIT 440 Data Mining and Warehouse.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
1 Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously.  A decision support database that is maintained.
1 Data Warehouses C hapter 2. 2 Chapter 2 Outline Chapter 2 Outline – Introduction –Data Warehouses –Data Warehouse in Organisation – OLTP vs. OLAP –Why.
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
OLAP OPERATIONS. OLAP ONLINE ANALYTICAL PROCESSING OLAP provides a user-friendly environment for Interactive data analysis. In the multidimensional model,
8/20/ Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously. Defined in many different ways, but.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
August 25, 2015Data Mining: Concepts and Techniques 1 Data Warehousing What is a data warehouse? A multi-dimensional data model Data warehouse architecture.
Data Warehousing and Decision Support courtesy of Jiawei Han, Larry Kerschberg, and etc. for some slides. Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
School of Management, HUST
Data warehouses and Online Analytical Processing.
An Introduction to Data Warehousing Yannis Kotidis AT&T Labs-Research.
Data Warehousing.
1 Cube Computation and Indexes for Data Warehouses CPS Notes 7.
Cube Intro. Decision Making Effective decision making Goal: Choice that moves an organization closer to an agreed-on set of goals in a timely manner Goal:
Presented By: Muhammad Rizvi Raghuram Vempali Surekha Vemuri.
Data Warehousing Xintao Wu. Can You Easily Answer These Questions? What are Personnel Services costs across all departments for all funding sources? What.
Data Warehouse & OLAP Kuliah 1 Introduction Slide banyak mengambil dari acuan- acuan yang dipakai.
T.ROKAYAH BAYAN OLAP IN THE DATA WAREHOUSE. CHAPTER OBJECTIVES  Review the major features and functions of OLAP in detail  Grasp the intricacies of.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
Dr. N. MamoulisAdvanced Database Technologies1 Topic 6: Data Warehousing & OLAP Defined in many different ways, but not rigorously. A decision support.
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
Shilpa Seth.  Multidimensional Data Model Concepts Multidimensional Data Model Concepts  Data Cube Data Cube  Data warehouse Schemas Data warehouse.
1 On-Line Analytic Processing Warehousing Data Cubes.
1 CSE 592 Data Mining Instructor: Pedro Domingos.
Data Mining Data Warehouses.
Data Warehousing & OLAP Yannis Kotidis AT&T Labs-Research.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 This is the full course notes, but not quite complete. You.
January 21, 2016Data Mining: Concepts and Techniques 1 Chapter 3: Data Warehousing and OLAP Technology: An Overview What is a data warehouse? A multi-dimensional.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
Datawarehousing and OLAP C.Eng 714 Spring
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
Data Warehouses and OLAP. Data Warehousing and OLAP Technology for Data Mining  What is a data warehouse?  A multi-dimensional data model  Data warehouse.
Data Analysis Decision Support Systems Data Analysis and OLAP Data Warehousing.
1 Chapter 4: Data Warehousing and On-line Analytical Processing Data Warehouse: Basic Concepts Data Warehouse Modeling: Data Cube and OLAP Data Warehouse.
Data Mining and Data Warehousing: Concepts and Techniques Conceptual Modeling of Data Warehouses Defining a Snowflake Schema in Data Mining Query Language.
Data Mining: Data Warehousing
Introduction to Data Warehousing
Data Mining: Concepts and Techniques — Chapter 3 —
Information Management course
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
A B D C G5b Date 1Qtr 2Qtr 3Qtr 4Qtr TV Product PC
On-Line Analytic Processing
Data warehouse and OLAP
Information Management course
OLAP Concepts and Techniques
Data Warehousing and OLAP Technology for Data Mining
Chapter 2: Data Warehousing and OLAP Technology for Data Mining
Lecture 4: From Data Cubes to ML
Fundamentals of Data Cube & OLAP Operations
Presentation transcript:

Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator – semantics and computation 5.Aggregate View Selection

Why not Using Existing DB? DBMS is for On Line Transaction Processing (OLTP) – automate day-to-day operations (purchasing, banking etc) Data Warehouse is for On Line Analytical Processing (OLAP) – need historical data for trend analysis

OLTP vs. OLAP

Examples of OLAP Comparisons (this period v.s. last period) –Show me the sales per store for this year and compare it to that of the previous year to identify discrepancies Ranking and statistical profiles (top N/bottom N) –Show me sales, profit and average call volume per day for my 10 most profitable salespeople Custom consolidation (market segments, ad hoc groups) –Show me an abbreviated income statement by quarter for the last four quarters for my northeast region operations

Multidimensional Modeling Example: compute total sales volume per product and store StoreProductTotal Sales Etc. Product Store 800

From Tables and Spreadsheets to Data Cubes In general multidimensional data model views data in the form of a data cube A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions – Dimension tables, such as item (item_name, brand, type), or time(day, week, month, quarter, year) – Fact table contains measures (such as dollars_sold) and keys to each of the related dimension tables In data warehousing literature, an n-D base cube is called a base cuboid. The top most 0-D cuboid, which holds the highest-level of summarization, is called the apex cuboid. The lattice of cuboids forms a data cube.

Cube: A Lattice of Cuboids all timeitemlocationsupplier time,itemtime,location time,supplier item,location item,supplier location,supplier time,item,location time,item,supplier time,location,supplier item,location,supplier time, item, location, supplier 0-D(apex) cuboid 1-D cuboids 2-D cuboids 3-D cuboids 4-D(base) cuboid

Dimensions and Hierarchies DIMENSIONS product city month category region year product country quarter state month week city day store PRODUCT LOCATION TIME Hyd DVD Augus t Sales of DVDs in Hyd in August A cell in the cube may store values (measurements) relative to the combination of the labeled dimensions

Common OLAP Operations Roll-up: move up the hierarchy – e.g given total sales per city, we can roll-up to get sales per state Drill-down: move down the hierarchy – more fine-grained aggregation category region year product country quarter state month week city day store PRODUCT LOCATION TIME

Pivoting Pivoting: aggregate on selected dimensions – usually 2 dims (cross-tabulation)

Slice and Dice Queries Slice and Dice: select and project on one or more dimensions product customers store customer = “Kalam”

Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator – semantics and computation 5.Aggregate View Selection

The Data Cube Operator (Gray et al) All previous aggregates in a single query: SELECT LOCATION.store, SALES.product_key, SUM (amount) FROM SALES, LOCATION WHERE SALES.location_key=LOCATION.location_key CUBE BY SALES.product_key, LOCATION.store OR CUBE product_key, store BY SUM(SALES.amount) Challenge: Optimize Aggregate Computation

Store Product_key sum(amout) ALL1379 2ALL1268 3ALL536 4ALL1937 ALL11870 ALL2800 ALL3780 ALL41670 ALLALL5120 Relational View of Data Cube SELECT LOCATION.store, SALES.product_key, SUM (amount) FROM SALES, LOCATION WHERE SALES.location_key=LOCATION.location_key CUBE BY SALES.product_key, LOCATION.store

Data Cube: Multidimensional View Total annual sales of DVDs in America Quarter Product Region sum DVD VCR PC 1Qtr 2Qtr 3Qtr 4Qtr America Europe Asia sum

Other Extensions to SQL Complex aggregation at multiple granularities (Ross et. all 1998) – Compute multiple dependent aggregates Other proposals: the MD-join operator (Chatziantoniou et. all 1999] SELECT LOCATION.store, SALES.product_key, SUM (amount) FROM SALES, LOCATION WHERE SALES.location_key=LOCATION.location_key CUBE BY SALES.product_key, LOCATION.store: R SUCH THAT R.amount = max(amount)