Introduction to Data Warehousing CPS 196.03 Notes 6.

Slides:



Advertisements
Similar presentations
Data Warehousing.
Advertisements

Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #15.
ICS 421 Spring 2010 Data Warehousing 2 Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/30/20101Lipyeow.
The Role of Data Warehousing and OLAP Technologies CS 536 – Data Mining These slides are adapted from J. Han and M. Kamber’s book slides (
1 CIS *717.2 Data Warehouse Design Week 2 Dimensional Modeling Primer Data Warehouse Models OLAP Operations Instructor Carmela R. Balassiano Feb 5, Spring.
Data Warehousing Overview
Data Warehousing & OLAP
Data Warehouses and OLAP
Data Warehousing Xintao Wu. Evolution of Database Technology (See Fig. 1.1) 1960s: Data collection, database creation, IMS and network DBMS 1970s: Relational.
Data Warehousing.
1 1 Data Warehousing Decision-Support Systems  Data Analysis  OLAP  Extended aggregation features in SQL –Windowing and ranking  Implementation Techniques.
Data Warehousing and OLAP
Dr. M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2010 COMP207: Data Mining Data Warehousing COMP207: Data Mining.
1 Lecture 10: More OLAP - Dimensional modeling
Business Systems Intelligence: 3. Data Warehousing Dr. Brian Mac Namee (
Lab3 CPIT 440 Data Mining and Warehouse.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Ch3 Data Warehouse part2 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
1 Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously.  A decision support database that is maintained.
Chapter 4 Tutorial.
Tanvi Madgavkar CSE 7330 FALL Ralph Kimball states that : A data warehouse is a copy of transaction data specifically structured for query and analysis.
CS346: Advanced Databases
1 Data Warehouses C hapter 2. 2 Chapter 2 Outline Chapter 2 Outline – Introduction –Data Warehouses –Data Warehouse in Organisation – OLTP vs. OLAP –Why.
Dr. Bernard Chen Ph.D. University of Central Arkansas
8/20/ Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously. Defined in many different ways, but.
8/25/2015Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 3 — Jiawei Han Department of Computer Science University.
August 25, 2015Data Mining: Concepts and Techniques 1 Data Warehousing What is a data warehouse? A multi-dimensional data model Data warehouse architecture.
Data Warehousing and Decision Support courtesy of Jiawei Han, Larry Kerschberg, and etc. for some slides. Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
An overview of Data Warehousing and OLAP Technology
School of Management, HUST
Introduction to Data Mining and Data Warehousing Muhammad Ali Yousuf DSC – ITM Friday, 9 th May 2003.
Data Warehouses and OLAP — Slides for Textbook — — Chapter 2 —
1 Data Warehousing and Decision Support. 2 Data Warehousing and OLAP Technology What is a data warehouse? A multi-dimensional data model Data warehouse.
Data Warehousing Xintao Wu. Can You Easily Answer These Questions? What are Personnel Services costs across all departments for all funding sources? What.
1 Fall 2004, CIS, Temple University CIS527: Data Warehousing, Filtering, and Mining Lecture 2 Data Warehousing and OLAP Technology for Data Mining Lecture.
Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator –
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
Data Warehousing and OLAP. Warehousing ► Growing industry: $8 billion in 1998 ► Range from desktop to huge:  Walmart: 900-CPU, 2,700 disk, 23TB Teradata.
Dr. N. MamoulisAdvanced Database Technologies1 Topic 6: Data Warehousing & OLAP Defined in many different ways, but not rigorously. A decision support.
Data Warehousing Overview CS245 Notes 11 Hector Garcia-Molina Stanford University CS Notes11.
Data Mining Data Warehouses.
Data warehousing, data analysis and OLAP Sunita Sarawagi
Managing Data for DSS II. Managing Data for DS Data Warehouse Common characteristics : –Database designed to meet analytical tasks comprising of data.
January 21, 2016Data Mining: Concepts and Techniques 1 Chapter 3: Data Warehousing and OLAP Technology: An Overview What is a data warehouse? A multi-dimensional.
مثال ‌ هایی از شماهای پایگاه داده تحلیلی سید حسن فیروزآبادی.
Datawarehousing and OLAP C.Eng 714 Spring
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
Data Warehouses and OLAP. Data Warehousing and OLAP Technology for Data Mining  What is a data warehouse?  A multi-dimensional data model  Data warehouse.
1 Advanced Database Systems: DBS CB, 2 nd Edition Data Warehouse, OLAP, Data Mining Ch , Ch. 22.
CSE6011 Implementing a Warehouse  Monitoring: Sending data from sources  Integrating: Loading, cleansing,...  Processing: Query processing, indexing,...
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Advanced Database Systems: DBS CB, 2nd Edition
Introduction to Data Warehousing
Information Management course
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Data warehouse and OLAP
A multi-dimensional data model
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 4 —
Information Management course
OLAP Concepts and Techniques
Data Warehousing and OLAP Technology for Data Mining
Chapter 2: Data Warehousing and OLAP Technology for Data Mining
Overview of Data Warehousing and OLAP
Data Warehousing and OLAP
Data Warehousing and Decision Support Chapter 25
Data Mining: Concepts and Techniques
Presentation transcript:

Introduction to Data Warehousing CPS Notes 6

2 Warehousing l Growing industry: $30+ billion industry l Range from desktop to huge: u Walmart: 900-CPU, 2,700 disk, 23TB Teradata system (numbers from earlier part of this decade) l Lots of buzzwords, hype u slice & dice, rollup, MOLAP, pivot,...

3 Outline l What is a data warehouse? l Why a warehouse? l Models & operations l Implementing a warehouse

4 What is a Warehouse? l Collection of diverse data u subject oriented u aimed at executive, decision maker u often a copy of operational data u with value-added data (e.g., summaries, history) u integrated u time-varying u non-volatile more

5 What is a Warehouse? l Collection of tools u gathering data u cleansing, integrating,... u querying, reporting, analysis u data mining u monitoring, administering warehouse

6 Warehouse Architecture Client Warehouse Source Query & Analysis Integration Metadata

7 Motivating Examples l Forecasting l Comparing performance of units l Monitoring, detecting fraud l Visualization

8 Why a Warehouse? l Two Approaches: u Query-Driven (Lazy) u Warehouse (Eager) Source ?

9 Query-Driven Approach Client Wrapper Mediator Source

10 Advantages of Warehousing l High query performance l Queries not visible outside warehouse l Local processing at sources unaffected l Can operate when sources unavailable l Can query data not stored in a DBMS l Extra information at warehouse u Modify, summarize (store aggregates) u Add historical information

11 Advantages of Query-Driven l No need to copy data u less storage u no need to purchase data l More up-to-date data l Query needs can be unknown l Only query interface needed at sources l May be less draining on sources

12 OLTP vs. OLAP l OLTP: On Line Transaction Processing u Describes processing at operational sites l OLAP: On Line Analytical Processing u Describes processing at warehouse

13 OLTP vs. OLAP l Mostly updates l Many small transactions l Mb-Gb of data l Raw data l Clerical users l Up-to-date data l Consistency, recoverability critical l Mostly reads l Queries long, complex l Tb-Pb of data l Summarized, consolidated data l Decision-makers, analysts as users OLTP OLAP

14 Data Marts l Smaller warehouses l Spans part of organization u e.g., marketing (customers, products, sales) l Do not require enterprise-wide consensus u but long term integration problems?

15 Warehouse Models & Operators l Data Models u relations u stars & snowflakes u cubes l Operators u slice & dice u roll-up, drill down u pivoting u other

16 Warehouse Models l Modeling data warehouses: dimensions, measures u Star schema: A fact table in the middle connected to a set of dimension tables u Snowflake schema: A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake u Fact constellations: Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation

17 Star Measures

18 Star Schema

19 Another Example of Star Schema time_key day day_of_the_week month quarter year time location_key street city state_or_province country location Sales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measures item_key item_name brand type supplier_type item branch_key branch_name branch_type branch

20 Terms l Fact table l Dimension tables l Measures

21 Dimension Hierarchies store sType cityregion  snowflake schema  constellations

22 Example of Snowflake Schema time_key day day_of_the_week month quarter year time location_key street city_key location Sales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measures item_key item_name brand type supplier_key item branch_key branch_name branch_type branch supplier_key supplier_type supplier city_key city state_or_province country city

23 Example of Fact Constellation time_key day day_of_the_week month quarter year time location_key street city province_or_state country location Sales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measures item_key item_name brand type supplier_type item branch_key branch_name branch_type branch Shipping Fact Table time_key item_key shipper_key from_location to_location dollars_cost units_shipped shipper_key shipper_name location_key shipper_type shipper

24 Cube Fact table view: Multi-dimensional cube: dimensions = 2 Recall counters in Apriori

25 3-D Cube day 2 day 1 dimensions = 3 Multi-dimensional cube:Fact table view:

26 Aggregates Add up amounts for day 1 In SQL: SELECT sum(amt) FROM SALE WHERE date = 1 81

27 Aggregates Add up amounts by day In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date

28 Another Example Add up amounts by day, product In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date, prodId drill-down rollup

29 Aggregates l Operators: sum, count, max, min, median, ave l “Having” clause l Using dimension hierarchy u average by region (within store) u maximum by month (within date)

30 Types of Measures in Data Cubes l Distributive: if the result derived by applying the function to n aggregate values is the same as that derived by applying the function on all the data without partitioning è E.g., count(), sum(), min(), max() l Algebraic: if it can be computed by an algebraic function with M arguments (where M is a bounded integer), each of which is obtained by applying a distributive aggregate function è E.g., avg(), min_N(), standard_deviation() l Holistic: if there is no constant bound on the storage size needed to describe a subaggregate. è E.g., median(), mode(), rank()

31 Cube Aggregation day 2 day drill-down rollup Example: computing sums

32 Cube Operators day 2 day sale(c1,*,*) sale(*,*,*) sale(c2,p2,*)

33 Extended Cube day 2 day 1 * sale(*,p2,*)

34 Cube Aggregates Lattice city, product, date city, productcity, dateproduct, date cityproductdate all day 2 day 1 129

35 Dimension Hierarchies all state city

36 Dimension Hierarchies city, product city, product, date city, date product, date city product date all state, product, date state, date state, product state not all arcs shown...

37 Interesting Hierarchy all years quarters months days weeks conceptual dimension table

38 Aggregation Using Hierarchies day 2 day 1 customer region country (customer c1 in Region A; customers c2, c3 in Region B)

39 Multidimensional Data l Sales volume as a function of product, month, and region Product Region Month Dimensions: Product, Location, Time Hierarchical summarization paths Industry Region Year Category Country Quarter Product City Month Week Office Day

40 Typical OLAP Operations Total annual sales of TV in U.S.A. Date Product Country sum TV VCR PC 1Qtr 2Qtr 3Qtr 4Qtr U.S.A Canada Mexico sum

41 Typical OLAP Operations l Roll up (drill-up): summarize data u by climbing up hierarchy or by dimension reduction l Drill down (roll down): reverse of roll-up u from higher level summary to lower level summary or detailed data, or introducing new dimensions l Slice and dice: project and select l Pivot (rotate): u reorient the cube, visualization, 3D to series of 2D planes l Other operations u drill across: involving (across) more than one fact table u drill through: through the bottom level of the cube to its back-end relational tables (using SQL)

42 Fig Typical OLAP Operations

43 Pivoting day 2 day 1 Multi-dimensional cube: Fact table view: Pivot turns unique values from one column into unique columns in the output