Presentation is loading. Please wait.

Presentation is loading. Please wait.

Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator –

Similar presentations


Presentation on theme: "Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator –"— Presentation transcript:

1 Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator – semantics and computation 5.Aggregate View Selection

2 Why not Using Existing DB? DBMS is for On Line Transaction Processing (OLTP) – automate day-to-day operations (purchasing, banking etc) Data Warehouse is for On Line Analytical Processing (OLAP) – need historical data for trend analysis

3 OLTP vs. OLAP

4 Examples of OLAP Comparisons (this period v.s. last period) –Show me the sales per store for this year and compare it to that of the previous year to identify discrepancies Ranking and statistical profiles (top N/bottom N) –Show me sales, profit and average call volume per day for my 10 most profitable salespeople Custom consolidation (market segments, ad hoc groups) –Show me an abbreviated income statement by quarter for the last four quarters for my northeast region operations

5 Multidimensional Modeling Example: compute total sales volume per product and store StoreProductTotal Sales 11454 14925 21468 22800 Etc. Product Store 800

6 From Tables and Spreadsheets to Data Cubes In general multidimensional data model views data in the form of a data cube A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions – Dimension tables, such as item (item_name, brand, type), or time(day, week, month, quarter, year) – Fact table contains measures (such as dollars_sold) and keys to each of the related dimension tables In data warehousing literature, an n-D base cube is called a base cuboid. The top most 0-D cuboid, which holds the highest-level of summarization, is called the apex cuboid. The lattice of cuboids forms a data cube.

7 Cube: A Lattice of Cuboids all timeitemlocationsupplier time,itemtime,location time,supplier item,location item,supplier location,supplier time,item,location time,item,supplier time,location,supplier item,location,supplier time, item, location, supplier 0-D(apex) cuboid 1-D cuboids 2-D cuboids 3-D cuboids 4-D(base) cuboid

8 Dimensions and Hierarchies DIMENSIONS product city month category region year product country quarter state month week city day store PRODUCT LOCATION TIME Hyd DVD Augus t Sales of DVDs in Hyd in August A cell in the cube may store values (measurements) relative to the combination of the labeled dimensions

9 Common OLAP Operations Roll-up: move up the hierarchy – e.g given total sales per city, we can roll-up to get sales per state Drill-down: move down the hierarchy – more fine-grained aggregation category region year product country quarter state month week city day store PRODUCT LOCATION TIME

10 Pivoting Pivoting: aggregate on selected dimensions – usually 2 dims (cross-tabulation)

11 Slice and Dice Queries Slice and Dice: select and project on one or more dimensions product customers store customer = “Kalam”

12 Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator – semantics and computation 5.Aggregate View Selection

13 The Data Cube Operator (Gray et al) All previous aggregates in a single query: SELECT LOCATION.store, SALES.product_key, SUM (amount) FROM SALES, LOCATION WHERE SALES.location_key=LOCATION.location_key CUBE BY SALES.product_key, LOCATION.store OR CUBE product_key, store BY SUM(SALES.amount) Challenge: Optimize Aggregate Computation

14 Store Product_key sum(amout) 1 1454 1 4 925 2 1 468 22800 31296 33240 41625 43240 44745 1ALL1379 2ALL1268 3ALL536 4ALL1937 ALL11870 ALL2800 ALL3780 ALL41670 ALLALL5120 Relational View of Data Cube SELECT LOCATION.store, SALES.product_key, SUM (amount) FROM SALES, LOCATION WHERE SALES.location_key=LOCATION.location_key CUBE BY SALES.product_key, LOCATION.store

15 Data Cube: Multidimensional View Total annual sales of DVDs in America Quarter Product Region sum DVD VCR PC 1Qtr 2Qtr 3Qtr 4Qtr America Europe Asia sum

16 Other Extensions to SQL Complex aggregation at multiple granularities (Ross et. all 1998) – Compute multiple dependent aggregates Other proposals: the MD-join operator (Chatziantoniou et. all 1999] SELECT LOCATION.store, SALES.product_key, SUM (amount) FROM SALES, LOCATION WHERE SALES.location_key=LOCATION.location_key CUBE BY SALES.product_key, LOCATION.store: R SUCH THAT R.amount = max(amount)


Download ppt "Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator –"

Similar presentations


Ads by Google