Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions
Hector Garcia Molina: Data Warehousing and OLAP 2 Warehouse Models & Operators Data Models u relations u stars & snowflakes u Cubes Operators u slice & dice u roll-up, drill down u pivoting u other
Hector Garcia Molina: Data Warehousing and OLAP 3 Star
Hector Garcia Molina: Data Warehousing and OLAP 4 Star Schema sale orderId date custId prodId storeId qty amt
Hector Garcia Molina: Data Warehousing and OLAP 5 Terms l Fact table l Dimension tables l Measures
Hector Garcia Molina: Data Warehousing and OLAP 6 Dimension Hierarchies store sType cityregion snowflake schema constellations
Hector Garcia Molina: Data Warehousing and OLAP 7 Cube Fact table view: Multi-dimensional cube: dimensions = 2
Hector Garcia Molina: Data Warehousing and OLAP 8 3-D Cube day 2 day 1 dimensions = 3 Multi-dimensional cube:Fact table view:
Hector Garcia Molina: Data Warehousing and OLAP 9 ROLAP vs. MOLAP l ROLAP: Relational On-Line Analytical Processing l MOLAP: Multi-Dimensional On-Line Analytical Processing
Hector Garcia Molina: Data Warehousing and OLAP 10 Aggregates Add up amounts for day 1 In SQL: SELECT sum(amt) FROM SALE WHERE date = 1 81
Hector Garcia Molina: Data Warehousing and OLAP 11 Aggregates Add up amounts by day In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date
Hector Garcia Molina: Data Warehousing and OLAP 12 Another Example Add up amounts by day, product In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date, prodId drill-down rollup
Hector Garcia Molina: Data Warehousing and OLAP 13 Aggregates l Operators: sum, count, max, min, median, ave l “Having” clause l Using dimension hierarchy u average by region (within store) u maximum by month (within date)
Hector Garcia Molina: Data Warehousing and OLAP 14 Cube Aggregation day 2 day drill-down rollup Example: computing sums
Hector Garcia Molina: Data Warehousing and OLAP 15 Cube Operators day 2 day sale(c1,*,*) sale(*,*,*) sale(c2,p2,*)
Hector Garcia Molina: Data Warehousing and OLAP 16 Extended Cube day 2 day 1 * sale(*,p2,*)
Hector Garcia Molina: Data Warehousing and OLAP 17 Aggregation Using Hierarchies day 2 day 1 customer region country (customer c1 in Region A; customers c2, c3 in Region B)
Hector Garcia Molina: Data Warehousing and OLAP 18 Pivoting day 2 day 1 Multi-dimensional cube: Fact table view:
Hector Garcia Molina: Data Warehousing and OLAP 19 Integration l Data Cleaning l Data Loading l Derived Data Client Warehouse Source Query & Analysis Integration Metadata
Hector Garcia Molina: Data Warehousing and OLAP 20 Data Cleaning Migration (e.g., yen dollars) l Scrubbing: use domain-specific knowledge (e.g., social security numbers) l Fusion (e.g., mail list, customer merging) l Auditing: discover rules & relationships (like data mining) billing DB service DB customer1(Joe) customer2(Joe) merged_customer(Joe)
Hector Garcia Molina: Data Warehousing and OLAP 21 Loading Data l Incremental vs. refresh l Off-line vs. on-line l Frequency of loading u At night, 1x a week/month, continuously l Parallel/Partitioned load
Hector Garcia Molina: Data Warehousing and OLAP 22 Derived Data l Derived Warehouse Data u indexes u aggregates u materialized views (next slide) l When to update derived data? l Incremental vs. refresh
Hector Garcia Molina: Data Warehousing and OLAP 23 Materialized Views l Define new warehouse relations using SQL expressions does not exist at any source
Hector Garcia Molina: Data Warehousing and OLAP 24 Processing l ROLAP servers vs. MOLAP servers l Index Structures l What to Materialize? l Algorithms Client Warehouse Source Query & Analysis Integration Metadata
Hector Garcia Molina: Data Warehousing and OLAP 25 ROLAP Server l Relational OLAP Server relational DBMS ROLAP server tools utilities Special indices, tuning; Schema is “denormalized”
Hector Garcia Molina: Data Warehousing and OLAP 26 MOLAP Server l Multi-Dimensional OLAP Server multi- dimensional server M.D. tools utilities could also sit on relational DBMS Product City Date milk soda eggs soap A B Sales
Hector Garcia Molina: Data Warehousing and OLAP 27 Join “Combine” SALE, PRODUCT relations In SQL: SELECT * FROM SALE, PRODUCT
Hector Garcia Molina: Data Warehousing and OLAP 28 Join Indexes join index
Hector Garcia Molina: Data Warehousing and OLAP 29 What to Materialize? l Store in warehouse results useful for common queries l Example: day 2 day total sales materialize
Hector Garcia Molina: Data Warehousing and OLAP 30 Cube Aggregates Lattice city, product, date city, productcity, dateproduct, date cityproductdate all day 2 day use greedy algorithm to decide what to materialize
Hector Garcia Molina: Data Warehousing and OLAP 31 Dimension Hierarchies all state city
Hector Garcia Molina: Data Warehousing and OLAP 32 Dimension Hierarchies city, product city, product, date city, date product, date city product date all state, product, date state, date state, product state not all arcs shown...
Hector Garcia Molina: Data Warehousing and OLAP 33 Interesting Hierarchy all years quarters months days weeks conceptual dimension table