Download presentation
Presentation is loading. Please wait.
Published byDerek Powell Modified over 9 years ago
1
Data Warehousing CPS216 Notes 13 Shivnath Babu
2
2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart: 900-CPU, 2,700 disk, 23TB Teradata system l Lots of buzzwords, hype u slice & dice, rollup, MOLAP, pivot,...
3
3 Outline l What is a data warehouse? l Why a warehouse? l Models & operations l Implementing a warehouse l Future directions
4
4 What is a Warehouse? l Collection of diverse data u subject oriented u aimed at executive, decision maker u often a copy of operational data u with value-added data (e.g., summaries, history) u integrated u time-varying u non-volatile more
5
5 What is a Warehouse? l Collection of tools u gathering data u cleansing, integrating,... u querying, reporting, analysis u data mining u monitoring, administering warehouse
6
6 Warehouse Architecture Client Warehouse Source Query & Analysis Integration Metadata
7
7 Motivating Examples l Forecasting l Comparing performance of units l Monitoring, detecting fraud l Visualization
8
8 Why a Warehouse? l Two Approaches: u Query-Driven (Lazy) u Warehouse (Eager) Source ?
9
9 Query-Driven Approach Client Wrapper Mediator Source
10
10 Advantages of Warehousing l High query performance l Queries not visible outside warehouse l Local processing at sources unaffected l Can operate when sources unavailable l Can query data not stored in a DBMS l Extra information at warehouse u Modify, summarize (store aggregates) u Add historical information
11
11 Advantages of Query-Driven l No need to copy data u less storage u no need to purchase data l More up-to-date data l Query needs can be unknown l Only query interface needed at sources l May be less draining on sources
12
12 OLTP vs. OLAP l OLTP: On Line Transaction Processing u Describes processing at operational sites l OLAP: On Line Analytical Processing u Describes processing at warehouse
13
13 OLTP vs. OLAP l Mostly updates l Many small transactions l Mb-Gb of data l Raw data l Clerical users l Up-to-date data l Consistency, recoverability critical l Mostly reads l Queries long, complex l Gb-Tb of data l Summarized, consolidated data l Decision-makers, analysts as users OLTP OLAP
14
14 Data Marts l Smaller warehouses l Spans part of organization u e.g., marketing (customers, products, sales) l Do not require enterprise-wide consensus u but long term integration problems?
15
15 Warehouse Models & Operators l Data Models u relations u stars & snowflakes u cubes l Operators u slice & dice u roll-up, drill down u pivoting u other
16
16 Star
17
17 Star Schema
18
18 Terms l Fact table l Dimension tables l Measures
19
19 Dimension Hierarchies store sType cityregion snowflake schema constellations
20
20 Cube Fact table view: Multi-dimensional cube: dimensions = 2
21
21 3-D Cube day 2 day 1 dimensions = 3 Multi-dimensional cube:Fact table view:
22
22 ROLAP vs. MOLAP l ROLAP: Relational On-Line Analytical Processing l MOLAP: Multi-Dimensional On-Line Analytical Processing
23
23 Aggregates Add up amounts for day 1 In SQL: SELECT sum(amt) FROM SALE WHERE date = 1 81
24
24 Aggregates Add up amounts by day In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date
25
25 Another Example Add up amounts by day, product In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date, prodId drill-down rollup
26
26 Aggregates l Operators: sum, count, max, min, median, ave l “Having” clause l Using dimension hierarchy u average by region (within store) u maximum by month (within date)
27
27 Cube Aggregation day 2 day 1 129... drill-down rollup Example: computing sums
28
28 Cube Operators day 2 day 1 129... sale(c1,*,*) sale(*,*,*) sale(c2,p2,*)
29
29 Extended Cube day 2 day 1 * sale(*,p2,*)
30
30 Aggregation Using Hierarchies day 2 day 1 customer region country (customer c1 in Region A; customers c2, c3 in Region B)
31
31 Pivoting day 2 day 1 Multi-dimensional cube: Fact table view: Pivot turns unique values from one column into unique columns in the output
32
32 Derived Data l Derived Warehouse Data u indexes u aggregates u materialized views (next slide) l When to update derived data? l Incremental vs. refresh
33
33 Materialized Views l Define new warehouse relations using SQL expressions does not exist at any source
34
34 Processing l ROLAP servers vs. MOLAP servers l Index Structures l What to Materialize? l Algorithms Client Warehouse Source Query & Analysis Integration Metadata
35
35 ROLAP Server l Relational OLAP Server relational DBMS ROLAP server tools utilities Special indices, tuning; Schema is “denormalized”
36
36 MOLAP Server l Multi-Dimensional OLAP Server multi- dimensional server M.D. tools utilities could also sit on relational DBMS Product City Date 1 2 3 4 milk soda eggs soap A B Sales
37
37 Index Structures l Traditional Access Methods u B-trees, hash tables, R-trees, grids, … l Popular in Warehouses u inverted lists u bit map indexes u join indexes u text indexes
38
38 Inverted Lists... age index inverted lists data records
39
39 Using Inverted Lists l Query: u Get people with age = 20 and name = “fred” l List for age = 20: r4, r18, r34, r35 l List for name = “fred”: r18, r52 l Answer is intersection: r18
40
40 Bit Maps... age index bit maps data records
41
41 Using Bit Maps l Query: u Get people with age = 20 and name = “fred” l List for age = 20: 1101100000 l List for name = “fred”: 0100000001 l Answer is intersection: 010000000000 l Good if domain cardinality small l Bit vectors can be compressed
42
42 Join “Combine” SALE, PRODUCT relations In SQL: SELECT * FROM SALE, PRODUCT WHERE...
43
43 Join Indexes join index
44
44 What to Materialize? l Store in warehouse results useful for common queries l Example: day 2 day 1 129... total sales materialize
45
45 Materialization Factors l Type/frequency of queries l Query response time l Storage cost l Update cost
46
46 Cube Aggregates Lattice city, product, date city, productcity, dateproduct, date cityproductdate all day 2 day 1 129 use greedy algorithm to decide what to materialize
47
47 Dimension Hierarchies all state city
48
48 Dimension Hierarchies city, product city, product, date city, date product, date city product date all state, product, date state, date state, product state not all arcs shown...
49
49 Interesting Hierarchy all years quarters months days weeks conceptual dimension table
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.