Download presentation
Presentation is loading. Please wait.
Published byVivien Martin Modified over 6 years ago
1
Data Warehousing CIS 4301 Lecture Notes 4/20/2006
2
Emerging Database Technologies
Management of complex data Multimedia databases GIS Databases Genome Data Management Management of semistructured data XML Databases Sensor Networks - Management of Data Streams Approximate query processing Data Warehousing and OLAP Data Mining © COP Spring 2006
3
Warehousing Growing industry: $8 billion in 1998
Range from desktop to huge: Walmart: 900-CPU, 2,700 disk, 23TB Teradata system Lots of buzzwords, hype slice & dice, rollup, MOLAP, pivot, ... © COP Spring 2006
4
Outline What is a data warehouse? Why a warehouse? Models & operations
Implementing a warehouse Future directions © COP Spring 2006
5
What is a Warehouse? Collection of diverse data subject oriented
aimed at executive, decision maker often a copy of operational data with value-added data (e.g., summaries, history) integrated time-varying non-volatile more © COP Spring 2006
6
What is a Warehouse? Collection of tools gathering data
cleansing, integrating, ... querying, reporting, analysis data mining monitoring, administering warehouse © COP Spring 2006
7
Warehouse Architecture
Client Client Query & Analysis Warehouse Metadata Integration Source Source Source © COP Spring 2006
8
Motivating Examples Forecasting Comparing performance of units
Monitoring, detecting fraud Visualization © COP Spring 2006
9
Why a Warehouse? ? Two Approaches: Query-Driven (Lazy)
Warehouse (Eager) ? Source Source © COP Spring 2006
10
Query-Driven Approach
Client Wrapper Mediator Source © COP Spring 2006 7
11
Advantages of Warehousing
High query performance Queries not visible outside warehouse Local processing at sources unaffected Can operate when sources unavailable Can query data not stored in a DBMS Extra information at warehouse Modify, summarize (store aggregates) Add historical information © COP Spring 2006
12
Advantages of Query-Driven
No need to copy data less storage no need to purchase data More up-to-date data Query needs can be unknown Only query interface needed at sources May be less draining on sources © COP Spring 2006
13
OLTP vs. OLAP OLTP: On Line Transaction Processing
Describes processing at operational sites OLAP: On Line Analytical Processing Describes processing at warehouse © COP Spring 2006
14
OLTP vs. OLAP OLTP OLAP Mostly reads Mostly updates
Many small transactions Mb-Tb of data Raw data Clerical users Up-to-date data Consistency, recoverability critical Mostly reads Queries long, complex Gb-Tb of data Summarized, consolidated data Decision-makers, analysts as users © COP Spring 2006
15
Data Marts Smaller warehouses Spans part of organization
e.g., marketing (customers, products, sales) Do not require enterprise-wide consensus but long term integration problems? © COP Spring 2006
16
Data Warehouse Reference Architecture
Query Manager / DSS Client Interfaces Data Mart Central Warehouse Operational Data Store ... Metadata Data Mart derived data WH Manager Extractor/ Monitor Extractor/ Monitor Extractor/ Monitor Source Source Source ... © COP Spring 2006
17
Warehouse Models & Operators
Data Models relations stars & snowflakes cubes Operators slice & dice roll-up, drill down pivoting other © COP Spring 2006
18
Multidimensional Data Model - Data Cube
Geography (cities) NY Measure: Sales SF LA Juice Milk Coke Cream Soap Bread 10 34 56 32 12 Three Dimensions: Time, Product, Geography Attributes (not shown in picture): Product (upc, price, …) Geography (lat, long, … … Product (item) M T W Th F S S Time (day) 56 units of bread sold in LA on M © COP Spring 2006
19
Star © COP Spring 2006
20
Star Schema © COP Spring 2006
21
Terms Fact table Dimension tables Measures © COP Spring 2006
22
Dimension Hierarchies
sType store city region è snowflake schema è constellations © COP Spring 2006
23
Cube Fact table view: Multi-dimensional cube: dimensions = 2
© COP Spring 2006
24
3-D Cube Fact table view: Multi-dimensional cube: dimensions = 3 day 2
© COP Spring 2006
25
ROLAP vs. MOLAP ROLAP: Relational On-Line Analytical Processing
MOLAP: Multi-Dimensional On-Line Analytical Processing © COP Spring 2006
26
Aggregates Add up amounts for day 1 In SQL: SELECT sum(amt) FROM SALE
WHERE date = 1 81 © COP Spring 2006
27
Aggregates Add up amounts by day
In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date © COP Spring 2006
28
Another Example Add up amounts by day, product
In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date, prodId rollup drill-down © COP Spring 2006
29
Aggregates Operators: sum, count, max, min, median, ave
“Having” clause Using dimension hierarchy average by region (within store) maximum by month (within date) © COP Spring 2006
30
Cube Aggregation Example: computing sums . . . 129 rollup drill-down
day 2 . . . day 1 129 drill-down rollup © COP Spring 2006
31
Cube Operators . . . sale(c1,*,*) 129 sale(c2,p2,*) sale(*,*,*) day 2
© COP Spring 2006
32
Extended Cube * day 2 sale(*,p2,*) day 1 © COP Spring 2006
33
Aggregation Using Hierarchies
day 2 day 1 customer region country (customer c1 in Region A; customers c2, c3 in Region B) © COP Spring 2006
34
Query & Analysis Tools Query Building
Report Writers (comparisons, growth, graphs,…) Spreadsheet Systems Web Interfaces Data Mining © COP Spring 2006
35
Other Operations Time functions Computed Attributes Text Queries
e.g., time average Computed Attributes e.g., commission = sales * rate Text Queries e.g., find documents with words X AND B e.g., rank documents by frequency of words X, Y, Z © COP Spring 2006
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.