Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Warehouse.

Similar presentations


Presentation on theme: "Data Warehouse."— Presentation transcript:

1 Data Warehouse

2 Data Warehouse Most common form of data integration.
Copy data from one or more sources into a single DB (warehouse) Update: periodic reconstruction of the warehouse, perhaps overnight. Frequently essential for analytic queries.

3 OLTP Most database operations involve On-Line Transaction Processing (OTLP). Short, simple, frequent queries and/or modifications, each involving a small number of tuples. Examples: Answering queries from a Web interface, sales at cash registers, selling airline tickets.

4 OLAP On-Line Application Processing (OLAP, or “analytic”) queries :
Few, but complex queries --- may run for hours. Queries not depend on having an absolutely latest database (trend analysis).

5 OLAP Examples Ads: Amazon analyzes purchases by its customers to come up with an individual screen with products of likely interest to the customer. Optimization: Analysts at Wal-Mart look for items with increasing sales in some region. Use empty trucks to move merchandise between stores.

6 Databases at store branches handle OLTP.
Common Architecture Databases at store branches handle OLTP. Local store databases copied to a central warehouse (overnight). Analysts use the warehouse for OLAP / analytics.

7 Star Schemas A star schema is a common organization for data at a warehouse. Fact table : large collection of facts such as sales. ( Often “insert-only.” ) Dimension tables : smaller, generally static information about the entities involved in the facts.

8 Example: Star Schema Suppose we record in a warehouse information about every beer sale: the bar, the brand of beer, the drinker who bought the beer, the day, the time, and the price charged.

9 Example: Star Schema Suppose we want to record in a warehouse information about every beer sale: the bar, the brand of beer, the drinker who bought the beer, the day, the time, and the price charged. Fact table : Sales(bar, beer, drinker, day, time, price)

10 Drinkers(drinker, addr, phone)
Example -- Continued Dimension tables ~ info about bar, beer, and drinker “dimensions”: Bars(bar, addr, license) Beers(beer, manf) Drinkers(drinker, addr, phone)

11 Star Schema Dimension Table (Bars) Dimension Table (Drinkers)
Dimension Attrs. Dependent Attrs. Fact Table - Sales Dimension Table (Beers) Dimension Table (etc.)

12 Dimensions and Dependent Attributes
Two classes of fact-table attributes: Dimension attributes : the key of a dimension table. Dependent attributes : a value determined by the dimension attributes of the tuple.

13 Example: Dependent Attribute
Dependent attribute of Sales relation: price Determined by combination of dimension attributes: bar, beer, drinker, and time

14 Data Cube beer price bar drinker

15 Marginals / Aggregates
The data cube includes aggregation (e.g, SUM) along margins of cube. Marginals include aggregations over one dimension, two dimensions,…

16 Visualization--Data Cube w/Aggregation
beer SUM over all Drinkers price bar drinker

17 Example: Marginals Our 4-dimensional Sales cube includes sum of price over each bar, each beer, each drinker, and each time unit (perhaps days). It would also have the sum of price over all bar-beer pairs, all bar-drinker-day triples,…

18 Structure of Cube Think of each dimension as having an additional value * (group-by + aggregate ) Group-by: A point with one or more *s in its coordinates aggregates over the dimensions with the *’s. Example: Sales(”Joe’s Bar”, ”Bud”, *, *) holds the sum, over all drinkers and all time, of the Bud beer consumed at Joe’s.

19 Drill-Down Drill-down = “de-aggregate” = break an aggregate into its constituents. Example: if Joe’s Bar sells very few Anheuser-Busch beers, break down his sales by particular A.-B. beer.

20 Roll-Up Roll-up = aggregate further along one or more dimensions. Example: Given a table of how much Bud each drinker consumes at each bar, roll it up into a table giving total amount of Bud consumed by each drinker.

21 Example: Roll Up and Drill Down
$ of Anheuser-Busch by drinker/bar 112 100 133 Mary Bob Jim $ of A-B / drinker Roll up by Bar Jim Bob Mary Joe’s Bar 45 33 30 Nut- House 50 36 42 Blue Chalk 38 31 40 35 40 48 Bud Light 37 31 45 M’lob 29 Bud Mary Bob Jim $ of A-B Beers / drinker Drill down by Beer

22 Query Execution (Row-oriented storage)
SELECT drinker, SUM(price) FROM fact_table GROUP BY drinker;

23 Query Execution (Column-oriented storage)
SELECT drinker, SUM(price) FROM fact_table GROUP BY drinker; Drinker

24 Data Warehousing: Column vs Row Store

25 Column Stores are special purpose (relational) databases
Logical storing details of RDBMS vs. Column Store. Potential for compression of columnar data Source: VLDB Tutorial 2009 – Column Oriented Database Systems - Stavros Harizopoulos, Daniel Abadi, Peter Boncz

26 Using Column Stores Column Stores well suited for analytics problems.
Typical usage: Create OLAP cube (or column store) periodically Use it for decision support for business Potential issues with a column store: Hard to insert/update/remove a row But the above use case does not require it

27 Column-Oriented Storage
Google BigQuery is an example of a system that stores its data column-by-column, not row-by-row. Potential for compression of columnar data Source: An Inside Look at Google BigQuery (


Download ppt "Data Warehouse."

Similar presentations


Ads by Google