Data Warehouse.

Slides:



Advertisements
Similar presentations
SQL Group Members: Shijun Shen Xia Tang Sixin Qiang.
Advertisements

Data Analysis. Overview Traditional database systems are tuned to many, small, simple queries. Some applications use fewer, more time-consuming, analytic.
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
Data Warehouse Architecture Sakthi Angappamudali Data Architect, The Oregon State University, Corvallis 16 th May, 2005.
2/10/05Salman Azhar: Database Systems1 On-Line Analytical Processing Salman Azhar Warehousing Data Cubes Data Mining These slides use some figures, definitions,
OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries.
Lecture 1: Data Warehousing Based on the slides by Jeffrey D. Ullman and Hector Garcia-Molina at Stanford University 1.
SLIDE 1IS 257 – Fall 2011 Data Mining and the Weka Toolkit University of California, Berkeley School of Information IS 257: Database Management.
Columnar Database Systems
1 On-Line Application Processing Warehousing Data Cubes Data Mining.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Chapter 13 The Data Warehouse
SLIDE 1IS 257 – Fall 2010 Data Mining and the Weka Toolkit University of California, Berkeley School of Information IS 257: Database Management.
On-Line Application Processing Warehousing Data Cubes Data Mining 1.
On-Line Analytic Processing Chetan Meshram Class Id:221.
IMS 6217: Data Warehousing / Business Intelligence Part 3 1 Dr. Lawrence West, Management Dept., University of Central Florida Analysis.
OnLine Analytical Processing (OLAP)
1 On-Line Application Processing Warehousing Data Cubes Data Mining.
1 Data Warehouses BUAD/American University Data Warehouses.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Winter 2006Winter 2002 Keller, Ullman, CushingJudy Cushing 19–1 Warehousing The most common form of information integration: copy sources into a single.
Fox MIS Spring 2011 Data Warehouse Week 8 Introduction of Data Warehouse Multidimensional Analysis: OLAP.
UNIT-II Principles of dimensional modeling
1 On-Line Analytic Processing Warehousing Data Cubes.
On-Line Application Processing Warehousing Data Cubes (Data Mining) (slides borrowed from Stanford)
CMPE 226 Database Systems October 21 Class Meeting Department of Computer Engineering San Jose State University Fall 2015 Instructor: Ron Mak
ADVANCED TOPICS IN RELATIONAL DATABASES Spring 2011 Instructor: Hassan Khosravi.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
1 Introduction to Database Systems, CS420 SQL Views and Indexes.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Databases 2 On-Line Application Processing: Warehousing, Data Cubes, Data Mining.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
On-Line Application Processing
Virtual and Materialized Views Speeding Accesses to Data
Operation Data Analysis Hints and Guidelines
Data warehouse.
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
CPSC-310 Database Systems
On-Line Analytic Processing
Data warehouse and OLAP
On-Line Analytic Processing
Chapter 13 The Data Warehouse
Data storage is growing Future Prediction through historical data
Data Warehouse.
CS 440 Database Management Systems
CPSC-608 Database Systems
Inventory is used to illustrate:
Database Models Relational Model
On-Line Analytical Processing (OLAP)
CMPE 226 Database Systems April 11 Class Meeting
Data Warehouse and OLAP
An Introduction to Data Warehousing
Data Warehousing: Data Models and OLAP operations
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009
On-Line Application Processing
Virtual and Materialized Views Speeding Accesses to Data
Data Warehousing Concepts
CPSC-608 Database Systems
CPSC-608 Database Systems
Analysis Services Analysis Services vs. the Data Warehouse vs. OLTP DB
Data Warehouse and OLAP
Instructor: Zhe He Department of Computer Science
Data Warehouse and OLAP Technology
Data Warehousing.
Presentation transcript:

Data Warehouse

Data Warehouse Most common form of data integration. Copy data from one or more sources into a single DB (warehouse) Update: periodic reconstruction of the warehouse, perhaps overnight. Frequently essential for analytic queries.

OLTP Most database operations involve On-Line Transaction Processing (OTLP). Short, simple, frequent queries and/or modifications, each involving a small number of tuples. Examples: Answering queries from a Web interface, sales at cash registers, selling airline tickets.

OLAP On-Line Application Processing (OLAP, or “analytic”) queries : Few, but complex queries --- may run for hours. Queries not depend on having an absolutely latest database (trend analysis).

OLAP Examples Ads: Amazon analyzes purchases by its customers to come up with an individual screen with products of likely interest to the customer. Optimization: Analysts at Wal-Mart look for items with increasing sales in some region. Use empty trucks to move merchandise between stores.

Databases at store branches handle OLTP. Common Architecture Databases at store branches handle OLTP. Local store databases copied to a central warehouse (overnight). Analysts use the warehouse for OLAP / analytics.

Star Schemas A star schema is a common organization for data at a warehouse. Fact table : large collection of facts such as sales. ( Often “insert-only.” ) Dimension tables : smaller, generally static information about the entities involved in the facts.

Example: Star Schema Suppose we record in a warehouse information about every beer sale: the bar, the brand of beer, the drinker who bought the beer, the day, the time, and the price charged.

Example: Star Schema Suppose we want to record in a warehouse information about every beer sale: the bar, the brand of beer, the drinker who bought the beer, the day, the time, and the price charged. Fact table : Sales(bar, beer, drinker, day, time, price)

Drinkers(drinker, addr, phone) Example -- Continued Dimension tables ~ info about bar, beer, and drinker “dimensions”: Bars(bar, addr, license) Beers(beer, manf) Drinkers(drinker, addr, phone)

Star Schema Dimension Table (Bars) Dimension Table (Drinkers) Dimension Attrs. Dependent Attrs. Fact Table - Sales Dimension Table (Beers) Dimension Table (etc.)

Dimensions and Dependent Attributes Two classes of fact-table attributes: Dimension attributes : the key of a dimension table. Dependent attributes : a value determined by the dimension attributes of the tuple.

Example: Dependent Attribute Dependent attribute of Sales relation: price Determined by combination of dimension attributes: bar, beer, drinker, and time

Data Cube beer price bar drinker

Marginals / Aggregates The data cube includes aggregation (e.g, SUM) along margins of cube. Marginals include aggregations over one dimension, two dimensions,…

Visualization--Data Cube w/Aggregation beer SUM over all Drinkers price bar drinker

Example: Marginals Our 4-dimensional Sales cube includes sum of price over each bar, each beer, each drinker, and each time unit (perhaps days). It would also have the sum of price over all bar-beer pairs, all bar-drinker-day triples,…

Structure of Cube Think of each dimension as having an additional value * (group-by + aggregate ) Group-by: A point with one or more *s in its coordinates aggregates over the dimensions with the *’s. Example: Sales(”Joe’s Bar”, ”Bud”, *, *) holds the sum, over all drinkers and all time, of the Bud beer consumed at Joe’s.

Drill-Down Drill-down = “de-aggregate” = break an aggregate into its constituents. Example: if Joe’s Bar sells very few Anheuser-Busch beers, break down his sales by particular A.-B. beer.

Roll-Up Roll-up = aggregate further along one or more dimensions. Example: Given a table of how much Bud each drinker consumes at each bar, roll it up into a table giving total amount of Bud consumed by each drinker.

Example: Roll Up and Drill Down $ of Anheuser-Busch by drinker/bar 112 100 133 Mary Bob Jim $ of A-B / drinker Roll up by Bar Jim Bob Mary Joe’s Bar 45 33 30 Nut- House 50 36 42 Blue Chalk 38 31 40 35 40 48 Bud Light 37 31 45 M’lob 29 Bud Mary Bob Jim $ of A-B Beers / drinker Drill down by Beer

Query Execution (Row-oriented storage) SELECT drinker, SUM(price) FROM fact_table GROUP BY drinker;

Query Execution (Column-oriented storage) SELECT drinker, SUM(price) FROM fact_table GROUP BY drinker; Drinker

Data Warehousing: Column vs Row Store

Column Stores are special purpose (relational) databases Logical storing details of RDBMS vs. Column Store. Potential for compression of columnar data Source: VLDB Tutorial 2009 – Column Oriented Database Systems - Stavros Harizopoulos, Daniel Abadi, Peter Boncz

Using Column Stores Column Stores well suited for analytics problems. Typical usage: Create OLAP cube (or column store) periodically Use it for decision support for business Potential issues with a column store: Hard to insert/update/remove a row But the above use case does not require it

Column-Oriented Storage Google BigQuery is an example of a system that stores its data column-by-column, not row-by-row. Potential for compression of columnar data Source: An Inside Look at Google BigQuery (https://goo.gl/G0lbJh)