1 Cube Computation and Indexes for Data Warehouses CPS 196.03 Notes 7.

Slides:



Advertisements
Similar presentations
An Introduction to Data Warehousing
Advertisements

Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values.
OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
1 Multi-way Algorithm for Cube Computation CPS Notes 8.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Implementação do DW. SAD Tagus 2004/05 H. Galhardas O problema e as soluções Grandes quantidades de dados => Métodos de acesso e processamento de interrogações.
Data Warehouse Design Enrico Franconi CS 636. CS 3362 Implementing a Warehouse  Monitoring: Sending data from sources  Integrating: Loading, cleansing,...
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
Introduction to Data Warehousing CPS Notes 6.
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
Implementation & Computation of DW and Data Cube.
Data Warehousing Overview
Lecture 1: Data Warehousing Based on the slides by Jeffrey D. Ullman and Hector Garcia-Molina at Stanford University 1.
Data Warehousing and OLAP
Dr. M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2010 COMP207: Data Mining Data Warehousing COMP207: Data Mining.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ Data Warehouse and Data Cube Lecture Notes for Chapter 3 Introduction to Data Mining By.
By N.Gopinath AP/CSE. Two common multi-dimensional schemas are 1. Star schema: Consists of a fact table with a single table for each dimension 2. Snowflake.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates
Health & Human Services Data Warehouse Why a Data Warehouse.
1 Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously.  A decision support database that is maintained.
CS346: Advanced Databases
8/20/ Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously. Defined in many different ways, but.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
Data Warehousing and Decision Support courtesy of Jiawei Han, Larry Kerschberg, and etc. for some slides. Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
Data Warehousing.
CS 345: Topics in Data Warehousing Tuesday, October 19, 2004.
OnLine Analytical Processing (OLAP)
1 Fall 2004, CIS, Temple University CIS527: Data Warehousing, Filtering, and Mining Lecture 2 Data Warehousing and OLAP Technology for Data Mining Lecture.
1 Data Warehouses BUAD/American University Data Warehouses.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Data Warehousing.
Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator –
BI Terminologies.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
Data Warehousing and OLAP. Warehousing ► Growing industry: $8 billion in 1998 ► Range from desktop to huge:  Walmart: 900-CPU, 2,700 disk, 23TB Teradata.
Dr. N. MamoulisAdvanced Database Technologies1 Topic 6: Data Warehousing & OLAP Defined in many different ways, but not rigorously. A decision support.
1 On-Line Analytic Processing Warehousing Data Cubes.
Data Warehousing Overview CS245 Notes 11 Hector Garcia-Molina Stanford University CS Notes11.
Data Mining Data Warehouses.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 This is the full course notes, but not quite complete. You.
CSE 5331/7331 F'071 CSE 5331/7331 Fall 2007 Dimensional Modeling Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University.
Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
SF-Tree and Its Application to OLAP Speaker: Ho Wai Shing.
SQL Server Analysis Services Understanding Unified Dimension Model (UDM)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
1 Advanced Database Systems: DBS CB, 2 nd Edition Data Warehouse, OLAP, Data Mining Ch , Ch. 22.
CSE6011 Implementing a Warehouse  Monitoring: Sending data from sources  Integrating: Loading, cleansing,...  Processing: Query processing, indexing,...
Advanced Database Systems: DBS CB, 2nd Edition
Data Mining: Data Warehousing
BlinkDB.
Data Warehousing Overview CS245 Notes 12
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
A B D C G5b Date 1Qtr 2Qtr 3Qtr 4Qtr TV Product PC
On-Line Analytic Processing
BlinkDB.
Efficient Methods for Data Cube Computation
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 4 —
OLAP Concepts and Techniques
Data Warehouse.
Data Warehousing and OLAP Technology for Data Mining
Data Warehousing and OLAP
Data Warehousing and Decision Support Chapter 25
Data Mining: Concepts and Techniques
Presentation transcript:

1 Cube Computation and Indexes for Data Warehouses CPS Notes 7

2 Processing l ROLAP servers vs. MOLAP servers l Index Structures l Cube computation l What to Materialize? l Algorithms Client Warehouse Source Query & Analysis Integration Metadata

3 ROLAP Server l Relational OLAP Server relational DBMS ROLAP server tools utilities Special indices, tuning; Schema is “denormalized”

4 MOLAP Server l Multi-Dimensional OLAP Server multi- dimensional server M.D. tools utilities could also sit on relational DBMS Product City Date milk soda eggs soap A B Sales

5 MOLAP Total annual sales of TV in U.S.A. Date Product Country sum TV VCR PC 1Qtr 2Qtr 3Qtr 4Qtr U.S.A Canada Mexico sum

6 MOLAP A B a1a0 c3 c2 c1 c 0 b3 b2 b1 b0 a2a3 C B

7 Challenges in MOLAP l Storing large arrays for efficient access u Row-major, column major u Chunking u Compressing sparse arrays l Creating array data from data in tables l Efficient techniques for Cube computation Topics are discussed in the paper for reading

8 Index Structures l Traditional Access Methods u B-trees, hash tables, R-trees, grids, … l Popular in Warehouses u inverted lists u bit map indexes u join indexes u text indexes

9 Inverted Lists... age index inverted lists data records

10 Using Inverted Lists l Query: u Get people with age = 20 and name = “fred” l List for age = 20: r4, r18, r34, r35 l List for name = “fred”: r18, r52 l Answer is intersection: r18

11 Bit Maps... age index bit maps data records

12 Bitmap Index l Index on a particular column l Each value in the column has a bit vector: bit-op is fast l The length of the bit vector: # of records in the base table l The i-th bit is set if the i-th row of the base table has the value for the indexed column l not suitable for high cardinality domains Base table Index on RegionIndex on Type

13 Using Bit Maps l Query: u Get people with age = 20 and name = “fred” l List for age = 20: l List for name = “fred”: l Answer is intersection: l Good if domain cardinality small l Bit vectors can be compressed

14 Join “Combine” SALE, PRODUCT relations In SQL: SELECT * FROM SALE, PRODUCT WHERE...

15 Join Indexes join index

16 Cube Computation for Data Warehouses

17 Counting Exercise l How many cuboids are there in a cube? u The full or nothing case u When dimension hierarchies are present l What is the size of each cuboid?

18 Lattice of Cuboids city, product, date city, productcity, dateproduct, date cityproductdate all day 2 day 1 129

19 Dimension Hierarchies all state city

20 Dimension Hierarchies city, product city, product, date city, date product, date city product date all state, product, date state, date state, product state not all arcs shown...

21 Efficient Data Cube Computation l Data cube can be viewed as a lattice of cuboids u The bottom-most cuboid is the base cuboid u The top-most cuboid (apex) contains only one cell u How many cuboids in an n-dimensional cube with L levels? l Materialization of data cube u Materialize every (cuboid) (full materialization), none (no materialization), or some (partial materialization) u Selection of which cuboids to materialize è Based on size, sharing, access frequency, etc.

22 Derived Data l Derived Warehouse Data u indexes u aggregates u materialized views (next slide) l When to update derived data? l Incremental vs. refresh

23 Idea of Materialized Views l Define new warehouse tables/arrays does not exist at any source

24 Efficient OLAP Processing l Determine which operations should be performed on available cuboids u Transform drill, roll, etc. into corresponding SQL and/or OLAP operations, e.g., dice = selection + projection l Determine which materialized cuboid(s) should be selected for OLAP: u Let the query to be processed be on {brand, province_or_state} with the condition “year = 2004”, and there are 4 materialized cuboids available: 1) {year, item_name, city} 2) {year, brand, country} 3) {year, brand, province_or_state} 4) {item_name, province_or_state} where year = 2004 Which should be selected to process the query? l Explore indexing structures & compressed vs. dense arrays in MOLAP

25 What to Materialize? l Store in warehouse results useful for common queries l Example: day 2 day total sales materialize

26 Materialization Factors l Type/frequency of queries l Query response time l Storage cost l Update cost Will study a concrete algorithm later

27 Iceberg Cube l Computing only the cuboid cells whose count or other aggregates satisfying the condition like HAVING COUNT(*) >= minsup l Motivation u Only a small portion of cube cells may be “above the water’’ in a sparse cube u Only calculate “interesting” cells—data above certain threshold

28 Challenges in MOLAP l Storing large arrays for efficient access u Row-major, column major u Chunking u Compressing sparse arrays l Creating array data from data in tables l Efficient techniques for Cube computation Topics are discussed in the paper for reading