DATA CUBE Advanced Databases 584.

Slides:



Advertisements
Similar presentations
Chapter 18: Data Analysis and Mining Kat Powell. Chapter 18: Data Analysis and Mining ➔ Decision Support Systems ➔ Data Analysis and OLAP ➔ Data Warehousing.
Advertisements

5.1Database System Concepts - 6 th Edition Chapter 5: Advanced SQL Advanced Aggregation Features OLAP.
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
Chapter 11 Group Functions
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Pivoting and SQL:1999.
OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries.
Data Cube and OLAP Server
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
Chap8: Trends in DBMS 8.1 Database support for Field Entities 8.2 Content-based retrieval 8.3 Introduction to spatial data warehouses 8.4 Summary.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
Data Warehousing. On-Line Analytical Processing (OLAP) Tools The use of a set of graphical tools that provides users with multidimensional views of their.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Microsoft SQL Server 2012 Analysis Services (SSAS) Reporting Services (SSRS)
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
OLAP OPERATIONS. OLAP ONLINE ANALYTICAL PROCESSING OLAP provides a user-friendly environment for Interactive data analysis. In the multidimensional model,
1 Basic concepts of On-Line Analytical processing DT211 /4.
8/20/ Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously. Defined in many different ways, but.
Advanced Databases 5841 DATA CUBE. Index of Content 1. The “ALL” value and ALL() function 2. The New Features added in CUBE 3. Computing the CUBE and.
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals Presenter : Parminder Jeet Kaur Discussion Lead : Kailang.
Database Programming Sections 5– GROUP BY, HAVING clauses, Rollup & Cube Operations, Grouping Set, Set Operations 11/2/10.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
Ahsan Abdullah 1 Data Warehousing Lecture-12 Relational OLAP (ROLAP) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
1 CUBE: A Relational Aggregate Operator Generalizing Group By By Ata İsmet Özçelik.
Multi-Dimensional Databases & Online Analytical Processing This presentation uses some materials from: “ An Introduction to Multidimensional Database Technology,
Ahsan Abdullah 1 Data Warehousing Lecture-11 Multidimensional OLAP (MOLAP) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for.
OnLine Analytical Processing (OLAP)
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross- Tab and Sub-Totals Gray et Al. Presented By: Priya Rajan.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Prof. Bayer, DWH, Ch.4, SS Chapter 4: Dimensions, Hierarchies, Operations, Modeling.
Data Warehousing.
BUSINESS ANALYTICS AND DATA VISUALIZATION
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
1 On-Line Analytic Processing Warehousing Data Cubes.
What is OLAP?.
Data Warehousing.
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals 데이터베이스 연구실 김호숙
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Pindaro Demertzoglou Data Resource Management – MGMT 4170 Lally School of Management Rensselaer Polytechnic Institute.
Data Analysis Decision Support Systems Data Analysis and OLAP Data Warehousing.
Data Analysis and OLAP Dr. Ms. Pratibha S. Yalagi Topic Title
Lecturer : Dr. Pavle Mogin
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Chapter 13 Business Intelligence and Data Warehouses
Data Warehouse.
On-Line Analytic Processing
Advanced Queries in MS Access
Efficient Methods for Data Cube Computation
What is OLAP OLAP allows to model data in a multidimensional way like a data cube in order to look for the data from many perspectives.
Chapter 5: Advanced SQL Database System concepts,6th Ed.
SQL/OLAP Sang-Won Lee Let’s e-Wha! URL: Jul. 12th, 2001 SQL/OLAP
Data storage is growing Future Prediction through historical data
Data Warehouse.
Based on notes by Jim Gray
On-Line Analytical Processing (OLAP)
MANAGING DATA RESOURCES
Data warehouse Design Using Oracle
Data Warehouse and OLAP
Enhance BI Applications and Simplify Development
Chapter 4: Dimensions, Hierarchies, Operations, Modeling
DataMart (Data Warehouse) Tool:
SQL: Structured Query Language
DATA CUBES E0 261 Jayant Haritsa Computer Science and Automation
Slides based on those originally by : Parminder Jeet Kaur
Data Warehouse and OLAP
Presentation transcript:

DATA CUBE Advanced Databases 584

Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP DATA CUBE Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP Advanced Databases 584

Motivation - Data Analysis Visualization tools Should display data trends, clusters. N-dimensional space Allow Dimensionality reduction Data Summarization Look for unusual patterns in data Advanced Databases 584

Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP DATA CUBE Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP Advanced Databases 584

N-dimensional Data in 2D SQL tables Time Latitude Longitude Altitude Temp Pressure Date:T1 La1 Lo1 A1 T1 P1 Date:T2 La2 Lo2 A2 T2 P2 N dimensions into N-attribute domain For Visualization, dimensionality reduction via aggregation to include data along neglected dimensions November 9, 2018 Advanced Databases

N-dimensional Data in 2D SQL tables Time Latitude Longitude Altitude Temp Pressure Date:T1 La1 Lo1 A1 T1 P1 Date:T2 La2 Lo2 A2 T2 P2 Query + Extract Each row in the above Weather table represents a weather measurement. SQL Standard Aggregators: COUNT() SUM() MIN() MAX() AVG() E.g: Select AVG(Temp) FROM Weather -> Returns a single Aggregate value for all tuples GROUP BY: -> Returns the aggregate value for each of the groups in the result-set, grouped by one or more columns. E.g: SELECT Time, Altitude, AVG(Temp) FROM Weather GROUP BY Time, Altitude; Complexity? Visualize Advanced Databases 584

Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP DATA CUBE Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP Advanced Databases 584

Few Extensions Informix Illustra – Handles for User defined CallBacks Init(&handle): Allocates the handle and initializes the aggregate computation. Iter(&handle, value): Aggregates the next value into the current aggregate. value = Final(&handle): Computes and returns the resulting aggregate by using data saved in the handle. Red Brick systems Rank(expression) N_tile(expression, n) Ratio_To_Total(expression) Advanced Databases 584

Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP DATA CUBE Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP Advanced Databases 584

Issues with GROUP BY- Histogram Data analysis are difficult with these SQL aggregation constructs Issue: SQL does not allow direct construction of aggregation over computed categories Time Latitude Longitude Altitude Temp Pressure Date:T1 La1 Lo1 A1 T1 P1 Date:T2 La2 Lo2 A2 T2 P2 Advanced Databases 584

Issues with GROUP BY- Histogram Suppose the user wants a grouping on day, area and Max(temp) for the day SELECT day, nation, MAX(Temp) FROM Weather GROUP BY Day(Time) AS day, Nation(Latitude, Longitude) AS nation; Not Possible in Standard SQL Day Nation Max(Temp) 1 US 67 Canada 57 Advanced Databases 584

Issues with GROUP BY- Histogram Solution : Compute indirectly from a table-valued expression which is then aggregated SELECT day, nation, MAX(Temp) FROM (SELECT Day(Time) AS day, Nation(Latitude, Longitude) AS nation, Temp FROM Weather) AS foo GROUP BY day, nation; Issue: Complex Nested Queries!! Advanced Databases 584

Issues with GROUP BY- Rollups Model Year Color Sales by Year by Model by Color by Model by Year Sales By Model Chevy 1994 1995 Black White 50 40 85 115 90 200 290 Not Relational! Rollup N states gives rise to 2^N possible aggregation columns Drill down Advanced Databases 584

Issues with GROUP BY- Rollups Solution 1: Model Year Color Sales Sales by Model by Year Sales by Model Chevy 1994 1995 Black White 50 40 85 115 90 200 290 Enormous Number of Domains Increase in number of columns ∞ power set of the number of aggregated attributes Advanced Databases 584

Issues with GROUP BY- Rollups Solution 2: Pivot table in Excel Pivot on color column Sum sales model Year/Color 1994 1994 total 1995 1995 total Grand Total Black White Chevy 50 40 90 85 115 200 290 Explosion in number of Columns! ->Transposes the Spreadsheet -> Aggregate values based on their column ‘Values’ -> Increases as Pivot increases Advanced Databases 584

Issues with GROUP BY- Rollups Another Solution: Prevents Exponential Growth by ALL Dummy Value “ALL” added to fill in super-aggregation terms Model Year Color Units Chevy 1994 1995 ALL Black White 50 40 90 85 115 200 290 Advanced Databases 584

Issues with GROUP BY- Rollups SELECT `ALL', `ALL', `ALL', SUM(Sales) FROM Sales WHERE Model = `Chevy‘ UNION SELECT Model, `ALL', `ALL', SUM(Sales) WHERE Model = `Chevy' GROUP BY Model SELECT Model, Year, `ALL', SUM(Sales) GROUP BY Model, Year SELECT Model, Year, Color, SUM(Sales) GROUP BY Model, Year, Color; Model Year Color Units Chevy 1994 1995 ALL Black White 50 40 90 85 115 200 290 SQL statement to build this table from Sales Data Advanced Databases 584

Issues with GROUP BY- Cross Tabulation What is cross-tabulation?? Symmetric aggregation result table Cross-tab is two dimensional aggregation. If more aggregation values are added then it grows in dimension Advanced Databases 584

Issues with GROUP BY- Cross Tabulation Equivalent Relational Representation in terms of ‘ALL’ values Explosion in terms of number of cross-tabs! Chevy Sales Cross Tab: Chevy 1994 1995 Total(ALL) Black White 50 40 90 85 115 200 135 155 290 Ford Sales Cross Tab: Ford 1994 1995 Total(ALL) Black White 50 10 06 85 75 160 135 20 Advanced Databases 584

Why CUBE instead of Group By’s -> Group Bys solve aggregation -> But roll-up and Cross-tab is daunting! E.g A 6 Dimensional cross-tab requires 64-way union of 64 different GROUP BY operators!!!!! Solution: Lets look at the CUBE! Advanced Databases 584

Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP DATA CUBE Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP Advanced Databases 584

CUBE Operators Generalization of group-by, roll-up and cross-tab N-1 lower dimensional aggregates appear as points, lines, planes, cubes, hyper-cubes Advanced Databases 584

Data Cubes : What, Why, Where, How What? Operator in OLAP for fast analysis of data Why? Requirement to manipulate and analyze data from multiple perspectives. Traditional relational databases fit into this multi-dimensional data analysis??? Where? Used in OLAP Advanced Databases 584

Data Cubes : What, When, Why, How Advanced Databases 584

OLAP On-Line Analytical Processing - A type of data processing that allows decision makers to examine data according to the dimensions of the business Data warehousing is a type of OLAP. OLAP systems are built upon Multidimensional Databases - i.e., they are designed from the ground up to support the definition and query of multidimensional data cubes/structures. The core of any OLAP system is an OLAP cube a.k.a Data cube Advanced Databases 584

OLAP CUBE Company sales of a part to a customer at a store location Advanced Databases 584

Advanced Databases 584 Combination Count {P1, Calgary, Vance} 2 {P2, Vancouver, Richard} 11 {P2, Calgary, Vance} 4 {P3, Vancouver, Richard} 9 {P3, Calgary, Vance} 1 {P4, Vancouver, Richard} {P1, Toronto, Vance} 5 {P5, Vancouver, Richard} {P3, Toronto, Vance} 8 {P1, Calgary, Richard} {P5, Toronto, Vance} {P2, Calgary, Richard} {P5, Montreal, Vance} {P3, Calgary, Richard} {P1, Vancouver, Bob} 3 {P2, Calgary, Allison} {P3, Vancouver, Bob} {P3, Calgary, Allison} {P5, Vancouver, Bob} {P1, Toronto, Allison} {P1, Montreal, Bob} {P2, Toronto, Allison} {P3, Montreal, Bob} {P3, Toronto, Allison} 6 {P4, Montreal, Bob} 7 {P4, Toronto, Allison} {P5, Montreal, Bob} Advanced Databases 584

Create a Data CUBE Generate the power set (set of all subsets) of the aggregation columns by overloading GROUP BY operator. The data cube operator builds a table containing all these aggregate values. The total aggregate using function f() is represented as the tuple: ALL, ALL, ALL, : : :, ALL, f(*) Advanced Databases 584

Create a Data CUBE Advanced Databases 584

CUBE is a relational Operator Create a Data CUBE CUBE is a relational Operator If C1,C2,… CN are cardinalities of N attributes Then resulting cube relation is ∏(Ci +1) i.e initial SALES table is 2*3*3 = 18 rows The derived data cube 3*4*4 = 48 rows Advanced Databases 584

CUBE-Syntax and Semantics SELECT day, nation, MAX(Temp) FROM Weather GROUP BY CUBE Day(Time) AS day, Country(Latitude, Longitude) AS nation; Semantics: Aggregates over all the <select list> in the GROUP BY . It then unions in each super-aggregate of the global cube- substituting ALL for aggregation columns -> If there are N attributes in <select list> there will be 2^N - 1 super-aggregate values. Advanced Databases 584

Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP DATA CUBE Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP Advanced Databases 584

ROLL-UP Operator SOLUTION: ROLLUP! To compute only a set of aggregates like this, CUBE would be an overkill! SOLUTION: ROLLUP! Advanced Databases 584

ROLL-UP Operator Difference : ROLLUP produces just the super-aggregates Difference : CUBE generates a result set that shows aggregates for all combinations of values in the selected columns. ROLLUP generates a result set that shows aggregates for a hierarchy of values in the selected columns. (v1 ,v2 ,...,vn, f()), (v1 ,v2 ,...,ALL, f()), ... (v1 ,ALL,...,ALL, f()), (ALL,ALL,...,ALL, f()). Advantage: ->Unlike CUBE in which the answer set is Multi-dimensional…. ROLL UP is naturally sequential -> Operators like SUM(),AVG() work well with ROLLUP Advanced Databases 584

CUBE and ROLLUP Algebra SELECT Manufacturer, Year, Month, Day, Color, Model, SUM(price) AS Revenue FROM Sales GROUP BY Manufacturer, ROLLUP Year(Time) AS Year, Month(Time) AS Month, Day(Time) AS Day, CUBE Color, Model; GroupBy ROLLUP CUBE Advanced Databases 584

CUBE and ROLLUP Algebra GROUP BY <select list> ROLLUP <select list> CUBE <select list> “Powerful” Advanced Databases 584

CUBE and ROLLUP Algebra A nice link illustrating the difference between ROLLUP and CUBE http://msdn.microsoft.com/en-US/library/ms189305(v=SQL.90).aspx Advanced Databases 584