Presentation is loading. Please wait.

Presentation is loading. Please wait.

DATA CUBE Advanced Databases 584.

Similar presentations


Presentation on theme: "DATA CUBE Advanced Databases 584."— Presentation transcript:

1 DATA CUBE Advanced Databases 584

2 Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP
DATA CUBE Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP Advanced Databases 584

3 Motivation - Data Analysis
Visualization tools Should display data trends, clusters. N-dimensional space Allow Dimensionality reduction Data Summarization Look for unusual patterns in data Advanced Databases 584

4 Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP
DATA CUBE Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP Advanced Databases 584

5 N-dimensional Data in 2D SQL tables
Time Latitude Longitude Altitude Temp Pressure Date:T1 La1 Lo1 A1 T1 P1 Date:T2 La2 Lo2 A2 T2 P2 N dimensions into N-attribute domain For Visualization, dimensionality reduction via aggregation to include data along neglected dimensions November 9, 2018 Advanced Databases

6 N-dimensional Data in 2D SQL tables
Time Latitude Longitude Altitude Temp Pressure Date:T1 La1 Lo1 A1 T1 P1 Date:T2 La2 Lo2 A2 T2 P2 Query + Extract Each row in the above Weather table represents a weather measurement. SQL Standard Aggregators: COUNT() SUM() MIN() MAX() AVG() E.g: Select AVG(Temp) FROM Weather -> Returns a single Aggregate value for all tuples GROUP BY: -> Returns the aggregate value for each of the groups in the result-set, grouped by one or more columns. E.g: SELECT Time, Altitude, AVG(Temp) FROM Weather GROUP BY Time, Altitude; Complexity? Visualize Advanced Databases 584

7 Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP
DATA CUBE Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP Advanced Databases 584

8 Few Extensions Informix Illustra – Handles for User defined CallBacks
Init(&handle): Allocates the handle and initializes the aggregate computation. Iter(&handle, value): Aggregates the next value into the current aggregate. value = Final(&handle): Computes and returns the resulting aggregate by using data saved in the handle. Red Brick systems Rank(expression) N_tile(expression, n) Ratio_To_Total(expression) Advanced Databases 584

9 Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP
DATA CUBE Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP Advanced Databases 584

10 Issues with GROUP BY- Histogram
Data analysis are difficult with these SQL aggregation constructs Issue: SQL does not allow direct construction of aggregation over computed categories Time Latitude Longitude Altitude Temp Pressure Date:T1 La1 Lo1 A1 T1 P1 Date:T2 La2 Lo2 A2 T2 P2 Advanced Databases 584

11 Issues with GROUP BY- Histogram
Suppose the user wants a grouping on day, area and Max(temp) for the day SELECT day, nation, MAX(Temp) FROM Weather GROUP BY Day(Time) AS day, Nation(Latitude, Longitude) AS nation; Not Possible in Standard SQL Day Nation Max(Temp) 1 US 67 Canada 57 Advanced Databases 584

12 Issues with GROUP BY- Histogram
Solution : Compute indirectly from a table-valued expression which is then aggregated SELECT day, nation, MAX(Temp) FROM (SELECT Day(Time) AS day, Nation(Latitude, Longitude) AS nation, Temp FROM Weather) AS foo GROUP BY day, nation; Issue: Complex Nested Queries!! Advanced Databases 584

13 Issues with GROUP BY- Rollups
Model Year Color Sales by Year by Model by Color by Model by Year Sales By Model Chevy 1994 1995 Black White 50 40 85 115 90 200 290 Not Relational! Rollup N states gives rise to 2^N possible aggregation columns Drill down Advanced Databases 584

14 Issues with GROUP BY- Rollups
Solution 1: Model Year Color Sales Sales by Model by Year Sales by Model Chevy 1994 1995 Black White 50 40 85 115 90 200 290 Enormous Number of Domains Increase in number of columns ∞ power set of the number of aggregated attributes Advanced Databases 584

15 Issues with GROUP BY- Rollups
Solution 2: Pivot table in Excel Pivot on color column Sum sales model Year/Color 1994 1994 total 1995 1995 total Grand Total Black White Chevy 50 40 90 85 115 200 290 Explosion in number of Columns! ->Transposes the Spreadsheet -> Aggregate values based on their column ‘Values’ -> Increases as Pivot increases Advanced Databases 584

16 Issues with GROUP BY- Rollups
Another Solution: Prevents Exponential Growth by ALL Dummy Value “ALL” added to fill in super-aggregation terms Model Year Color Units Chevy 1994 1995 ALL Black White 50 40 90 85 115 200 290 Advanced Databases 584

17 Issues with GROUP BY- Rollups
SELECT `ALL', `ALL', `ALL', SUM(Sales) FROM Sales WHERE Model = `Chevy‘ UNION SELECT Model, `ALL', `ALL', SUM(Sales) WHERE Model = `Chevy' GROUP BY Model SELECT Model, Year, `ALL', SUM(Sales) GROUP BY Model, Year SELECT Model, Year, Color, SUM(Sales) GROUP BY Model, Year, Color; Model Year Color Units Chevy 1994 1995 ALL Black White 50 40 90 85 115 200 290 SQL statement to build this table from Sales Data Advanced Databases 584

18 Issues with GROUP BY- Cross Tabulation
What is cross-tabulation?? Symmetric aggregation result table Cross-tab is two dimensional aggregation. If more aggregation values are added then it grows in dimension Advanced Databases 584

19 Issues with GROUP BY- Cross Tabulation
Equivalent Relational Representation in terms of ‘ALL’ values Explosion in terms of number of cross-tabs! Chevy Sales Cross Tab: Chevy 1994 1995 Total(ALL) Black White 50 40 90 85 115 200 135 155 290 Ford Sales Cross Tab: Ford 1994 1995 Total(ALL) Black White 50 10 06 85 75 160 135 20 Advanced Databases 584

20 Why CUBE instead of Group By’s
-> Group Bys solve aggregation -> But roll-up and Cross-tab is daunting! E.g A 6 Dimensional cross-tab requires 64-way union of 64 different GROUP BY operators!!!!! Solution: Lets look at the CUBE! Advanced Databases 584

21 Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP
DATA CUBE Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP Advanced Databases 584

22 CUBE Operators Generalization of group-by, roll-up and cross-tab
N-1 lower dimensional aggregates appear as points, lines, planes, cubes, hyper-cubes Advanced Databases 584

23 Data Cubes : What, Why, Where, How
What? Operator in OLAP for fast analysis of data Why? Requirement to manipulate and analyze data from multiple perspectives. Traditional relational databases fit into this multi-dimensional data analysis??? Where? Used in OLAP Advanced Databases 584

24 Data Cubes : What, When, Why, How
Advanced Databases 584

25 OLAP On-Line Analytical Processing - A type of data processing that allows decision makers to examine data according to the dimensions of the business Data warehousing is a type of OLAP. OLAP systems are built upon Multidimensional Databases - i.e., they are designed from the ground up to support the definition and query of multidimensional data cubes/structures. The core of any OLAP system is an OLAP cube a.k.a Data cube Advanced Databases 584

26 OLAP CUBE Company sales of a part to a customer at a store location
Advanced Databases 584

27 Advanced Databases 584 Combination Count {P1, Calgary, Vance} 2
{P2, Vancouver, Richard} 11 {P2, Calgary, Vance} 4 {P3, Vancouver, Richard} 9 {P3, Calgary, Vance} 1 {P4, Vancouver, Richard} {P1, Toronto, Vance} 5 {P5, Vancouver, Richard} {P3, Toronto, Vance} 8 {P1, Calgary, Richard} {P5, Toronto, Vance} {P2, Calgary, Richard} {P5, Montreal, Vance} {P3, Calgary, Richard} {P1, Vancouver, Bob} 3 {P2, Calgary, Allison} {P3, Vancouver, Bob} {P3, Calgary, Allison} {P5, Vancouver, Bob} {P1, Toronto, Allison} {P1, Montreal, Bob} {P2, Toronto, Allison} {P3, Montreal, Bob} {P3, Toronto, Allison} 6 {P4, Montreal, Bob} 7 {P4, Toronto, Allison} {P5, Montreal, Bob} Advanced Databases 584

28 Create a Data CUBE Generate the power set (set of all subsets) of the aggregation columns by overloading GROUP BY operator. The data cube operator builds a table containing all these aggregate values. The total aggregate using function f() is represented as the tuple: ALL, ALL, ALL, : : :, ALL, f(*) Advanced Databases 584

29 Create a Data CUBE Advanced Databases 584

30 CUBE is a relational Operator
Create a Data CUBE CUBE is a relational Operator If C1,C2,… CN are cardinalities of N attributes Then resulting cube relation is ∏(Ci +1) i.e initial SALES table is 2*3*3 = 18 rows The derived data cube 3*4*4 = 48 rows Advanced Databases 584

31 CUBE-Syntax and Semantics
SELECT day, nation, MAX(Temp) FROM Weather GROUP BY CUBE Day(Time) AS day, Country(Latitude, Longitude) AS nation; Semantics: Aggregates over all the <select list> in the GROUP BY . It then unions in each super-aggregate of the global cube- substituting ALL for aggregation columns -> If there are N attributes in <select list> there will be 2^N - 1 super-aggregate values. Advanced Databases 584

32 Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP
DATA CUBE Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP Advanced Databases 584

33 ROLL-UP Operator SOLUTION: ROLLUP!
To compute only a set of aggregates like this, CUBE would be an overkill! SOLUTION: ROLLUP! Advanced Databases 584

34 ROLL-UP Operator Difference :
ROLLUP produces just the super-aggregates Difference : CUBE generates a result set that shows aggregates for all combinations of values in the selected columns. ROLLUP generates a result set that shows aggregates for a hierarchy of values in the selected columns. (v1 ,v2 ,...,vn, f()), (v1 ,v2 ,...,ALL, f()), ... (v1 ,ALL,...,ALL, f()), (ALL,ALL,...,ALL, f()). Advantage: ->Unlike CUBE in which the answer set is Multi-dimensional…. ROLL UP is naturally sequential -> Operators like SUM(),AVG() work well with ROLLUP Advanced Databases 584

35 CUBE and ROLLUP Algebra
SELECT Manufacturer, Year, Month, Day, Color, Model, SUM(price) AS Revenue FROM Sales GROUP BY Manufacturer, ROLLUP Year(Time) AS Year, Month(Time) AS Month, Day(Time) AS Day, CUBE Color, Model; GroupBy ROLLUP CUBE Advanced Databases 584

36 CUBE and ROLLUP Algebra
GROUP BY <select list> ROLLUP <select list> CUBE <select list> “Powerful” Advanced Databases 584

37 CUBE and ROLLUP Algebra
A nice link illustrating the difference between ROLLUP and CUBE Advanced Databases 584


Download ppt "DATA CUBE Advanced Databases 584."

Similar presentations


Ads by Google