DATA CUBE Advanced Databases 584
Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP DATA CUBE Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP Advanced Databases 584
Motivation - Data Analysis Visualization tools Should display data trends, clusters. N-dimensional space Allow Dimensionality reduction Data Summarization Look for unusual patterns in data Advanced Databases 584
Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP DATA CUBE Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP Advanced Databases 584
N-dimensional Data in 2D SQL tables Time Latitude Longitude Altitude Temp Pressure Date:T1 La1 Lo1 A1 T1 P1 Date:T2 La2 Lo2 A2 T2 P2 N dimensions into N-attribute domain For Visualization, dimensionality reduction via aggregation to include data along neglected dimensions November 9, 2018 Advanced Databases
N-dimensional Data in 2D SQL tables Time Latitude Longitude Altitude Temp Pressure Date:T1 La1 Lo1 A1 T1 P1 Date:T2 La2 Lo2 A2 T2 P2 Query + Extract Each row in the above Weather table represents a weather measurement. SQL Standard Aggregators: COUNT() SUM() MIN() MAX() AVG() E.g: Select AVG(Temp) FROM Weather -> Returns a single Aggregate value for all tuples GROUP BY: -> Returns the aggregate value for each of the groups in the result-set, grouped by one or more columns. E.g: SELECT Time, Altitude, AVG(Temp) FROM Weather GROUP BY Time, Altitude; Complexity? Visualize Advanced Databases 584
Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP DATA CUBE Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP Advanced Databases 584
Few Extensions Informix Illustra – Handles for User defined CallBacks Init(&handle): Allocates the handle and initializes the aggregate computation. Iter(&handle, value): Aggregates the next value into the current aggregate. value = Final(&handle): Computes and returns the resulting aggregate by using data saved in the handle. Red Brick systems Rank(expression) N_tile(expression, n) Ratio_To_Total(expression) Advanced Databases 584
Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP DATA CUBE Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP Advanced Databases 584
Issues with GROUP BY- Histogram Data analysis are difficult with these SQL aggregation constructs Issue: SQL does not allow direct construction of aggregation over computed categories Time Latitude Longitude Altitude Temp Pressure Date:T1 La1 Lo1 A1 T1 P1 Date:T2 La2 Lo2 A2 T2 P2 Advanced Databases 584
Issues with GROUP BY- Histogram Suppose the user wants a grouping on day, area and Max(temp) for the day SELECT day, nation, MAX(Temp) FROM Weather GROUP BY Day(Time) AS day, Nation(Latitude, Longitude) AS nation; Not Possible in Standard SQL Day Nation Max(Temp) 1 US 67 Canada 57 Advanced Databases 584
Issues with GROUP BY- Histogram Solution : Compute indirectly from a table-valued expression which is then aggregated SELECT day, nation, MAX(Temp) FROM (SELECT Day(Time) AS day, Nation(Latitude, Longitude) AS nation, Temp FROM Weather) AS foo GROUP BY day, nation; Issue: Complex Nested Queries!! Advanced Databases 584
Issues with GROUP BY- Rollups Model Year Color Sales by Year by Model by Color by Model by Year Sales By Model Chevy 1994 1995 Black White 50 40 85 115 90 200 290 Not Relational! Rollup N states gives rise to 2^N possible aggregation columns Drill down Advanced Databases 584
Issues with GROUP BY- Rollups Solution 1: Model Year Color Sales Sales by Model by Year Sales by Model Chevy 1994 1995 Black White 50 40 85 115 90 200 290 Enormous Number of Domains Increase in number of columns ∞ power set of the number of aggregated attributes Advanced Databases 584
Issues with GROUP BY- Rollups Solution 2: Pivot table in Excel Pivot on color column Sum sales model Year/Color 1994 1994 total 1995 1995 total Grand Total Black White Chevy 50 40 90 85 115 200 290 Explosion in number of Columns! ->Transposes the Spreadsheet -> Aggregate values based on their column ‘Values’ -> Increases as Pivot increases Advanced Databases 584
Issues with GROUP BY- Rollups Another Solution: Prevents Exponential Growth by ALL Dummy Value “ALL” added to fill in super-aggregation terms Model Year Color Units Chevy 1994 1995 ALL Black White 50 40 90 85 115 200 290 Advanced Databases 584
Issues with GROUP BY- Rollups SELECT `ALL', `ALL', `ALL', SUM(Sales) FROM Sales WHERE Model = `Chevy‘ UNION SELECT Model, `ALL', `ALL', SUM(Sales) WHERE Model = `Chevy' GROUP BY Model SELECT Model, Year, `ALL', SUM(Sales) GROUP BY Model, Year SELECT Model, Year, Color, SUM(Sales) GROUP BY Model, Year, Color; Model Year Color Units Chevy 1994 1995 ALL Black White 50 40 90 85 115 200 290 SQL statement to build this table from Sales Data Advanced Databases 584
Issues with GROUP BY- Cross Tabulation What is cross-tabulation?? Symmetric aggregation result table Cross-tab is two dimensional aggregation. If more aggregation values are added then it grows in dimension Advanced Databases 584
Issues with GROUP BY- Cross Tabulation Equivalent Relational Representation in terms of ‘ALL’ values Explosion in terms of number of cross-tabs! Chevy Sales Cross Tab: Chevy 1994 1995 Total(ALL) Black White 50 40 90 85 115 200 135 155 290 Ford Sales Cross Tab: Ford 1994 1995 Total(ALL) Black White 50 10 06 85 75 160 135 20 Advanced Databases 584
Why CUBE instead of Group By’s -> Group Bys solve aggregation -> But roll-up and Cross-tab is daunting! E.g A 6 Dimensional cross-tab requires 64-way union of 64 different GROUP BY operators!!!!! Solution: Lets look at the CUBE! Advanced Databases 584
Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP DATA CUBE Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP Advanced Databases 584
CUBE Operators Generalization of group-by, roll-up and cross-tab N-1 lower dimensional aggregates appear as points, lines, planes, cubes, hyper-cubes Advanced Databases 584
Data Cubes : What, Why, Where, How What? Operator in OLAP for fast analysis of data Why? Requirement to manipulate and analyze data from multiple perspectives. Traditional relational databases fit into this multi-dimensional data analysis??? Where? Used in OLAP Advanced Databases 584
Data Cubes : What, When, Why, How Advanced Databases 584
OLAP On-Line Analytical Processing - A type of data processing that allows decision makers to examine data according to the dimensions of the business Data warehousing is a type of OLAP. OLAP systems are built upon Multidimensional Databases - i.e., they are designed from the ground up to support the definition and query of multidimensional data cubes/structures. The core of any OLAP system is an OLAP cube a.k.a Data cube Advanced Databases 584
OLAP CUBE Company sales of a part to a customer at a store location Advanced Databases 584
Advanced Databases 584 Combination Count {P1, Calgary, Vance} 2 {P2, Vancouver, Richard} 11 {P2, Calgary, Vance} 4 {P3, Vancouver, Richard} 9 {P3, Calgary, Vance} 1 {P4, Vancouver, Richard} {P1, Toronto, Vance} 5 {P5, Vancouver, Richard} {P3, Toronto, Vance} 8 {P1, Calgary, Richard} {P5, Toronto, Vance} {P2, Calgary, Richard} {P5, Montreal, Vance} {P3, Calgary, Richard} {P1, Vancouver, Bob} 3 {P2, Calgary, Allison} {P3, Vancouver, Bob} {P3, Calgary, Allison} {P5, Vancouver, Bob} {P1, Toronto, Allison} {P1, Montreal, Bob} {P2, Toronto, Allison} {P3, Montreal, Bob} {P3, Toronto, Allison} 6 {P4, Montreal, Bob} 7 {P4, Toronto, Allison} {P5, Montreal, Bob} Advanced Databases 584
Create a Data CUBE Generate the power set (set of all subsets) of the aggregation columns by overloading GROUP BY operator. The data cube operator builds a table containing all these aggregate values. The total aggregate using function f() is represented as the tuple: ALL, ALL, ALL, : : :, ALL, f(*) Advanced Databases 584
Create a Data CUBE Advanced Databases 584
CUBE is a relational Operator Create a Data CUBE CUBE is a relational Operator If C1,C2,… CN are cardinalities of N attributes Then resulting cube relation is ∏(Ci +1) i.e initial SALES table is 2*3*3 = 18 rows The derived data cube 3*4*4 = 48 rows Advanced Databases 584
CUBE-Syntax and Semantics SELECT day, nation, MAX(Temp) FROM Weather GROUP BY CUBE Day(Time) AS day, Country(Latitude, Longitude) AS nation; Semantics: Aggregates over all the <select list> in the GROUP BY . It then unions in each super-aggregate of the global cube- substituting ALL for aggregation columns -> If there are N attributes in <select list> there will be 2^N - 1 super-aggregate values. Advanced Databases 584
Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP DATA CUBE Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP Advanced Databases 584
ROLL-UP Operator SOLUTION: ROLLUP! To compute only a set of aggregates like this, CUBE would be an overkill! SOLUTION: ROLLUP! Advanced Databases 584
ROLL-UP Operator Difference : ROLLUP produces just the super-aggregates Difference : CUBE generates a result set that shows aggregates for all combinations of values in the selected columns. ROLLUP generates a result set that shows aggregates for a hierarchy of values in the selected columns. (v1 ,v2 ,...,vn, f()), (v1 ,v2 ,...,ALL, f()), ... (v1 ,ALL,...,ALL, f()), (ALL,ALL,...,ALL, f()). Advantage: ->Unlike CUBE in which the answer set is Multi-dimensional…. ROLL UP is naturally sequential -> Operators like SUM(),AVG() work well with ROLLUP Advanced Databases 584
CUBE and ROLLUP Algebra SELECT Manufacturer, Year, Month, Day, Color, Model, SUM(price) AS Revenue FROM Sales GROUP BY Manufacturer, ROLLUP Year(Time) AS Year, Month(Time) AS Month, Day(Time) AS Day, CUBE Color, Model; GroupBy ROLLUP CUBE Advanced Databases 584
CUBE and ROLLUP Algebra GROUP BY <select list> ROLLUP <select list> CUBE <select list> “Powerful” Advanced Databases 584
CUBE and ROLLUP Algebra A nice link illustrating the difference between ROLLUP and CUBE http://msdn.microsoft.com/en-US/library/ms189305(v=SQL.90).aspx Advanced Databases 584