1 CUBE: A Relational Aggregate Operator Generalizing Group By By Ata İsmet Özçelik.

Slides:



Advertisements
Similar presentations
1 CUBE: A Relational Aggregate Operator Generalizing Group By Jim Gray Adam Bosworth Andrew Layman Microsoft Microsoft.com Hamid Pirahesh IBM.
Advertisements

D ATABASE S YSTEMS I R ELATIONAL A LGEBRA. 22 R ELATIONAL Q UERY L ANGUAGES Query languages (QL): Allow manipulation and retrieval of data from a database.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Concepts of Database Management Sixth Edition
Concepts of Database Management Seventh Edition
5.1Database System Concepts - 6 th Edition Chapter 5: Advanced SQL Advanced Aggregation Features OLAP.
Introduction to Data Warehousing CPS Notes 6.
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Midterm Review Lecture 14b. 14 Lectures So Far 1.Introduction 2.The Relational Model 3.Disks and Files 4.Relational Algebra 5.File Org, Indexes 6.Relational.
Decision Support and Data Warehouse. Decision supports Systems Components Data management function –Data warehouse Model management function –Analytical.
 N. Roussopoulos 2007 OLAP & Data Cubing Spring 2007 Nick Roussopoulos
OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries.
Data Cube and OLAP Server
Horizontal data sets: Number of attributes is of the same order to several orders of magnitude higher than the number of records. Example: genetic data.
Relational Algebra Chapter 4 - part I. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
Chap8: Trends in DBMS 8.1 Database support for Field Entities 8.2 Content-based retrieval 8.3 Introduction to spatial data warehouses 8.4 Summary.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
Data Warehousing. On-Line Analytical Processing (OLAP) Tools The use of a set of graphical tools that provides users with multidimensional views of their.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
Exercises Product ( pname, price, category, maker) Purchase (buyer, seller, store, product) Company (cname, stock price, country) Person( per-name, phone.
8/20/ Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously. Defined in many different ways, but.
Advanced Databases 5841 DATA CUBE. Index of Content 1. The “ALL” value and ALL() function 2. The New Features added in CUBE 3. Computing the CUBE and.
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals Presenter : Parminder Jeet Kaur Discussion Lead : Kailang.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
Concepts of Database Management, Fifth Edition
©Silberschatz, Korth and Sudarshan5.1Database System Concepts Chapter 5: Other Relational Languages Query-by-Example (QBE) Datalog.
Objects for Business Reporting MIS 497. Objective Learn about miscellaneous objects required for business reporting. Learn about miscellaneous objects.
OnLine Analytical Processing (OLAP)
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross- Tab and Sub-Totals Gray et Al. Presented By: Priya Rajan.
Concepts of Database Management Seventh Edition
Using Special Operators (LIKE and IN)
Data Warehousing.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Database Systems Microsoft Access Practical #3 Queries Nos 215.
UNIT-II Principles of dimensional modeling
1 On-Line Analytic Processing Warehousing Data Cubes.
Aggregate Function Computation and Iceberg Querying in Vertical Databases Yue (Jenny) Cui Advisor: Dr. William Perrizo Master Thesis Oral Defense Department.
Data Warehousing.
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals 데이터베이스 연구실 김호숙
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
SQL: Interactive Queries (2) Prof. Weining Zhang Cs.utsa.edu.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
Data Analysis Decision Support Systems Data Analysis and OLAP Data Warehousing.
Plan for Final Lecture What you may expect to be asked in the Exam?
Operation Data Analysis Hints and Guidelines
Prof: Dr. Shu-Ching Chen TA: Hsin-Yu Ha
Lecturer : Dr. Pavle Mogin
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Data Mining: Concepts and Techniques
Lecturer : Dr. Pavle Mogin
Query-by-Example (QBE)
Chapter 5: Advanced SQL Database System concepts,6th Ed.
Based on notes by Jim Gray
Relational Algebra Chapter 4, Part A
DATA CUBE Advanced Databases 584.
Prof: Dr. Shu-Ching Chen TA: Yimin Yang
Data warehouse Design Using Oracle
Prof: Dr. Shu-Ching Chen TA: Hsin-Yu Ha
Chapter 4 Summary Query.
The Relational Model Textbook /7/2018.
Prof: Dr. Shu-Ching Chen TA: Haiman Tian
DATA CUBES E0 261 Jayant Haritsa Computer Science and Automation
Slides based on those originally by : Parminder Jeet Kaur
Relational Algebra Chapter 4 - part I.
Presentation transcript:

1 CUBE: A Relational Aggregate Operator Generalizing Group By By Ata İsmet Özçelik

2 The Data Analysis Cycle User extracts data from database with query Then visualizes, analyzes data with desktop tools Spread Sheet Table Size vs Speed Access Time (seconds) Cache Main Secondary Disc Nearline Tape Offline Tape Online Tape Price vs Speed Access Time (seconds) Cache Main Secondary Disc Nearline Tape Offline Tape Online Tape Size(B) $/MB visualize Extract analyze

3 N-Dimensional data What exactly is N-Dimensional data ? –Relation with N-attribute Domains. –Could have Domain Tables for dimension in the main table. Why is just this not enough? –We need aggregation of various kinds to make the data representation humanly readable.

4 Relational Representation of a 3-D Data Model Sales Fact Table model_key year_key color_key sales Measures Year Color

5 Aggregate Functions Aggregation Functions : –SQL Standard – SUM(), COUNT(), MIN(), MAX(), and AVG(). –Many Systems provide their own custom aggregate functions and some even provide users ability to make custom functions. The basic idea is : Combine all values in a column into a single scalar value.

6 6 Relational Group By Operator Group By allows aggregates over table sub-groups Result is a new table Syntax: select location, sum(units) from inventory group by location having nation = “USA”;

7 Problems with GROUP BY Histogram –In standard SQL, histograms are computed indirectly from table-valued expression which is then aggregated. Roll-up Totals and Sub-Totals for drill-downs. –Reports commonly aggregate data at a coarse level, and then at successively finer levels. Roll-up: going up levels. Drill-down: going down levels. Cross-tabulation (Cross-tab for short). –Symmetric aggregation table. The problem hence is a 2 N – way Union for every Roll- up or Cross-tab, when using GROUP BY

8 An example approach Not relational Not convenient

9 ‘ALL’ Dummy value to fill all the super-aggregation items. Is actually a set representing all the values that are present for the corresponding dimension. There are two ways of dealing with it. –Define a new keyword ALL in SQL ALL() function is defined to enumerate the set that ALL represents. ALL [NOT] ALLOWED is added to column definition syntax Set interpretation guides relational operators {=, IN} for ALL –Avoiding the ALL keyword. NULL is used instead of ALL. GROUPING() function to discriminate between ALL and NULL

10 This is a simple 3-dimensional roll-up. Aggregating over N dimensions requires N such unions. 3D ROLL-UP 3D Roll-Up

11 Cross Tabs The symmetric aggregation result is a table called cross-tabulation.

12 Data Cube Relational Operator

13 N-dimensional Cube Each Attribute is a Dimension N-dimensional Aggregate (sum(), max(),...) –fits relational model exactly: a 1, a 2,...., a N, f() Super-aggregate over N-1 Dimensional sub-cubes ALL, a 2,...., a N, f() a 1, ALL, a 3,...., a N, f()... a 1, a 2,...., ALL, f() –this is the N-1 Dimensional cross-tab. Super-aggregate over N-2 Dimensional sub-cubes ALL, ALL, a 3,...., a N, f()... a 1, a 2,...., ALL, ALL, f()

14 CUBE Operator Syntax: SELECT Model, Year, Color, SUM(sales) AS Sales FROM Sales WHERE Model in (‘Ford’, ‘Chevy’) AND Year BETWEEN 1990 AND 1992 GROUP BY CUBE (Model, Year, Color) Semantics:

15 CUBE Result of a Cube Operator

16 ROLL UP Operator Syntax: SELECT Manufacturer, Year, Color, Model, SUM(price) AS Revenue FROM Weather GROUP BY Manufacturer ROLLUP Year(Time) AS Year Month(Time) AS Month Day(Time) AS Day Semantics:

17 Snowflake Schema A snowflake schema showing the core fact table and some of the many aggregation granularities of the core dimensions.

18 Addressing Data Cube SQL3 defines a Turing Complete procedural programming language. SELECT Year, Color, Model, SUM(sales) AS total SUM(Sales) / total(ALL, ALL, ALL) FROM Sales WHERE Model IN {‘Ford’, ‘Chevy’} AND Year BETWEEN 1990 AND 1992 GROUP BY CUBE Model, Year, Color

19 Computing Data Cubes If each attribute has N i values CUBE has P (N i +1) values Compute N-D cube with hash if fits in RAM Compute N-D cube with sort if overflows RAM Same comments apply to subcubes: –compute N-D-1 subcube from N-D cube. –Aggregate on “biggest” domain first when >1 deep –Aggregate functions need hidden variables: e.g. average needs sum and count. Use standard techniques from query processing –arrays, hashing, hybrid hashing –fall back on sorting.

20 Computing Data Cubes 2 N Algorithm for cube computation. –The simplest algorithm to compute the cube is to allocate a handle for each cube cell Categorization of aggregation functions. –Distributive If the function can be calculated in the following distributed manner: –Partition data into n sets. –Compute the aggregation function on each partition to get an aggregate value. –Apply a function g(), to the n aggregates to get a final aggregate. –This aggregate is the same as it would have been if the whole data would have been aggregated at the same time. COUNT(), SUM(), MIN(), MAX(), SUM(). Can be more efficiently calculated than by the 2 N Algorithm

21 Computing Data Cubes continued.. –Algebraic If it can be calculated by an algebraic function with M(a bounded +ve integer) arguments(each result of a distributive function) Min_N(), max_N, standard_deviation(), avg() Can also be calculated in a more efficient way. –Holistic If there is no constant bound on the storage size needed to describe a subaggregate. rank(), median(), mode() (Need base data) 2 N algorithm the fastest for exact result, but better algorithms for approximate results.

22 Compute 2D core of 2 x 3 Cube Then computer 1D edges Then compute 0D points Works for algebraic and distributive functions Saves “lots” of calls Example

23 Maintaining a Data Cube –Up until now we have been discussing only SELECT statements. –Now we have to accommodate INSERT, DELETE, & UPDATE –Example max() function Distributive for SELECT and INSERT, but holistic for DELETE –If a function algebraic for INSERT,UPDATE and DELETE it is easy to maintain the cube. –If it is distributive it is fairly inexpensive ( using scratchpads) –If its holistic it is expensive to maintain the cube.

24 Summary CUBE operator generalizes relational aggregates Needs ALL value to denote sub-cubes –ALL values represent aggregation sets Needs generalization of user-defined aggregates Decorations and abstractions are interesting Computation has interesting optimizations Relationship to “rest of SQL” not fully worked out.