Based on notes by Jim Gray

Slides:



Advertisements
Similar presentations
1 CUBE: A Relational Aggregate Operator Generalizing Group By Jim Gray Adam Bosworth Andrew Layman Microsoft Microsoft.com Hamid Pirahesh IBM.
Advertisements

BACS 485—Database Management Advanced SQL Overview Advanced DDL, DML, and DCL Commands.
Copyright  Oracle Corporation, All rights reserved. 4 Aggregating Data Using Group Functions.
OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Dwarf: A High Performance OLAP Engine Nick Roussopoulos ACT Inc. & UMD.
Concepts of Database Management Seventh Edition
5 5 Aggregating Data Using Group Functions Important Legal Notice:  Materials on this lecture are from a book titled “Oracle Education” by Kochhar, Gravina,
Aggregating Data Using Group Functions. Objectives After completing this lesson, you should be able to do the following: Identify the available group.
Copyright  Oracle Corporation, All rights reserved. 5 Aggregating Data Using Group Functions.
1Eyad alshareef Enhanced Guide to Oracle 10g Chapter 3: Using SQL Queries to Insert, Update, Delete, and View Data.
GROUP FUNCTIONS. Objectives After completing this lesson, you should be able to do the following: Identify the available group functions Describe the.
Aggregating Data Using Group Functions. Objectives After completing this lesson, you should be able to do the following: Identify the available group.
4 การใช้ SQL Functions. Copyright © 2007, Oracle. All rights reserved What Are Group Functions? Group functions operate on sets of rows to give.
Polaris: A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases Presented by Darren Gates for ICS 280.
 N. Roussopoulos 2007 OLAP & Data Cubing Spring 2007 Nick Roussopoulos
Data Cube and OLAP Server
Computer Science 101 Web Access to Databases SQL – Extended Form.
Advanced Databases 5841 DATA CUBE. Index of Content 1. The “ALL” value and ALL() function 2. The New Features added in CUBE 3. Computing the CUBE and.
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals Presenter : Parminder Jeet Kaur Discussion Lead : Kailang.
Enhancements to the GROUP BY Clause Fresher Learning Program January, 2012.
Database Programming Sections 5– GROUP BY, HAVING clauses, Rollup & Cube Operations, Grouping Set, Set Operations 11/2/10.
Xin  Syntax ◦ SELECT field1 AS title1, field2 AS title2,... ◦ FROM table1, table2 ◦ WHERE conditions  Make a query that returns all records.
1 CUBE: A Relational Aggregate Operator Generalizing Group By By Ata İsmet Özçelik.
4-1 Copyright  Oracle Corporation, All rights reserved. Displaying Data from Multiple Tables.
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross- Tab and Sub-Totals Gray et Al. Presented By: Priya Rajan.
1 ICS 184: Introduction to Data Management Lecture Note 10 SQL as a Query Language (Cont.)
1 Single Table Queries. 2 Objectives  SELECT, WHERE  AND / OR / NOT conditions  Computed columns  LIKE, IN, BETWEEN operators  ORDER BY, GROUP BY,
Subqueries.
Copyright س Oracle Corporation, All rights reserved. 5 Aggregating Data Using Group Functions.
BY SATHISH SQL Basic. Introduction The language Structured English Query Language (SEQUEL) was developed by IBM Corporation, Inc., to use Codd's model.
SQL-5 (Group By.. Having). Group By  Need: To apply the aggregate functions to subgroups of tuples in a relation, where the subgroups are based on some.
SQL SeQueL -Structured Query Language SQL SQL better support for Algebraic operations SQL Post-Relational row and column types,
V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1.
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals 데이터베이스 연구실 김호숙
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Collection Operators These slides are.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Grouping These slides are licensed under.
Aggregating Data Using Group Functions. Objectives After completing this lesson, you should be able to do the following: –Identify the available group.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Collection Operators These slides are.
Agenda for Class - 03/04/2014 Answer questions about HW#5 and HW#6 Review query syntax. Discuss group functions and summary output with the GROUP BY statement.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Plan for Final Lecture What you may expect to be asked in the Exam?
Aggregating Data Using Group Functions
Enhanced Guide to Oracle 10g
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Lecturer : Dr. Pavle Mogin
Subqueries.
Subqueries Schedule: Timing Topic 25 minutes Lecture
Data Analysis with SQL Window Functions
SQL/OLAP Sang-Won Lee Let’s e-Wha! URL: Jul. 12th, 2001 SQL/OLAP
Aggregating Data Using Group Functions
Interacting with the Oracle Server
Generalization.
SQL FUNDAMENTALS CDSE Days 2018.
DATA CUBE Advanced Databases 584.
(SQL) Aggregating Data Using Group Functions
What Is a View? EMPNO ENAME JOB EMP Table EMPVU10 View
Data warehouse Design Using Oracle
Chapter 4 Summary Query.
Aggregating Data Using Group Functions
Aggregating Data Using Group Functions
DATA CUBES E0 261 Jayant Haritsa Computer Science and Automation
Subqueries Schedule: Timing Topic 25 minutes Lecture
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
Subqueries Schedule: Timing Topic 25 minutes Lecture
Slides based on those originally by : Parminder Jeet Kaur
LINQ to SQL Part 3.
Database Programming Using Oracle 11g
Presentation transcript:

Based on notes by Jim Gray Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals Jim Gray Microsoft Adam Bosworth Microsoft Andrew Layman Microsoft Hamid Pirahesh IBM Presented by: Changwu Li Based on notes by Jim Gray 9/17/2018

The Data Analysis Cycle Spread Sheet Table 1 10 15 12 9 6 3 Size vs Speed Access Time (seconds) -9 -6 -3 Cache Main Secondary Disc Nearline Tape Offline Online 4 2 -2 -4 Price vs Speed Size(B) $/MB visualize Extract analyze User extracts data from database with query Then visualizes, analyzes data with desktop tools 9/17/2018

Relational Aggregate Operators SQL has several aggregate operators: sum(), min(), max(), count(), avg() Other systems extend this with many others: stat functions, financial functions, ... The basic idea is: Combine all values in a column into a single scalar value. Syntax select sum(units) from inventory; 9/17/2018

Relational Group By Operator Group By allows aggregates over table sub-groups Result is a new table Syntax: select deptno, sum(salary) from emp group by deptno 9/17/2018

Problems With This Design Users Want Histograms Users want sub-totals and totals drill-down & roll-up reports Users want CrossTabs Conventional wisdom These are not relational operators They are in many report writers and query engines F() G() H() sum M T W T F S S  AIR HOTEL FOOD MISC 9/17/2018

Table 5a: Ford Sales Cross Tab A cross tab example Table 5a: Ford Sales Cross Tab Ford 1994 1995 total (ALL) black 50 85 135 white 10 75 60 160 220   9/17/2018

How to solve this problem? Answer: cube 9/17/2018

The Idea: Think of the N-dimensional Cube Each Attribute is a Dimension N-dimensional Aggregate (sum(), max(),...) fits relational model exactly: a1, a2, ...., aN, f() Super-aggregate over N-1 Dimensional sub-cubes ALL, a2, ...., aN , f() a3 , ALL, a3, ...., aN , f() ... a1, a2, ...., ALL, f() this is the N-1 Dimensional cross-tab. Super-aggregate over N-2 Dimensional sub-cubes ALL, ALL, a3, ...., aN , f() a1, a2 ,...., ALL, ALL, f() 9/17/2018

An Example CUBE 9/17/2018

Why the ALL Value? Need a new “Null” value (overloads the null indicator) Value must not already be in the aggregated domain Can’t use NULL since may aggregate on it. Think of ALL as a token representing the set All(color)={red, white, blue}, All(year)={1990, 1991, 1992}, All(model)={Chevy, Ford} Follow “set of values” semantics. 9/17/2018

CUBE operator: Syntax select model, make, year, sum(sales) Proposed syntax: Note: Group By operator repeats aggregate list in select list in group by list select model, make, year, sum(sales) from car_sales where model in {“chevy”, “ford”} and year between 1990 and 1994 group by model, make, year with cube having sum(sales) > 0; 9/17/2018

How To Compute the Cube? If each attribute has Ni values CUBE has P (Ni+1) values Compute N-D cube with hash if fits in RAM Compute N-D cube with sort if overflows RAM Same comments apply to subcubes: compute N-D-1 subcube from N-D cube. Aggregate on “biggest” domain first when >1 deep Aggregate functions need hidden variables: e.g. average needs sum and count. 9/17/2018

Example: Compute 2D core of 2 x 3 cube Then compute 1D edges Then compute 0D point Works for algebraic and distributive functions Saves “lots” of calls 9/17/2018

Real world implementation Both Oracle 9i and SQL server 2000 An example in Oracle 9i: select deptno, job, sum(sal) as salary from emp group by cube(deptno, job) 9/17/2018

-------- ------------------ -------------------- 10 CLERK 1300 DEPTNO JOB SALARY -------- ------------------ -------------------- 10 CLERK 1300 10 MANAGER 2450 10 PRESIDENT 5000 10 8750 20 ANALYST 6000 20 CLERK 1900 20 MANAGER 2975 20 10875 30 CLERK 950 30 MANAGER 2850 30 SALESMAN 5600 30 9400 ANALYST 6000 CLERK 4150 MANAGER 8275 PRESIDENT 5000 SALESMAN 5600 2902 9/17/2018

1999 ACM Turing Award Jim Gray 9/17/2018