V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1.

Slides:



Advertisements
Similar presentations
BACS 485—Database Management Advanced SQL Overview Advanced DDL, DML, and DCL Commands.
Advertisements

Copyright  Oracle Corporation, All rights reserved. 4 Aggregating Data Using Group Functions.
Alternative Database topology: The star schema
Introduction To SQL Lynnwood Brown President System Managers LLC Copyright System Managers LLC 2003 all rights reserved.
Restricting and sorting data 16 May May May Created By Pantharee Sawasdimongkol.
5 5 Aggregating Data Using Group Functions Important Legal Notice:  Materials on this lecture are from a book titled “Oracle Education” by Kochhar, Gravina,
Database Programming Sections 5 & 6 – Group functions, COUNT, DISTINCT, NVL, GROUP BY, HAVING clauses, Subqueries.
Aggregating Data Using Group Functions. Objectives After completing this lesson, you should be able to do the following: Identify the available group.
Copyright  Oracle Corporation, All rights reserved. 5 Aggregating Data Using Group Functions.
1Eyad alshareef Enhanced Guide to Oracle 10g Chapter 3: Using SQL Queries to Insert, Update, Delete, and View Data.
GROUP FUNCTIONS. Objectives After completing this lesson, you should be able to do the following: Identify the available group functions Describe the.
Aggregating Data Using Group Functions. Objectives After completing this lesson, you should be able to do the following: Identify the available group.
Set operators (UNION, UNION ALL, MINUS, INTERSECT) [SQL]
Chapter 11 Group Functions
V 1.0 OE NIK PHP+SQL 11 (SQL 4) Views Rownum/LIMIT Examples.
V 1.0 OE NIK PHP+SQL 10 (SQL 3) Group By, Having Multi-table queries Subqueries Examples.
Logical Operators Operator AND OR NOT Meaning Returns TRUE if both component conditions are TRUE Returns TRUE if either component condition is TRUE Returns.
Enhancements to the GROUP BY Clause Fresher Learning Program January, 2012.
Database Programming Sections 5– GROUP BY, HAVING clauses, Rollup & Cube Operations, Grouping Set, Set Operations 11/2/10.
4-1 Copyright  Oracle Corporation, All rights reserved. Displaying Data from Multiple Tables.
OnLine Analytical Processing (OLAP)
Oracle Database Administration Lecture 3  Transactions  SQL Language: Additional information  SQL Language: Analytic Functions.
Oracle Database Administration Lecture 2 SQL language.
1 ICS 184: Introduction to Data Management Lecture Note 10 SQL as a Query Language (Cont.)
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Basic Relational Algebra These slides.
Copyright س Oracle Corporation, All rights reserved. 5 Aggregating Data Using Group Functions.
Joins & Sub-queries. Oracle recognizes that you may want data that resides in multiple tables drawn together in some meaningful way. One of the most important.
SELECT Statements Lecture Notes Sree Nilakanta Fall 2010 (rev)
SQL- DQL (Oracle Version). 2 SELECT Statement Syntax SELECT [DISTINCT] column_list FROM table_list [WHERE conditional expression] [GROUP BY column_list]
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Basic SQL These slides are licensed under.
Chapter 3 Selected Single-Row Functions and Advanced DML & DDL.
SQL SeQueL -Structured Query Language SQL SQL better support for Algebraic operations SQL Post-Relational row and column types,
Basic Group Functions (without GROUP BY clause) Week 5 – Chapter 5.
Copyright  Oracle Corporation, All rights reserved. 12 Creating Views.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Subqueries These slides are licensed under.
An Introduction To SQL Part 2 (Special thanks to Geoff Leese)
1 Information Retrieval and Use (IRU) An Introduction To SQL Part 2.
Copyright س Oracle Corporation, All rights reserved. I Introduction.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Relational State Assertions These slides.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Grouping These slides are licensed under.
Aggregating Data Using Group Functions. Objectives After completing this lesson, you should be able to do the following: –Identify the available group.
Copyright س Oracle Corporation, All rights reserved. 12 Creating Views.
Background Lots of Demos(That’s it.)
Copyright  All material contained herein is owned by Daniel Stober, the author of this presentation. This presentation and the queries, examples, and.
1 Ch. 11: Grouping Things Together  ANSI standard SQL Group functions: AVG, COUNT, MAX, MIN, STDDEV, SUM, VARIANCE  Others: 8i: GROUPING (used with CUBE.
DATABASES
SQL: Structured Query Language It enables to create and operate on relational databases, which are sets of related information stored in tables. It is.
Defining a Column Alias
Advanced SQL. SQL - Nulls Nulls are not equal to anything - Null is not even equal to Null where columna != ‘ABC’ --this will not return records where.
Multiple-Column Subqueries
Aggregating Data Using Group Functions
Enhanced Guide to Oracle 10g
Subqueries.
Subqueries Schedule: Timing Topic 25 minutes Lecture
Aggregating Data Using Group Functions
(SQL) Aggregating Data Using Group Functions
MENAMPILKAN DATA DARI SATU TABEL (Chap 2)
What Is a View? EMPNO ENAME JOB EMP Table EMPVU10 View
Multi-table queries Subqueries
Chapter 4 Summary Query.
Aggregating Data Using Group Functions
Aggregating Data Using Group Functions
Retail Sales is used to illustrate a first dimensional model
Retail Sales is used to illustrate a first dimensional model
M1G Introduction to Database Development
Subqueries Schedule: Timing Topic 25 minutes Lecture
Subqueries Schedule: Timing Topic 25 minutes Lecture
Database Programming Using Oracle 11g
Presentation transcript:

V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 1

V 1.0 SELECT Displayed order of suffixes 1.INTO 2.FROM 3.WHERE 4.GROUP BY 5.HAVING 6.UNION/MINUS 7.INTERSECT 8.ORDER BY 2

V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 3

V 1.0 Grouping/Aggregate functions SUM - Sum AVG - Average MIN - Minimum MAX - Maximum COUNT - Number of non null values (records) GROUP_CONCAT - Concatenated list of elements STDDEV - Standard deviation VARIANCE - Variance 4

V 1.0 Non-grouping usage select avg(sal) as Average from emp; select min(sal) from emp; select min(sal) from emp where sal>2000; select avg(distinct sal) as Average from emp; select count(sal) from emp; select count(comm) from emp where sal>2000; select comm from emp where sal>2000; select count(*) from emp where sal>2000; select avg(comm) from emp;  NULL values are not included! 5

V 1.0 Grouping select distinct deptno from emp; select avg(sal) from emp where deptno=10; select avg(sal) from emp where deptno=20; select avg(sal) from emp where deptno=30;  select deptno, avg(sal) from emp group by deptno; 6

V 1.0 Grouping IN THE SELECTION LIST (FIELD LIST) ONLY THE GROUPED FIELD(s) AND THE GROUPING FUNCTION(s) ARE ALLOWED! (YES, IN MYSQL AS WELL!!!) (ONLY_FULL_GROUP_BY) select deptno, avg(sal) as Average, min(sal) as Minimum, count(*) as Num from emp group by deptno; 7

V 1.0 Grouping and suffixes select mgr, avg(sal) from emp group by mgr; select ifnull(mgr, "none") as boss, lpad(avg(sal), 15, '#') as "Averagesal" from emp group by mgr; HAVING vs. WHERE select mgr, avg(sal) from emp where ename like '%E%' group by mgr; select mgr, avg(sal) from emp where ename like '%E%' group by mgr having avg(sal)>1300; select mgr, avg(sal) as average from emp where ename like '%E%' group by mgr having avg(sal)>1300 order by average desc; 8

V 1.0 More complex grouping queries select min(max(sal)), max(max(sal)), round(avg(max(sal))) from emp group by deptno; -- In Oracle this works, in MySQL „Invalid use of group function” select min(sal+ nvl(comm,0)), mod(empno,3) from emp group by mod(empno,3) having min(sal+nvl(comm,0)) > 800; 9

V 1.0 select distinct job, substr(job, 2, 1) from emp; select avg(sal) as average, substr(job, 2, 1) from emp group by substr(job, 2, 1); select ename, sal, round(sal/1000) from emp; select round(sal/1000) as SalCat, count(sal) as Num from emp group by round(sal/1000); More complex grouping queries 10

V 1.0 select ename, round(datediff(curdate(), hiredate)/365.25) as diff from emp; select count(*), round(datediff(curdate(), hiredate)/365.25) as diff from emp group by round(datediff(curdate(), hiredate)/365.25); More complex grouping queries (MySQL) 11

V 1.0 select ename, hiredate, (to_char(sysdate, 'YYYY')- to_char(hiredate, 'YYYY')) as diff from emp; select count(*),(to_char(sysdate, 'YYYY')- to_char(hiredate, 'YYYY')) as diff from emp group by (to_char(sysdate, 'YYYY')-to_char(hiredate, 'YYYY')); OR: we could use months_between() More complex grouping queries (Oracle) 12

V 1.0 select distinct depno, job from emp; select deptno, job, avg(sal), min(sal), max(sal) from emp group by deptno, job order by deptno, job;  Oracle-specific „extras”: –GROUP BY GROUPING SETS –GROUP BY CUBE –GROUP BY ROLLUP More complex grouping queries 13

V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 14

V 1.0 GROUP BY Group by, Having – one-field use is "trivial": e.g. average salary for job or department Multiple fields: complex grouping, e.g. average salary for job AND department Still: only the grouped field and the grouping functions are allowed in the selection list!!! 15

V 1.0 SELECT job, deptno, avg(sal) FROM emp GROUP BY job, deptno; JOB DEPTNO AVG(SAL) CLERK MANAGER PRESIDENT ANALYST CLERK MANAGER CLERK MANAGER SALESMAN

V 1.0 SELECT mgr, job, deptno, avg(sal) FROM emp GROUP BY job, deptno, mgr; MGR JOB DEPTNO AVG(SAL) MANAGER MANAGER CLERK SALESMAN MANAGER CLERK CLERK PRESIDENT ANALYST CLERK

V 1.0 DISADVANTAGES OF A SINGLE GROUP BY Not flexible enough One grouping per query, thus multiple queries are needed even if groupings are similar  Slower Aim: One query, multiple groupings  GROUPING SETS SELECT job, deptno, avg(sal) FROM emp GROUP BY GROUPING SETS ( (job, deptno) ); 18

V 1.0 NVL – Type matching! SELECT nvl(mgr, 'Nope'), deptno, avg(sal) FROM emp GROUP BY GROUPING SETS ( (mgr, deptno) ); SELECT nvl(to_char(mgr), 'Nope'), deptno, avg(sal) FROM emp GROUP BY GROUPING SETS ( (mgr, deptno) ); SELECT nvl(mgr, 0), deptno, avg(sal) FROM emp GROUP BY GROUPING SETS ( (mgr, deptno) ); 19

V 1.0 GROUP BY GROUPING SETS We can define multiple groupings inside one query, sub-results can be cached E.g. performing an MGR, DEPTNO and a JOB, DEPTNO grouping in ONE query: SELECT nvl(mgr, 0), deptno, nvl(job, 'Nope'), avg(sal) FROM emp GROUP BY GROUPING SETS ( (mgr, deptno), (deptno, job) ); 20

V 1.0 GROUP BY GROUPING SETS SELECT nvl(mgr, 0), nvl(deptno,0), nvl(job, 'NO'), avg(sal) FROM emp GROUP BY GROUPING SETS ( (mgr, deptno), (deptno, job), (mgr) ); SELECT nvl(mgr, 0), nvl(deptno,0), nvl(job, 'NO'), avg(sal) FROM emp GROUP BY GROUPING SETS ( (mgr, deptno), (deptno, job), (mgr), () );  Why do we have 0 for the mgr value ??? 21

V

V 1.0 GROUPING Using the GROUPING special "grouping function" we can determine if the given field is used for a grouping in a record Grouping function: allowed in the selection list Special: It can only work with a grouped field! 23

V 1.0 GROUPING 0 = TRUE ? When using with a single and multi-field simple GROUP BY, it returns with 0 SELECT job, avg(sal), grouping(job) FROM emp GROUP BY job; SELECT deptno, job, avg(sal), grouping(job) FROM emp GROUP BY job, deptno; When using with grouping sets: grouping = 0 means that the field is being used in the aggregation for that record 24

V 1.0 GROUPING SELECT mgr, deptno, job, avg(sal), GROUPING(mgr) as GMGR, GROUPING(deptno) as GDEPTNO, GROUPING(job) as GJOB FROM emp GROUP BY GROUPING SETS ( (mgr, deptno), (deptno, job), (mgr), () ); 25

V

V 1.0 GROUPING SELECT CASE WHEN GROUPING(mgr)=0 THEN mgr ELSE 0 END as MGR, CASE WHEN GROUPING(deptno)=0 THEN deptno ELSE 0 END as DEPTNO, CASE WHEN GROUPING(job)=0 THEN job ELSE 'NO' END as JOB, avg(sal) FROM emp GROUP BY GROUPING SETS ( (mgr, deptno), (deptno, job), (mgr), ()); 27

V

V 1.0 GROUPING_ID Unique identifier for each possible grouping column configuration SELECT mgr, deptno, job, avg(sal), GROUPING_ID(mgr, deptno, job) as GID FROM emp GROUP BY GROUPING SETS ( (mgr, deptno), (deptno, job), (mgr), () ); 29

V

V 1.0 GROUP BY GROUPING SETS DRAWBACKS Too complicated, too long When do we need a query with three totally different grouping sets? What kind of caching can we do here? Usually, there are hierarchical relations between the grouping fields  more meaning, more caching  ROLLUP and CUBE  GROUPING and GROUPING_ID can be used the same way 31

V 1.0 CUBE GROUP BY CUBE (a, b, c) = GROUP BY GROUPING SETS ( (a, b, c), (a, b), (b, c), (a, c), (a), (b), (c), ( )). CUBE(field1, field2)  the two fields have the same rank, all permutations are shown CUBE(job, deptno): In addition for the simple two-field grouping, we get the job-averages, the department- averages, and the total average 32

V 1.0 SELECT job, deptno, avg(sal) FROM emp GROUP BY CUBE(job, deptno); 33

V 1.0 ROLLUP GROUP BY ROLLUP (a, b, c) = GROUPING SETS ( (a, b, c), (a, b), (a), ( )) ROLLUP(field1, field2)  the first field is hierarchically more important, we only take the permutations where it is used ROLLUP(job, deptno): In addition for the simple two- field grouping, we get the job-averages and the total average 34

V 1.0 SELECT job, deptno, avg(sal) FROM emp GROUP BY ROLLUP(job, deptno); JOB DEPTNO AVG(SAL) CLERK MANAGER PRESIDENT ANALYST CLERK MANAGER CLERK MANAGER SALESMAN ANALYST 3000 CLERK 1037,5 MANAGER 2758,33333 PRESIDENT 5000 SALESMAN ,

V 1.0 MIXTURE OF GROUPINGS GROUP BY a, CUBE (b, c) = GROUP BY GROUPING SETS ( (a, b, c), (a, b), (a, c), (a) ) GROUP BY a, ROLLUP (b, c) = GROUP BY GROUPING SETS ( (a, b, c), (a, b), (a) ) 36

V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 37

V 1.0 OLTP? OLAP? OLTP = On Line Transaction Processing OLAP = On Line Analytic Processing OLTP –product » price –invoice » amount –client » name OLAP –Product category × Region » Gross margin –Product × Warehouse » Inventory –Supplier × Time × Product » Return rate –Tables are usually a result of grouping! 38

V 1.0 OLTP vs OLAP OLTPOLAP ApplicationOperational: ERP, CRM, legacy apps Management Information System, Decision Support System Typical users StaffManagers, Executives HorizonWeeks, MonthsYears RefreshImmediatePeriodic Data modelEntity-relationshipMulti-dimensional SchemaNormalizedStar EmphasisUpdateRetrieval 39

V 1.0 Star data model? 40

V 1.0 Star data model? The supervisor that gave the most discounts? The quantity shipped on a particular date, month, year or quarter? In which zip code did product A sell the most? 41

V 1.0 OLAP rules Automatized data transfer –Extract data from OLTP system(s) –Transform/standardize, if necessary –Import to OLAP database –Build cubes (GROUP BY!) –Produce reports Drilling –Drill down: region  city  district –Drill up: city  region  country –Drill across: north region  south region  west region 42

V 1.0 OLAP vs Group by Every dimension can be a result of a group by query Every data cube will be a result of group by queries One problem: missing/bad data points  We need trends and projections! 43

V 1.0 DBMAN 3 Group By, Having Cube, Rollup OLTP vs OLAP Data analysis 44

V FROM 2.WHERE 3.GROUP BY 4.HAVING 5.UNION/MINUS 6.INTERSECT 7.ORDER BY 8.INTO SELECT Order of suffixes 45

V 1.0 BASIC PROBLEMS Functions: in the selection list Order by, group by: always executed after functions, so we might need sub-queries ROWNUM s*cks (later...) Solution: special functions, that can work together with the ordering / grouping of records 46

V 1.0 RANK FUNCTIONS SELECT ROW_NUMBER() OVER (ORDER BY ENAME ASC) AS RNUM, ENAME FROM EMP; Simple rank functions: RANK()  1, 2, 2, 4 DENSE_RANK()  1, 2, 2, 3 PERCENT_RANK()  percentage, [0..1] NO PARAMETERS! 47

V 1.0 LET'S TRY THOSE… SELECT ename, sal, RANK() over (ORDER BY sal desc) FROM emp; + DENSE_RANK(), PERCENT_RANK() 48

V 1.0 RANK WITHIN A GROUP SELECT deptno, ename, sal, RANK() OVER ( PARTITION BY deptno ORDER BY sal ) as RANG FROM emp; 49

V 1.0 RANK WITHIN A GROUP SELECT deptno, job, ename, sal, RANK() OVER ( PARTITION BY deptno, job ORDER BY sal ) as RANG FROM emp; + ORDER BY … 50

V 1.0 GROUPING FUNCTIONS WITH ANALYTICAL CLOSURES SELECT ename, sal, SUM(SAL) OVER (order by sal) as MySAL FROM emp; Ordered list! SELECT ename, sal, AVG(SAL) OVER (order by sal) as MySAL FROM emp; 51

V 1.0 GROUPING FUNCTIONS WITH ANALYTICAL CLOSURES SELECT deptno, ename, sal, SUM(SAL) OVER ( partition by deptno order by ename ) as MySum FROM emp ORDER BY deptno, ename; 52

V 1.0 GROUPING FUNCTIONS WITH ANALYTICAL CLOSURES alter session set nls_date_format='YYYY-MM-DD'; select ename, hiredate, sal from emp order by hiredate; select ename, hiredate, sal, sum(sal) over (order by hiredate) as TOTAL from emp order by hiredate; select ename, hiredate, sal, sum(sal) over (partition by to_char(hiredate, 'YYYY') order by hiredate) as TOTAL from emp order by hiredate; 53

V 1.0 SUBSET (Sliding window) SELECT ename, sal, avg(SAL) OVER ( order by sal rows between 1 preceding and 2 following ) as MyAvg FROM emp; 54

V 1.0 SUBSET (Sliding window) SELECT deptno, ename, sal, sum(SAL) OVER ( partition by deptno order by sal rows between 0 preceding and 1 following ) as MySum FROM emp; 55

V 1.0 SUBSET (Sliding window) We can use the RANGE keyword SELECT deptno, ename, sal, sum(SAL) OVER ( order by sal range between current row and unbounded following ) as MySum FROM emp; 56

V 1.0 OTHER ANALYTICAL FUNCTIONS FIRST_VALUE(), LAST_VALUE() RATIO_TO_REPORT()  Ratio compared to the sum value SELECT ename, sal, RATIO_TO_REPORT(sal) OVER () FROM emp ORDER BY sal desc;  + PARTITION BY 57

V

59