Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database Programming Sections 5– GROUP BY, HAVING clauses, Rollup & Cube Operations, Grouping Set, Set Operations 11/2/10.

Similar presentations


Presentation on theme: "Database Programming Sections 5– GROUP BY, HAVING clauses, Rollup & Cube Operations, Grouping Set, Set Operations 11/2/10."— Presentation transcript:

1 Database Programming Sections 5– GROUP BY, HAVING clauses, Rollup & Cube Operations, Grouping Set, Set Operations 11/2/10

2 GROUP BY Clause Use the Group By clause to divide the rows in a table into groups than apply. The Group Functions to return summary information about that group In the example below, the rows are being GROUPed BY department_id. The AVG(group function) is then applied to each GROUP, or department_id. SELECT department_id, AVG(salary) FROM employees GROUP BY department_id; Chapter 6 Implicit order by, with group by – it is ascending by default Don’t need Order By clause if sort on group by field May have if Order By other fields. Marge Hohly

3 Results of previous query
SELECT department_id, AVG(salary) FROM employees GROUP BY department_id; SELECT department_id, ROUND((AVG(salary)),1) Marge Hohly

4 GROUP BY Clause With aggregate (group) functions in the SELECT clause, be sure to include individual columns from the SELECT clause in a GROUP BY clause!!! SELECT department_id, job_id, SUM(salary) FROM employees WHERE hire_date <= ’01-JUN-00’ GROUP BY department_id, job_id; SELECT d.department_id,d.department_name, MIN(e.hire_date) AS “Min Date” FROM departments d, employees e WHERE e.department_id=d.department_id GROUP BY ????? The correct statement SELECT d.department_id, d.department_name, MIN(e.hire_date) AS "Min Date" FROM departments d, employees e WHERE e.department_id = d.department_id GROUP BY d.department_id,d.department_name All single fields not in aggregate functions must be included in the Group By clause or it will not run. Demonstrate with only one of the columns not in the group function and show it fails. Marge Hohly

5 GROUP BY Rules... Use the Group By clause to divide the rows in a table into groups then apply the Group Functions to return summary information about that group. If the Group By clause is used, all individual columns in the SELECT clause must also appear in the GROUP BY clause. Columns in the GROUP BY clause do not have to appear in the SELECT clause No column aliases can be used in the Group By clause. Use the ORDER BY clause to sort the results other than the default ASC order. The WHERE clause, if used, can not have any group functions – use it to restrict any columns in the SELECT clause before they are divided into groups. Use the HAVING clause to restrict groups not the WHERE clause. Use a WHERE clause to exclude rows before the remaining rows are formed into groups. SELECT department_id, MAX(salary) FROM employees WHERE last_name <> ‘King’ GROUP BY department_id; Marge Hohly

6 GROUP BY examples: Show the average graduation rate of the schools in several cities; include only those students who have graduated in the last few years SELECT AVG(graduation_rate), city FROM students WHERE graduation_date >= ’01-JUN-07’ GROUP BY city; Count the number of students in the school, grouped by grade; include all students SELECT COUNT(first_name), grade FROM students GROUP BY grade; Marge Hohly

7 GROUP BY Guidelines Important guidelines to remember when using a GROUP BY clause are: If you include a group function (AVG, SUM, COUNT, MAX, MIN, STDDEV, VARIANCE) in a SELECT clause and any other individual columns, each individual column must also appear in the GROUP BY clause. You cannot use a column alias in the GROUP BY clause. The WHERE clause excludes rows before they are divided into groups. Marge Hohly

8 GROUPS WITHIN GROUPS Sometimes you need to divide groups into smaller groups. For example, you may want to group all employees by department; then, within each department, group them by job. This example shows how many employees are doing each job within each department. SELECT department_id, job_id, count(*) FROM employees WHERE department_id > 40 GROUP BY department_id, job_id; Dept_ID JOB_Idq COUNT(*) 50 ST_MAN 1 ST_CLERK 4 60 IT_PROG 3 80 SA_MAN SA_REP 2 Marge Hohly

9 Nesting Group functions
Group functions can be nested to a depth of two when GROUP BY is used. SELECT max(avg(salary)) FROM employees GROUP by department_id; How many values will be returned by this query? The answer is one – the query will find the average salary for each department, and then from that list, select the single largest value. Marge Hohly

10 The HAVING Clause With the HAVING clause the Oracle Server:
Having is used to restrict groups. Applies group function to the group(s). Displays the groups that match the criteria in the HAVING clause. SELECT department_id, job_id, SUM(salary) FROM employees WHERE hire_date <= ’01-JUN-00’ GROUP BY department_id, job_id HAVING department_id >50; When using the grouped by and Having, the group by functions are applied and then only those groups matching the HAVING clause are displayed. The Where clause is used to restrict rows; the Having clause is used to restrict groups returned by a GROUP BY clause. SELECT department_id, MAX(salary) FROM employees GROUP BY department_id HAVING COUNT(*) >1; Marge Hohly

11 The HAVING clause example
SELECT department_id, job_id, SUM(salary) FROM employees WHERE hire_date <= ’01-JUN-00’ GROUP BY department_id, job_id HAVING department_id >50; Marge Hohly

12 WHERE or HAVING ?? The WHERE clause is used to restrict rows.
SELECT department_id, MAX(salary) FROM employees WHERE department_id>=20 GROUP BY department_id; The HAVING clause is used to restrict groups returned by a GROUP BY clause. SELECT department_id, MAX(salary) FROM employees GROUP BY department_id HAVING MAX(salary)> 1000; Marge Hohly

13 HAVING Although the HAVING clause can precede the GROUP BY clause in a SELECT statement, it is recommended that you place each clause in the order shown. The ORDER BY clause (if used) is always last! SELECT column, group_function FROM table WHERE GROUP BY HAVING ORDER BY Marge Hohly

14 GROUP BY Extensions When using GROUP BY functions and you want to extend them you can add the ROLLUP, CUBE, & GROUPING SETS. These extensions make your life a lot easier and more efficient Chapter 6 Lesson 2 Marge Hohly

15 ROLLUP In GROUP BY queries you are quite often required to produce subtotals and totals, and the ROLLUP operation can do that for you. Action of ROLLUP is straightforward: it creates subtotals that roll up from the most detailed level to a grand total, following a grouping list specified in the ROLLUP clause. ROLLUP takes as its argument an ordered list of grouping columns. it calculates the standard aggregate values specified in the GROUP BY clause. it creates progressively higher-level subtotals, moving from right to left through the list of grouping columns it creates a grand total. This is a completely new concept for the students. Spend some time explaining how GROUP BY reports using SUM and COUNT are normally used for in the financial departments in all companies all over the world. It is quite normal that managers want not only the sum of salaries for a job role per department, but they probably also want the total per department and the total for all departments. Without using the ROLLUP operator that kind of requirement would mean writing several queries and then enter the results in for instance an excel spreadsheet to make it look good. ROLLUP creates subtotals that roll up from the most detailed level to a grand total, in the grouping list specified in the GROUP BY clause. Marge Hohly

16 This slide shows an example of the ROLLUP syntax along with the result of the statement. The rows in red in the result table are the superaggregate rows, or the rows created by the ROLLUP operation. Marge Hohly

17 CUBE CUBE is an extension to the GROUP BY clause like ROLLUP. It produces cross-tabulation reports. It can be applied to all aggregate functions including AVG, SUM, MIN, MAX and COUNT. Columns listed in the GROUP BY clause are cross-referenced to create a superset of groups. The aggregate functions specified in the SELECT list are applied to this group to create summary values for the additional super-aggregate rows. Every possible combination of rows is aggregated by CUBE. If you have n columns in the GROUP BY clause, there will be 2n possible super-aggregate combinations. Mathematically these combinations form an n-dimensional cube, which is how the operator got its name. Another new concept for the students but closely related to ROLLUP. CUBE is an extension to GROUP BY, in that is aggregates the same rows as ROLLUP, but then is also aggregates the aggregated rows, resulting in a report where every column mentioned in the GROUP BY clause has a super-aggregated result row created. Marge Hohly

18 CUBE (Cont.) CUBE is typically most suitable in queries that use columns from multiple tables rather than columns representing different rows of a single table. Imagine for example a user querying the Sales table for a company like AMAZON.COM. For instance, a commonly requested cross- tabulation might need subtotals for all the combinations of Month, Region and Product. These are three independent tables, and analysis of all possible subtotal combinations is commonplace. In contrast, a cross- tabulation showing all possible combinations of year, month and day would have several values of limited interest, because there is a natural hierarchy in the time table. Subtotals such as profit by day of month summed across year would be unnecessary in most analyses. Relatively few users need to ask "What were the total sales for the 16th of each month across the year?" This slide shows explains the concepts behind the CUBE statement Marge Hohly

19 CUBE (Cont.) This slide shows an example of CUBE and the result of a query using CUBE. As can be seen on the slide every column is cross-referenced with all other columns. Ask your students what the difference is between ROLLUP and CUBE. What does CUBE give us that ROLLUP does not? Marge Hohly

20 GROUPING SETS GROUPING SETS is another extension to the GROUP BY clause, like ROLLUP and CUBE. It is used to specify multiple groupings of data. It is like giving you the possibility to have multiple GROUP BY clauses in the same SELECT statement, which is not allowed in the syntax. The point of GROUPING SETS is that if you want to see data from the EMPLOYEES table grouped by (department_id, job_id , manager_id), but also by (department_id, manager_id) and also by (job_id, manager_id) then you would normally have to write 3 different select statements with the only difference being the GROUP BY clauses. For the database this means retrieving the same data in this case 3 times, and that can be quite a big overhead. Imagine if your company had 3,000,000 employees. Then you are asking the database to retrieve 9 million rows instead of just 3 million rows – quite a big difference. So GROUPING SETS are much more efficient when writing complex reports. One more extension to the GROUP BY clause is the concept of GROUPING SETS. This operation allows you to specify all the possible combinations of GROUP BY’ed groups you want from your query. In effect it is the same as writing: GROUP BY column 1, column 2, column 3 GROUP BY column 1, column 3 GROUP BY column 2, column 4 GROUP BY column 2, column 3 The above is not allowed, but instead we can write the query as GROUP BY GROUPING SETS ((column 1, column 2, column 3), (column 1, column 3), (column 2, column 4), (column 2, column 3)) Without using GROUPING SETS you would have to write 4 different queries and combine the results with the UNION statement, which, while simple to write, generates an overhead for the database, as it has to find, fetch and handle all the rows in the queries 4 times in the above example. Think of GROUPING SETS as a way of telling Oracle all the ways you want the data grouped by up front, and it will then go off and do it in one single pass over the data, as opposed to one pass over the data per GROUP BY clause. This is much more efficient. Marge Hohly

21 This slide shows an example of the GROUPING SETS syntax and the result of running a GROUPING SETS query with the different result sets in different colors to ease understanding. Ask your students what other grouping sets would have been possible and make sure you demonstrate any suggestions. Marge Hohly

22 Marge Hohly

23 Marge Hohly

24 GROUPING FUNCTIONS Continued.
The GROUPING function handles these problems. Using a single column from the query as its argument, GROUPING returns 1 when it encounters a NULL value created by a ROLLUP or CUBE operation. That is, if the NULL indicates the row is a subtotal, GROUPING returns a 1. Any other type of value, including a stored NULL, returns a 0. So the GROUPING function will return a 1 for an aggregated (computed) row and a 0 for a not aggregated (returned) row. The syntax for the GROUPING is simply GROUPING (column_name). It is used only in the SELECT clause and it takes only a single column expression as argument. Marge Hohly

25 GROUPING FUNCTIONS Continued.
SELECT department_id, job_id, SUM(salary), GROUPING(department_id) Dept_sub_total, DECODE(GROUPING(department_id), 1,'Dept Aggregate row', department_id) AS DT, GROUPING(job_id) job_sub_total, DECODE(GROUPING(job_id), 1,'JobID Aggregate row',job_id) AS JI FROM employees WHERE department_id < 50 GROUP BY CUBE (department_id, job_id) Marge Hohly

26 Marge Hohly

27 Marge Hohly

28 Marge Hohly

29 Marge Hohly

30 Group By Extensions: ROLLUP: Used to create subtotals that roll up from the most detailed level to a grand total, following a grouping list specified in the clause CUBE: An extension to the GROUP BY clause like ROLLUP that produces cross-tabulation reports. GROUPING SETS: Used to specify multiple groupings of data GROUPING FUNCTION: Used to return a value for an aggregated row and another value for a non aggregated row. Marge Hohly

31 What will be covered: Define and explain the purpose of Set Operators
Use a set operator to combine multiple queries into a single query Control the order of rows returned using set operators Marge Hohly

32 Why Set Operators Set operators are used to combine the results from different SELECT statements into one single result output. Sometimes you want a single output from more than one table. If you join the tables, you only get returned the rows that match, but what if you don’t want to do a join, or can’t do a join because a join will give the wrong result? This is where SET operators comes in. They can return the rows found in both statements, the rows that are in one table and not the other or the rows common to both statements Marge Hohly

33 This slide introduces the two tables that will be used throughout this lesson for demonstration purposes, and the rows they contain. Marge Hohly

34 Rules to Remember There are a few rules to remember when using SET operators: The number of columns and the data types of the columns must be identical in all of the SELECT statements used in the query. The names of the columns need not be identical. Column names in the output are taken from the column names in the first SELECT statement. So any column aliases should be entered in the first statement as you would want to see them in the finished report. The slide shows a few rules to remember when using the Set operators. Marge Hohly

35 This slide introduces the UNION
This slide introduces the UNION. Make sure you demonstrate both the result of joining A and B and also the query on the slide. Ask your students if all rows from both tables are returned and if not which ones are not returned? Marge Hohly

36 Like UNION but without removing duplicates.
Marge Hohly

37 Demonstrates the result of an INTERSECT statement
Demonstrates the result of an INTERSECT statement. Ask your students how else the result could have been achieved? Marge Hohly

38 The graph only displays the result of A MINUS B, not B MINUS B
The graph only displays the result of A MINUS B, not B MINUS B. Again, this can easily be demonstrated. Marge Hohly

39 SET Operator Examples Sometimes if you are selecting rows from tables that do not have columns in common, you may have to make up columns in order to match the queries. The easiest way to do this is to include one or more NULL values in the select list. Remember to give them suitable aliases and matching datatypes. For example: Table A contains a location id and a department name. Table B contains a location id and a warehouse name. You can use the TO_CHAR(NULL) function to fill in the missing columns as shown below. SELECT location_id, department_name "Department", TO_CHAR(NULL) "Warehouse" FROM departments UNION SELECT location_id, TO_CHAR(NULL) "Department", warehouse_name FROM warehouses; This slide introduces the students to the concept of matching the select lists in all the queries in Set operator statements. Marge Hohly

40 SET Operator Examples (Continued)
The keyword NULL can be used to match columns in a SELECT list. One NULL is included for each missing column. Furthermore, NULL is formatted to match the datatype of the column it is standing in for, so TO_CHAR, TO_DATE or TO_NUMBER functions are often used to achieve identical SELECT lists. Marge Hohly

41 Marge Hohly

42 Marge Hohly

43 Marge Hohly

44 Marge Hohly

45 Key terms SET operators: used to combine results into one single result from multiple SELECT statements UNION: operator that returns all rows from both tables and eliminates duplicates UNION ALL: operator that returns all rows from both tables, including duplicates INTERSECT: operator that returns rows common to both tables MINUS: operator that returns rows that are unique to each table TO_CHAR(null): columns that were made up to match queries in another table that are not in both tables Marge Hohly


Download ppt "Database Programming Sections 5– GROUP BY, HAVING clauses, Rollup & Cube Operations, Grouping Set, Set Operations 11/2/10."

Similar presentations


Ads by Google