SQL Unit 5 Aggregation, GROUP BY, and HAVING Kirk Scott 1.

Slides:



Advertisements
Similar presentations
Concepts of Database Management Seventh Edition
Advertisements

Concepts of Database Management Sixth Edition
Concepts of Database Management Seventh Edition
SQL Subqueries Objectives of the Lecture : To consider the general nature of subqueries. To consider simple versus correlated subqueries. To consider the.
Bordoloi and Bock Chapter 5 : Aggregate Row Functions.
Chapter 11 Group Functions
The University of Akron Dept of Business Technology Computer Information Systems The Relational Model: Query-By-Example (QBE) 2440: 180 Database Concepts.
Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Lecture 6: Set Functions.
Structured Query Language Part I Chapter Three CIS 218.
A Guide to SQL, Seventh Edition. Objectives Retrieve data from a database using SQL commands Use compound conditions Use computed columns Use the SQL.
Microsoft Access 2010 Chapter 7 Using SQL.
SELECT Advanced. Sorting data in a table The ORDER BY clause is used for sorting the data in either ascending or descending order depending on the condition.
1 Section 5 - Grouping Data u The GROUP BY clause allows the grouping of data u Aggregate functions are most often used with the GROUP BY clause u GROUP.
Enhancements to the GROUP BY Clause Fresher Learning Program January, 2012.
Database Programming Sections 5– GROUP BY, HAVING clauses, Rollup & Cube Operations, Grouping Set, Set Operations 11/2/10.
Concepts of Database Management, Fifth Edition
Relational DBs and SQL Designing Your Web Database (Ch. 8) → Creating and Working with a MySQL Database (Ch. 9, 10) 1.
Chapter 3 Single-Table Queries
Microsoft Access 2010 Chapter 7 Using SQL. Change the font or font size for SQL queries Create SQL queries Include fields in SQL queries Include simple.
1 CS 430 Database Theory Winter 2005 Lecture 12: SQL DML - SELECT.
CSC271 Database Systems Lecture # 12. Summary: Previous Lecture  Row selection using WHERE clause  WHERE clause and search conditions  Sorting results.
Extending the Definition of Exponents © Math As A Second Language All Rights Reserved next #10 Taking the Fear out of Math 2 -8.
SQL Unit 8 Subqueries with IN, Joins, and Other Topics Kirk Scott 1.
1 Single Table Queries. 2 Objectives  SELECT, WHERE  AND / OR / NOT conditions  Computed columns  LIKE, IN, BETWEEN operators  ORDER BY, GROUP BY,
Concepts of Database Management Seventh Edition
Using Special Operators (LIKE and IN)
Views Lesson 7.
NULLs & Outer Joins Objectives of the Lecture : To consider the use of NULLs in SQL. To consider Outer Join Operations, and their implementation in SQL.
SQL Unit 3 Joins Kirk Scott Qualified Field Names and Table Aliases 3.2 Joining Two Tables 3.3 Three-Way Joins and Joining a Table with Itself.
Intro to SQL Management Studio. Please Be Sure!! Make sure that your access is read only. If it isn’t, you have the potential to change data within your.
Database Applications – Microsoft Access Lesson 4 Working with Queries 36 Slides in Presentation.
SQL Unit 9 Correlated Subqueries Kirk Scott 1. 2.
SQL Unit 6 Set Operations Kirk Scott. 6.1 Introduction 6.2 UNION Queries 6.3 Queries with IN (Intersection) 6.4 Queries with NOT IN (Set Subtraction)
DATA RETRIEVAL WITH SQL Goal: To issue a database query using the SELECT command.
Instructor: Craig Duckett Lecture 03: Tuesday, April 14, 2015 SQL Sorting, Aggregates and Joining Tables 1.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 7 (Part II) INTRODUCTION TO STRUCTURED QUERY LANGUAGE (SQL) Instructor.
SQL Unit 7 Set Operations Kirk Scott Introduction 7.2 UNION Queries 7.3 Queries with IN (Intersection) 7.4 Queries with NOT IN (Set Subtraction)
Concepts of Database Management Seventh Edition Chapter 3 The Relational Model 2: SQL.
Views, Algebra Temporary Tables. Definition of a view A view is a virtual table which does not physically hold data but instead acts like a window into.
Computer Science & Engineering 2111 Lecture 13 Outer Joins 1.
DAY 18: MICROSOFT ACCESS – CHAPTER 3 CONTD. Akhila Kondai October 21, 2013.
A Guide to SQL, Eighth Edition Chapter Four Single-Table Queries.
Aggregating Data Using Group Functions. What Are Group Functions? Group functions operate on sets of rows to give one result per group.
1 Chapter 3 Single Table Queries. 2 Simple Queries Query - a question represented in a way that the DBMS can understand Basic format SELECT-FROM Optional.
Tarik Booker CS 122. What we will cover… Tables (review) SELECT statement DISTINCT, Calculated Columns FROM Single tables (for now…) WHERE Date clauses,
Retrieving Information Pertemuan 3 Matakuliah: T0413/Current Popular IT II Tahun: 2007.
Concepts of Database Management, Fifth Edition Chapter 3: The Relational Model 2: SQL.
SQL Query Getting to the data ……..
SQL Unit 7 Set Operations
Prof: Dr. Shu-Ching Chen TA: Hsin-Yu Ha
Aggregating Data Using Group Functions
Chapter 5: Aggregate Functions and Grouping of Data
Aggregating Data Using Group Functions
(SQL) Aggregating Data Using Group Functions
Prof: Dr. Shu-Ching Chen TA: Yimin Yang
Prof: Dr. Shu-Ching Chen TA: Hsin-Yu Ha
Aggregating Data Using Group Functions
Chapter 4 Summary Query.
Prof: Dr. Shu-Ching Chen TA: Haiman Tian
Aggregating Data Using Group Functions
Aggregating Data Using Group Functions
Reporting Aggregated Data Using the Group Functions
Section 4 - Sorting/Functions
Joins and other advanced Queries
Reporting Aggregated Data Using the Group Functions
Reporting Aggregated Data Using the Group Functions
分组函数 Schedule: Timing Topic 35 minutes Lecture 40 minutes Practice
Shelly Cashman: Microsoft Access 2016
Aggregating Data Using Group Functions
Presentation transcript:

SQL Unit 5 Aggregation, GROUP BY, and HAVING Kirk Scott 1

5.1 Grouping By One Field 5.2 Grouping By More than One Field 5.3 GROUP BY with HAVING 5.4 More on Nulls 2

5.1 Grouping By One Field 3

1. Recall that the term aggregation referred to built-in functions like these: COUNT, SUM, AVG, MAX, MIN, etc. The results of such a function are based on the contents of more than one row in a table. 4

A simple example of the use of such a function would be: SELECT SUM(salesprice) FROM Carsale This would find the sum of the salesprices of all of the cars listed in the Carsale table. 5

2. Remember also that the records in the Carsale table include the spno, and it is possible to write a query that orders the results of a query by that field: SELECT * FROM Carsale ORDER BY spno 6

3. What if you would like the subtotals of the sums of the salesprices for the cars sold by each salesperson? This would involve finding a SUM, and it would also depend on the spno Both of these fields are in the Carsale table. Here is a query that accomplishes this: SELECT spno, SUM(salesprice) FROM Carsale GROUP BY spno 7

This query will give the subtotal for each spno in the Carsale table. There will be only one row for each spno in the results of the query. In a sense when you GROUP BY, it is like having the keyword DISTINCT in the query. 8

The aggregate functions ignore nulls, but GROUP BY does not If any sales records had null spno's, the query results would also include a row where the sum of the salesprices for such records appeared. However, in calculating the sums, null values for salesprice would still be ignored 9

The keyword GROUP has this in common with the keyword ORDER: The results of this query will be sorted by the spno values. 10

4. Here is another example, using COUNT, where the function is applied to * rather than to a single field in the table. The results will be what you would expect —the count of the number of car sales by each salesperson: SELECT spno, COUNT(*) FROM Carsale GROUP BY spno 11

Recall that the meaning of COUNT(*) is to count all of the records where any of the fields are non-null. None of the records can be all null, so this counts all records. GROUP BY will include in the results a group that counts how many records had a null spno, if there were any such records. 12

5. It's not necessary to include the GROUP BY field in the query results. These results may not be very useful, but this query is syntactically OK: SELECT SUM(salesprice) FROM Carsale GROUP BY spno 13

On the other hand, there are limitations on what fields can be included in the results of a GROUP BY query. A query like this is wrong: SELECT spno, custno, SUM(salesprice) FROM Carsale GROUP BY spno 14

The reason is simple. By definition, there will only be one row per spno in the results of the query. However, it is possible that there would be more than one custno per salesperson. 15

It would not be possible to show the multiple custno's belonging to a single spno, so this is not allowed. It's true that in some cases there may only be one custno for a given spno, but even so, the syntax will not support exceptions like these. 16

The bottom line is that in a GROUP BY query, the SELECT can include at most the GROUP BY field and the field that the aggregate is calculated on. 17

6. It is possible to use GROUP BY and ORDER BY together in a single query. This is a simple, practical example. It illustrates the fact that you can order the results by the aggregate if you want to. Recall that the default order is by the GROUP BY field. SELECT spno, COUNT(*) FROM Carsale GROUP BY spno ORDER BY COUNT(*) 18

5.2 Grouping By More than One Field 19

1. It is also possible to GROUP BY more than one field at a time in a query. For example: SELECT make, model, SUM(stickerprice) FROM Car GROUP BY make, model 20

This query will give the sum of the stickerprices for every possible combination of make and model. Each of these combinations will appear only once in the results. Again, the effect is similar to having the keyword DISTINCT in a query. 21

The results would also include rows for the three cases where either the make, model, or both fields were null in the original records in the Car table. No fields other than make and model (and the aggregate) could be included in the select clause. Also, both make and model are optional in the SELECT, although in most cases the query results would probably be more useful if they were included. 22

2. It is again useful to compare the GROUP BY query with the analogous ORDER BY query: SELECT make, model FROM Car ORDER BY make, model 23

In this query the primary sort key is make and the secondary sort key is model. The results of the query will show every combination of make and model that occurs in the Car table sorted first by make, and within make by model. The corresponding GROUP BY query will show the sums of the stickerprices for every combination of make and model in the table and the results will be given in the same order as the ORDER BY query. 24

3. Observe that it would also be possible to write queries where the order that the fields are selected is changed. The sums for the various combinations of make and model wouldn't change, but the orders of the columns and rows in the results would change. The first example would put the model column before the make column, but the sort order of the rows would be the same as in the previous example. 25

It is conceivable that someone might want to write a query like this: SELECT model, make, SUM(stickerprice) FROM Car GROUP BY make, model 26

The second example would put the make column first and the model column second, but the sort order has been changed to sort first by model and than by make. It seems unlikely that anyone would write the query in this way intentionally, but it is possible that all they're interested in is the sum for each combination of make and model and the sort order doesn't make a difference. 27

In any case, it's syntactically OK: SELECT make, model, SUM(stickerprice) FROM Car GROUP BY model, make 28

4. It bears repeating that including a GROUP BY field in the SELECT is optional. For example, the following example would be OK. The results will only show the make and sum in each row, but there will be a row for each combination of make and model: SELECT make, SUM(stickerprice) FROM Car GROUP BY make, model 29

It also bears repeating that it is not possible to include in the SELECT any fields except for the aggregate field and the fields in the GROUP BY. This is because there may be multiple values for the additional field for each combination of the GROUP BY fields. 30

For example, this query is wrong: SELECT make, model, year, SUM(stickerprice) FROM Car GROUP BY make, model 31

5. It is always possible to specify an order for the results of a query in addition to doing GROUP BY. This example is kind of silly, because it simply accomplishes what could be accomplished by putting the fields in the GROUP BY in the other order. But it does illustrate how the syntax for ORDER BY will override the ordering that otherwise would be used by GROUP BY: SELECT make, model, SUM(stickerprice) FROM Car GROUP BY model, make ORDER BY make, model 32

This example illustrates a more practical use of the syntax. Notice again that it's possible to use the aggregate function in the ORDER BY: SELECT make, model, SUM(stickerprice) FROM Car GROUP BY make, model ORDER BY SUM(stickerprice) DESC 33

5.3 GROUP BY with HAVING 34

1. In a simple query, a WHERE clause causes the SELECT to pick out only certain sets of records in a table based on a condition on the value of an individual field. This is known as a selection or a restriction. It might also be called a refinement of the query's results. A query with a WHERE clause will potentially give as its results a subset of the results that would be returned by the same query without the WHERE clause. 35

In a query with GROUP BY, the HAVING clause can be used to achieve similar results as the WHERE clause in a simple query. In other words, it can be used to restrict the results based on the results of the aggregate function in the query. 36

For example, this query will show the spno's and the sums of the salesprices of cars that they sold, but only for those salespeople who sold a total of at least dollars worth of cars overall: SELECT spno, SUM(salesprice) FROM Carsale GROUP BY spno HAVING SUM(salesprice) >=

Here is another straightforward example which will find the salespeople and the counts of the numbers of cars they sold, if they sold more than 4 cars: SELECT spno, COUNT(*) FROM Carsale GROUP BY spno HAVING COUNT(*) > 4 38

2. For better or worse, the HAVING clause can also be applied to the GROUP BY field or fields. So, for example, this query is possible. It will find the sum of the stickerprices for all of the Chevrolets and only the Chevrolets. 39

There will be only one row in the results: SELECT make, SUM(stickerprice) FROM Car GROUP BY make HAVING make = 'Chevrolet' 40

There is nothing wrong with the previous example, but the following alternative may be preferable. It is possible to have both WHERE and GROUP BY in the same query, and it might be helpful to use WHERE instead of HAVING whenever that is possible. 41

Here is a query that has the same results as the previous one: SELECT make, SUM(stickerprice) FROM Car WHERE make = 'Chevrolet' GROUP BY make 42

Keep in mind that it is possible to do inequalities on text fields. This query would find the sums of the stickerprices for all makes whose names appear after Chevrolet in alphabetical order: SELECT make, SUM(stickerprice) FROM Car WHERE make > 'Chevrolet' GROUP BY make 43

3. It is possible to have both a condition on a GROUP BY field (a non-aggregate field) and the aggregate field in a query. Again, it may be helpful to keep them straight by using WHERE for the condition on the GROUP BY field. You have to use HAVING on the aggregate field in any case. 44

So, for example, this query will find the makes and the sums of their stickerprices for makes that appear after Chevrolet in alphabetical order, and whose stickerprice sums are greater than or equal to Notice that even though the word "and" appears in the verbal description, the keyword AND does not belong in the syntax of a correct query implementing this: 45

SELECT make, SUM(stickerprice) FROM Car WHERE make > 'Chevrolet' GROUP BY make HAVING SUM(stickerprice) >=

4. All of the examples so far have concentrated on conditions on the group by fields or the aggregate. As usual, most things in SQL mix and match. It is also possible to have a condition on any field or fields. 47

For example: SELECT make, SUM(stickerprice) FROM Car WHERE make > 'Chevrolet‘ AND year > 2005 GROUP BY make HAVING SUM(stickerprice) >=

5. The ability to mix and match extends to joins. It is possible to have a join query where the grouping is done on the field of one table, while the aggregate is done on a field of the other table. Such a query could also include the keyword HAVING as well as other elements of SQL queries unrelated to grouping. 49

This last example dispenses with HAVING and where clauses except for the joining condition in order to clearly illustrate doing a join and GROUP BY together. SELECT commrate, SUM(salesprice) FROM Salesperson, Carsale WHERE Salesperson.spno = Carsale.spno GROUP BY commrate 50

5.4 More on Nulls 51

1. For the purposes of the following discussion, here are the contents of the Salesperson table: Salesperson spnonameaddrcitystatephonebossnocommrate 111Fred Flintstone123 C StreetAnchorageAK Wile E. Coyote456 KarlukAnchorageAK Bugs Bunny789 OtisAnchorageAK Rocky the Squirrel345 TudorAnchorageAK Yosemite Sam678 MuldoonAnchorageAK

The Salesperson table is the table in the example database which includes nulls. Recall that the aggregate functions, COUNT, SUM, AVG, MAX, MIN and so on, ignore nulls. The following query will return a result of 4: SELECT COUNT(commrate) FROM Salesperson 53

The following query will return an average calculated by dividing by 4 rather than 5: SELECT AVG(commrate) FROM Salesperson 54

If you want to make sure that nulls are included, you have to use the NZ function. For sums, if nulls are treated as zero, this won't make a difference, but for counts and averages, it will. In the query below the average will be calculated by dividing by 5 rather than 4: SELECT AVG(NZ(commrate, 0)) FROM Salesperson 55

2. The thing to remember is that GROUP BY will include a group for null values even though the aggregate functions ignore nulls. Focus on the last two columns in the Salesperson table and consider this query: SELECT bossno, AVG(commrate) FROM Salesperson GROUP BY bossno 56

This is what the results look like: Query1 bossnoExpr E-02 57

There is nothing surprising here. GROUP BY returns a row for the case where bossno is null. There is only one record that meets this condition, and the commrate for that salesperson is That means that the average is also

GROUP BY also returns a row for the case where bossno equals 333, which happens to be a group where 4 records in the Salesperson table have that value. 3 of those 4 records in the Salesperson table have non-null commrates. The average for them is calculated as ( ) / 3, giving the value shown above. The null value is ignored both in the sum in the numerator and in the count in the denominator. 59

Other examples could be devised. The point simply is that you need to keep this in mind: GROUP BY will return rows for those cases where the GROUP BY fields are null. However, the aggregate functions still do ignore nulls in the aggregate fiels, unless you include NZ in the expression. 60

The End 61