Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.

Slides:



Advertisements
Similar presentations
 Database is SQL1.mdb ◦ import using MySQL Migration Toolkit 
Advertisements

Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Lecture 7: Subqueries and Set Operations.
4c. Structured Query Language - Built-in Functions Lingma Acheson Department of Computer and Information Science IUPUI CSCI N207 Data Analysis with Spreadsheets.
4 Copyright © 2004, Oracle. All rights reserved. Reporting Aggregated Data Using the Group Functions.
5 Copyright © Oracle Corporation, All rights reserved. Aggregating Data Using Group Functions.
5 Copyright © 2007, Oracle. All rights reserved. Reporting Aggregated Data Using the Group Functions.
5 Copyright © Oracle Corporation, All rights reserved. Aggregating Data Using Group Functions.
4 การใช้ SQL Functions. Copyright © 2007, Oracle. All rights reserved What Are Group Functions? Group functions operate on sets of rows to give.
Chapter 11 Group Functions
LECTURE 10.  Group functions operate on sets of rows to give one result per group.
Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Lecture 7: Aggregates.
Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Lecture 7: Subqueries and Set Operations.
Relational Algebra on Bags A bag is like a set, but an element may appear more than once. –Multiset is another name for “bag.” Example: {1,2,1,3} is a.
Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Lecture 2: Single-Table Selections.
Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Lecture 6: Set Functions.
Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Lecture 3: Joins Part I.
Introduction to Oracle9i: SQL1 SQL Group Functions.
Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Lecture 7: Aggregates.
Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Lecture 5: Subqueries and Set Operations.
Basic SQL Select Commands. Basic Relational Query Operations Selection Projection Natural Join Sorting Aggregation: Max, Min, Sum, Count, Avg –Total –Sub.
Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Lecture 4: Joins Part II.
Using Relational Databases and SQL Department of Computer Science California State University, Los Angeles Lecture 7:
Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Lecture 2: Single-Table Selections.
A Guide to SQL, Seventh Edition. Objectives Retrieve data from a database using SQL commands Use compound conditions Use computed columns Use the SQL.
Computer Science 101 Web Access to Databases SQL – Extended Form.
1 Section 5 - Grouping Data u The GROUP BY clause allows the grouping of data u Aggregate functions are most often used with the GROUP BY clause u GROUP.
Database Programming Sections 5– GROUP BY, HAVING clauses, Rollup & Cube Operations, Grouping Set, Set Operations 11/2/10.
Using Relational Databases and SQL Department of Computer Science California State University, Los Angeles Lecture 8: Subqueries.
Xin  Syntax ◦ SELECT field1 AS title1, field2 AS title2,... ◦ FROM table1, table2 ◦ WHERE conditions  Make a query that returns all records.
Chapter 6 Group Functions. Chapter Objectives  Differentiate between single-row and multiple-row functions  Use the SUM and AVG functions for numeric.
Chapter 3 Single-Table Queries
SQL Unit 5 Aggregation, GROUP BY, and HAVING Kirk Scott 1.
 Continue queries ◦ You completed two tutorials with step-by-step instructions for creating queries in MS Access. ◦ Now must apply knowledge and skills.
1 ICS 184: Introduction to Data Management Lecture Note 10 SQL as a Query Language (Cont.)
1 CS 430 Database Theory Winter 2005 Lecture 12: SQL DML - SELECT.
1 Single Table Queries. 2 Objectives  SELECT, WHERE  AND / OR / NOT conditions  Computed columns  LIKE, IN, BETWEEN operators  ORDER BY, GROUP BY,
4 Copyright © 2004, Oracle. All rights reserved. Reporting Aggregated Data Using the Group Functions.
Structured Query Language. Group Functions What are group functions ? Group Functions Group functions operate on sets of rows to give one result per group.
Day 13, Slide 1 U:/msu/course/cse/103 CSE 103 Students: Review INNER and OUTER JOINs, Subqueries. Others: Please save your.
Copyright © Curt Hill Queries in SQL More options.
Advanced SELECT Queries CS 146. Review: Retrieving Data From a Single Table Syntax: Limitation: Retrieves "raw" data Note the default formats… SELECT.
Queries SELECT [DISTINCT] FROM ( { }| ),... [WHERE ] [GROUP BY [HAVING ]] [ORDER BY [ ],...]
DATA RETRIEVAL WITH SQL Goal: To issue a database query using the SELECT command.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 7 (Part II) INTRODUCTION TO STRUCTURED QUERY LANGUAGE (SQL) Instructor.
SQL Aggregation Oracle and ANSI Standard SQL Lecture 9.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
SqlExam1Review.ppt EXAM - 1. SQL stands for -- Structured Query Language Putting a manual database on a computer ensures? Data is more current Data is.
Single-Table Queries 2: Advanced Topics CS 320. Review: Retrieving Data From a Single Table Syntax: Limitation: Retrieves "raw" data SELECT field1, field2,
Agenda for Class - 03/04/2014 Answer questions about HW#5 and HW#6 Review query syntax. Discuss group functions and summary output with the GROUP BY statement.
Sorting data and Other selection Techniques Ordering data results Allows us to view our data in a more meaningful way. Rather than just a list of raw.
Aggregating Data Using Group Functions. What Are Group Functions? Group functions operate on sets of rows to give one result per group.
COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!
7 1 Database Systems: Design, Implementation, & Management, 7 th Edition, Rob & Coronel 7.6 Advanced Select Queries SQL provides useful functions that.
More SQL: Complex Queries,
Chapter 3 Introduction to SQL(3)
Using Relational Databases and SQL
Chapter 5: Aggregate Functions and Grouping of Data
CS 405G: Introduction to Database Systems
Aggregating Data Using Group Functions
SQL – Entire Select.
Chapter 4 Summary Query.
Access: SQL Participation Project
CS122 Using Relational Databases and SQL
Reporting Aggregated Data Using the Group Functions
Section 4 - Sorting/Functions
Joins and other advanced Queries
Reporting Aggregated Data Using the Group Functions
Reporting Aggregated Data Using the Group Functions
分组函数 Schedule: Timing Topic 35 minutes Lecture 40 minutes Practice
Presentation transcript:

Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions

Topics for Today Set (Aggregate) Functions GROUP BY Clause HAVING Clause

Set Functions Definition A set function, or group aggregate function, is a function that operates on groups

Aggregate Functions Aggregate/Non-aggregate similarities Both take some kind of input Both perform operations using the input Both have an single output. Aggregate/Non-aggregate differences Input to an aggregate function is a group of data Input to a non-aggregate function is a single item Aggregate functions may not be nested Aggregate functions do not alter any table data

Examples Function Example: SELECT LEFT(Title, 1) FROM Movies; Set Function Example: SELECT MPAA, COUNT(MPAA) FROM Movies GROUP BY MPAA;

Aggregate Functions There are only 5 general aggregate functions COUNT(*), COUNT(fieldname)‏ AVG(fieldname)‏ MIN(fieldname) MAX(fieldname)‏ SUM(fieldname)‏

COUNT COUNT(*)‏ Counts the number of rows in a table Excludes NULLs (doesn't count them)‏ -- This query returns 6. SELECT COUNT(*) AS 'Number of Movies' FROM Movies; COUNT(fieldname)‏ Same as above -- This query also returns 6. SELECT COUNT(ArtistID) AS 'Number of Movies' FROM Movies;

AVG AVG(fieldname)‏ Averages all the data under fieldname Excludes NULLs (doesn't count NULL as 0). -- Averages all movie runtimes. SELECT AVG(Runtime) AS 'Average Runtime' FROM Movies;

MIN and MAX MIN(fieldname)‏ Returns the minimum value under fieldname -- Returns the minimum movie runtime. SELECT MIN(Runtime) AS 'Shortest Runtime' FROM Movies; MAX(fieldname)‏ Returns the maximum value under fieldname -- Returns the maximum movie runtime. SELECT MAX(Runtime) AS 'Longest Runtime' FROM Movies;

SUM SUM(fieldname)‏ Sums all the data under fieldname Excludes NULLs (doesn't count NULL as 0). -- Sums all of the movie runtimes. SELECT SUM(Runtime) AS 'Total Runtime' FROM Movies;

Filtering Aggregate Calculations To exclude items from being aggregated, you may use the WHERE clause. Example: Count the number of PG-13 movies. SELECT COUNT(*) FROM Movies WHERE MPAA = 'PG-13'; Example: Count the number of rated R movies. SELECT COUNT(*) FROM Movies WHERE MPAA = 'R';

Mixing Field Types Can we calculate both with a single query? | MPAA | COUNT(*) | | PG-13 | 5 | | R | 1 | rows in set (0.01 sec) Well, we would need to mix non-aggregated fieldnames with aggregated ones -- Example: What does this do? Does it work? No! SELECT MPAA, COUNT(MPAA) FROM Movies;

Grouping Tables Solution: You can divide the table into groups. -- Groups the movies table by MPAA rating. SELECT MPAA FROM Movies GROUP BY MPAA; -- Groups and counts movies by MPAA rating. SELECT MPAA, COUNT(MPAA) FROM Movies GROUP BY MPAA;

How GROUP BY Works GROUP BY begins by sorting the table based on the grouping attribute (in our case, Gender)‏ If any aggregates are present, GROUP BY causes each aggregate to be applied per-group rather than per-table GROUP BY then condenses the table so that each group only appears once in the table (if listed) and displays any aggregated group values along with it

GROUP BY Example

Grouping on Multiple Fields GROUP BY can use multiple fieldnames (similar to how you can sort using multiple fieldnames)‏ -- Example: Report the number of movies by MPAA rating and year of release. SELECT MPAA, YEAR(ReleaseDate), COUNT(*) FROM Movies GROUP BY MPAA, YEAR(ReleaseDate); In the SELECT clause that contains one or more aggregates, you should only list table attributes that are als

Filtering Based on Aggregates Can we use aggregate functions in the WHERE clause? -- List all genres that have an average movie runtime of over 2 hours. SELECT Genre, COUNT(*), AVG(Runtime) FROM Movies JOIN XRefGenresMovies USING(MovieID) WHERE AVG(Runtime) > 120 GROUP BY Genre; The answer is no because WHERE filters during aggregation! We need something that filters after!

The HAVING Clause Solution is to use the HAVING clause Example: -- List all genres that have an average movie runtime of over 2 hours. SELECT Genre, COUNT(*), AVG(Runtime) FROM Movies JOIN XRefGenresMovies USING(MovieID) GROUP BY Genre HAVING AVG(Runtime) > 120;

How HAVING Works In previous example: This is calculated first... SELECT Genre, COUNT(*), AVG(Runtime) FROM Movies JOIN XRefGenresMovies USING(MovieID) GROUP BY Genre; Then the result is filtered using the HAVING clause... SELECT Genre, COUNT(*), AVG(Runtime) FROM Movies JOIN XRefGenresMovies USING(MovieID) GROUP BY Genre HAVING AVG(Runtime) > 120;

How HAVING Works So in other words: WHERE filters per row (DURING aggregation)‏ HAVING filters per group (AFTER aggregation)‏ Since HAVING filters on groups: You cannot use just any fieldname you want to in the SELECT or HAVING clause with an aggregate query; you can only the use ones you choose to group by Example on next page...

Having Examples This works: SELECT Genre, COUNT(*), AVG(Runtime) FROM Movies JOIN XRefGenresMovies USING(MovieID) GROUP BY Genre HAVING AVG(Runtime) > 120; This doesn't work: SELECT Genre, COUNT(*), AVG(Runtime) FROM Movies JOIN XRefGenresMovies USING(MovieID) GROUP BY Genre HAVING AVG(Runtime) > Runtime; HAVING only sees group attributes and aggregates.

Having Examples Why doesn't it work? Because Runtime is an attribute of a movie and not an attribute of a group. You can only use group attributes and aggregate functions in a HAVING clause. Since Genre is an attribute of the aggregated group (Genre is listed in the GROUP BY clause), we can use it in the HAVING clause. SELECT Genre, COUNT(*), AVG(Runtime) FROM Movies JOIN XRefGenresMovies USING(MovieID) GROUP BY Genre HAVING (AVG(Runtime) > 120 AND Genre <> ‘Horror’);

HAVING Summary So in a HAVING clause: You can use aggregate functions You can use constant values You can use grouping attributes Anything else and... Happy error time! Usually “ERROR 1111 (HY000): Invalid use of group function” or “ERROR 1054 (42S22): Unknown column 'column name' in having clause” are the most common errors.

An Advanced HAVING Problem List the country and average age of all (movie- related) people born in that country, for only those countries that have an average person age greater than 50. Remember that nobody every says “I'm years old!” Always truncate ages to zero decimal places.

Solution SELECT BirthCountry, TRUNCATE(AVG(TRUNCATE(DATEDIFF(C urDate(), BirthDate)/365, 0)), 0) AS 'Average Age' FROM People GROUP BY BirthCountry HAVING TRUNCATE(AVG(TRUNCATE(DATEDIFF(C urDate(), BirthDate)/365, 0)), 0) > 50 AND BirthCountry IS NOT NULL;

Solution Note that you may also define an alias for the aggregate function in MySQL and use it in the HAVING clause SELECT BirthCountry, TRUNCATE(AVG(TRUNCATE(DATEDIFF(C urDate(), BirthDate)/365, 0)), 0) AS AverageAge FROM People GROUP BY BirthCountry HAVING AverageAge > 50 AND BirthCountry IS NOT NULL;

Aggregating Distinct Values A normal SELECT DISTINCT query filters out duplicates in a second pass Aggregates are computed in the first pass, so if a field contains duplicate values, and you aggregate on that field, SELECT DISTINCT WILL NOT filter out duplicate values from being aggregated. The solution is to use the DISTINCT keyword in the aggregate function: SELECT COUNT(DISTINCT MPAA) FROM Movies;

Aggregating Distinct Values Example: -- Returns 6 since there are 6 movies. SELECT COUNT(MPAA) FROM Movies; -- Returns 6 since there are 6 movies and 6 is unique. SELECT DISTINCT COUNT(MPAA) FROM Movies; -- Returns 2 since only PG-13 and R rated movies are currently in the database. SELECT COUNT(DISTINCT MPAA) FROM Movies;