COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

Slides:



Advertisements
Similar presentations
Advanced SQL (part 1) CS263 Lecture 7.
Advertisements

1 Advanced SQL Queries. 2 Example Tables Used Reserves sidbidday /10/04 11/12/04 Sailors sidsnameratingage Dustin Lubber Rusty.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 SQL: Queries, Programming, Triggers Chapter 5 Modified by Donghui Zhang.
SQL Part II: Advanced Queries. 421B: Database Systems - SQL Queries II 2 Aggregation q Significant extension of relational algebra q “Count the number.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 SQL: Queries, Programming, Triggers Chapter 5.
CS 405G: Introduction to Database Systems
SQL (2).
M ATH IN SQL. 222 A GGREGATION O PERATORS Operators on sets of tuples. Significant extension of relational algebra. SUM ( [DISTINCT] A): the sum of all.
Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.
SQL Review.
1 Database Systems Relations as Bags Grouping and Aggregation Database Modification.
Structured Query Language – Continued Rose-Hulman Institute of Technology Curt Clifton.
FALL 2004CENG 351 File Structures and Data Management1 SQL: Structured Query Language Chapter 5.
© 2002 by Prentice Hall 1 David M. Kroenke Database Processing Eighth Edition Chapter 9 Structured Query Language.
Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Lecture 7: Aggregates.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 SQL: Queries, Constraints, Triggers Chapter 5.
Relational Algebra Chapter 4 - part I. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
A Guide to SQL, Seventh Edition. Objectives Retrieve data from a database using SQL commands Use compound conditions Use computed columns Use the SQL.
Exercises Product ( pname, price, category, maker) Purchase (buyer, seller, store, product) Company (cname, stock price, country) Person( per-name, phone.
Computer Science 101 Web Access to Databases SQL – Extended Form.
SQL Operations Aggregate Functions Having Clause Database Access Layer A2 Teacher Up skilling LECTURE 5.
1 Section 5 - Grouping Data u The GROUP BY clause allows the grouping of data u Aggregate functions are most often used with the GROUP BY clause u GROUP.
Using Relational Databases and SQL Department of Computer Science California State University, Los Angeles Lecture 8: Subqueries.
1 IT420: Database Management and Organization SQL - Data Manipulation Language 27 January 2006 Adina Crăiniceanu
Chapter 3 Single-Table Queries
1 ICS 184: Introduction to Data Management Lecture Note 10 SQL as a Query Language (Cont.)
1 Single Table Queries. 2 Objectives  SELECT, WHERE  AND / OR / NOT conditions  Computed columns  LIKE, IN, BETWEEN operators  ORDER BY, GROUP BY,
Instructor: Jinze Liu Fall Basic Components (2) Relational Database Web-Interface Done before mid-term Must-Have Components (2) Security: access.
MIS 3053 Database Design & Applications The University of Tulsa Professor: Akhilesh Bajaj RM/SQL Lecture 5 © Akhilesh Bajaj, 2000, 2002, 2003, All.
BACS 287 Structured Query Language 1. BACS 287 Visual Basic Table Access Visual Basic provides 2 mechanisms to access data in tables: – Record-at-a-time.
DATA RETRIEVAL WITH SQL Goal: To issue a database query using the SELECT command.
1 CSCE Database Systems Anxiao (Andrew) Jiang The Database Language SQL.
1 SQL Additional Notes. 2  1 Group and Aggregation*  2 Execution Order*  3 Join*  4 Find the maximum  5 Line Format SQL Additional Notes *partially.
1/18/00CSE 711 data mining1 What is SQL? Query language for structural databases (esp. RDB) Structured Query Language Originated from Sequel 2 by Chamberlin.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
1 SY306 Web and Databases for Cyber Operations Set #13: SQL SELECT Grouping and sub-queries.
COMP 430 Intro. to Database Systems Multi-table SQL Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!
A Guide to SQL, Eighth Edition Chapter Four Single-Table Queries.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 5 SQL.
1 SQL: The Query Language (Part II). 2 Expressions and Strings v Illustrates use of arithmetic expressions and string pattern matching: Find triples (of.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu.
1 SQL: The Query Language. 2 Example Instances R1 S1 S2 v We will use these instances of the Sailors and Reserves relations in our examples. v If the.
Sorting data and Other selection Techniques Ordering data results Allows us to view our data in a more meaningful way. Rather than just a list of raw.
1 Lecture 03: SQL Monday, January 9, Project t/Default.aspxhttp://iisqlsrv.cs.washington.edu/444/Projec.
1 Chapter 3 Single Table Queries. 2 Simple Queries Query - a question represented in a way that the DBMS can understand Basic format SELECT-FROM Optional.
SQL: Interactive Queries (2) Prof. Weining Zhang Cs.utsa.edu.
1 CS122A: Introduction to Data Management Lecture 9 SQL II: Nested Queries, Aggregation, Grouping Instructor: Chen Li.
CS3220 Web and Internet Programming More SQL
Slides are reused by the approval of Jeffrey Ullman’s
Chapter 3 Introduction to SQL(3)
CS 440 Database Management Systems
Lecture#7: Fun with SQL (Part 2)
SQL : Query Language Part II CS3431.
CS 405G: Introduction to Database Systems
MENAMPILKAN DATA DARI SATU TABEL (Chap 2)
Data warehouse Design Using Oracle
Introduction to Database Systems CSE 444 Lecture 03: SQL
GROUP BY & Subset Data Analysis
MIS2502: Review for Exam 1 JaeHwuen Jung
SQL – Entire Select.
Introduction to Database Systems CSE 444 Lecture 03: SQL
Chapter 4 Summary Query.
SQL: Structured Query Language
CS122 Using Relational Databases and SQL
Lectures 6: Introduction to SQL 5
M1G Introduction to Database Development
CSC 453 Database Systems Lecture
Section 4 - Sorting/Functions
Building Queries using the Principle of Simplest Query (POSQ)
Presentation transcript:

COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

One form of aggregation – Column totals Candy candy_nameprice Reese’s Cup th Avenue1.20 Almond Joy0.75 Jelly Babies2.39 SELECT Sum(price) FROM Candy; 4.84

Other aggregations Count Avg Max Min … What else depends on SQL version.

Aggregation + DISTINCT Candy candy_nameprice Reese’s Cup th Avenue1.20 Almond Joy0.75 Jelly Babies2.39 Chocolate Orange 2.39 Jelly Nougats0.50 SELECT Count(price), Count (DISTINCT price) FROM Candy; 64

Aggregation & NULL Candy candy_nameprice Reese’s Cup th AvenueNULL Almond Joy0.75 Jelly Babies2.39 Chocolate Orange 2.39 Jelly NougatsNULL SELECT Count(price) FROM Candy; 4 No price since no longer made in original form.

Counting rows Candy candy_nameprice Reese’s Cup th AvenueNULL Almond Joy0.75 Jelly Babies2.39 Chocolate Orange 2.39 Jelly NougatsNULL SELECT Count(*) FROM Candy; 6

Aggregation + other previous features Purchase productdatepricequantity bagel10/21120 banana10/ banana10/10110 bagel10/ SELECT Sum(price * quantity) FROM Purchase WHERE product = ‘bagel’; 50

Grouping + Aggregation – Column subtotals Purchase productdatepricequantity bagel10/21120 banana10/ banana10/10110 bagel10/ SELECT product, Sum(price * quantity) AS subtotal FROM Purchase GROUP BY product; productdatepricequantity bagel 10/ / banana 10/ /10110 productsubtotal bagel50 banana15

Grouping + Aggregation Purchase productdatepricequantity bagel10/21120 banana10/ banana10/10110 bagel10/ SELECT product, Count(*) AS purchases, Sum(quantity) AS items, Sum(price * quantity) AS subtotal FROM Purchase GROUP BY product; Can only SELECT fields that you GROUP BY. productpurchasesitemssubtotal bagel24050 banana22015

Conditions on aggregates SELECT product, Sum(price * quantity) AS subtotal FROM Purchase WHERE date > ‘10/05’ GROUP BY product HAVING SUM(quantity) >= 10; Condition on individual rows. Condition on aggregates. Purchase productdatepricequantity bagel10/21120 banana10/ banana10/10110 bagel10/ apple10/1015 productdatepricequantity bagel 10/ / banana10/10110 apple10/1015 productsubtotal bagel50 banana10

Query: Products with at least two purchases. A.Count(product) B.Count(*) C.Sum(product) D.product SELECT product FROM Purchase GROUP BY product HAVING ??? >= 2 ; Purchase productdatepricequantity bagel10/21120 banana10/ banana10/10110 bagel10/ apple10/1015

Sorting on aggregates Purchase productdatepricequantity bagel10/21120 banana10/ banana10/10110 bagel10/ SELECT product, Sum(price * quantity) FROM Purchase GROUP BY product ORDER BY Sum(price * quantity); productdatepricequantity bagel 10/ / banana 10/ /10110 productsubtotal banana15 bagel50

Sorting on aggregates Purchase productdatepricequantity bagel10/21120 banana10/ banana10/10110 bagel10/ SELECT product, Sum(price * quantity) AS subtotal FROM Purchase GROUP BY product ORDER BY subtotal; productdatepricequantity bagel 10/ / banana 10/ /10110 productsubtotal banana15 bagel50

Syntax summary SELECT [DISTINCT] Computations FROM Tables WHERE Condition 1 GROUP BY a 1, …, a k HAVING Condition 2 ORDER BY attributes LIMIT n; Uses attributes a 1, …, a k or aggregates. Condition on attributes. Condition on aggregates. Some SQLs: SELECT TOP(n) Computations Can also use computation results.

Semantics summary SELECT [DISTINCT] Computations FROM Tables WHERE Condition 1 GROUP BY a 1, …, a k HAVING Condition 2 ORDER BY attributes LIMIT n; 5.Computations, [eliminate duplicates] 8.Projections 1.Cross-product 2.Apply Condition 1 3.Group by attributes 4.Apply Condition 2 to each group 6.Sort 7.Use first n rows What we’re they thinking?!?

(Potentially confusing) details

DISTINCT Sum() vs. Sum(DISTINCT) SELECT DISTINCT Sum(DISTINCT value) FROM Data GROUP BY tag; Data itemtagvalue 1a2 2a2 3b1 4b3 5c4 ?

HAVING without grouping SELECT value, COUNT(*) AS count FROM Data HAVING count > 1; Data itemtagvalue 1a2 2a2 3b1 4b3 5c4 ? Fails in SQLite. Without GROUP BY, the entire results form one group.

Grouping without aggregation Purchase productdatepricequantity bagel10/21120 banana10/ banana10/10110 bagel10/ SELECT product FROM Purchase GROUP BY product; product bagel banana

GROUP BY vs. DISTINCT SELECT product FROM Purchase GROUP BY product; SELECT DISTINCT product FROM Purchase; But only GROUP BY works well with aggregation. Purchase productdatepricequantity bagel10/21120 banana10/ banana10/10110 bagel10/ product bagel banana

More complicated examples

Activity – will break activity into segments 04a-aggregation.ipynb Students (id and name) taking more than 5 courses Courses and their average ratings Highest average rating of a course Course with highest average rating

Students with >5 courses – Two solutions SELECT s_id, first_name, last_name FROM Student WHERE (SELECT Count(*) FROM Enrollment WHERE Student.s_id = Enrollment.s_id ) > 5; SELECT s.s_id, s.first_name, s.last_name FROM Student s, Enrollment e WHERE s.s_id = e.s_id GROUP BY Student.s_id HAVING COUNT(crn) > 5;

Efficiency comparison SELECT s_id, first_name, last_name FROM Student WHERE (SELECT Count(*) FROM Enrollment WHERE Student.s_id = Enrollment.s_id ) > 5; SELECT s.s_id, s.first_name, s.last_name FROM Student s, Enrollment e WHERE s.s_id = e.s_id GROUP BY Student.s_id HAVING COUNT(crn) > 5; Assuming no major optimizations: 1.How many times do we SELECT … FROM Enrollment? 2.How many joins?

Courses and their average ratings SELECT crn, Avg(rating) FROM Enrollment GROUP BY crn;

Highest average rating of a course SELECT Max(avg_rating) FROM (SELECT Avg(rating) as avg_rating FROM Enrollment GROUP BY crn ); SELECT crn, Max(Avg(rating)) FROM Enrollment GROUP BY crn; Nesting aggregate functions is not syntactically allowed.

Highest avg rating – Solution 1 SELECT crn FROM (SELECT crn, Avg(rating) AS avg_rating1 FROM Enrollment GROUP BY crn ) WHERE avg_rating1 = (SELECT Max(avg_rating2) FROM (SELECT Avg(rating) as avg_rating2 FROM Enrollment GROUP BY crn ) ); 1.Calculate all average ratings. 2.Calculate the maximum. 3.Find which courses have this value as their average.

What about repeated subquery? SELECT crn FROM (SELECT crn, Avg(rating) AS avg_rating1 FROM Enrollment GROUP BY crn ) WHERE avg_rating1 = (SELECT Max(avg_rating2) FROM (SELECT Avg(rating) AS avg_rating2 FROM Enrollment GROUP BY crn ) ); CREATE VIEW Average AS SELECT crn, Avg(rating) AS avg_rating FROM Enrollment GROUP BY crn; SELECT crn FROM Average WHERE avg_rating = (SELECT Max(avg_rating) FROM Average );

Highest avg rating – Solution 2 CREATE VIEW Average AS SELECT crn, Avg(rating) AS avg_rating FROM Enrollment GROUP BY crn; SELECT crn FROM Average ORDER BY avg_rating DESC LIMIT 1; Doesn’t work for ties.

Efficiency comparison CREATE VIEW Averages AS SELECT crn, Avg(rating) AS avg_rating FROM Enrollment GROUP BY crn; SELECT crn FROM Averages ORDER BY avg_rating LIMIT 1; SELECT crn FROM Averages WHERE avg_rating = (SELECT Max(avg_rating) FROM Averages ); Assuming no major optimizations: Cost estimate of each?

Aside: Code Style & Indentation Review previous examples.

Where we’re headed So far: Have seen a powerful subset of SQL queries. Raised questions about how example schemas are structured. Next: Switch to focusing on database design. Later: Return to more SQL.