January 17th – Subqueries

Slides:



Advertisements
Similar presentations
SQL Queries Principal form: SELECT desired attributes FROM tuple variables –– range over relations WHERE condition about tuple variables; Running example.
Advertisements

1 Database Systems Relations as Bags Grouping and Aggregation Database Modification.
1 Lecture 12: SQL Friday, October 26, Outline Simple Queries in SQL (5.1) Queries with more than one relation (5.2) Subqueries (5.3) Duplicates.
Subqueries Example Find the name of the producer of ‘Star Wars’.
1 Lecture 03: SQL Friday, January 7, Administrivia Have you logged in IISQLSRV yet ? HAVE YOU CHANGED YOUR PASSWORD ? Homework 1 is now posted.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #3.
Slides adapted from A. Silberschatz et al. Database System Concepts, 5th Ed. SQL - part 2 - Database Management Systems I Alex Coman, Winter 2006.
1 Lecture 3: More SQL Friday, January 9, Agenda Homework #1 on the web site today. Sign up for the mailing list! Next Friday: –In class ‘activity’
Exercises Product ( pname, price, category, maker) Purchase (buyer, seller, store, product) Company (cname, stock price, country) Person( per-name, phone.
SQL (almost end) April 26 th, Agenda HAVING clause Views Modifying views Reusing views.
Using Relational Databases and SQL Department of Computer Science California State University, Los Angeles Lecture 8: Subqueries.
1 SQL cont.. 2 Outline Unions, intersections, differences (6.2.5, 6.4.2) Subqueries (6.3) Aggregations (6.4.3 – 6.4.6) Hint for reading the textbook:
1 Lecture 04: SQL Wednesday, January 11, Outline Two Examples Nulls (6.1.6) Outer joins (6.3.8) Database Modifications (6.5)
1 ICS 184: Introduction to Data Management Lecture Note 10 SQL as a Query Language (Cont.)
1 CSCE Database Systems Anxiao (Andrew) Jiang The Database Language SQL.
1 SQL Additional Notes. 2  1 Group and Aggregation*  2 Execution Order*  3 Join*  4 Find the maximum  5 Line Format SQL Additional Notes *partially.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
1 Introduction to Database Systems CSE 444 Lecture 02: SQL September 28, 2007.
More SQL (and Relational Algebra). More SQL Extended Relational Algebra Outerjoins, Grouping/Aggregation Insert/Delete/Update.
1 Lecture 03: SQL Monday, January 9, Project t/Default.aspxhttp://iisqlsrv.cs.washington.edu/444/Projec.
1 CS122A: Introduction to Data Management Lecture 9 SQL II: Nested Queries, Aggregation, Grouping Instructor: Chen Li.
1 Introduction to Database Systems, CS420 SQL JOIN, Aggregate, Grouping, HAVING and DML Clauses.
Schedule Today: Jan. 28 (Mon) Jan. 30 (Wed) Next Week Assignments !!
Slides are reused by the approval of Jeffrey Ullman’s
CS580 Advanced Database Topics
Lecture 05: SQL Wednesday, January 12, 2005.
Cours 7: Advanced SQL.
Outerjoins, Grouping/Aggregation Insert/Delete/Update
Lecture 04: SQL Monday, January 10, 2005.
References: Text Chapters 8 and 9
CPSC-310 Database Systems
Schedule Today: Next After that Subqueries, Grouping and Aggregation.
Database Systems Subqueries, Aggregation
Server-Side Application and Data Management IT IS 3105 (FALL 2009)
Cse 344 April 4th – Subqueries.
Monday 3/27 and Wednesday 3/29, 2006
Introduction to Database Systems CSE 444 Lecture 04: SQL
Lecture 2 (cont’d) & Lecture 3: Advanced SQL – Part I
CS 405G: Introduction to Database Systems
Cse 344 January 12th –joins.
February 7th – Exam Review
March 30th – intro to joins
Cse 344 January 10th –joins.
Introduction to Database Systems CSE 444 Lecture 03: SQL
January 19th – Subqueries 2 and relational algebra
Lecture 4: Advanced SQL – Part II
Introduction to SQL Wenhao Zhang October 5, 2018.
Introduction to Database Systems CSE 444 Lecture 03: SQL
CSE544 SQL Wednesday, March 31, 2004.
SQL: Structured Query Language
SQL Introduction Standard language for querying and manipulating data
CMPT 354: Database System I
SQL: Structured Query Language
Lecture 12: SQL Friday, October 20, 2000.
Lectures 3: Introduction to SQL Part II
Introduction to Database Systems CSE 444 Lecture 02: SQL
Lecture 4: SQL Thursday, January 11, 2001.
Lectures 6: Introduction to SQL 5
M1G Introduction to Database Development
Lecture 3 Monday, April 8, 2002.
Lecture 4: SQL Wednesday, April 10, 2002.
Lecture 03: SQL Friday, October 3, 2003.
CPSC-608 Database Systems
More SQL Extended Relational Algebra Outerjoins, Grouping/Aggregation
Lecture 04: SQL Monday, October 6, 2003.
Lecture 14: SQL Wednesday, October 31, 2001.
CS 405G: Introduction to Database Systems
Presentation transcript:

January 17th – Subqueries Cse 344 January 17th – Subqueries

Grouping and Aggregation Purchase(product, price, quantity) Find total quantities for all sales over $1, by product. SELECT product, Sum(quantity) AS TotalSales FROM Purchase WHERE price > 1 GROUP BY product How is this query processed?

Grouping and Aggregation Purchase(product, price, quantity) Find total quantities for all sales over $1, by product. SELECT product, Sum(quantity) AS TotalSales FROM Purchase WHERE price > 1 GROUP BY product Do these queries return the same number of rows? Why? SELECT product, Sum(quantity) AS TotalSales FROM Purchase GROUP BY product

Grouping and Aggregation Purchase(product, price, quantity) Find total quantities for all sales over $1, by product. SELECT product, Sum(quantity) AS TotalSales FROM Purchase WHERE price > 1 GROUP BY product Do these queries return the same number of rows? Why? SELECT product, Sum(quantity) AS TotalSales FROM Purchase GROUP BY product Empty groups are removed, hence first query may return fewer groups

Grouping and Aggregation 1. Compute the FROM and WHERE clauses. 2. Group by the attributes in the GROUPBY 3. Compute the SELECT clause: grouped attributes and aggregates. FWGS = From Where Group by Select (I made up that acronym) TM FWGS CSE 344 - 2017au

1,2: From, Where FWGS Product Price Quantity Bagel 3 20 1.50 Banana 0.5 50 2 10 4 WHERE price > 1 SELECT product, Sum(quantity) AS TotalSales FROM Purchase WHERE price > 1 GROUP BY product

3,4. Grouping, Select FWGS Product Price Quantity Bagel 3 20 1.50 Banana 0.5 50 2 10 4 Product TotalSales Bagel 40 Banana 20 SELECT product, Sum(quantity) AS TotalSales FROM Purchase WHERE price > 1 GROUP BY product

Ordering Results FWGOS Purchase(pid, product, price, quantity, month) SELECT product, sum(price*quantity) as rev FROM Purchase GROUP BY product ORDER BY rev desc Reminder students of the syntax of “as” if they have not seen already (just renaming) Why need the alias for rev? FWGOS TM Note: some SQL engines want you to say ORDER BY sum(price*quantity) desc CSE 344 - 2017au

HAVING Clause Purchase(pid, product, price, quantity, month) Same query as before, except that we consider only products that had at least 30 sales. SELECT product, sum(price*quantity) FROM Purchase WHERE price > 1 GROUP BY product HAVING sum(quantity) > 30 HAVING clause contains conditions on aggregates. CSE 344 - 2017au

General form of Grouping and Aggregation SELECT S FROM R1,…,Rn WHERE C1 GROUP BY a1,…,ak HAVING C2 Why ? S = may contain attributes a1,…,ak and/or any aggregates but NO OTHER ATTRIBUTES C1 = is any condition on the attributes in R1,…,Rn C2 = is any condition on aggregate expressions and on attributes a1,…,ak CSE 344 - 2017au

Semantics of SQL With Group-By SELECT S FROM R1,…,Rn WHERE C1 GROUP BY a1,…,ak HAVING C2 FWGHOS Evaluation steps: Evaluate FROM-WHERE using Nested Loop Semantics Group by the attributes a1,…,ak Apply condition C2 to each group (may have aggregates) Compute aggregates in S and return the result From Where Groupby Having Orderby Select CSE 344 - 2017au

Exercise Purchase(pid, product, price, quantity, month) Compute the total income per month Show only months with less than 10 items sold Order by quantity sold and display as “TotalSold” Here is a series of slides that discusses how to construct a query from scratch CSE 344 - 2017au

Exercise Purchase(pid, product, price, quantity, month) Compute the total income per month Show only months with less than 10 items sold Order by quantity sold and display as “TotalSold” FROM Purchase CSE 344 - 2017au

Exercise Purchase(pid, product, price, quantity, month) Compute the total income per month Show only months with less than 10 items sold Order by quantity sold and display as “TotalSold” FROM Purchase GROUP BY month CSE 344 - 2017au

Exercise Purchase(pid, product, price, quantity, month) Compute the total income per month Show only months with less than 10 items sold Order by quantity sold and display as “TotalSold” FROM Purchase GROUP BY month HAVING sum(quantity) < 10 CSE 344 - 2017au

Exercise Purchase(pid, product, price, quantity, month) Compute the total income per month Show only months with less than 10 items sold Order by quantity sold and display as “TotalSold” SELECT month, sum(price*quantity), sum(quantity) as TotalSold FROM Purchase GROUP BY month HAVING sum(quantity) < 10 CSE 344 - 2017au

Exercise Purchase(pid, product, price, quantity, month) Compute the total income per month Show only months with less than 10 items sold Order by quantity sold and display as “TotalSold” SELECT month, sum(price*quantity), sum(quantity) as TotalSold FROM Purchase GROUP BY month HAVING sum(quantity) < 10 ORDER BY sum(quantity) CSE 344 - 2017au

WHERE vs HAVING WHERE condition is applied to individual rows The rows may or may not contribute to the aggregate No aggregates allowed here Occasionally, some groups become empty and are removed HAVING condition is applied to the entire group Entire group is returned, or removed May use aggregate functions on the group CSE 344 - 2017au

Mystery Query Purchase(pid, product, price, quantity, month) What do they compute? SELECT month, sum(quantity), max(price) FROM Purchase GROUP BY month SELECT month, sum(quantity) FROM Purchase GROUP BY month SELECT month FROM Purchase GROUP BY month

Mystery Query Purchase(pid, product, price, quantity, month) What do they compute? SELECT month, sum(quantity), max(price) FROM Purchase GROUP BY month SELECT month, sum(quantity) FROM Purchase GROUP BY month Lesson: DISTINCT is a special case of GROUP BY SELECT month FROM Purchase GROUP BY month

Aggregate + Join Product(pid,pname,manufacturer) Purchase(id,product_id,price,month) Aggregate + Join For each manufacturer, compute how many products with price > $100 they sold Group by x.manufacturer, y.month = create groups of the form (manufacturer, month)

Aggregate + Join Product(pid,pname,manufacturer) Purchase(id,product_id,price,month) Aggregate + Join For each manufacturer, compute how many products with price > $100 they sold Problem: manufacturer is in Purchase, price is in Product... Group by x.manufacturer, y.month = create groups of the form (manufacturer, month)

Aggregate + Join Product(pid,pname,manufacturer) Purchase(id,product_id,price,month) Aggregate + Join For each manufacturer, compute how many products with price > $100 they sold Problem: manufacturer is in Purchase, price is in Product... manu facturer ... price Hitachi 150 Canon 300 180 -- step 1: think about their join SELECT ... FROM Product x, Purchase y WHERE x.pid = y.product_id and y.price > 100 Group by x.manufacturer, y.month = create groups of the form (manufacturer, month)

Aggregate + Join Product(pid,pname,manufacturer) Purchase(id,product_id,price,month) Aggregate + Join For each manufacturer, compute how many products with price > $100 they sold Problem: manufacturer is in Purchase, price is in Product... manu facturer ... price Hitachi 150 Canon 300 180 -- step 1: think about their join SELECT ... FROM Product x, Purchase y WHERE x.pid = y.product_id and y.price > 100 Group by x.manufacturer, y.month = create groups of the form (manufacturer, month) -- step 2: do the group-by on the join SELECT x.manufacturer, count(*) FROM Product x, Purchase y WHERE x.pid = y.product_id and y.price > 100 GROUP BY x.manufacturer manu facturer count(*) Hitachi 2 Canon 1 ...

Aggregate + Join Product(pid,pname,manufacturer) Purchase(id,product_id,price,month) Aggregate + Join Variant: For each manufacturer, compute how many products with price > $100 they sold in each month SELECT x.manufacturer, y.month, count(*) FROM Product x, Purchase y WHERE x.pid = y.product_id and y.price > 100 GROUP BY x.manufacturer, y.month Group by x.manufacturer, y.month = create groups of the form (manufacturer, month) manu facturer month count(*) Hitachi Jan 2 Feb 1 Canon 3 ...

Including Empty Groups FWGHOS In the result of a group by query, there is one row per group in the result Count(*) is never 0 SELECT x.manufacturer, count(*) FROM Product x, Purchase y WHERE x.pname = y.product GROUP BY x.manufacturer Mention that this actually triggered the famous count bug that happened in the 80s that took 5 years to discover and fix. CSE 344 - 2017au

Including Empty Groups SELECT x.manufacturer, count(y.pid) FROM Product x LEFT OUTER JOIN Purchase y ON x.pname = y.product GROUP BY x.manufacturer Count(pid) is 0 when all pid’s in the group are NULL Extra exercises: 1) List all manufacturers with more than 10 items sold. Return the manufacturer name and the number of items sold. select manufacturer, sum(quantity) from Product, Purchase where pname=product group by manufacturer having sum(quantity) > 10; 2) List all manufacturers with more than 1 distinct product sold. Return the name of the manufacturer and the number of distinct products sold. select manufacturer, count( distinct product) from Product, Purchase where pname=product group by manufacturer having count(distinct product) > 1; 3) List all products with more than 2 purchases. Return the name of the product, its product ID, and the max price at which it was sold select R.pname, R.pid, max(R.price) from Product R, Purchase P where R.pname=P.product group by R.pname, R.pid having count(*) > 2; 4) Find manufacturers with at least 5 purchases in the same month. Return the manufacturer, month, and the total quantity of items sold select R.manufacturer, P.month, sum(quantity) from Product R, Purchase P group by P.month, R.manufacturer having count(*) > 2; CSE 344 - 2017au

Subqueries A subquery is a SQL query nested inside a larger query Such inner-outer queries are called nested queries A subquery may occur in: A SELECT clause A FROM clause A WHERE clause Rule of thumb: avoid nested queries when possible But sometimes it’s impossible, as we will see Subqueries can return a single constant and this constant can be compared with another value in a WHERE clause Subqueries can return relations that can be used in various ways in WHERE clauses Subqueries can appear in FROM clauses, followed by a tuple variable that represents the tuples in the result of the subquery You will understand why the last is true in a moment CSE 344 - 2017au

Subqueries… Can return a single value to be included in a SELECT clause Can return a relation to be included in the FROM clause, aliased using a tuple variable Can return a single value to be compared with another value in a WHERE clause Can return a relation to be used in the WHERE or HAVING clause under an existential quantifier Subqueries can return a single constant and this constant can be compared with another value in a WHERE clause Subqueries can return relations that can be used in various ways in WHERE clauses Subqueries can appear in FROM clauses, followed by a tuple variable that represents the tuples in the result of the subquery CSE 344 - 2017au

“correlated subquery” 1. Subqueries in SELECT Product (pname, price, cid) Company (cid, cname, city) For each product return the city where it is manufactured “correlated subquery” SELECT X.pname, (SELECT Y.city FROM Company Y WHERE Y.cid=X.cid) as City FROM Product X Putting query here because it is easier to paste into SQLite: SELECT X.pname, (SELECT Y.city FROM Company Y WHERE Y.cid=X.cid) As City FROM Product X; What happens if the subquery returns more than one city? We get a runtime error (and SQLite simply ignores the extra values…) CSE 344 - 2017au

We have “unnested” the query Product (pname, price, cid) Company (cid, cname, city) 1. Subqueries in SELECT Whenever possible, don’t use a nested queries: SELECT X.pname, (SELECT Y.city FROM Company Y WHERE Y.cid=X.cid) as City FROM Product X = SELECT X.pname, Y.city FROM Product X, Company Y WHERE X.cid=Y.cid We have “unnested” the query SELECT X.pname, Y.city FROM Product X, Company Y WHERE X.cid=Y.cid CSE 344 - 2017au

Product (pname, price, cid) Company (cid, cname, city) 1. Subqueries in SELECT Compute the number of products made by each company SELECT DISTINCT C.cname, (SELECT count(*) FROM Product P WHERE P.cid=C.cid) FROM Company C SELECT DISTINCT C.cname, (SELECT count(*) FROM Product P WHERE P.cid=C.cid) FROM Company C; SELECT C.cname, count(*) FROM Company C, Product P WHERE C.cid=P.cid GROUP BY C.cname; SELECT C.cname, count(pname) FROM Company C left outer join Product P on C.cid=P.cid GROUP BY C.cname; CSE 344 - 2017au

Product (pname, price, cid) Company (cid, cname, city) 1. Subqueries in SELECT Compute the number of products made by each company SELECT DISTINCT C.cname, (SELECT count(*) FROM Product P WHERE P.cid=C.cid) FROM Company C SELECT DISTINCT C.cname, (SELECT count(*) FROM Product P WHERE P.cid=C.cid) FROM Company C; SELECT C.cname, count(*) FROM Company C, Product P WHERE C.cid=P.cid GROUP BY C.cname; SELECT C.cname, count(pname) FROM Company C left outer join Product P on C.cid=P.cid GROUP BY C.cname; SELECT C.cname, count(*) FROM Company C, Product P WHERE C.cid=P.cid GROUP BY C.cname Better: we can unnest using a GROUP BY CSE 344 - 2017au

Product (pname, price, cid) Company (cid, cname, city) 1. Subqueries in SELECT But are these really equivalent? SELECT DISTINCT C.cname, (SELECT count(*) FROM Product P WHERE P.cid=C.cid) FROM Company C SELECT C.cname, count(*) FROM Company C, Product P WHERE C.cid=P.cid GROUP BY C.cname SELECT DISTINCT C.cname, (SELECT count(*) FROM Product P WHERE P.cid=C.cid) FROM Company C SELECT C.cname, count(*) FROM Company C, Product P WHERE C.cid=P.cid GROUP BY C.cname SELECT C.cname, count(pname) FROM Company C left outer join Product P on C.cid=P.cid GROUP BY C.cname CSE 344 - 2017au

Product (pname, price, cid) Company (cid, cname, city) 1. Subqueries in SELECT But are these really equivalent? SELECT DISTINCT C.cname, (SELECT count(*) FROM Product P WHERE P.cid=C.cid) FROM Company C SELECT C.cname, count(*) FROM Company C, Product P WHERE C.cid=P.cid GROUP BY C.cname No! Different results if a company has no products SELECT DISTINCT C.cname, (SELECT count(*) FROM Product P WHERE P.cid=C.cid) FROM Company C SELECT C.cname, count(*) FROM Company C, Product P WHERE C.cid=P.cid GROUP BY C.cname SELECT C.cname, count(pname) FROM Company C left outer join Product P on C.cid=P.cid GROUP BY C.cname SELECT C.cname, count(pname) FROM Company C LEFT OUTER JOIN Product P ON C.cid=P.cid GROUP BY C.cname CSE 344 - 2017au

Product (pname, price, cid) Company (cid, cname, city) 2. Subqueries in FROM Find all products whose prices is > 20 and < 500 SELECT X.pname FROM (SELECT * FROM Product AS Y WHERE price > 20) as X WHERE X.price < 500 CSE 344 - 2017au

Product (pname, price, cid) Company (cid, cname, city) 2. Subqueries in FROM Find all products whose prices is > 20 and < 500 SELECT X.pname FROM (SELECT * FROM Product AS Y WHERE price > 20) as X WHERE X.price < 500 Try unnest this query ! CSE 344 - 2017au

Product (pname, price, cid) Company (cid, cname, city) 2. Subqueries in FROM Find all products whose prices is > 20 and < 500 SELECT X.pname FROM (SELECT * FROM Product AS Y WHERE price > 20) as X WHERE X.price < 500 Side note: This is not a correlated subquery. (why?) Try unnest this query ! CSE 344 - 2017au

2. Subqueries in FROM Sometimes we need to compute an intermediate table only to use it later in a SELECT-FROM-WHERE Option 1: use a subquery in the FROM clause Option 2: use the WITH clause CSE 344 - 2017au

Product (pname, price, cid) Company (cid, cname, city) 2. Subqueries in FROM SELECT X.pname FROM (SELECT * FROM Product AS Y WHERE price > 20) as X WHERE X.price < 500 = A subquery whose result we called myTable WITH myTable AS (SELECT * FROM Product AS Y WHERE price > 20) SELECT X.pname FROM myTable as X WHERE X.price < 500 CSE 344 - 2017au

3. Subqueries in WHERE Product (pname, price, cid) Company (cid, cname, city) 3. Subqueries in WHERE Find all companies that make some products with price < 200 Conditions involving relations. Remember to say: EXISTS R is true if and only if R is not empty SELECT DISTINCT C.cname FROM Company C WHERE EXISTS (SELECT * FROM Product P WHERE C.cid = P.cid and P.price < 200) CSE 344 - 2017au

Existential quantifiers Product (pname, price, cid) Company (cid, cname, city) 3. Subqueries in WHERE Find all companies that make some products with price < 200 Existential quantifiers Conditions involving relations. Remember to say: EXISTS R is true if and only if R is not empty SELECT DISTINCT C.cname FROM Company C WHERE EXISTS (SELECT * FROM Product P WHERE C.cid = P.cid and P.price < 200) CSE 344 - 2017au

Existential quantifiers Product (pname, price, cid) Company (cid, cname, city) 3. Subqueries in WHERE Find all companies that make some products with price < 200 Existential quantifiers Using EXISTS: Conditions involving relations. Remember to say: EXISTS R is true if and only if R is not empty SELECT DISTINCT C.cname FROM Company C WHERE EXISTS (SELECT * FROM Product P WHERE C.cid = P.cid and P.price < 200) SELECT DISTINCT C.cname FROM Company C WHERE EXISTS (SELECT * FROM Product P WHERE C.cid = P.cid and P.price < 200) CSE 344 - 2017au

Existential quantifiers Product (pname, price, cid) Company (cid, cname, city) 3. Subqueries in WHERE Find all companies that make some products with price < 200 Existential quantifiers Using IN SELECT DISTINCT C.cname FROM Company C WHERE C.cid IN (SELECT P.cid FROM Product P WHERE P.price < 200) SELECT DISTINCT C.cname FROM Company C WHERE C.cid IN (SELECT P.cid FROM Product P WHERE P.price < 200) CSE 344 - 2017au

Existential quantifiers Product (pname, price, cid) Company (cid, cname, city) 3. Subqueries in WHERE Find all companies that make some products with price < 200 Existential quantifiers Using ANY: SQLITE DOES NOT SUPPORT ANY AND ALL! S > ANY R is true iff s is greater than at least one value in R SELECT DISTINCT C.cname FROM Company C WHERE 100 > ANY (SELECT price FROM Product P WHERE P.cid = C.cid) SELECT DISTINCT C.cname FROM Company C WHERE 200 > ANY (SELECT price FROM Product P WHERE P.cid = C.cid) CSE 344 - 2017au

3. Subqueries in WHERE SELECT DISTINCT C.cname FROM Company C Product (pname, price, cid) Company (cid, cname, city) 3. Subqueries in WHERE Find all companies that make some products with price < 200 Existential quantifiers Using ANY: SQLITE DOES NOT SUPPORT ANY AND ALL! S > ANY R is true iff s is greater than at least one value in R SELECT DISTINCT C.cname FROM Company C WHERE 100 > ANY (SELECT price FROM Product P WHERE P.cid = C.cid) SELECT DISTINCT C.cname FROM Company C WHERE 200 > ANY (SELECT price FROM Product P WHERE P.cid = C.cid) Not supported in sqlite CSE 344 - 2017au

Existential quantifiers Product (pname, price, cid) Company (cid, cname, city) 3. Subqueries in WHERE Find all companies that make some products with price < 200 Existential quantifiers Now let’s unnest it: SELECT DISTINCT C.cname FROM Company C, Product P WHERE C.cid = P.cid and P.price < 200 CSE 344 - 2017au

Existential quantifiers Product (pname, price, cid) Company (cid, cname, city) 3. Subqueries in WHERE Find all companies that make some products with price < 200 Existential quantifiers Now let’s unnest it: You can easily unnest existential quantifiers SELECT DISTINCT C.cname FROM Company C, Product P WHERE C.cid = P.cid and P.price < 200 Existential quantifiers are easy!  CSE 344 - 2017au

3. Subqueries in WHERE Product (pname, price, cid) Company (cid, cname, city) 3. Subqueries in WHERE Find all companies s.t. all their products have price < 200 same as: Find all companies that make only products with price < 200 CSE 344 - 2017au

Universal quantifiers Product (pname, price, cid) Company (cid, cname, city) 3. Subqueries in WHERE Find all companies s.t. all their products have price < 200 same as: Find all companies that make only products with price < 200 Universal quantifiers CSE 344 - 2017au

Universal quantifiers Product (pname, price, cid) Company (cid, cname, city) 3. Subqueries in WHERE Find all companies s.t. all their products have price < 200 same as: Find all companies that make only products with price < 200 Universal quantifiers Universal quantifiers are hard!  CSE 344 - 2017au

3. Subqueries in WHERE SELECT DISTINCT C.cname FROM Company C Product (pname, price, cid) Company (cid, cname, city) 3. Subqueries in WHERE Find all companies s.t. all their products have price < 200 1. Find the other companies that make some product ≥ 200 SELECT DISTINCT C.cname FROM Company C WHERE C.cid IN (SELECT P.cid FROM Product P WHERE P.price >= 200) SELECT DISTINCT C.cname FROM Company C WHERE C.cid IN (SELECT P.cid FROM Product P WHERE P.price >= 200); SELECT DISTINCT C.cname FROM Company C WHERE C.cid NOT IN (SELECT P.cid FROM Product P WHERE P.price >= 200); CSE 344 - 2017au

3. Subqueries in WHERE SELECT DISTINCT C.cname FROM Company C Product (pname, price, cid) Company (cid, cname, city) 3. Subqueries in WHERE Find all companies s.t. all their products have price < 200 1. Find the other companies that make some product ≥ 200 SELECT DISTINCT C.cname FROM Company C WHERE C.cid IN (SELECT P.cid FROM Product P WHERE P.price >= 200) 2. Find all companies s.t. all their products have price < 200 SELECT DISTINCT C.cname FROM Company C WHERE C.cid IN (SELECT P.cid FROM Product P WHERE P.price >= 200); SELECT DISTINCT C.cname FROM Company C WHERE C.cid NOT IN (SELECT P.cid FROM Product P WHERE P.price >= 200); SELECT DISTINCT C.cname FROM Company C WHERE C.cid NOT IN (SELECT P.cid FROM Product P WHERE P.price >= 200) CSE 344 - 2017au

Universal quantifiers Product (pname, price, cid) Company (cid, cname, city) 3. Subqueries in WHERE Find all companies s.t. all their products have price < 200 Universal quantifiers Using EXISTS: SELECT DISTINCT C.cname FROM Company C WHERE NOT EXISTS (SELECT * FROM Product P WHERE P.cid = C.cid and P.price >= 200); SELECT DISTINCT C.cname FROM Company C WHERE NOT EXISTS (SELECT * FROM Product P WHERE P.cid = C.cid and P.price >= 200) CSE 344 - 2017au

Universal quantifiers Product (pname, price, cid) Company (cid, cname, city) 3. Subqueries in WHERE Find all companies s.t. all their products have price < 200 Universal quantifiers Using ALL: Does not work in SQLite: SELECT DISTINCT C.cname FROM Company C WHERE 200 > ALL (SELECT price FROM Product P WHERE P.cid = C.cid); SELECT DISTINCT C.cname FROM Company C WHERE 200 >= ALL (SELECT price FROM Product P WHERE P.cid = C.cid) CSE 344 - 2017au

3. Subqueries in WHERE SELECT DISTINCT C.cname FROM Company C Product (pname, price, cid) Company (cid, cname, city) 3. Subqueries in WHERE Find all companies s.t. all their products have price < 200 Universal quantifiers Using ALL: Does not work in SQLite: SELECT DISTINCT C.cname FROM Company C WHERE 200 > ALL (SELECT price FROM Product P WHERE P.cid = C.cid); SELECT DISTINCT C.cname FROM Company C WHERE 200 >= ALL (SELECT price FROM Product P WHERE P.cid = C.cid) Not supported in sqlite CSE 344 - 2017au

Question for Database Theory Fans and their Friends Can we unnest the universal quantifier query? We need to first discuss the concept of monotonicity CSE 344 - 2017au

Monotone Queries Definition A query Q is monotone if: Product (pname, price, cid) Company (cid, cname, city) Monotone Queries Definition A query Q is monotone if: Whenever we add tuples to one or more input tables, the answer to the query will not lose any of the tuples CSE 344 - 2017au

Monotone Queries Definition A query Q is monotone if: Product (pname, price, cid) Company (cid, cname, city) Monotone Queries Definition A query Q is monotone if: Whenever we add tuples to one or more input tables, the answer to the query will not lose any of the tuples Product Company pname price cid Gizmo 19.99 c001 Gadget 999.99 c004 Camera 149.99 c003 cid cname city c002 Sunworks Bonn c001 DB Inc. Lyon c003 Builder Lodtz pname city Gizmo Lyon Camera Lodtz Q CSE 344 - 2017au

Monotone Queries Definition A query Q is monotone if: Product (pname, price, cid) Company (cid, cname, city) Monotone Queries Definition A query Q is monotone if: Whenever we add tuples to one or more input tables, the answer to the query will not lose any of the tuples Product Company pname price cid Gizmo 19.99 c001 Gadget 999.99 c004 Camera 149.99 c003 cid cname city c002 Sunworks Bonn c001 DB Inc. Lyon c003 Builder Lodtz pname city Gizmo Lyon Camera Lodtz Q Product Company pname city Gizmo Lyon Camera Lodtz iPad pname price cid Gizmo 19.99 c001 Gadget 999.99 c004 Camera 149.99 c003 iPad 499.99 cid cname city c002 Sunworks Bonn c001 DB Inc. Lyon c003 Builder Lodtz Q So far it looks monotone...

Monotone Queries Definition A query Q is monotone if: Product (pname, price, cid) Company (cid, cname, city) Monotone Queries Definition A query Q is monotone if: Whenever we add tuples to one or more input tables, the answer to the query will not lose any of the tuples Product Company pname price cid Gizmo 19.99 c001 Gadget 999.99 c004 Camera 149.99 c003 cid cname city c002 Sunworks Bonn c001 DB Inc. Lyon c003 Builder Lodtz pname city Gizmo Lyon Camera Lodtz Q Q is not monotone! Product Company pname city Gizmo Lodtz Camera iPad Lyon pname price cid Gizmo 19.99 c001 Gadget 999.99 c004 Camera 149.99 c003 iPad 499.99 cid cname city c002 Sunworks Bonn c001 DB Inc. Lyon c003 Builder Lodtz c004 Crafter Q

Monotone Queries Theorem: If Q is a SELECT-FROM-WHERE query that does not have subqueries, and no aggregates, then it is monotone. CSE 344 - 2017au

Monotone Queries Theorem: If Q is a SELECT-FROM-WHERE query that does not have subqueries, and no aggregates, then it is monotone. Proof. We use the nested loop semantics: if we insert a tuple in a relation Ri, this will not remove any tuples from the answer for x1 in R1 do for x2 in R2 do … for xn in Rn do if Conditions output (a1,…,ak) SELECT a1, a2, …, ak FROM R1 AS x1, R2 AS x2, …, Rn AS xn WHERE Conditions CSE 344 - 2017au

Monotone Queries The query: is not monotone Product (pname, price, cid) Company (cid, cname, city) Monotone Queries The query: is not monotone Find all companies s.t. all their products have price < 200

Monotone Queries The query: is not monotone Product (pname, price, cid) Company (cid, cname, city) Monotone Queries The query: is not monotone Find all companies s.t. all their products have price < 200 pname price cid Gizmo 19.99 c001 cid cname city c001 Sunworks Bonn cname Sunworks

Product (pname, price, cid) Company (cid, cname, city) Monotone Queries The query: is not monotone Consequence: If a query is not monotonic, then we cannot write it as a SELECT-FROM-WHERE query without nested subqueries Find all companies s.t. all their products have price < 200 pname price cid Gizmo 19.99 c001 cid cname city c001 Sunworks Bonn cname Sunworks pname price cid Gizmo 19.99 c001 Gadget 999.99 cid cname city c001 Sunworks Bonn cname

Queries that must be nested Queries with universal quantifiers or with negation CSE 344 - 2017au

Queries that must be nested Queries with universal quantifiers or with negation Queries that use aggregates in certain ways sum(..) and count(*) are NOT monotone, because they do not satisfy set containment select count(*) from R is not monotone! CSE 344 - 2017au

Introduction to Data Management CSE 344 Lecture 7-8: SQL Wrap-up Relational Algebra CSE 344 - 2017au

Announcements You received invitation email to @cs You will be prompted to choose passwd Problems with existing account? In the worst case we will ask you to create a new @outlook account just for this class If OK, create the database server Choose cheapest pricing tier! Remember: WQ2 due on Friday

GROUP BY v.s. Nested Queries Purchase(pid, product, quantity, price) GROUP BY v.s. Nested Queries SELECT product, Sum(quantity) AS TotalSales FROM Purchase WHERE price > 1 GROUP BY product SELECT DISTINCT x.product, (SELECT Sum(y.quantity) FROM Purchase y WHERE x.product = y.product AND y.price > 1) AS TotalSales FROM Purchase x WHERE x.price > 1 At the beginning of lecture, we saw that we can express group by queries with a subquery in the select clause. Let’s take a look at some more complicated examples along those lines. First… what if we also want to apply a predicate in the where clause. We have to be careful, we need to use it twice. Why twice ? CSE 344 - 2017au

More Unnesting Author(login,name) Wrote(login,url) Find authors who wrote ≥ 10 documents: CSE 344 - 2017au

More Unnesting Author(login,name) Wrote(login,url) Find authors who wrote ≥ 10 documents: This is SQL by a novice Attempt 1: with nested queries SELECT DISTINCT Author.name FROM Author WHERE (SELECT count(Wrote.url) FROM Wrote WHERE Author.login=Wrote.login) >= 10 CSE 344 - 2017au

More Unnesting Author(login,name) Wrote(login,url) Find authors who wrote ≥ 10 documents: Attempt 1: with nested queries Attempt 2: using GROUP BY and HAVING SELECT Author.name FROM Author, Wrote WHERE Author.login=Wrote.login GROUP BY Author.name HAVING count(wrote.url) >= 10 This is SQL by an expert CSE 344 - 2017au

Finding Witnesses Product (pname, price, cid) Company (cid, cname, city) Finding Witnesses For each city, find the most expensive product made in that city CSE 344 - 2017au

Finding Witnesses Product (pname, price, cid) Company (cid, cname, city) Finding Witnesses For each city, find the most expensive product made in that city Finding the maximum price is easy… SELECT x.city, max(y.price) FROM Company x, Product y WHERE x.cid = y.cid GROUP BY x.city; But we need the witnesses, i.e., the products with max price CSE 344 - 2017au

Finding Witnesses Product (pname, price, cid) Company (cid, cname, city) Finding Witnesses To find the witnesses, compute the maximum price in a subquery (in FROM or in WITH) WITH CityMax AS (SELECT x.city, max(y.price) as maxprice FROM Company x, Product y WHERE x.cid = y.cid GROUP BY x.city) SELECT DISTINCT u.city, v.pname, v.price FROM Company u, Product v, CityMax w WHERE u.cid = v.cid and u.city = w.city and v.price = w.maxprice; CSE 344 - 2017au

Finding Witnesses Product (pname, price, cid) Company (cid, cname, city) Finding Witnesses To find the witnesses, compute the maximum price in a subquery (in FROM or in WITH) SELECT DISTINCT u.city, v.pname, v.price FROM Company u, Product v, (SELECT x.city, max(y.price) as maxprice FROM Company x, Product y WHERE x.cid = y.cid GROUP BY x.city) w WHERE u.cid = v.cid and u.city = w.city and v.price = w.maxprice; CSE 344 - 2017au

Finding Witnesses Product (pname, price, cid) Company (cid, cname, city) Finding Witnesses Or we can use a subquery in where clause SELECT u.city, v.pname, v.price FROM Company u, Product v WHERE u.cid = v.cid and v.price >= ALL (SELECT y.price FROM Company x, Product y WHERE u.city=x.city and x.cid=y.cid); CSE 344 - 2017au

Finding Witnesses Product (pname, price, cid) Company (cid, cname, city) Finding Witnesses There is a more concise solution here: SELECT u.city, v.pname, v.price FROM Company u, Product v, Company x, Product y WHERE u.cid = v.cid and u.city = x.city and x.cid = y.cid GROUP BY u.city, v.pname, v.price HAVING v.price = max(y.price) CSE 344 - 2017au