Database Systems Subqueries, Aggregation Gergely Lukács Pázmány Péter Catholic University Faculty of Information Technology Budapest, Hungary lukacs@itk.ppke.hu
Overview Subqueries VIEWS AGGREGATION WHERE clause FROM clause VIEWS AGGREGATION GROUP BY, HAVING Grouping and aggregating + Joining: Chasm trap, Fan trap SQL for analysis, Window function
Subqueries WHERE CLAUSE
Subqueries Returning Relations Company(name, city) Product(pname, maker) Purchase(id, product, buyer) Return cities where one can find companies that manufacture products bought by Joe Blow SELECT Company.city FROM Company WHERE Company.name IN (SELECT Product.maker FROM Purchase INNER JOIN Product ON Product.pname=Purchase.product WHERE Purchase.buyer = ‘Joe Blow‘);
Subqueries Returning Relations Is it equivalent to this ? SELECT company.city FROM company INNER JOIN product ON company.name = product.maker INNER JOIN purchase ON product.pname = purchase.product AND purchase.buyer = 'Joe Blow'; Beware of duplicates !
Removing Duplicates Now they are equivalent SELECT Company.city FROM Company WHERE Company.name IN (SELECT Product.maker FROM Purchase, Product WHERE Product.pname=Purchase.product AND Purchase .buyer = ‘Joe Blow‘); SELECT DISTINCT company.city FROM company INNER JOIN product ON company.name = product.maker INNER JOIN purchase ON product.pname = purchase.product AND purchase.buyer = 'Joe Blow'; Now they are equivalent
Subqueries Returning Relations You can also use: s > ALL R s > ANY R EXISTS R Product ( pname, price, category, maker) Find products that are more expensive than all those produced By “Gizmo-Works” SELECT name FROM Product WHERE price > ALL (SELECT price FROM Purchase WHERE maker=‘Gizmo-Works’)
Correlated Queries Movie (title, year, director, length) Find movies whose title appears more than once. SELECT DISTINCT title FROM Movie AS x WHERE year <> ANY (SELECT year FROM Movie WHERE title = x.title); Notes: (1) scope of variables (2) this can still be expressed as single SFW
IN NOT IN EXISTS NOT EXISTS ALL
Subqueries FROM CLAUSE
Subquery Subquery SUBQUERY SQL> SELECT … FROM … WHERE …
Subquery in the FROM clause SELECT DISTINCT company.city FROM company, product, (SELECT * FROM purchase WHERE purchase.buyer = 'Joe Blow') Purchase_filtered WHERE company.name = product.maker AND product.pname = Purchase_filtered.product Very useful in more complex queries; s. Aggregation later Also called: „inline view”
view
Views In some cases, we want to have the results of queries as tables without having to think again about the query In some cases, it is not desirable for all users to see the entire logical model (that is, all the actual relations stored in the database.) Consider a person who needs to know an instructors name and department, but not the salary. This person should see a relation described, in SQL, by select ID, name, dept_name from instructor A view provides a mechanism for these issues. View : “virtual relation”, defined by a query
Example View A view of instructors without their salary CREATE VIEW faculty AS SELECT ID, name, dept_name FROM instructor Find all instructors in the Biology department SELECT name FROM faculty WHERE dept_name = ’Biology’
View Definition create view v as < query expression > A view is defined using the create view statement which has the form create view v as < query expression > where <query expression> is any legal SQL expression. The view name is represented by v. Once a view is defined, the view name can be used to refer to the virtual relation that the view generates. View definition is not the same as creating a new relation by evaluating the query expression Rather, a view definition causes the saving of an expression; the expression is substituted into queries using the view.
aggregation
Aggregation SELECT avg(price) FROM Product WHERE maker=“Toyota” SELECT count(*) FROM Product WHERE year > 1995 SQL supports several aggregation operations: sum, count, min, max, avg Except count, all aggregations apply to a single attribute
Aggregation: Count COUNT applies to duplicates, unless otherwise stated: same as Count(*) (except for NULL values) SELECT Count(category) FROM Product WHERE year > 1995 We probably want: SELECT Count(DISTINCT category) FROM Product WHERE year > 1995
More Examples Purchase(product, date, price, quantity) SELECT Sum(price * quantity) FROM Purchase What do they mean ? SELECT Sum(price * quantity) FROM Purchase WHERE product = ’bagel’
Simple Aggregations Purchase Product Date Price Quantity Bagel 10/21 1 20 Banana 10/3 0.5 10 10/10 10/25 1.50 SELECT Sum(price * quantity) FROM Purchase WHERE product = ’bagel’ 50 (= 20+30)
Grouping and Aggregation Purchase(product, date, price, quantity) Find total sales per product. SELECT product, Sum(price*quantity) AS TotalSales FROM Purchase GROUP BY product Let’s see what this means…
SELECT – FROM – GROUP BY Product TotalSales Bagel 50 Banana 15 Product Date Price Quantity Bagel 10/21 1 20 10/25 1.50 Banana 10/3 0.5 10 10/10 SELECT product, Sum(price*quantity) AS TotalSales FROM Purchase GROUP BY product
WHERE Find total sales per product. Consider only sales after 10/1/2005 SELECT product, Sum(price*quantity) AS TotalSales FROM Purchase WHERE date > ‘10/1/2005’ GROUP BY product
HAVING Clause Same query, except that we consider only products the total quantity of which is more than 30. SELECT product, Sum(price * quantity) FROM Purchase GROUP BY product HAVING Sum(quantity) > 30 HAVING clause contains conditions on aggregates. Filters groups
General form of Grouping and Aggregation SELECT S FROM R1,…,Rn WHERE C1 GROUP BY a1,…,ak HAVING C2 ORDER BY O Evaluation steps: Evaluate FROM-WHERE, apply condition C1 Group by the attributes a1,…,ak Apply condition C2 to each group (may have aggregates) Compute aggregates in S Sort the result according O return the result
General form of Grouping and Aggregation SELECT S FROM R1,…,Rn WHERE C1 GROUP BY a1,…,ak HAVING C2 ORDER BY O S = may contain attributes a1,…,ak and/or any aggregates but NO OTHER ATTRIBUTES C1 = is any condition on the attributes in R1,…,Rn C2 = is any condition on aggregate expressions O = may contain attributes a1,…,ak and/or any aggregates but NO OTHER ATTRIBUTES
Aggregating with Join Id User_id Name 1 2 Bella Tiger 3 Molly Id Name Cat Person Id Name 1 Peter 2 Anna Id User_id Name 1 Max 2 Jack 3 Duke Dog
Aggregating with OUTER JOIN Chasm trap SELECT p.id, p.name, Count(c.id) AS cat_count FROM jn_person p LEFT OUTER JOIN jn_cat c ON p.id = c.person_id GROUP BY p.id, p.name; SELECT p.id, p.name, Count(c.id) AS cat_count FROM jn_person p INNER JOIN jn_cat c ON p.id = c.person_id GROUP BY p.id, p.name; Chasm trap
Aggregating over three tables Fan trap SELECT p.id, p.name, COUNT(c.id) AS cat_count, COUNT(d.id) AS dog_count FROM jn_person p LEFT OUTER JOIN jn_cat c ON p.id = c.person_id LEFT OUTER JOIN jn_dog d ON p.id = d.person_id GROUP BY p.id, p.name;
Aggregating over three tables SELECT pc.id, pc.name, pc.cat_count, Count(d.id) dog_count FROM (SELECT p.id, p.name, Count(c.id) AS cat_count FROM jn_person p LEFT OUTER JOIN jn_cat c ON p.id = c.person_id GROUP BY p.id, p.name) pc LEFT OUTER JOIN jn_dog d ON pc.id = d.person_id GROUP BY pc.id, pc.name, pc.cat_count; ;
SQL for analysis and reporting, window functions
SELECT fname, lname, salary, Rank() OVER (ORDER BY salary) FROM employee;
SELECT fname, lname, salary, Rank() OVER (ORDER BY salary), Dense_rank() OVER (ORDER BY salary), Round(100 * Percent_rank()OVER(ORDER BY salary)) FROM employee; Row_number(), rank(), dense_rank(), percent_rank(), cume_dist(), ntile()
SELECT fname, lname, salary, Avg(salary) OVER ( ORDER BY salary), SUM (salary) OVER ( ORDER BY salary) FROM employee;
SELECT fname, lname, salary, Lag(salary) OVER (ORDER BY salary), salary - Lag(salary) OVER (ORDER BY salary) FROM employee; Lag(), Lead(), First_value(), Last_value()
SELECT fname, lname, dno, salary, Rank() OVER (PARTITION BY dno ORDER BY salary ) FROM employee;
Window Functions Overview ROWS UNBOUNDED PRECEDING AND CURRENT ROW (default) ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING RANGE BETWEEN 2 PRECEDING AND 2 FOLLOWING