Presentation is loading. Please wait.

Presentation is loading. Please wait.

M.P. Johnson, DBMS, Stern/NYU, Spring 20081 C20.0046: Database Management Systems Lecture #11 M.P. Johnson Stern School of Business, NYU Spring, 2008.

Similar presentations


Presentation on theme: "M.P. Johnson, DBMS, Stern/NYU, Spring 20081 C20.0046: Database Management Systems Lecture #11 M.P. Johnson Stern School of Business, NYU Spring, 2008."— Presentation transcript:

1 M.P. Johnson, DBMS, Stern/NYU, Spring 20081 C20.0046: Database Management Systems Lecture #11 M.P. Johnson Stern School of Business, NYU Spring, 2008

2 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 2 Agenda Nulls & outer joins Grouping & aggregation

3 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 3 Acc(name,bal,type,…) Q2: Find holder of largest account of each type Note: 1. scope of variables 2. this can still be expressed as single SFW SELECT name, type FROM Acc a1 WHERE bal >= ALL (SELECT bal FROM Acc WHERE type=a1.type) SELECT name, type FROM Acc a1 WHERE bal >= ALL (SELECT bal FROM Acc WHERE type=a1.type) Recall: correlated subqueries correlation

4 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 4 New topic: Nulls in SQL If we don’t have a value, can put a NULL Null can mean several things:  Value does not exists  Value exists but is unknown  Value not applicable But null is not the same as 0  See Douglas Foster Wallace…

5 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 5 Null Values x = NULL  4*(3-x)/7 = NULL x = NULL  x + 3 – x = NULL x = NULL  3 + (x-x) = NULL x = NULL  x = 'Joe' is UNKNOWN In general: no row using null fields appear in the selection test will pass the test  With one exception Pace Boole, SQL has three boolean values:  FALSE=0  TRUE=1  UNKNOWN=0.5

6 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 6 Null values in boolean expressions C1 AND C2= min(C1, C2) C1 OR C2= max(C1, C2) NOT C1= 1 – C1 height > 6 = UNKNOWN  UNKNOWN OR weight > 190 = UNKOWN  (age < 25) AND UNKNOWN = UNKNOWN E.g. age=20 height=NULL weight=180 SELECT * FROM Person WHERE (age < 25) AND (height > 6 OR weight > 190) SELECT * FROM Person WHERE (age < 25) AND (height > 6 OR weight > 190)

7 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 7 Comparing null and non-nulls The schema specifies whether null is allowed for each attribute  NOT NULL to forbid  Nulls are allowed by default Unexpected behavior: Some Persons are not included! The “trichotomy law” does not hold! SELECT * FROM Person WHERE age = 25 SELECT * FROM Person WHERE age = 25

8 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 8 Testing for null values Can test for NULL explicitly:  x IS NULL  x IS NOT NULL But:  x = NULL is never true Now it includes all Persons SELECT * FROM Person WHERE age = 25 OR age IS NULL SELECT * FROM Person WHERE age = 25 OR age IS NULL

9 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 9 Null/logic review TRUE AND UNKNOWN = ? TRUE OR UNKNOWN = ? UNKNOWN OR UNKNOWN = ? X = NULL = ?

10 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 10 Next: Outer join Like inner join except that dangling tuples are included, padded with nulls Left outerjoin: dangling tuples from left are included  Nulls appear “on the right” Right outerjoin: dangling tuples from right are included  Nulls appear “on the left”

11 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 11 Cross join - example NameAddressGenderBirthdate Hanks123 Palm RdM01/01/60 Taylor456 Maple AvF02/02/40 Lucas789 Oak StM03/03/55 NameAddressNetworth Spielberg246 Palm Rd10M Taylor456 Maple Av20M Lucas789 Oak St30M MovieStar MovieExec

12 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 12 NameAddressG.BirthdateNameAddressNet Hanks123 Palm RdM01/01/60 Taylor456 Maple AvF02/02/40Taylor456 Maple Av20M Lucas789 Oak StM03/03/55Lucas789 Oak St30M Spielberg246 Palm Rd10M

13 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 13 Outer Join - Example SELECT * FROM MovieStar LEFT OUTER JOIN MovieExec ON MovieStart.name=MovieExec.name SELECT * FROM MovieStar RIGHT OUTER JOIN MovieExec ON MovieStart.name=MovieExec.name NameAddressG.BirthdateNameAddressNet Hanks123 Palm RdM01/01/60Null Taylor456 Maple AvF02/02/40Taylor456 Maple Av20M Lucas789 Oak StM03/03/55Lucas789 Oak St30M Null Spielberg246 Palm Rd10M NameAddressG.BirthdateNameAddressNet Hanks123 Palm RdM01/01/60Null Taylor456 Maple AvF02/02/40Taylor456 Maple Av20M Lucas789 Oak StM03/03/55Lucas789 Oak St30M Null Spielberg246 Palm Rd10M

14 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 14 Outer Join - Example NameAddressGenderBirthdate Hanks123 Palm RdM01/01/60 Taylor456 Maple AvF02/02/40 Lucas789 Oak StM03/03/55 NameAddressNetworth Spielberg246 Palm Rd10M Taylor456 Maple Av20M Lucas789 Oak St30M MovieStarMovieExec SELECT * FROM MovieStar FULL OUTER JOIN MovieExec ON MovieStart.name=MovieExec.name NameAddressG.BirthdateNameAddressNet Hanks123 Palm RdM01/01/60Null Taylor456 Maple AvF02/02/40Taylor456 Maple Av20M Lucas789 Oak StM03/03/55Lucas789 Oak St30M Null Spielberg246 Palm Rd10M

15 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 15 New-style outer joins Outer joins may be left, right, or full  FROM A LEFT [OUTER] JOIN B;  FROM A RIGHT [OUTER] JOIN B;  FROM A FULL [OUTER] JOIN B; OUTER is optional  If OUTER is included, then FULL is the default Q: How to remember left v. right? A: It indicates the side whose rows are always included

16 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 16 Next: Grouping & Aggregation In SQL:  aggregation operators in SELECT,  Grouping in GROUP BY clause Recall aggregation operators:  sum, avg, min, max, count strings, numbers, dates  Each applies to scalars  Count also applies to row: count(*)  Can DISTINCT inside aggregation op: count(DISTINCT x) Grouping: group rows that agree on single value  Each group becomes one row in result

17 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 17 Aggregation functions Numerical: SUM, AVG, MIN, MAX Char: MIN, MAX  In lexocographic/alphabetic order Any attribute: COUNT  Number of values SUM(B) = 10 AVG(A) = 1.5 MIN(A) = 1 MAX(A) = 3 COUNT(A) = 4 AB 12 34 12 12

18 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 18 Straight aggregation In R.A.  sum(x)  total (R) In SQL: Just put the aggregation op in SELECT NB: aggreg. ops applied to each non-null val  count(x) counts the number of nun-null vals in field x  Use count(*) to count the number of rows SELECT SUM(x) total FROM R SELECT SUM(x) total FROM R

19 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 19 Straight aggregation example COUNT applies to duplicates, unless otherwise stated: Better: Can we say: same as Count(*), except excludes nulls SELECT Count(category) FROM Product WHERE year > 1995 SELECT Count(category) FROM Product WHERE year > 1995 SELECT COUNT(DISTINCT category) FROM Product WHERE year > 1995 SELECT COUNT(DISTINCT category) FROM Product WHERE year > 1995 SELECT category, COUNT(category) FROM Product WHERE year > 1995 SELECT category, COUNT(category) FROM Product WHERE year > 1995

20 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 20 Straight aggregation example Purchase(product, date, price, quantity) Q: Find total sales for the entire database: Q: Find total sales of bagels: SELECT SUM(price * quantity) FROM Purchase SELECT SUM(price * quantity) FROM Purchase SELECT SUM(price * quantity) FROM Purchase WHERE product = 'bagel' SELECT SUM(price * quantity) FROM Purchase WHERE product = 'bagel'

21 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 21 Largest balance again Acc(name,bal,type) Q: Who has the largest balance? Q: Who has the largest balance of each type? Can we do these with aggregation functions?

22 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 22 Straight grouping Group rows together by field values Produces one row for each group  I.e., by each (combin. of) grouped val(s)  Don’t select non-grouped fields Reduces to DISTINCT selections: SELECT product FROM Purchase GROUP BY product SELECT product FROM Purchase GROUP BY product SELECT DISTINCT product FROM Purchase SELECT DISTINCT product FROM Purchase

23 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 23 Grouping & aggregation Sometimes want to group and compute aggregations by group  Aggregation op applied to rows in group,  not to all rows in table Purchase(product, date, price, quantity) Find total sales for products that sold for > 0.50: SELECT product, SUM(price*quantity) total FROM Purchase WHERE price >.50 GROUP BY product SELECT product, SUM(price*quantity) total FROM Purchase WHERE price >.50 GROUP BY product

24 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 24 Illustrated G&A example Purchase

25 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 25 First compute the FROM-WHERE Then GROUP BY product: Illustrated G&A example

26 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 26 Finally, aggregate and select: Illustrated G&A example SELECT product, SUM(price*quantity) total FROM Purchase WHERW price >.50 GROUP BY product SELECT product, SUM(price*quantity) total FROM Purchase WHERW price >.50 GROUP BY product

27 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 27 Illustrated G&A example GROUP BY may be reduced to (a possibly more complicated) subquery: SELECT product, SUM(price*quantity) total FROM Purchase WHERE price >.50 GROUP BY product SELECT product, SUM(price*quantity) total FROM Purchase WHERE price >.50 GROUP BY product SELECT DISTINCT x.product, (SELECT SUM(y.price*y.quantity) FROM Purchase y WHERE x.product = y.product AND y.price >.50) total FROM Purchase x WHERE x.price >.50 SELECT DISTINCT x.product, (SELECT SUM(y.price*y.quantity) FROM Purchase y WHERE x.product = y.product AND y.price >.50) total FROM Purchase x WHERE x.price >.50

28 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 28 For every product, what is the total sales and max quantity sold? Multiple aggregations SELECT product, SUM(price * quantity) SumSales, MAX(quantity) MaxQuantity FROM Purchase WHERE price >.50 GROUP BY product SELECT product, SUM(price * quantity) SumSales, MAX(quantity) MaxQuantity FROM Purchase WHERE price >.50 GROUP BY product

29 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 29 Another grouping/aggregation e.g. Movie(title, year, length, studioName) Q: How many total minutes of film have been produced by each studio? Strategy: Divide movies into groups per studio, then add lengths per group

30 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 30 Another grouping/aggregation e.g. TitleYearLengthStudio Star Wars1977120Fox Jedi1980105Fox Aviator2004800Miramax Pulp Fiction1995110Miramax Lost in Translation 200395Universal SELECT studio, sum(length) totalLength FROM Movies GROUP BY studio SELECT studio, sum(length) totalLength FROM Movies GROUP BY studio

31 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 31 Another grouping/aggregation e.g. TitleYearLengthStudio Star Wars1977120Fox Jedi1980105Fox Aviator2004800Miramax Pulp Fiction1995110Miramax Lost in Translation 200395Universal SELECT studio, sum(length) length FROM Movies GROUP BY studio SELECT studio, sum(length) length FROM Movies GROUP BY studio

32 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 32 Another grouping/aggregation e.g. TitleYearLengthStudio Star Wars1977120Fox Jedi1980105Fox Aviator2004800Miramax Pulp Fiction1995110Miramax Lost in Translation 200395Universal StudioLength Fox225 Miramax910 Universal95 SELECT studio, sum(length) totalLength FROM Movies GROUP BY studio SELECT studio, sum(length) totalLength FROM Movies GROUP BY studio

33 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 33 Grouping/aggregation example StarsIn(SName,Title,Year) Q: Find the year of each star’s first movie Q: Find the span of each star’s career  Look up first and last movies SELECT sname, min(year) firstyear FROM StarsIn GROUP BY sname SELECT sname, min(year) firstyear FROM StarsIn GROUP BY sname

34 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 34 Account types again Acc(name,bal,type) Q: Who has the largest balance of each type? Can we do this with grouping/aggregation?

35 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 35 G & A for constructed relations Movie(title,year,producerSsn,length) MovieExec(name,ssn,netWorth) Can do the same thing for larger, non-atomic relations Q: How many mins. of film did each producer make?  What happens to non-producer movie-execs? SELECT name, sum(length) total FROM Movie, MovieExec WHERE producerSsn = ssn GROUP BY name SELECT name, sum(length) total FROM Movie, MovieExec WHERE producerSsn = ssn GROUP BY name

36 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 36 HAVING clauses Sometimes want to limit which rows may be grouped Q: How many mins. of film did each rich producer make?  Rich = netWorth > 10000000 Q: Is HAVING necessary here? A: No, could just add rich req. to WHERE SELECT name, sum(length) total FROM Movie, MovieExec WHERE producerSsn = ssn GROUP BY name HAVING netWorth > 10000000 SELECT name, sum(length) total FROM Movie, MovieExec WHERE producerSsn = ssn GROUP BY name HAVING netWorth > 10000000

37 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 37 HAVING clauses Sometimes want to limit which rows may be grouped Q: How many mins. of film did each rich producer make?  Old = made movies before 1930 Q: Is HAVING necessary here? SELECT name, sum(length) total FROM Movie, MovieExec WHERE producerSsn = ssn GROUP BY name HAVING min(year) < 1930 SELECT name, sum(length) total FROM Movie, MovieExec WHERE producerSsn = ssn GROUP BY name HAVING min(year) < 1930

38 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 38 Review Examples from sqlzoo.netsqlzoo.net SELECT L FROM R 1, …, R n WHERE C SELECT L FROM R 1, …, R n WHERE C  L (  C (R 1 x … R n )


Download ppt "M.P. Johnson, DBMS, Stern/NYU, Spring 20081 C20.0046: Database Management Systems Lecture #11 M.P. Johnson Stern School of Business, NYU Spring, 2008."

Similar presentations


Ads by Google