M.P. Johnson, DBMS, Stern/NYU, Sp20041 C : Database Management Systems Lecture #9 Matthew P. Johnson Stern School of Business, NYU Spring, 2004
M.P. Johnson, DBMS, Stern/NYU, Sp Agenda Last time: 4NF, RA This time: 1. More RA 2. Bags Project Part 2 due next time
M.P. Johnson, DBMS, Stern/NYU, Sp Normalization Review Q: What’s required for BCNF? Q: What’s the loophole for 3NF? Q: How do we fix a non-BCNF, non-3NF, non-4NF relations? Q: When are are FDs also MVDs? Q: When are MVDs also FDs?
M.P. Johnson, DBMS, Stern/NYU, Sp Relational Algebra Review Five basic operators: Union: Intersection: Difference: - Selection: Projection: Cartesian Product: Extended operators: Joins (natural, equijoin, theta join, semijoin) Renaming: Extended projection
M.P. Johnson, DBMS, Stern/NYU, Sp Renaming Changes the schema, not the instance Notation: B1,…,Bn (R) is spelled “rho”, pronounced “row” Example: Employee(ssn,name) social, name) (Employee) Or just: (Employee)
M.P. Johnson, DBMS, Stern/NYU, Sp Complex RA Expressions Q: How long was Star Wars (1977)? Strategy: find the row with Star Wars; then project the length field TitleYearLengthinColorStudioPrdcr# Star Wars TrueFox12345 M.Ducks TrueDisney67890 W.World199295TrueParamount99999
M.P. Johnson, DBMS, Stern/NYU, Sp Combining operations Q: Which Fox movies are at least 100 minutes long? TitleYearLengthFilmtypeStudio Star wars ColorFox Mighty ducks ColorDisney Wayne’s world199285ColorParamount
M.P. Johnson, DBMS, Stern/NYU, Sp Complex RA Expressions Reps(ssn, name, etc.) Clients(ssn, name, rssn) Q: Who are George’s clients? Clients.name ( Reps.name=George ( Reps.ssn=rssn ( Reps x Clients))) Or: Clients.name ( Reps.name=George and Reps.ssn=rssn (Reps x Clients)) Or: Clients.name ( Reps.name=George (Reps x Clients) Reps.ssn=rssn (Reps x Clients))
M.P. Johnson, DBMS, Stern/NYU, Sp Complex RA Expressions People(ssn, name, street, city, state) assume for clarity that cities are unique Q: Who lives on George’s street? A: First, find George: name=“George” (People) Get George’s street/city: street,city ( name=“George” (People)) Cross with People: People x street,city ( name=“George” (People))
M.P. Johnson, DBMS, Stern/NYU, Sp Complex RA Expressions How to specify street = street? Rename p2(s2,c2) (People) x street,city ( name=“George” (People)) Now can select: street=s2 AND city=c2 ( p2(s2,c2) (People) x street,city ( name=“George” (People))) Then project names… Only way? No. Join! People street,city ( name=“George” (People)) Q: Would the following work? street,city ( name=“George” (People People))
M.P. Johnson, DBMS, Stern/NYU, Sp Complex RA Expressions Scenario: 1. Purchase(pid, seller-ssn, buyer-ssn, etc.) 2. Person(ssn, name, etc.) 3. Product(pid, name, etc.) Q: Who (give names) bought gizmos from Dick? Where to start? Purchase uses pid, ssn, so must get them…
M.P. Johnson, DBMS, Stern/NYU, Sp Complex RA Expressions Person Purchase Person Product name=“Dick” name=“Gizmo” pid ssn seller-ssn=ssnpid=pidbuyer-ssn=Person.ssn name
M.P. Johnson, DBMS, Stern/NYU, Sp Complex RA Expressions Acc(name,ssn,balance) Q: Who has the largest balance? First, get two copies of rel to play with: Acc x a2 (Acc) Now, consider this: a2.bal < Acc.bal (Acc x a2 (Acc)) Q: What does it give us? Now, subtract the names: name (Acc) - a2.name ( a2.bal < Acc.bal (Acc x a2 (Acc)))
M.P. Johnson, DBMS, Stern/NYU, Sp Confession Relations aren’t really sets! They’re bags!
M.P. Johnson, DBMS, Stern/NYU, Sp Bag Theory (5.3) Bags: like sets but elements may repeat “multisets” Set ops change somewhat when applied to bags intuition: pretend identical elements are distinct {a,b,b,c} {a,b,b,b,e,f,f} = {a,a,b,b,b,b,b,c,e,f,f} {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b} {a,b,b,b,c,c} {b,c,c,c,d} = {b,c,c} Reading assignment: 5.3 – 5.4
M.P. Johnson, DBMS, Stern/NYU, Sp Bag theory C (R): preserve the number of occurrences A (R), product, join: no duplicate elimination |R1xR2| = |R1|*|R2| Can convert to sets when necessary Why not sets? Too expensive: Union Projection updates… consider: average( bal (Acc))
M.P. Johnson, DBMS, Stern/NYU, Sp Some surprises in bag theory Be careful about your set theory laws – not all hold in bag theory (R S) – T = (R – T) (S – T) always true in set theory But true in bag theory? suppose t is in R, S and T
M.P. Johnson, DBMS, Stern/NYU, Sp Finally: RA has limitations Cannot compute “transitive closure” Find all direct and indirect relatives of Fred Cannot express in RA! RA is not Turing-Complete Name1Name2Relationship FredMaryFather MaryJoeCousin MaryBillSpouse NancyLouSister
M.P. Johnson, DBMS, Stern/NYU, Sp Extended Operators (5.4) Duplicate-eliminator Lower-case delta Convert to set Aggregation operators Compute functions of tuples Sum, average, etc. Grouping-and-aggregation op lwr-case gamma Partition tuples into groups, then compute function Sorting Lower-case tau Extended projection Project onto new, computed columns Outerjoin Include dangling duples by nulling
M.P. Johnson, DBMS, Stern/NYU, Sp Duplicate elimination AB RAB (R)
M.P. Johnson, DBMS, Stern/NYU, Sp Aggregation operators Numerical: SUM, AVG, MIN, MAX Char: MIN, MAX In lexocographic/alphabetic order Any attribute: COUNT Number of values SUM(B) = 10 AVG(A) = 1.5 MIN(A) = 1 MAX(A) = 4 COUNT(A) = 4 AB R
M.P. Johnson, DBMS, Stern/NYU, Sp Grouping Motivation: Movie(title, year, length, studioName) Q: How many minutes of film have been produced by each studio? Strategy: Divide movies into groups per studio, then add lengths Our expression: Studio,SUM(length) total (Movies) The subscript: list of attributes and aggregations Movies is grouped by these attributes Result includes both
M.P. Johnson, DBMS, Stern/NYU, Sp Grouping example Studio,SUM(length) total (Movies) TitleYearLengthStudio Star Wars Fox Jedi Fox M.Ducks Disney Lion King Disney W.World199295Paramount
M.P. Johnson, DBMS, Stern/NYU, Sp Grouping example Studio,SUM(length) total (Movies) TitleYearLengthStudio Star Wars Fox Jedi Fox M.Ducks Disney Lion King Disney W.World199295Paramount StudioLength Fox225 Disney220 Paramount95
M.P. Johnson, DBMS, Stern/NYU, Sp Extended projection Form: a b, a+b c, a||b d (R) a b rename attribute a as b a+b c create att c as sum of a and b a||b d create att d as concatenation of a, b Example: firstname||“ ”||lastname fullname (R) Replace firstname and lastname fields with a fullname field
M.P. Johnson, DBMS, Stern/NYU, Sp Grouping/extended projection example StarsIn(SName,Title,Year) Q: Find the year of each star’s first movie SName,MIN(year) firstYear (StarsIn) How about Q: Find the span of each star’s career A idea: get max and min and subtract A: SName,MIN(year) firstY,MIN(year) lastY (StarsIn) Now project onto diff: SName,lastY-firstY+1 span ( SName,MIN(year) firstY,MIN(year) lastY (StarsIn))