M.P. Johnson, DBMS, Stern/NYU, Sp20041 C : Database Management Systems Lecture #8 Matthew P. Johnson Stern School of Business, NYU Spring, 2004
M.P. Johnson, DBMS, Stern/NYU, Sp Agenda Last time: Normalization This time: 1. 4NF 2. Relational Algebra Pep talk OHs today, drop-ins (80809)
M.P. Johnson, DBMS, Stern/NYU, Sp Normalization Review Q: What’s required for BCNF? Q: What are the two types of violations? Q: What’s the loophole for 3NF? Q: How do we fix a non-BCNF relation?
M.P. Johnson, DBMS, Stern/NYU, Sp Normalization Review Q: If As Bs violates BCNF, what do we do? Q: In this case, could the decomposition be lossy? Q: How do we combine two relations? Q: Can BCNF decomp. lose FDs? Q: Can 3NF decomp. lose FDs?
M.P. Johnson, DBMS, Stern/NYU, Sp New topic: MVDs (3.7) Consider this relation People ~ their jobs ~ their residences Person-address/city: many-many Person-job: many-many Address/city-job: independent Chappaqua333 Some StreetFirst Lady456Hilary Washington444 Embassy RowFirst Lady456Hilary New York111 East 60 th StreetCEO123Michael London222 Brompton RoadCEO123Michael 444 Embassy Row 333 Some Street 444 Embassy Row 333 Some Street 222 Brompton Road 111 East 60 th Street Streets Lawyer Senator Mayor Jobs Washington456Hilary Chappaqua789Hilary Washington789Hilary Chappaqua456Hilary London123Michael New York123Michael CitysSSNName
M.P. Johnson, DBMS, Stern/NYU, Sp Redundancy in BCNF Lots of redundancy! Key? All fields None determined by others! Non-trivial FDs? None! In BCNF? Yes! NameStreetsCitysJobs Michael111 East 60 th StreetNew YorkMayor Michael222 Brompton RoadLondonMayor Michael111 East 60 th StreetNew YorkCEO Michael222 Brompton RoadLondonCEO Hilary333 Some StreetChappaquaSenator Hilary444 Embassy RowWashingtonSenator Hilary333 Some StreetChappaquaFirst Lady Hilary444 Embassy RowWashingtonFirst Lady Hilary333 Some StreetChappaquaLawyer Hilary444 Embassy RowWashingtonLawyer Now what? New concept, leading to another normal form: Multivalued dependencies
M.P. Johnson, DBMS, Stern/NYU, Sp As Bs if, when As are held fixed values in Bs are independent of values in rest More precisely: if t 1 and t 3 agree on As, we then can find t 2 such that t 2, t 2, t 3 agree on As t 2, t 1 agree of Bs t 2, t 3 agree on Cs MVD definition AsBsCs t1t1 AsBsCst2t2 AsBsCst3t3 |
M.P. Johnson, DBMS, Stern/NYU, Sp MVD example Claim: name streets,cities If true: can pick arbitrary t 1, t 3 and find a t 2 We pick: first and last of Hilary’s tuples: Now: if true, can find another Hilary row with street/address of t 1 and job of t 3 LawyerWashington444 Embassy RowHilary JobsCitysStreetsName SenatorChappaqua333 Some StreetHilary t1t1 t3t3 LawyerChappaqua333 Some StreetHilary t2t2
M.P. Johnson, DBMS, Stern/NYU, Sp MVD example Now: if true, can find another Hilary row with street/address of t 1 and job of t 3 Sure enough: Hilary333 Some StreetChappaquaLawyer t2t2 NameStreetsCitysJobs Michael111 East 60 th StreetNew YorkMayor Michael222 Brompton RoadLondonMayor Michael111 East 60 th StreetNew YorkCEO Michael222 Brompton RoadLondonCEO Hilary333 Some StreetChappaquaSenator Hilary444 Embassy RowWashingtonSenator Hilary333 Some StreetChappaquaFirst Lady Hilary444 Embassy RowWashingtonFirst Lady Hilary333 Some StreetChappaquaLawyer Hilary444 Embassy RowWashingtonLawyer t2t2
M.P. Johnson, DBMS, Stern/NYU, Sp MVD rules No splitting rule: In the example, name streets,cities Do we have name streets? No: 444 Embassy Row doesn’t go with Chappaqua NB: City doesn’t determine street – could have >1 house But city, street aren’t independent NameStreetsCitysJobs Hilary333 Some StreetChappaquaSenator Hilary444 Embassy RowWashingtonLawyer t1t1 t3t3
M.P. Johnson, DBMS, Stern/NYU, Sp MVD rules Trivial dependencies: As Bs iff As BsA i Transitive rule: As Bs, Bs Cs As Cs Complementation rule: As Bs As rest Intuition: if each value in Bs is assoc’ed w/each value in rest, then each value of rest is assoc’ed w/each value in Bs NameStreetsCitysJobs Michael111 East 60 th StreetNew YorkMayor Michael222 Brompton RoadLondonMayor Michael111 East 60 th StreetNew YorkCEO Michael222 Brompton RoadLondonCEO
M.P. Johnson, DBMS, Stern/NYU, Sp MVDs and FDs MVD is a generalization of FD Every FD is an MVD Pf: Suppose As Bs Pick t 1, t 3 that agree on As. Must find a t 2. Let t 2 be t 3. Then1) t 2 agrees on As with both 2) t 2 agrees on Bs with t 1 (why?) 3) t 2 agrees on rest with t 3 (why?) QED
M.P. Johnson, DBMS, Stern/NYU, Sp Fourth Normal Form 4NF: like BCNF, but with MVDs not FDs An MVD As Bs is nontrivial if No Bs are As Some attributes left over (why?) 4NF: for every nontrivial MVD As Bs, As is a superkey In example name streets,cities, but name isn’t a superkey NameStreetsCitysJobs Hilary333 Some StreetChappaquaSenator Hilary444 Embassy RowWashingtonLawyer
M.P. Johnson, DBMS, Stern/NYU, Sp Decomposition to 4NF Again, analogous to BCNF If we can find As Bs for R where As isn’t a superkey, replace R with R 1 (As,Bs) and R 2 (As,rest) Running example: name streets,cities People(name,streets,cities,jobs) becomes Residences(name,street,city) and Employment(name,job)
M.P. Johnson, DBMS, Stern/NYU, Sp NF: another construal In nontrivial As Bs, As must be superkey After df of 4NF, text says: “That is, … every nontrivial MVD is really a FD with a superkey on the left” (p123). We know: FDs are* MVDs but not vice versa So: Why does this follow? Is it true? Yes. As is a superkey As everything As Bs the MVD is an FD Two kinds of MVDs: FDs and “true” MVDs 4NF eliminates exactly the true ones * The typo swapping these was fixed.
M.P. Johnson, DBMS, Stern/NYU, Sp Summary of normal forms Guaranteed to3NFBCFN4NF Eliminate FD redundancy MostlyYes Eliminate MVD redundancy No Yes Preserve FDsYesNo Preserve MVDsNo
M.P. Johnson, DBMS, Stern/NYU, Sp Combined isa/weak example Exercise Convert from E/R to R, by E/R, OO and nulls courses Lab- courses Depts Computer- allocation room number givenBy name chair isa
M.P. Johnson, DBMS, Stern/NYU, Sp Next topic: relational algebra (5.1-2) Set operations: union, intersection, difference Projection, selection Cartesian Product Joins: natural joins, theta joins Combining operations to form queries Dependent and independent operations
M.P. Johnson, DBMS, Stern/NYU, Sp What is relational algebra? An algebra for relations “High-school” algebra is an algebra for numbers Formalism for constructing expressions Operations Operands: Variables, Constants, expressions Expressions: Vars & constants Operators applied to expressions AlgebraVars/constsOperators High-schoolNumbers+ * - / etc. RelationalRelations (=sets of tupes) union, intersection, join, etc.
M.P. Johnson, DBMS, Stern/NYU, Sp Why do we care about relational algebra? Why construct expressions on relations? The exprs are the form questions about the take The relations these exprs cash out to are the answers to our questions First proof of RDBMS/RA concept: System R (1979) Modern implementation of RA: SQL
M.P. Johnson, DBMS, Stern/NYU, Sp Relation operators Five basic operators: Union: Intersection: Difference: - Selection: Projection: Cartesian Product: Derived/auxiliary operators: Intersection, complement Joins (natural, equijoin, theta join, semijoin) Renaming:
M.P. Johnson, DBMS, Stern/NYU, Sp Operators Relations are sets have set-theoretic ops Venn diagrams Union: R1 R2 Example: ActiveEmployees RetiredEmployees Difference: R1 – R2 Example: AllEmployees – RetiredEmployees = ActiveEmployees
M.P. Johnson, DBMS, Stern/NYU, Sp Set operations - example NameAddressGenderBirthdate Fisher123 MapleF9/9/99 Hamill456 OakM8/8/88 NameAddressGenderBirthdate Fisher123 MapleF9/9/99 Ford345 PalmM7/7/77 R: S: NameAddressGenderBirthdate Fisher123 MapleF9/9/99 Hamill456 OakM8/8/88 Ford345 PalmM7/7/77 R S:
M.P. Johnson, DBMS, Stern/NYU, Sp Set operations - example NameAddressGenderBirthdate Fisher123 MapleF9/9/99 Hamill456 OakM8/8/88 NameAddressGenderBirthdate Fisher123 MapleF9/9/99 Ford345 PalmM7/7/77 R: S: R - S: NameAddressGenderBirthdate Hamill456 OakM8/8/88
M.P. Johnson, DBMS, Stern/NYU, Sp Operators Intersection: R1 R2 Example: UnionizedEmployees RetiredEmployees Intersection can be derived from and – R1 R2 = R1 – (R1 – R2) R1 R2 = -(-R1 -R2) (allowed?)
M.P. Johnson, DBMS, Stern/NYU, Sp Set operations - example NameAddressGenderBirthdate Fisher123 MapleF9/9/99 Hamill456 OakM8/8/88 NameAddressGenderBirthdate Fisher123 MapleF9/9/99 Ford345 PalmM7/7/77 R: S: R S: NameAddressGenderBirthdate Fisher123 MapleF9/9/99
M.P. Johnson, DBMS, Stern/NYU, Sp Operators Selection Selects all tuples satisfying a condition Notation: c (R) Examples salary > (Employee) name = “Smith” (Employee) The condition c can have comparison ops:=,, , <> boolean ops: and, or
M.P. Johnson, DBMS, Stern/NYU, Sp Selection example Select the movies at Angelica: Theater=“Angelica” (Showings) City of GodVillageFilm Forum Village N’hood Fog of War City of God Title Angelica Theater Village N’hood Fog of War City of God Title Angelica Theater
M.P. Johnson, DBMS, Stern/NYU, Sp Operators Projection: op we used for decomposition Eliminates columns, then removes duplicates Notation: A1,…,An (R)
M.P. Johnson, DBMS, Stern/NYU, Sp Operators Cartesian Product Cross product Each tuple in R 1 combines w/each tuple in R 2 Notation: R 1 R 2 If R1, R2 fields overlap, include both and disambiguate: R1.A, R2.A Fairly rare in practice used to express joins Q: Where does the name come from? Q: If R1 has n1 rows and R2 has n2, how large is R1 x R2?
M.P. Johnson, DBMS, Stern/NYU, Sp Cartesian product example StreetCity 333 Some StreetChappaqua 444 Embassy RowWashington 333 Some StreetChappaqua Hillary-addresses Job Senator First Lady Lawyer Hillary-jobs StreetCityJob 333 Some StreetChappaquaSenator 444 Embassy RowWashingtonSenator 333 Some StreetChappaquaFirst Lady 444 Embassy RowWashingtonFirst Lady 333 Some StreetChappaquaLawyer 444 Embassy RowWashingtonLawyer Hillary-addresses x Hillary-jobs
M.P. Johnson, DBMS, Stern/NYU, Sp Operators Natural join: our join up to now But always merging shared attributes Notation: R1 ⋈ R2 Meaning: R 1 ⋈ R 2 = every att once ( shared atts = (R 1 R 2 )) I.e., first compute the cross product R 1 x R 2 Next, select the rows in which shared fields agree Finally, project onto the union of R 1 and R 2 ’s fields (remove duplicates)
M.P. Johnson, DBMS, Stern/NYU, Sp Natural join example NameStreetCity Hilary333 Some StreetChappaqua Hilary444 Embassy RowWashington Hilary333 Some StreetChappaqua Addresses NameJob HilarySenator HilaryFirst Lady HilaryLawyer Jobs Addresses ⋈ Jobs NameStreetCityJob Hilary333 Some StreetChappaquaSenator Hilary444 Embassy RowWashingtonSenator Hilary333 Some StreetChappaquaFirst Lady Hilary444 Embassy RowWashingtonFirst Lady Hilary333 Some StreetChappaquaLawyer Hilary444 Embassy RowWashingtonLawyer
M.P. Johnson, DBMS, Stern/NYU, Sp Natural Join R S R ⋈ S= ? Unpaired tuples called dangling AB XY XZ YZ ZV BC ZU VW ZV
M.P. Johnson, DBMS, Stern/NYU, Sp Natural Join Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R ⋈ S ? Given R(A, B, C), S(D, E), what is R ⋈ S? Given R(A, B), S(A, B), what is R ⋈ S?
M.P. Johnson, DBMS, Stern/NYU, Sp Theta Join Like natural join, but includes only rows that satisfy arbitrary condition Does not project away shared attributes R 1 ⋈ R 2 = (R 1 R 2 ) Here can be any condition If condition is always satisfies, then theta join becomes natural join
M.P. Johnson, DBMS, Stern/NYU, Sp Theta-join example ABC BCD AU.BU.CV.BV.CD UV U V A<D
M.P. Johnson, DBMS, Stern/NYU, Sp Equijoin A theta join where is an equality R1 ⋈ A=B R2 = A=B (R1 R2) = lower-case sigma Example: Employee ⋈ SSN=SSN Dependents Most useful join in practice
M.P. Johnson, DBMS, Stern/NYU, Sp Semijoin R ⋉ S = {atts of R} (R ⋈ S) Q: What does this mean? Natural join of R and S; Then project onto R’s atts A: The rows of R for which >1 row in S agree on shared atts
M.P. Johnson, DBMS, Stern/NYU, Sp Semijoin example SSNName... DSSNDnameSSN... Employee Dependents network Employee ⋉ Dependents = { employees who have dependents} Employee ⋉ Dependents = { employees who have dependents}
M.P. Johnson, DBMS, Stern/NYU, Sp Renaming Changes the schema, not the instance Notation: B1,…,Bn (R) is spelled “rho”, pronounced “row” Example: Employee(ssn,name) social, name) (Employee) Or just: (Employee)
M.P. Johnson, DBMS, Stern/NYU, Sp Complex RA Expressions Q: How long was Star Wars (1977)? Strategy: find the row with Star Wars; then project the length field TitleYearLengthinColorStudioPrdcr# Star Wars TrueFox12345 M.Ducks TrueDisney67890 W.World199295TrueParamount99999
M.P. Johnson, DBMS, Stern/NYU, Sp Combining operations Schema: Movies (Title, year, length, filmType, studioName) Query: select titles and years of movies by Fox that are at least 100 minutes long. TitleYearLengthFilmtypeStudio Star wars ColorFox Mighty ducks ColorDisney Wayne’s world199285ColorParamount
M.P. Johnson, DBMS, Stern/NYU, Sp Complex RA Expressions Reps(ssn, name, etc.) Clients(ssn, name, rssn) Q: Find George’s client names Clients.name ( Reps.name=George ( Reps.ssn=rssn ( Reps x Clients))) Or: Clients.name ( Reps.name=George and Reps.ssn=rssn (Reps x Clients)) Or: Clients.name ( Reps.name=George (Reps x Clients) Reps.ssn=rssn (Reps x Clients))
M.P. Johnson, DBMS, Stern/NYU, Sp For next time Finish chapter 5 Come to office hours!