M.P. Johnson, DBMS, Stern/NYU, Sp20041 C : Database Management Systems Lecture #5 Matthew P. Johnson Stern School of Business, NYU Spring, 2004
M.P. Johnson, DBMS, Stern/NYU, Sp Agenda Last time: relational model Homework 1 is up, due next Tuesday This time: 1. Functional dependencies Keys and superkeys in terms of FDs Finding keys for relations 2. Rules for combining FDs Next time: anomalies & normalization
M.P. Johnson, DBMS, Stern/NYU, Sp Review: converting inheritance Movies Cartoons Murder- Mysteries isa Voices Weapon stars length titleyear Lion King Component
M.P. Johnson, DBMS, Stern/NYU, Sp Review: converting inheritance OO-style: Nulls-style: TitleYearlength Star Wars TitleYearlengthMurder-Weapon Scream Knife TitleYearlength Lion King TitleYearlengthMurder-Weapon Roger Rabbit Knife 1. Plain movies3. Murder mysteries 2. Cartoons4. Cartoon murder-mysteries TitleYearlengthMurder-Weapon Star Wars NULL Lion King NULL Scream Knife Roger Rabbit Knife
M.P. Johnson, DBMS, Stern/NYU, Sp Review: converting inheritance E/R-style Root entity set: Movies(title, year, length) Lion King Year length Roger Rabbit Scream Star Wars Title Knife1990R. Rabbit 1988 Year Knife murderWeapon Scream Title Subclass: MurderMysteries(title, year, murderWeapon) Subclass: Cartoons(title, year) 1993Lion King 1990 Year Roger Rabbit Title
M.P. Johnson, DBMS, Stern/NYU, Sp E/R-style & quasi-redundancy Name and year of Roger Rabbit were listed in three different rows (in different tables) Suppose title changes (“Roger” “Roget”) must change all three places Q: Is this redundancy? A: No! name and year are independent multiple movies may have same name Real redundancy reqs. depenency two rows agree on SSN must agree on rest conflicting hair colors in these rows is an error two rows agree on movie title may still disagree conflicting years may be correct – or may not be Better: introduce “movie-id” key att
M.P. Johnson, DBMS, Stern/NYU, Sp Combined isa/weak example Exercise Convert from E/R to R, by E/R, OO and nulls courses Lab- courses Depts Computer- allocation room number givenBy name chair isa
M.P. Johnson, DBMS, Stern/NYU, Sp Next topic: Functional dependencies (3.4) FDs are constraints part of the schema can’t tell from particular relation instances FD may hold for some instances “accidentally” Finding all FDs is part of DB design Used in normalization
M.P. Johnson, DBMS, Stern/NYU, Sp Functional dependencies Definition: Notation: Read: A i functionally determines B j If two tuples agree on the attributes A 1, A 2, …, A n then they must also agree on the attributes B 1, B 2, …, B m A 1, A 2, …, A n B 1, B 2, …, B m
M.P. Johnson, DBMS, Stern/NYU, Sp Typical Examples of FDs Product name price, manufacturer Person ssn name, age father’s/husband’s-name last-name zipcode state phone state (notwithstanding inter-state area codes) Company name stockprice, president symbol name name symbol
M.P. Johnson, DBMS, Stern/NYU, Sp To check A B, erase all other columns; for each rows t 1, t 2 i.e., check if remaining relation is many-one no “divergences” i.e., if A B is a well-defined function thus, functional dependency Functional dependencies BmBm...B1B1 AmAm A1A1 t1t1 t2t2 if t 1, t 2 agree here then t 1, t 2 agree here
M.P. Johnson, DBMS, Stern/NYU, Sp FDs Example Product(name, category, color, department, price) name color category department color, category price name color category department color, category price Consider these FDs: What do they say ?
M.P. Johnson, DBMS, Stern/NYU, Sp FDs Example FDs are constraints: On some instances they hold On others they don’t namecategorycolordepartmentprice GizmoGadgetGreenToys49 TweakerGadgetGreenToys99 Does this instance satisfy all the FDs ? name color category department color, category price name color category department color, category price
M.P. Johnson, DBMS, Stern/NYU, Sp FDs Example namecategorycolordepartmentprice GizmoGadgetGreenToys49 TweakerGadgetBlackToys99 GizmoStationaryGreen Office- supp. 59 What about this one ? name color category department color, category price name color category department color, category price
M.P. Johnson, DBMS, Stern/NYU, Sp Q: Is Position Phone an FD here? A: It is for this instance, but no, presumably not in general Others FDs? EmpID Name, Phone, Position but Phone Position Recognizing FDs EmpIDNamePhonePosition E0045Smith1234Clerk E1847John9876Salesrep E1111Smith9876Salesrep E9999Mary1234Lawyer
M.P. Johnson, DBMS, Stern/NYU, Sp Keys of relations {A 1 A 2 A 3 …A n } is a key for relation R if 1. A 1 A 2 A 3 …A n functionally determine all other attributes Usual notation: A 1 A 2 A 3 …A n B 1 B 2 …B k rels = sets distinct rows can’t agree on all A i 2. A 1 A 2 A 3 …A n is minimal No proper subset of A 1 A 2 A 3 …A n functionally determines all other attributes of R Primary key: chosen if there are several possible keys
M.P. Johnson, DBMS, Stern/NYU, Sp Keys example Relation: Student(Name, Address, DoB, , Credits) Which (/why) of the following are keys? Name, Address Name, SSN , SSN SSN NB: minimal != smallest
M.P. Johnson, DBMS, Stern/NYU, Sp Superkeys A set of attributes that contains a key Satisfies first condition: functionally determines every other attribute in the relation Might not satisfy the second condition: minimality may be possible to peel away some attributes from the superkey keys are superkeys key are special case of superkey superkey set is superset of key set name;ssn is a superkey but not a key
M.P. Johnson, DBMS, Stern/NYU, Sp Discovering keys for relations Relation entity set Key of relation = (minimized) key of entity set Relation binary relationship Many-many: union of keys of both entity sets Many(M)-one(O): only key of M (Why?) One-one: key of either entity set (but not both!)
M.P. Johnson, DBMS, Stern/NYU, Sp Example – entity sets Key of entity set = (minimized) key of relation Student(Name, Address, DoB, SSN, , Credits) Student Name Address DoB SSN Credits
M.P. Johnson, DBMS, Stern/NYU, Sp Example – many-many Many-many key: union of both ES keys StudentEnrolls Course SSNCredits CourseID Name Enrolls(SSN,CourseID)
M.P. Johnson, DBMS, Stern/NYU, Sp Example – many-one Key of many ES but not of one ES keys from both would be non-minimal CourseMeetsIn Room CourseIDName RoomNo Capacity MeetsIn(CourseID,RoomNo)
M.P. Johnson, DBMS, Stern/NYU, Sp Example – one-one Keys of both ESs included in relation Key is key of either ES (but not both!) HusbandsMarriedWives SSNName SSN Name Married(HSSN, WSSN) or Married(HSSN, WSSN)
M.P. Johnson, DBMS, Stern/NYU, Sp Discovering keys: multiway Multiway relationships: Multiple ways – may not be obvious R:F,G,H E is many-one E’s key is included but not part of key Recall that relship atts are implicitly many-one CourseEnrolls Student CourseID Name SSN NameSection RoomNo Capacity Enrolls(CourseID,SSN,RoomNo)
M.P. Johnson, DBMS, Stern/NYU, Sp Example Exercise Relation relating particles in a box to locations and velocities InPosition(id,x,y,z,vx,vy,vz) Q: What FDs hold? Q: What are the keys?
M.P. Johnson, DBMS, Stern/NYU, Sp Combining FDs (3.5) If some FDs are satisfied, then others are satisfied too If all these FDs are true: name color category department color, category price name color category department color, category price Then this FD also holds: name, category price Why?
M.P. Johnson, DBMS, Stern/NYU, Sp Rules for FDs Reasoning about FDs: given a set of FDs, infer other FDs – useful E.g. A B, B C A C Definitions: for FD-sets S and T T follows from S if all relation-instances satisfying S also satisfy T. S and T are equivalent if the sets of relation- instances satisfying S and T are the same. I.e., S and T are equivalent if S follows from T, and T follows from S.
M.P. Johnson, DBMS, Stern/NYU, Sp Splitting & combining FDs Splitting rule: Combining rule: Note: doesn’t apply to the left side Q: Does it apply to the left side? A 1 A 2 …A n B 1 B 2 …B m A 1, A 2, …, A n B 1 A 1, A 2, …, A n B A 1, A 2, …, A n B m A 1, A 2, …, A n B 1 A 1, A 2, …, A n B A 1, A 2, …, A n B m Bm...B1Am...A1 t1 t2
M.P. Johnson, DBMS, Stern/NYU, Sp Reflexive rule: trivial FDs FD A 1 A 2 …A n B 1 B 2 …B k may be Trivial: Bs are a subset of As Nontrivial: >=1 of the Bs is not among the As Completely nontrivial: none of the Bs is among the As Trivial elimination rule: Eliminate common attributes from Bs, to get an equivalent completely nontrivial FD A 1, A 2, …, A n A i with i in 1..n is a trivial FD A1A1 …AnAn t t’
M.P. Johnson, DBMS, Stern/NYU, Sp Transitive rule If and then A 1, A 2, …, A n B 1, B 2, …, B m B 1, B 2, …, B m C 1, C 2, …, C p A 1, A 2, …, A n C 1, C 2, …, C p A1A1 …AmAm B1B1 …BmBm C1C1...CpCp t t’
M.P. Johnson, DBMS, Stern/NYU, Sp Augmentation rule If then A 1, A 2, …, A n B A 1, A 2, …, A n, C B, C, for any C A1A1 …AmAm B1B1 …BmBm C1C1...CpCp t t’
M.P. Johnson, DBMS, Stern/NYU, Sp Rules summary 1. A B AC B (by definition) 2. Separation/Combination 3. Reflexive 4. Augmentation 5. Transitivity Last 3 called Armstrong’s Axioms Complete: entire closure follows from these Sound: no other FDs follow from these
M.P. Johnson, DBMS, Stern/NYU, Sp Inferring FDs example Start from the following FDs: Infer the following FDs: 1. name color 2. category department 3. color, category price 1. name color 2. category department 3. color, category price Inferred FD Which Rule did we apply? 4. name, category name 5. name, category color 6. name, category category 7. name, category color, category 8. name, category price Reflexive rule Transitivity(4,1) Reflexive rule combine(5,6) or Aug(1) Transitivity(3,7)
M.P. Johnson, DBMS, Stern/NYU, Sp Problem: infer ALL FDs Given a set of FDs, infer all possible FDs How to proceed? Try all possible FDs, apply all rules E.g. R(A, B, C, D): how many FDs are possible? Drop trivial FDs, drop augmented FDs Still way too many Better: use the Closure Algorithm (next)
M.P. Johnson, DBMS, Stern/NYU, Sp Closure of a set of Attributes Given a set of attributes A 1, …, A n The closure, {A 1, …, A n } + = {B in Atts: A 1, …, A n B} Given a set of attributes A 1, …, A n The closure, {A 1, …, A n } + = {B in Atts: A 1, …, A n B} name color category department color, category price name color category department color, category price Example: Closures: {name} + = {name, color} {name, category} + = {name, category, color, department, price} {color} + = {color}
M.P. Johnson, DBMS, Stern/NYU, Sp Closure Algorithm Start with X={A 1, …, A n }. Repeat: if B 1, …, B n C is a FD and B 1, …, B n are all in X then add C to X. until X didn’t change Start with X={A 1, …, A n }. Repeat: if B 1, …, B n C is a FD and B 1, …, B n are all in X then add C to X. until X didn’t change {name, category} + = {name, category, color, department, price} name color category department color, category price name color category department color, category price Example:
M.P. Johnson, DBMS, Stern/NYU, Sp Example Compute {A,B} + X = {A, B, } Compute {A, F} + X = {A, F, } R(A,B,C,D,E,F) A, B C A, D E B D A, F B A, B C A, D E B D A, F B In class:
M.P. Johnson, DBMS, Stern/NYU, Sp More definitions Given FDs versus Derived FDs Given FDs: stated initially for the relation Derived FDs: those that are inferred using the rules or through closure Basis: A set of FDs from which we can infer all other FDs for the relation Minimal basis: a minimal set of FDs (Can’t discard any of them) No proper subset of the minimal basis can derive the complete set of FDs
M.P. Johnson, DBMS, Stern/NYU, Sp Problem: find all FDs Given a database instance Find all FD’s satisfied by that instance Useful if we don’t get enough information from our users: need to reverse engineer a data instance Q: How long does this take? A: Some time for each subset of atts Q: How many subsets? powerset exponential time in worst-case But can often be smarter…
M.P. Johnson, DBMS, Stern/NYU, Sp Using Closure to Infer ALL FDs A, B C A, D B B D A, B C A, D B B D Example: Step 1: Compute X +, for every X: A + = A, B + = BD, C + = C, D + = D AB + = ABCD, AC + = AC, AD + = ABCD ABC + = ABD + = ACD + = ABCD (no need to compute– why?) BCD + = BCD, ABCD + = ABCD A + = A, B + = BD, C + = C, D + = D AB + = ABCD, AC + = AC, AD + = ABCD ABC + = ABD + = ACD + = ABCD (no need to compute– why?) BCD + = BCD, ABCD + = ABCD Step 2: Enumerate all FD’s X Y, s.t. Y X + and X Y = : AB CD, AD BC, ABC D, ABD C, ACD B
M.P. Johnson, DBMS, Stern/NYU, Sp Example R(A,B,C) Each of three determines other two Q: What are the FDs? Closure of singleton sets Closure of doubletons Q: What are the keys? Q: What are the minimal bases?
M.P. Johnson, DBMS, Stern/NYU, Sp Computing Keys Rephrasing of definition of key: X is a key if X + = all attributes X is minimal Note: there can be many minimal keys ! Example: R(A,B,C), AB C, BC A Minimal keys: AB and BC
M.P. Johnson, DBMS, Stern/NYU, Sp Examples of Keys Product(name, price, category, color) name, category price category color Keys are: {name, category} Enrollment(student, address, course, room, time) student address room, time course student, course room, time Keys are: [in class]