1 Functional Dependencies Why FD's Meaning of FD’s Keys and Superkeys Inferring FD’s Source: slides by Jeffrey Ullman
2 Improving Relation Schemas uSet of relation schemas obtained by translating from an E/R diagram might need improvement wmodeling with E/R diagrams is an art, not a science; relies on experience and intuition wmultiple alternative designs are possible, how to choose? uProblems caused by redundant storage of info wwasted space wanomalies when updating, inserting or deleting tuples uBasic idea: replace a relation schema with a collection of "smaller" schemas. Called decomposition
3 Improving Relation Schemas uThere is a theory for systematically guiding the improvement of relational designs, called normalization uNormalization uses the notion of "constraints" on the information wfunctional dependencies wmulti-valued dependencies wreferential integrity constraints
4 Functional Dependencies uX -> A is an assertion about a relation R that whenever two tuples of R agree on all the attributes of X, then they must also agree on the attribute A. wSay “X -> A holds in R.” wConvention: …, X, Y, Z represent sets of attributes; A, B, C,… represent single attributes. wConvention: no set formers in sets of attributes, just ABC, rather than {A,B,C }.
5 Example Consumers(name, addr, candiesLiked, manf, favCandy) uReasonable FD’s to assert: wname -> addr wname -> favCandy wcandiesLiked -> manf
6 Example Data nameaddr candiesLiked manffavCandy JanewayVoyager Twizzlers HersheySmarties JanewayVoyager Smarties NestleSmarties SpockEnterprise Twizzlers HersheyTwizzlers Because name -> addr Because name -> favCandy Because candiesLiked -> manf
7 FD’s With Multiple Attributes uNo need for FD’s with > 1 attribute on right. wBut sometimes convenient to combine FD’s as a shorthand. wExample: name -> addr and name -> favCandy become name -> addr favCandy u > 1 attribute on left may be essential. wExample: store candy -> price
8 Keys of Relations uK is a superkey for relation R if K functionally determines all of R. uK is a key for R if K is a superkey, but no proper subset of K is a superkey.
9 Example Consumers(name, addr, candiesLiked, manf,favCandy) u {name, candiesLiked} is a superkey because together these attributes determine all the other attributes. wname -> addr favCandy wcandiesLiked -> manf
10 Example, Cont. u{name, candiesLiked} is a key because neither {name} nor {candiesLiked} is a superkey. wname doesn’t -> manf; candiesLiked doesn’t -> addr. uThere are no other keys, but lots of superkeys. wAny superset of {name, candiesLiked}.
11 E/R and Relational Keys uKeys in E/R concern entities. uKeys in relations concern tuples. uUsually, one tuple corresponds to one entity, so the ideas are the same. uBut --- in poor relational designs, one entity can become several tuples, so E/R keys and Relational keys are different.
12 Example Data nameaddr candiesLiked manffavCandy JanewayVoyager Twizzlers HersheySmarties JanewayVoyager Smarties NestleSmarties SpockEnterprise Twizzlers HersheyTwizzlers Relational key = {name candiesLiked} But in E/R, name is a key for Consumers, and candiesLiked is a key for Candies. Note: 2 tuples for Janeway entity and 2 tuples for Twizzlers entity.
13 Discovering Keys uSuppose schema was obtained from an E/R diagram. uIf relation R came from an entity set E, then key for R is the keys of E uIf R came from a binary relationship from E1 to E2: wmany-many: key for R is the keys of E1 and E2 wmany-one: key for R is the keys for E1 (but not the keys for E2) wone-one: key for R is the keys for E1; another key for R is the keys for E2
14 Key Example Con- sumers CandiesLikes Likes(consumer, candy) Favorite Favorite(consumer, candy) Married husband wife Married(husband, wife) name addr name manf Buddies 1 2 Buddies(name1, name2) key: consumer candy key: name1 name2 key: consumer keys: husband or wife
15 More FD’s From Application uExample: “no two courses can meet in the same room at the same time” tells us: hour room -> course. uUltimately, FD's and keys come from the semantics of the application. uFD's and keys apply to the schema, not to specific instantiations of the relations
16 Manipulating FD's uNeed to be able to reason about FD's in order to support the normalization process (improving relational schemas) uSome preliminaries: wcan split FD X -> A B into X -> A, X -> B wcan combine FD's X -> A, X -> B into FD X -> A B wSince A -> A is trivially true, no point in having any attribute on the RHS that is also on the LHS
17 Closure of a Set of FD's uIf we have FD's A -> B and B -> C, then it is also true that A -> C. wEx: If name -> address and address -> phone, then name -> phone. uWhat about a chain of such deductions? uCalled closure
18 FD Closure Algorithm Input: a set of attributes {A1,…,An} and a set of FD's S 1.Z := {A1,…,An} 2.find an FD in S of the form X -> C such that all the attributes in X are in Z but C is not in Z. Add C to Z 3.repeat step 2 until there is nothing more that can be put in Z 4.return Z as the closure of {A1,…,An}
19 Example of Closure Algorithm uGiven relation with attributes A, B, C, D, E, F and FD's wAB -> C wBC -> A, BC -> D wD -> E wCF -> B uCompute closure of {A,B}. uAnswer: wZ := {A,B} wadd C wadd A and D wadd E wfinal answer is Z = {A,B,C,D,E}
20 Use of Closure Algorithm uNow we can check if a particular FD A1 … An -> B follows from a set of FD's S: wcompute {A1,…,An} + using S wif B is in the closure, then the FD follows wotherwise it does not
21 Example uGiven same set of attributes and FD's as in previous example: uDoes AB -> D follow? wcompute {A,B}+ = {A,B,C,D,E}. Since D is in {A,B}+, YES. uDoes D -> A follow? wcompute {D}+ = {D,E}. Since A is not in {D}+, NO.
22 Finding All Implied FD’s uMotivation: “normalization,” the process where we break a relation schema into two or more schemas. uExample: ABCD with FD’s AB ->C, C ->D, and D ->A. wDecompose into ABC, AD. What FD’s hold in ABC ? wNot only AB ->C, but also C ->A !
23 Why? a1b1ca1b1c ABC ABCD a2b2ca2b2c Thus, tuples in the projection with equal C’s have equal A’s; C -> A. a 1 b 1 cd 1 a 2 b 2 cd 2 comes from d 1 =d 2 because C -> D a 1 =a 2 because D -> A
24 Basic Idea uStart with given FD’s and find all nontrivial FD’s that follow from the given FD’s. wNontrivial = left and right sides disjoint. uRestrict to those FD’s that involve only attributes of the projected schema.
25 Simple, Exponential Algorithm uFor each set of attributes X, compute X +. uAdd X ->A for all A in X + - X. uHowever, drop XY ->A whenever we discover X ->A. wBecause XY ->A follows from X ->A in any projection. uFinally, use only FD’s involving projected attributes.
26 A Few Tricks uNo need to compute the closure of the empty set or of the set of all attributes. uIf we find X + = all attributes, so is the closure of any superset of X.
27 Example uABC with FD’s A ->B and B ->C. Project onto AC. wA + =ABC ; yields A ->B, A ->C. We do not need to compute AB + or AC +. wB + =BC ; yields B ->C. wC + =C ; yields nothing. wBC + =BC ; yields nothing.
28 Example --- Continued uResulting FD’s: A ->B, A ->C, and B - >C. uProjection onto AC : A ->C. wOnly FD that involves a subset of {A,C }.
29 A Geometric View of FD’s uImagine the set of all instances of a particular relation. uThat is, all finite sets of tuples that have the proper number of components. uEach instance is a point in this space.
30 Example: R(A,B) {(1,2), (3,4)} {} {(1,2), (3,4), (1,3)} {(5,1)}
31 An FD is a Subset of Instances uFor each FD X -> A there is a subset of all instances that satisfy the FD. uWe can represent an FD by a region in the space. uTrivial FD = an FD that is represented by the entire space. wExample: A -> A.
32 Example: A -> B for R(A,B) {(1,2), (3,4)} {} {(1,2), (3,4), (1,3)} {(5,1)} A -> B
33 Representing Sets of FD’s uIf each FD is a set of relation instances, then a collection of FD’s corresponds to the intersection of those sets. wIntersection = all instances that satisfy all of the FD’s.
34 Example A->B B->C CD->A Instances satisfying A->B, B->C, and CD->A
35 Implication of FD’s uIf an FD Y -> B follows from FD’s X 1 - > A 1,…,X n -> A n, then the region in the space of instances for Y -> B must include the intersection of the regions for the FD’s X i -> A i. wThat is, every instance satisfying all the FD’s X i -> A i surely satisfies Y -> B. wBut an instance could satisfy Y -> B, yet not be in this intersection.
36 Example A->B B->C A->C