Chapter 8: Relational Database Design First Normal Form First Normal Form Functional Dependencies Functional Dependencies Decomposition Decomposition Boyce-Codd Normal Form Boyce-Codd Normal Form Third Normal Form Third Normal Form
First Normal Form R is in first normal form if the domains of all attributes of R are atomic R is in first normal form if the domains of all attributes of R are atomic In object relational/object oriented databases, attributes can be composite or multi-valued In object relational/object oriented databases, attributes can be composite or multi-valued But in relational databases, composite attributes will need to be flatten out and multi-valued attributes need to be represented by another relation But in relational databases, composite attributes will need to be flatten out and multi-valued attributes need to be represented by another relation
Pitfalls in Relational Database Design We can create tables to represent an ER design in many different ways We can create tables to represent an ER design in many different ways Combine attributes differently to create tables Combine attributes differently to create tables Why do we choose some ways over the others? Why do we choose some ways over the others? Redundancy Redundancy Inability to represent certain information Inability to represent certain information E.g. relationships among attributes E.g. relationships among attributes
ER for Banking Enterprise
Schema Diagram for the Banking Enterprise
Example Consider the relation schema: Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount) Consider the relation schema: Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount) Redundancy, why? Redundancy, why? Inability to represent certain information, why? Inability to represent certain information, why? Cannot store information about a branch if no loans exist Cannot store information about a branch if no loans exist Can use null values, but they are difficult to handle. Can use null values, but they are difficult to handle.
Why Redundancy Is Bad? Wastes space Wastes space Complicates updating, introducing possibility of inconsistency of assets value Complicates updating, introducing possibility of inconsistency of assets value We know why inability to represent certain information is bad. We know why inability to represent certain information is bad.
Decomposition Decompose the relation schema Lending- schema into: Decompose the relation schema Lending- schema into: Branch-schema = (branch-name, branch-city,assets) Loan-info-schema = (customer-name, loan-number, branch-name, amount)
Lossless-join decomposition All attributes of an original schema (R) must appear in the decomposition (R 1, R 2 ): All attributes of an original schema (R) must appear in the decomposition (R 1, R 2 ): R = R 1 R 2 For all possible relations r on schema R For all possible relations r on schema R r = R1 (r) R2 (r)
Non Lossless-Join Decomposition Decomposition of R = (A, B) Decomposition of R = (A, B) R 1 = (A)R 2 = (B) R 1 = (A)R 2 = (B) AB A B 1212 r A(r)A(r) B(r) A (r) B (r) AB We do not loss any tuple but we lose the relationship between A and B
Relational DB Design Process Decide whether a particular relation R is in “good” form. Decide whether a particular relation R is in “good” form. In the case that a relation R is not in “good” form, decompose it into a set of relations {R 1, R 2,..., R n } such that In the case that a relation R is not in “good” form, decompose it into a set of relations {R 1, R 2,..., R n } such that each relation is in good form each relation is in good form the decomposition is a lossless-join decomposition the decomposition is a lossless-join decomposition Our theory is based on: Our theory is based on: functional dependencies functional dependencies multivalued dependencies multivalued dependencies
Functional Dependencies Constraints on the set of legal relations. Constraints on the set of legal relations. Require that the value for a certain set of attributes determines uniquely the value for another set of attributes. Require that the value for a certain set of attributes determines uniquely the value for another set of attributes. A functional dependency is a generalization of the notion of a key. A functional dependency is a generalization of the notion of a key.
Functional Dependencies Let R be a relation schema Let R be a relation schema R and R The functional dependency The functional dependency holds on R if and only if for any legal relations r(R), whenever any two tuples t 1 and t 2 of r agree on the attributes , they also agree on the attributes . That is, holds on R if and only if for any legal relations r(R), whenever any two tuples t 1 and t 2 of r agree on the attributes , they also agree on the attributes . That is, t 1 [ ] = t 2 [ ] t 1 [ ] = t 2 [ ] t 1 [ ] = t 2 [ ] t 1 [ ] = t 2 [ ]
Example Example: Consider r(A,B) with the following instance of r. Example: Consider r(A,B) with the following instance of r. On this instance, A B does NOT hold, but B A does hold. On this instance, A B does NOT hold, but B A does hold. Note: function dependency needs to hold for any possible instance of a relation. Note: function dependency needs to hold for any possible instance of a relation
More Example Consider the schema: Consider the schema: Loan-info-schema = (customer-name, loan- number, branch-name, amount). We expect this set of functional dependencies to hold: loan-number amount loan-number branch-name but would not expect the following to hold: loan-number customer-name (why?)
Keys Defined by Functional Dependencies K is a superkey for relation schema R if and only if K R K is a superkey for relation schema R if and only if K R K is a candidate key for R if and only if K is a candidate key for R if and only if K R, and K R, and for no K, R for no K, R
Example Keys Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount) Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount) Superkeys? Superkeys? Candidate keys? Candidate keys?
Functional Dependencies A functional dependency is trivial if it is satisfied by all instances of a relation A functional dependency is trivial if it is satisfied by all instances of a relation E.g. E.g. customer-name, loan-number customer-name customer-name, loan-number customer-name customer-name customer-name customer-name customer-name In general, is trivial if In general, is trivial if
Closure of Attribute Sets Given a set of attributes define the closure of under F (denoted by + ) as the set of attributes that are functionally determined by under F: is in F + + Given a set of attributes define the closure of under F (denoted by + ) as the set of attributes that are functionally determined by under F: is in F + +
Algorithm to compute + The closure of under F result := ; while (changes to result) do for each in F do begin if result The closure of under F result := ; while (changes to result) do for each in F do begin if result then result := result end then result := result end
Example of Attribute Set Closure R = (A, B, C, G, H, I) R = (A, B, C, G, H, I) F = {A B A C CG H CG I B H} F = {A B A C CG H CG I B H} (AG) + (AG) + 1.result = AG 2.result = ABCG(A C and A B) 3.result = ABCGH(CG H and CG AGBC) 4.result = ABCGHI(CG I and CG AGBCH)
Example Closures Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount) Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount) Closure of attribute set (loan-number)? Closure of attribute set (loan-number)? Closure of attribute set (customer-name, loan- number)? Closure of attribute set (customer-name, loan- number)? Closure of attribute set (branch-name)? Closure of attribute set (branch-name)? Given: Given: loan-number branch-name, amount loan-number branch-name, amount branch-name branch-city, assets branch-name branch-city, assets customer-name,loan-number Lending-schema customer-name,loan-number Lending-schema
Uses of Attribute Closure Testing for superkey and candidate key Testing for superkey and candidate key Testing functional dependencies Testing functional dependencies Computing closure of F Computing closure of F
Testing for Keys To test if is a superkey, we compute +, and check if + contains all attributes of R. To test if is a superkey, we compute +, and check if + contains all attributes of R. To test if is a candidate key, first make sure is a superkey. Then make sure no subset of is a superkey To test if is a candidate key, first make sure is a superkey. Then make sure no subset of is a superkey In the previous example, is AG a candidate key? In the previous example, is AG a candidate key? 1. Is AG a super key? 1. Does AG R? == Is (AG) + R 2. Is any subset of AG a superkey? 1. Does A R? == Is (A) + R 2. Does G R? == Is (G) + R
Example: Testing for keys Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount) Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount) Is (loan-number) a superkey/candidate key? Is (loan-number) a superkey/candidate key? Is (customer-name, loan-number) a superkey/candidate key? Is (customer-name, loan-number) a superkey/candidate key? Is (branch-name) a superkey/candidate key? Is (branch-name) a superkey/candidate key? Is (customer-name, loan-number,branch-name) a superkey/candidate key? Is (customer-name, loan-number,branch-name) a superkey/candidate key? Given: Given: loan-number branch-name,amount loan-number branch-name,amount branch-name branch-city, assets branch-name branch-city, assets customer-name,loan-number Lending-schema customer-name,loan-number Lending-schema
Testing Functional Dependencies To check if a functional dependency holds (or, in other words, is in F + ), just check if +. To check if a functional dependency holds (or, in other words, is in F + ), just check if +.
Example: Testing for functional dependencies Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount) Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount) Does loan-number branch-city, assets hold? Does loan-number branch-city, assets hold? Does customer-name, loan-number, amount amount hold? Does customer-name, loan-number, amount amount hold? Given: Given: loan-number branch-name,amount loan-number branch-name,amount branch-name branch-city, assets branch-name branch-city, assets customer-name,loan-number Lending-schema customer-name,loan-number Lending-schema
Computing Closure of F But what is a closure of F? But what is a closure of F? Given a set F set of functional dependencies, there are certain other functional dependencies that are logically implied by F. Given a set F set of functional dependencies, there are certain other functional dependencies that are logically implied by F. E.g. If A B and B C, then we can infer that A C E.g. If A B and B C, then we can infer that A C The set of all functional dependencies logically implied by F is the closure of F. The set of all functional dependencies logically implied by F is the closure of F. We denote the closure of F by F +. We denote the closure of F by F +.
Armstrong’s Axioms Sound and Complete rules: Sound and Complete rules: if , then (reflexivity) if , then (reflexivity) if , then (augmentation) if , then (augmentation) if , and , then (transitivity) if , and , then (transitivity) These rules are These rules are sound (generate only functional dependencies that actually hold) and sound (generate only functional dependencies that actually hold) and complete (generate all functional dependencies that hold). complete (generate all functional dependencies that hold).
Example R = (A, B, C, G, H, I) F = { A B, A C, CG H, CG I, B H } R = (A, B, C, G, H, I) F = { A B, A C, CG H, CG I, B H } some members of F + some members of F + A H A H by transitivity from A B and B H by transitivity from A B and B H AG I AG I by augmenting A C with G, to get AG CG and then transitivity with CG I by augmenting A C with G, to get AG CG and then transitivity with CG I CG HI CG HI from CG H and CG I : “union rule” can be inferred from from CG H and CG I : “union rule” can be inferred from definition of functional dependencies, or definition of functional dependencies, or Augmentation of CG I to infer CG CGI, augmentation of CG H to infer CGI HI, and then transitivity Augmentation of CG I to infer CG CGI, augmentation of CG H to infer CGI HI, and then transitivity
Closure of Functional Dependencies Derived rules from Armstrong’s axioms Derived rules from Armstrong’s axioms If holds and holds, then holds (union) If holds and holds, then holds (union) If holds, then holds and holds (decomposition) If holds, then holds and holds (decomposition) If holds and holds, then holds (pseudotransitivity) If holds and holds, then holds (pseudotransitivity)
When we compute the closure of attribute set , we already implicitly used Armstrong’s axioms When we compute the closure of attribute set , we already implicitly used Armstrong’s axioms Let us just use the closure of attribute set to calculate the closure of functional dependency set Let us just use the closure of attribute set to calculate the closure of functional dependency set
Now Compute Closure of F For each R, we find the closure +, and for each S +, we output a functional dependency S. For each R, we find the closure +, and for each S +, we output a functional dependency S.
Example: Closure of F Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount) Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount) How to compute the closure of F (F is given below)? How to compute the closure of F (F is given below)? Given F: Given F: loan-number branch-name,amount loan-number branch-name,amount branch-name branch-city, assets branch-name branch-city, assets customer-name,loan-number Lending-schema customer-name,loan-number Lending-schema
We just talked about the maximal set of functional dependencies that can be derived from a functional dependent set F We just talked about the maximal set of functional dependencies that can be derived from a functional dependent set F Closure of F Closure of F What about the minimal set of functional dependencies that is equivalent functional dependent set F What about the minimal set of functional dependencies that is equivalent functional dependent set F Canonical cover of F Canonical cover of F
Canonical Cover Intuitively, a canonical cover of F is a “minimal” set of functional dependencies equivalent to F, having no redundant dependencies or redundant parts of dependencies Intuitively, a canonical cover of F is a “minimal” set of functional dependencies equivalent to F, having no redundant dependencies or redundant parts of dependencies What does “equivalent” mean? What does “equivalent” mean? What does “minimal” mean? What does “minimal” mean?
Equivalent Means: Give two functional dependency sets F and F’, they are equivalent if and only if: Give two functional dependency sets F and F’, they are equivalent if and only if: F logically implies all functional dependencies in F’ F logically implies all functional dependencies in F’ F’ logically implies all functional dependencies in F F’ logically implies all functional dependencies in F We use to represent logically imply and to represent equivalent We use to represent logically imply and to represent equivalent How to test if F F’? How to test if F F’? For each F’, test if it holds in F For each F’, test if it holds in F You may need to calculate + in F You may need to calculate + in F
Minimal Means: No redundant functional dependencies! No redundant functional dependencies! Sets of functional dependencies may have redundant dependencies that can be inferred from the others Sets of functional dependencies may have redundant dependencies that can be inferred from the others Eg: A C is redundant in: {A B, B C, A C} Eg: A C is redundant in: {A B, B C, A C} Parts of a functional dependency may be redundant Parts of a functional dependency may be redundant E.g. on RHS: {A B, B C, A CD} can be simplified to {A B, B C, A D} E.g. on RHS: {A B, B C, A CD} can be simplified to {A B, B C, A D} E.g. on LHS: {A B, B C, AC D} can be simplified to {A B, B C, A D} E.g. on LHS: {A B, B C, AC D} can be simplified to {A B, B C, A D}
Extraneous Attributes For any in F For any in F Attribute A is extraneous if F (F – { }) {( – A) }. Attribute A is extraneous if F (F – { }) {( – A) }. Attribute A is extraneous if F (F – { }) { ( – A)} Attribute A is extraneous if F (F – { }) { ( – A)} Use attribute closures to check equivalence Use attribute closures to check equivalence
Examples Given F = {A C, AB C } Given F = {A C, AB C } B is extraneous in AB C because {A C, AB C} is equivalent to {A C, A C } = {A C} B is extraneous in AB C because {A C, AB C} is equivalent to {A C, A C } = {A C} Given F = {A C, AB CD} Given F = {A C, AB CD} C is extraneous in AB CD because {A C, AB CD} is equivalent to {A C, AB D} C is extraneous in AB CD because {A C, AB CD} is equivalent to {A C, AB D}
Canonical Cover A canonical cover for F is a set of dependencies F c such that A canonical cover for F is a set of dependencies F c such that F logically implies all dependencies in F c, and F logically implies all dependencies in F c, and F c logically implies all dependencies in F, and F c logically implies all dependencies in F, and No functional dependency in F c contains an extraneous attribute, and No functional dependency in F c contains an extraneous attribute, and Each left side of functional dependency in F c is unique. Each left side of functional dependency in F c is unique.
Compute a canonical cover for F repeat Use the union rule to replace any dependencies in F 1 1 and 1 2 with 1 1 2 Find a functional dependency with an extraneous attribute either in or in If an extraneous attribute is found, delete it repeat Use the union rule to replace any dependencies in F 1 1 and 1 2 with 1 1 2 Find a functional dependency with an extraneous attribute either in or in If an extraneous attribute is found, delete it from until F does not change from until F does not change Note: Union rule may become applicable after some extraneous attributes have been deleted, so it has to be re-applied Note: Union rule may become applicable after some extraneous attributes have been deleted, so it has to be re-applied
Example R = (A, B, C) F = {A BC, B C, A B,AB C} R = (A, B, C) F = {A BC, B C, A B,AB C} Combine A BC and A B into A BC Combine A BC and A B into A BC Set is now {A BC, B C, AB C} Set is now {A BC, B C, AB C} Is B extraneous in A BC Is B extraneous in A BC {A BC, B C, AB C} ? {A C, B C, AB C} NO {A BC, B C, AB C} ? {A C, B C, AB C} NO Is C extraneous in A BC Is C extraneous in A BC {A BC, B C, AB C} ? {A B, B C, AB C} Yes {A BC, B C, AB C} ? {A B, B C, AB C} Yes Set is now {A B, B C, AB C} Set is now {A B, B C, AB C} Is A is extraneous in AB C Is A is extraneous in AB C {A B, B C, AB C} ? {A B, B C, B C} Yes {A B, B C, AB C} ? {A B, B C, B C} Yes Set is now {A B, B C} Set is now {A B, B C} The canonical cover is: {A B, B C} The canonical cover is: {A B, B C}
Example: Canonical Cover of F Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount) Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount) How to compute the Canonical Cover of F (F is given below)? How to compute the Canonical Cover of F (F is given below)? Given F: Given F: loan-number branch-name,branch- city,assets,amount loan-number branch-name,branch- city,assets,amount branch-name branch-city, assets branch-name branch-city, assets customer-name,loan-number Lending-schema customer-name,loan-number Lending-schema