Presentation is loading. Please wait.

Presentation is loading. Please wait.

©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.

Similar presentations


Presentation on theme: "©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design."— Presentation transcript:

1 ©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design Functional Dependencies Decomposition Boyce-Codd Normal Form Third Normal Form (Multivalued Dependencies and Fourth Normal Form) (Overall Database Design Process)

2 ©Silberschatz, Korth and Sudarshan7.2Database System Concepts First Normal Form Domain is atomic if its elements are considered to be indivisible units  Examples of non-atomic domains:  Set of names, composite attributes  Identification numbers like CS101 that can be broken up into parts A relational schema R is in first normal form if the domains of all attributes of R are atomic Non-atomic values complicate storage and encourage redundant (repeated) storage of data  E.g. Instead of a separate relation depositor  set of accounts stored with each customer, and set of owners stored with each account  In this chapter, we assume all relations are in first normal form

3 ©Silberschatz, Korth and Sudarshan7.3Database System Concepts First Normal Form (Cont.) Atomicity is actually a property of how the elements of the domain are used.  Strings would normally be considered indivisible  But  Suppose that students are given roll numbers which are strings of the form CS0012 or EE1127  If the first two characters are extracted to find the department, the domain of roll numbers is not atomic.  Doing so is a bad idea: leads to encoding of information in application program rather than in the database.

4 ©Silberschatz, Korth and Sudarshan7.4Database System Concepts Pitfalls in Relational Database Design Relational database design requires that we find a “good” collection of relation schemas. A bad design may lead to  Repetition of Information.  Inability to represent certain information.

5 ©Silberschatz, Korth and Sudarshan7.5Database System ConceptsExample Consider the relation schema: Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount) Redundancy  Data for branch-name, branch-city, assets are repeated for each loan that a branch makes  Wastes space  Complicates updating, introducing possibility of inconsistency of assets value Null values  Cannot store information about a branch if no loans exist  Can use null values  but, difficult to handle.

6 ©Silberschatz, Korth and Sudarshan7.6Database System Concepts Goal — Devise a Theory for the Following Decide whether a particular relation R is in “good” form. In the case that a relation R is not in “good” form, decompose it into a set of relations {R 1, R 2,..., R n } such that  each relation is in good form  the decomposition is a lossless-join decomposition Our theory is based on:  functional dependencies  multivalued dependencies

7 ©Silberschatz, Korth and Sudarshan7.7Database System Concepts Functional Dependencies A key role in differentiating good database designs from bad database designs. A functional dependency is a generalization of the notion of a key. FDs are constraints on the set of legal relations. Require that the value for a certain set of attributes determines uniquely the value for another set of attributes.

8 ©Silberschatz, Korth and Sudarshan7.8Database System Concepts Functional Dependencies (Cont.) Let R be a relation schema   R and   R The functional dependency    holds on R if and only if for any legal relations r(R), whenever any two tuples t 1 and t 2 of r agree on the attributes , they also agree on the attributes . That is, t 1 [  ] = t 2 [  ]  t 1 [  ] = t 2 [  ] Example: Consider r(A,B) with the following instance of r.  On this instance, A  B does NOT hold, but B  A does hold. 14 1 5 37

9 ©Silberschatz, Korth and Sudarshan7.9Database System Concepts Functional Dependencies (Cont.) K is a superkey for relation schema R if and only if K  R K is a candidate key for R if and only if  K  R, and  for no   K,   R FDs allow us to express constraints that cannot be expressed using superkeys. Consider the schema: Loan-info-schema = (customer-name, loan-number, branch-name, amount). The set of functional dependencies that we expect to hold: loan-number  amount loan-number  branch-name but would not expect the following to hold: loan-number  customer-name (e.g. a loan - husband-wife pair)

10 ©Silberschatz, Korth and Sudarshan7.10Database System Concepts Use of Functional Dependencies We shall use FDs in two ways  To test relations to see whether they are legal under a given set of functional dependencies.  If a relation r is legal under a set F of functional dependencies, we say that r satisfies F.  To specify constraints on the set of legal relations  We say that F holds on R if all legal relations on R satisfy the set of F. Note  A specific instance of a relation schema may satisfy a FD even if the FD does not hold on all legal instances.  E.g. a specific instance of Loan-schema may, by chance, satisfy loan-number  customer-name.

11 ©Silberschatz, Korth and Sudarshan7.11Database System Concepts Functional Dependencies (Cont.) Some functional dependency are said to be trivial if they are satisfied by all relations.  E.g.  customer-name, loan-number  customer-name  customer-name  customer-name  In general, a FD    is trivial if   

12 ©Silberschatz, Korth and Sudarshan7.12Database System Concepts Closure of a Set of Functional Dependencies Given a set F of FDs, we can prove that certain other FDs hold. We say that such FDs are “logically implied” by F.  E.g. If A  B and B  C, then we can infer that A  C The set of all FDs logically implied by F is the closure of F. We denote the closure of F by F +. We can find all of F + by applying Armstrong’s axioms:  If   , then    (reflexivity)  If   , then      (augmentation)  If   , and   , then    (transitivity)  sound and complete Rules from Armstrong’s axioms:  If    and   , then     (union rule)  If    , then    and    (decomposition rule)  If    and    δ, then    δ (pseudotransitivity rule)

13 ©Silberschatz, Korth and Sudarshan7.13Database System ConceptsExample R = (A, B, C, G, H, I) F = { A  B A  C CG  H CG  I B  H } Some members of F +  A  H  By transitivity from A  B and B  H  AG  I  Pseudotransitivity rule from A  C and CG  I  By augmenting A  C with G, to get AG  CG and then transitivity with CG  I  CG  HI  Union rule from CG  H and CG  I  Augmentation of CG  I to infer CG  CGI, augmentation of CG  H to infer CGI  HI, and then transitivity

14 ©Silberschatz, Korth and Sudarshan7.14Database System Concepts Procedure for Computing F + To compute the closure of a set of F: F + = F repeat for each FD f in F + apply reflexivity and augmentation rules on f add the resulting FDs to F + for each pair of FDs f 1 and f 2 in F + if f 1 and f 2 can be combined using transitivity then add the resulting FD to F + until F + does not change any further

15 ©Silberschatz, Korth and Sudarshan7.15Database System Concepts Closure of Attribute Sets Given a set of attributes  define the closure of  under F (denoted by  + ) as the set of attributes that are functionally determined by  under F:    is in F     + Algorithm to compute  + result :=  ; while (changes to result) do for each    in F do begin if   result then result := result   end

16 ©Silberschatz, Korth and Sudarshan7.16Database System Concepts Example of Attribute Set Closure R = (A, B, C, G, H, I) F = { A  B A  C CG  H CG  I B  H } (AG) + 1.result = AG 2.result = ABCG(A  C and A  B) 3.result = ABCGH(CG  H and CG  ABCG) 4.result = ABCGHI(CG  I and CG  ABCGH)

17 ©Silberschatz, Korth and Sudarshan7.17Database System Concepts Uses of Attribute Closure Testing for superkey  To test if  is a superkey, we compute  +, and check if  + contains all attributes of R. Testing functional dependencies  To check if a FD    holds (in other words, is in F + ), just check if    +.  That is, we compute  + by using attribute closure, and then check if it contains . Computing closure of F  For each   R, we find the closure  +, and for each S   +, we output a functional dependency   S.

18 ©Silberschatz, Korth and Sudarshan7.18Database System Concepts Canonical Cover Sets of FDs may have redundant dependencies that can be inferred from the others  E.g. A  C is redundant in {A  B, B  C, A  C}  Parts of a functional dependency may be redundant  E.g. {A  B, B  C, A  CD} can be simplified to {A  B, B  C, A  D} Intuitively, a canonical cover of F is a “minimal” set of functional dependencies equivalent to F, with no redundant dependencies or having redundant parts of dependencies

19 ©Silberschatz, Korth and Sudarshan7.19Database System Concepts Extraneous Attributes Consider a set F and the FD    in F  If A  , to check attribute A is extraneous let  =  -{A}, and check if    can be inferred from F. (compute  + under F, check if  + includes all attributes in  )  If A  , to check attribute A is extraneous consider F’ = (F – {    })  {   (  – A)} and check if   A can be inferred from F’. (compute  + under F’, check if  + includes A) Example: Given F = {A  C, AB  C }  B is extraneous in AB  C because A  C logically implies AB  C. Example: Given F = {A  C, AB  CD}  C is extraneous in AB  CD since AB  C can be inferred even after deleting C

20 ©Silberschatz, Korth and Sudarshan7.20Database System Concepts Canonical Cover A canonical cover for F is a set of dependencies F c such that  F logically implies all dependencies in F c, and  F c logically implies all dependencies in F, and  No functional dependency in F c contains an extraneous attribute, and  Each left side of functional dependency in F c is unique. To compute a canonical cover for F: F c =F repeat Use the union rule to replace any dependencies in F c  1   1 and  1   2 with  1   1  2 Find a functional dependency    with an extraneous attribute either in  or in  If an extraneous attribute is found, delete it from    until F c does not change

21 ©Silberschatz, Korth and Sudarshan7.21Database System Concepts Example R = (A, B, C) F = { A  BC B  C A  B AB  C } Combine A  BC and A  B into A  BC  Set is now {A  BC, B  C, AB  C} A is extraneous in AB  C because B  C logically implies AB  C.  Set is now {A  BC, B  C} C is extraneous in A  BC since A  BC is logically implied by A  B and B  C. The canonical cover is: A  B B  C Note: canonical cover might not be unique.

22 ©Silberschatz, Korth and Sudarshan7.22Database System Concepts Decomposition Decompose the relation schema Lending-schema into: Branch-schema = (branch-name, branch-city,assets) Loan-info-schema = (customer-name, loan-number, branch-name, amount) All attributes of an original schema (R) must appear in the decomposition (R 1, R 2 ): R = R 1  R 2 Lossless-join decomposition. For all possible relations r on schema R r =  R1 (r)  R2 (r) A decomposition of R into R 1 and R 2 is lossless join if and only if at least one of the following dependencies is in F + :  R 1  R 2  R 1  R 1  R 2  R 2

23 ©Silberschatz, Korth and Sudarshan7.23Database System Concepts Normalization Using Functional Dependencies When we decompose a relation schema R with a set of F into R 1, R 2,.., R n we want  Lossless-join decomposition: Otherwise decomposition would result in information loss.  No redundancy: The relations R i preferably should be in either Boyce- Codd Normal Form or Third Normal Form.  Dependency preservation: Let F i be the set of dependencies F + that include only attributes in R i.  Preferably the decomposition should be dependency preserving, that is, (F 1  F 2  …  F n ) + = F +  Otherwise, checking updates for violation of functional dependencies may require computing joins, which is expensive.

24 ©Silberschatz, Korth and Sudarshan7.24Database System ConceptsExample R = (A, B, C) F = {A  B, B  C) R 1 = (A, B), R 2 = (B, C)  Lossless-join decomposition: R 1  R 2 = {B} and B  BC  Dependency preserving R 1 = (A, B), R 2 = (A, C)  Lossless-join decomposition: R 1  R 2 = {A} and A  AB  Not dependency preserving (cannot check B  C without computing R 1 R 2 )

25 ©Silberschatz, Korth and Sudarshan7.25Database System Concepts Boyce-Codd Normal Form     is trivial (i.e.,    )  is a superkey for R A relation schema R is in BCNF with respect to a set F of FDs if for all functional dependencies in F + of the form    , where   R and   R, at least one of the following holds:

26 ©Silberschatz, Korth and Sudarshan7.26Database System ConceptsExample R = (A, B, C) F = {A  B B  C} Key = {A} R is not in BCNF Decomposition R 1 = (A, B), R 2 = (B, C)  R 1 and R 2 in BCNF  Lossless-join decomposition  Dependency preserving

27 ©Silberschatz, Korth and Sudarshan7.27Database System Concepts Third Normal Form There are some situations where  BCNF is not dependency preserving, and  Efficient checking for FD violation on updates is important Solution: Define a weaker normal form, called Third Normal Form.  Allows some redundancy  But, FDs can be checked on individual relations without computing a join.  There is always a lossless-join, dependency-preserving decomposition into 3NF.

28 ©Silberschatz, Korth and Sudarshan7.28Database System Concepts A relation schema R is in third normal form (3NF) if for all:    in F + at least one of the following holds:     is trivial (i.e.,    )   is a superkey for R  Each attribute A in  –  is contained in a candidate key for R. (NOTE: each attribute may be in a different candidate key) If a relation is in BCNF it is in 3NF. Third condition is a minimal relaxation of BCNF to ensure dependency preservation 3NF (Cont.)

29 ©Silberschatz, Korth and Sudarshan7.29Database System Concepts R = (J, K, L) F = {JK  L, L  K} Two candidate keys: JK and JL R is in 3NF JK  LJK is a superkey L  KK is contained in a candidate keyExample

30 ©Silberschatz, Korth and Sudarshan7.30Database System Concepts Design Goals Goal for a relational database design is:  BCNF  Lossless join  Dependency preservation If we cannot achieve this, we accept one of  Lack of dependency preservation  Redundancy due to use of 3NF


Download ppt "©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design."

Similar presentations


Ads by Google