Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database Management Systems Course 236363 Faculty of Computer Science Technion – Israel Institute of Technology Lecture 7: Schema Normalization.

Similar presentations


Presentation on theme: "Database Management Systems Course 236363 Faculty of Computer Science Technion – Israel Institute of Technology Lecture 7: Schema Normalization."— Presentation transcript:

1 Database Management Systems Course 236363 Faculty of Computer Science Technion – Israel Institute of Technology Lecture 7: Schema Normalization

2 Schema Anomalies Redundant storage –Repeatedly storing the same information Update anomaly –To change a repeated item, every occurrence should be changed Insertion anomaly –Some information cannot be stored without additional (possibly unavailable) information Deletion anomaly –Some information cannot be deleted without deleting additional (possibly desired) information studentemailcourseyearsemesterlecturer Almaalma@csDB2013SpringEldar Amiramir@eeDB2014WinterRoy

3 From ERD to Normalization We have learned how to design schemas using ERDs But it is often not enough for a proper translation into well designed relations ERD is limited in constraint representation; we need a more careful design to enforce such constraints It may be challenging to avoid anomalies when dependencies are complicated 3

4 Example name id addressStudent Enroll Track date name faculty campus Consult Consultant

5 Example A track has at most one consultant per faculty A track is contained in a single campus A consultant belongs to a single campus and faculty A faculty is in a single campus trackconsultantcampusfaculty track,facultyconsultant trackcampus consultantcampus,faculty trackconsultantfaculty facultycampus trackcampus What makes it “good”? Are there principles to follow? Can design be automated? Trackname faculty campus Consult Consultant facultycampus

6 The Refined Design Process (Normalization) 6 1. Define the involved attributes 2. Determine what dependencies hold in real life 3. Decide on desired properties 4. Decompose into multiple good (“normalized”) schemas 2a. Determine implied dependencies Which FDs allowed? Should all dependencies be enforced?

7 Notation During this lecture, we focus on schemas of a special type: a single relation over a set U of attributes, and a set F of FDs So, during this lecture a schema is simply a pair (U,F) where: – U is a finite set of attributes – F is a set of FDs over U –(In particular, we ignore the relation name and order among attributes) 7

8 Basic Terminology Let (U,F) be a schema Recall: A superkey is a set K of attributes such that K + contains every attribute in U Recall: A key is a superkey K that does not contain any other superkey –That is, if Y ⊊ K then Y is not a superkey Attributes of keys are called prime “Schema normalization” deals with the relationship between keys, prime attributes and nonprime attributes 8

9 History of Normal Forms 9 1970 1NF 2NF Nonprime attributes are not dependent on strict parts of keys Codd 3NF DB looks like a logical structure; assumed by default “Standard” normal form: a nonprime attribute can be determined only by a superkey 1971 BCNF 1974 Boyce & Codd 1977 4NF Fagin 1979 5NF A relation does not involve any “implicit” joins No nontrivial MVDs except for superkey No nontrivial FDs except for superkeys

10 Introduction Normal Forms  BCNF  3NF Decomposition  NF Decompositions  Preserving Data  Preserving Dependencies Decomposition Algorithms  BCNF  3NF  Note on 4NF Outline 10

11 Our Focus We mainly focus on BCNF and 3NF –Historically BCNF came after 3NF, but we start with BCNF since it is simpler In the end we will briefly review 4NF

12 Introduction Normal Forms  BCNF  3NF Decomposition  NF Decompositions  Preserving Data  Preserving Dependencies Decomposition Algorithms  BCNF  3NF  Note on 4NF Outline 12

13 Boyce-Codd Normal Form (BCNF) A schema (U,F) is in BCNF if every nontrivial FD implied by F has a superkey on its premise (lhs) That is, every XY in F + is such that – X is a superkey; or – Y ⊆ X

14 Examples 14 name, symbol,dean namesymbol, symboldean, deanname Faculty: BCNF follows,followed,fid follows,followedfid, fidfollows,followed Social network: BCNF track,faculty,consultant,campus Tracks: not BCNF state,city,street,zip state,city,streetzip, zipstate Address: not BCNF track,facultyconsultant, consultantfaculty, trackcampus, facultycampus

15 Can BCNF be Tested Efficiently? On the face of it, we need to consider every derived FD (exponentially many); however: T HEOREM : The following are equivalent: 1.The schema (U,F) is in BCNF (i.e., every nontrivial XY in F + is such that X is a superkey) 2.In every nontrivial XY in F, X is a superkey Hence, it suffices to check F Proof not given –But which direction is straightforward? So what would be an efficient BCNF testing? 15 Answer: For each XY in F, test whether U=Closure(F,X)

16 Introduction Normal Forms  BCNF  3NF Decomposition  NF Decompositions  Preserving Data  Preserving Dependencies Decomposition Algorithms  BCNF  3NF  Note on 4NF Outline 16

17 Third Normal Form (3NF) Recall: an attribute A is prime if it is a part of some key A schema is in 3NF if for every nonprime A and nontrivial derived XA, the set X is a superkey Equivalently, for every XA in F + at least one of the following holds: – X is a superkey –A∈X–A∈X – A is prime 17

18 Examples 18 name, symbol,dean namesymbol, symboldean, deanname Faculty: BCNF follows,followed,fid follows,followedfid, fidfollows,followed Social network: BCNF state,city,street,zip state,city,streetzip, zipstate Address: not BCNF 3NF not 3NF track,faculty,consultant,campus Tracks: not BCNF track,facultyconsultant, consultantfaculty, trackcampus, facultycampus

19 Testing 3NF The following algorithm works: For every nontrivial FD XY in F 1.Check whether X is a superkey 2.Check whether every attribute in Y\X is prime As we know, (1) can be tested efficiently What about (2)? –It is NP-complete! (hence, it is unlikely that it is solvable in polynomial time) And in fact, testing whether a schema is in 3NF is an NP-complete problem [Jou, Fischer1982] 19

20 Introduction Normal Forms  BCNF  3NF Decomposition  NF Decompositions  Preserving Data  Preserving Dependencies Decomposition Algorithms  BCNF  3NF  Note on 4NF Outline 20

21 Decomposition We can fix a “badly designed” schema by decomposing it into several smaller schemas But we need to do so correctly! –Do not change our intended information –Do not violate the FDs –Get a “well designed” fixed schema In this part, we will make the above formal First, we need a notation 21

22 Restricting a Set of FDs Let (U,F) be a schema, and let W be a subset of U We denote by F[W] the set of all the FDs XY in F + such that XY ⊆ W 22

23 Formal Definition A decomposition of a schema (U,F) is a collection (X 1,F 1 ), …, (X k,F k ) of schemas such that: – U = X 1 ∪⋯∪ X k That is, the X i cover all the attributes in U –For i=1,…,k we have (F i ) + =F + [X i ]= F [X i ] That is, each F i consists of the FDs imposed by F on X i 23

24 Decomposing and Composing Relations 24 A π X1X1 X2X2 X3X3 ⋯ XkXk r r1r1 r2r2 r3r3 rkrk (U,F)(X 1,F 1 )(X 2,F 2 )(X 3,F 3 )(X k,F k )

25 Representing F i Given the schema (U,F), it suffices to represent a decomposition using the collection {X 1,…, X k } without mentioning the FDs F i  Since F i can be F [X i ] up to equivalence Problem: naively constructing F i as F + [X i ] can be impractical, since F + and F + [X i ] can be exponentially larger than U –This problem is unavoidable: It may be that F + [X i ] is not equivalent to any sub-exponential #FDs! –We keep this problem in mind – we will not assume that F + [X i ] can be instantiated efficiently 25

26 Introduction Normal Forms  BCNF  3NF Decomposition  NF Decompositions  Preserving Data  Preserving Dependencies Decomposition Algorithms  BCNF  3NF  Note on 4NF Outline 26

27 Obtaining Normal Forms Let N be a normal form (e.g., 3NF, BCNF) An N decomposition of a schema (U,F) is a decomposition {X 1,…,X k } of (U,F) such that each (X i,F[X i ]) is in N We will discuss 3NF decompositions and BCNF decompositions 27

28 Examples 28 ABCD AB, BC, ABCD, DB BC BCBC BD AD DBDBADAD 3NF decomposition? BCNF decomposition? ABCD ABC AD ABC CB ABC CB ABCD ABC ACD ABBCACABBCAC AB BC CD CD ACD Answer: BCNF, 3NF Answer: 3NF, not BCNF Answer: not 3NF, not BCNF

29 Introduction Normal Forms  BCNF  3NF Decomposition  NF Decompositions  Preserving Data  Preserving Dependencies Decomposition Algorithms  BCNF  3NF  Note on 4NF Outline 29

30 Good Decomposition? 30 personbuildingroom AlmaTaub152 AmirMeyer35 AhuvaMeyer246 personbuilding,room personbuilding AlmaTaub AmirMeyer AhuvaMeyer buildingroom Taub152 Meyer35 Meyer246

31 Lossless Decomposition Let {X 1,...,X k } be a decomposition of (U,F) We say that {X 1,...,X k } is a lossless decomposition of (U,F) if for all relations r over (U,F) we have: π X 1 (r) ⋯ π X k (r) = r Containment in one direction always holds: π X 1 (r) ⋯ π X k (r) ⊇ r What about the other direction? Depends on F ! 31

32 Example 1 32 personbuildingroom AlmaTaub152 AmirMeyer35 AhuvaMeyer246 personbuilding,room personbuilding AlmaTaub AmirMeyer AhuvaMeyer buildingroom Taub152 Meyer35 Meyer246 =? personbuilding AlmaTaub AmirMeyer AhuvaMeyer =? personroom Alma152 Amir35 Ahuva246 =? personroom Alma152 Amir35 Ahuva246 buildingroom Taub152 Meyer35 Meyer246

33 Example 2 33 personbuildingroom AlmaTaubt152 AmirMeyerm35 AhuvaMeyerm246 personbuilding,room roombuilding personbuilding AlmaTaub AmirMeyer AhuvaMeyer buildingroom Taubt152 Meyerm35 Meyerm246 =? personbuilding AlmaTaub AmirMeyer AhuvaMeyer =? personroom Almat152 Amirm35 Ahuvam246 =? personroom Almat152 Amirm35 Ahuvam246 buildingroom Taubt152 Meyerm35 Meyerm246

34 Decision Algorithm The definition of lossless is not constructive (check every possible relation) Next, we present a polynomial-time algorithm for this decision problem 34 Losslessness Testing Given:Goal: U, F, X 1,...,X k {X 1,...,X k } is a decomposition of (U,F) Determine whether {X 1,...,X k } is a lossless decomposition

35 The Case of Binary Decomposition T HEOREM : Let {X 1,X 2 } be a decomposition of (U,F). The following are equivalent: 1.F ⊨ X 1 ∩X 2 X 1 or F ⊨ X 1 ∩X 2 X 2 2.{X 1,X 2 } is a lossless decomposition 35 So what would be a decision algorithm in this case? Answer: test whether Closure(F, X 1 ∩X 2 ) contains either X 1 or X 2

36 Proof: 1 ⇒ 2 36 1.F ⊨ X 1 ∩X 2 X 1 or F ⊨ X 1 ∩X 2 X 2 2.{X 1,X 2 } is a lossless decomposition 1.F ⊨ X 1 ∩X 2 X 1 or F ⊨ X 1 ∩X 2 X 2 2.{X 1,X 2 } is a lossless decomposition A1A1 A2A2 A3A3 A4A4 A5A5 A1A1 A2A2 A3A3 a1a1 a2a2 a3a3 A2A2 A3A3 A4A4 A5A5 a2a2 a3a3 a4a4 a5a5 A1A1 A2A2 A3A3 A4A4 A5A5 a1a1 a2a2 a3a3 a4a4 a5a5 t A1A1 A2A2 A3A3 A4A4 A5A5 a1a1 a2a2 a3a3 x 14 x 15 x 21 a2a2 a3a3 a4a4 a5a5 We know that this is a subset of r, for some x’s F ⊨ A 2 A 3 A 4 A 5 A1A1 A2A2 A3A3 A4A4 A5A5 a1a1 a2a2 a3a3 a4a4 a5a5 x 21 a2a2 a3a3 a4a4 a5a5 F ⊨ A 2 A 3 A 1 A1A1 A2A2 A3A3 A4A4 A5A5 a1a1 a2a2 a3a3 x 14 x 15 a1a1 a2a2 a3a3 a4a4 a5a5 In any case, we had t to begin with... hence lossless π t t We need to prove that t is here!

37 Proof: not 1 ⇒ not 2 37 Let Y=(X 1 ∩X 2 ) + and suppose that X 1 ⊈ Y, X 2 ⊈ Y Construct a relation r={t,u} over U : – t[Y]=u[Y]=(0,...,0) – t[U\Y]=(1,...,1) ; u[U\Y]=(2,...,2) Claim 1: r ⊨ F –Proof similar to completeness of Armstrong’s axioms Claim 2: π X 1 (r) π X 2 (r) ≠ r –The join contains a row with both 1 s and 2 s 1.F ⊨ X 1 ∩X 2 X 1 or F ⊨ X 1 ∩X 2 X 2 2.{X 1,X 2 } is a lossless decomposition 1.F ⊨ X 1 ∩X 2 X 1 or F ⊨ X 1 ∩X 2 X 2 2.{X 1,X 2 } is a lossless decomposition A1A1 A2A2 A3A3 A4A4 A5A5 10001 20002 X1X1 X2X2 Y t u

38 Proof of Claim 1 Suppose not(r ⊨ F ). There exists X  A such that F ⊨ X  A that is violated. For a violation, it must be Y ⊇ X. But Y=Y +. So Y ⊇ X +. So A in Y. Thus there is no violation. Contradiction. 38

39 Illustration: not 1 ⇒ not 2 39 A1A1 A2A2 A3A3 A4A4 A5A5 10001 20002 A1A1 A2A2 A3A3 100 200 A2A2 A3A3 A4A4 A5A5 0001 0002 A1A1 A2A2 A3A3 A4A4 A5A5 10001 20002 10002 20001 1.F ⊨ X 1 ∩X 2 X 1 or F ⊨ X 1 ∩X 2 X 2 2.{X 1,X 2 } is a lossless decomposition 1.F ⊨ X 1 ∩X 2 X 1 or F ⊨ X 1 ∩X 2 X 2 2.{X 1,X 2 } is a lossless decomposition A 2 A 4, A 2 A 5 A 1 π

40 The General Case Next, we handle the general case of a decomposition (2 #schemas) 40 Losslessness Testing Given:Goal: U, F, X 1,...,X k {X 1,...,X k } is a decomposition of (U,F) Determine whether {X 1,...,X k } is a lossless decomposition

41 The Idea 41 A1A1 A2A2 A3A3 A4A4 A5A5 A1A1 A3A3 a1a1 a3a3 A2A2 A3A3 A4A4 a2a2 a3a3 a4a4 A4A4 A5A5 a4a4 a5a5 A1A1 A2A2 A3A3 A4A4 A5A5 a1a1 a2a2 a3a3 a4a4 a5a5 t We need to prove that t is here! A1A1 A2A2 A3A3 A4A4 A5A5 a1a1 x 12 a3a3 x 14 x 15 x 21 a2a2 a3a3 a4a4 x 25 x 31 x 32 x 33 a4a4 a5a5 We know that this is a subset of r, for some x’s r But some of the x’s may be known due to the FDs! F={A 3 A 5, A 4 A 5, A 5 A 2 A 4 } A1A1 A2A2 A3A3 A4A4 A5A5 a1a1 x 12 a3a3 x 14 x 25 x 21 a2a2 a3a3 a4a4 x 25 x 31 x 32 x 33 a4a4 a5a5 A1A1 A2A2 A3A3 A4A4 A5A5 a1a1 x 12 a3a3 x 14 a5a5 x 21 a2a2 a3a3 a4a4 a5a5 x 31 x 32 x 33 a4a4 a5a5 A1A1 A2A2 A3A3 A4A4 A5A5 a1a1 a2a2 a3a3 a4a4 a5a5 x 21 a2a2 a3a3 a4a4 a5a5 x 31 x 32 x 33 a4a4 a5a5 t π A3A5A3A5 A4A5A4A5 A5A2A4A5A2A4

42 The General Case 1 st step: create the “known subset” –A table over U, one tuple t i for each X i : t i [A j ]=a j if X i contains A j, and t i [A j ]=x ij otherwise 2 nd step: chase –While the table changes do: Look for an FD violation and equate the conclusions “Equate” = change every occurrence of one to the other When equating a j with x ij, change x ij to a j 3 nd step: Return true iff there is a row of a i ’s 42 Losslessness Testing Given:Goal: U, F, X 1,...,X k {X 1,...,X k } is a decomposition of (U,F) Determine whether {X 1,...,X k } is a lossless decomposition

43 Example 43 A1A1 A2A2 A3A3 A4A4 A5A5 A1A1 A3A3 A2A2 A3A3 A4A4 A4A4 A5A5 Step 1: construct the known subset A1A1 A2A2 A3A3 A4A4 A5A5 a1a1 x 12 a3a3 x 14 x 15 x 21 a2a2 a3a3 a4a4 x 25 x 31 x 32 x 33 a4a4 a5a5 F = {A 3 A 5, A 4 A 5, A 5 A 2 A 4 } A1A1 A2A2 A3A3 A4A4 A5A5 a1a1 x 12 a3a3 x 14 x 25 x 21 a2a2 a3a3 a4a4 x 25 x 31 x 32 x 33 a4a4 a5a5 A1A1 A2A2 A3A3 A4A4 A5A5 a1a1 x 12 a3a3 x 14 a5a5 x 21 a2a2 a3a3 a4a4 a5a5 x 31 x 32 x 33 a4a4 a5a5 A1A1 A2A2 A3A3 A4A4 A5A5 a1a1 a2a2 a3a3 a4a4 a5a5 x 21 a2a2 a3a3 a4a4 a5a5 x 31 x 32 x 33 a4a4 a5a5 Step 2: chase Step 3: return true

44 Think Why is this algorithm terminating in polynomial time? 44 Answer: Each iteration eliminates one symbol, and we have a polynomial #symbols

45 Introduction Normal Forms  BCNF  3NF Decomposition  NF Decompositions  Preserving Data  Preserving Dependencies Decomposition Algorithms  BCNF  3NF  Note on 4NF Outline 45

46 Preserving Dependencies 46 A π X1X1 X2X2 X3X3 ⋯ XkXk r r1r1 r2r2 r3r3 rkrk (U,F)(X 1,F 1 )(X 2,F 2 )(X 3,F 3 )(X k,F k ) Is F preserved given that each F i is preserved in each relation?

47 Example 1 47 ABCD AB, BC, ABCD, DB BC BCBC BD AD DBDB ADAD Are dependencies preserved in this decomposition? {AD,BD,BC} Answer: Yes!

48 Example 2 48 ABC BC AC CBCB ABC CB Are dependencies preserved in this decomposition? Is there any decomposition into binary schemas where dependencies are preserved? {BC,AC} Answer: No!

49 Formal Definition A decomposition X 1, …,X k of (U,F) is dependency preserving if for all relations r 1,...,r k over (X 1,F [X 1 ]), …, (X k,F [X k ]), respectively, r 1 ⋯ r k satisfies F Can we test whether a given decomposition has this property? T HEOREM : The following are equivalent: 1.For all r 1,...,r k over (X 1,F [X 1 ]),…,(X k,F [X k ]), respectively, the join r 1 ⋯ r k satisfies F 2.F + = (F [X 1 ] ∪⋯∪ F [X k ]) + 49

50 Testing for Dependency Preservation We need to test whether F + = (F 1 ∪⋯∪ F k ) + F + ⊇ F 1 ∪⋯∪ F k, so F + ⊇ (F 1 ∪⋯∪ F k ) + So, need to test whether F + ⊆ (F 1 ∪⋯∪ F k ) + It suffices to test whether each XY in F is implied by F 1 ∪⋯∪ F k –Or in other words, whether Y is a subset of the closure of X under F 1 ∪⋯∪ F k Next slide: efficient computation of the closure of X under F 1 ∪⋯∪ F k 50 Without explicitly calculating the F i ’s

51 Basic Claim If Z ⊆ X i then ( Z + F ∩ X i ) = Z + Fi 51

52 Closure w.r.t. a Decomposition 52 ClosureDecomp(X,F,X 1,...,X k ) { Y := X while(Y changes) for(i=1,...k) Y := Y ∪ ( Closure(Y∩X i,F) ∩ X i ) return Y } ClosureDecomp(X,F,X 1,...,X k ) { Y := X while(Y changes) for(i=1,...k) Y := Y ∪ ( Closure(Y∩X i,F) ∩ X i ) return Y } Given:Goal: U, F, X 1,...,X k, X {X 1,...,X k } is a decomposition of (U,F) X ⊆ U Compute the closure of X under F[X 1 ] ∪⋯∪ F[X k ]

53 Testing for Dependency Preservation 53 DepPreserving(X 1,...,X k,F) { for all (XY in F) if(Y ⊈ ClosureDecomp(X,F,X 1,...,X k )) return false return true } DepPreserving(X 1,...,X k,F) { for all (XY in F) if(Y ⊈ ClosureDecomp(X,F,X 1,...,X k )) return false return true } Dependency Preservation Testing Given:Goal: U, F, X 1,...,X k {X 1,...,X k } is a decomposition of (U,F) Determine whether {X 1,...,X k } is dependency preserving

54 Introduction Normal Forms  BCNF  3NF Decomposition  NF Decompositions  Preserving Data  Preserving Dependencies Decomposition Algorithms  BCNF  3NF  Note on 4NF Outline 54

55 Decomposition Algorithms Given a normal form N, we ask: –Is there always a lossless N decomposition? –Is there always a lossless & dependency preserving N decomposition? –Is there an efficient decomposition? We discuss 2 decomposition algorithms –BCNF decomposition Lossless –3NF decomposition Lossless, dependency preserving, p-time

56 Introduction Normal Forms  BCNF  3NF Decomposition  NF Decompositions  Preserving Data  Preserving Dependencies Decomposition Algorithms  BCNF  3NF  Note on 4NF Outline 56

57 Key Insight Recall: BCNF means that in every nontrivial XY, the set X is a superkey C LAIM : If (U,F) is not in BCNF, then there is a lossless decomposition {X 1,X 2 } with X 1,X 2 ⊊ U Proof: –Let XY be a BCNF violation ( X is not a superkey and Y is not a subset of X ) –Take X 1 =X + and X 2 =X ∪ (U\X + ) –Why are X 1 and X 2 strict subsets of U ? –Why lossless? Recall the theorem on binary lossless decompositions... 57

58 BCNF Decomposition 58 BCNFDec(U,F) { if ((U,F) in BCNF) return {U} Find a BCNF violation XY X 1 := Closure(X,F) F 1 := F + [X 1 ] X 2 := X ∪ (U\X 1 ) F 2 := F + [X 2 ] return BCNFDec(X 1,F 1 ) ∪ BCNFDec(X 2,F 2 ) } BCNFDec(U,F) { if ((U,F) in BCNF) return {U} Find a BCNF violation XY X 1 := Closure(X,F) F 1 := F + [X 1 ] X 2 := X ∪ (U\X 1 ) F 2 := F + [X 2 ] return BCNFDec(X 1,F 1 ) ∪ BCNFDec(X 2,F 2 ) }

59 Execution Example 59 ABCD AB, BC, ABCD, DB BCBC BCBC ABDABD ABDABD BCBCABD, DB BDBD BDBD ADAD ADAD DBDBADAD Are dependencies preserved in this decomposition? {AD,BD,BC} Answer: Yes, we already saw that previously

60 About the Algorithm Lossless –Proof idea: every step is lossless Exponential time in the worst case There is a polynomial-time algorithm for BCNF decomposition –[Tsou, Fischer, Decomposition of a relation scheme into Boyce-Codd Normal Form, 1982] The algorithm does not preserve dependencies! –But the problem is not with the algorithm... 60

61 Can Dependencies be Preserved? 61 ABC BC AC CBCB No BCNF decomposition of this schema preserves both dependencies (why?) Conclusion: Lossless BCNF decomposition is always possible; lossless & dependency-preserving BCNF decomposition may be impossible ABC CB

62 Introduction Normal Forms  BCNF  3NF Decomposition  NF Decompositions  Preserving Data  Preserving Dependencies Decomposition Algorithms  BCNF  3NF  Note on 4NF Outline 62

63 Algorithm for 3NF Decomposition We next describe an algorithm for 3NF decomposition First, some intuition 63

64 Intuition F={ AB, ABC, CB, DC } Idea: for dependency preservation, each XA becomes a schema AB ABC CD ABABABC CB BC CBCB DCDC Problem: not in 3NF Solution: minimal cover instead of F Problem: lossy Solution: add a key AD ABCD 1001 2002 ABC 100 200 CD 01 02 64

65 Reminder: Minimal Cover Let F be a set of FDs A minimal cover of F is a set G of FDs such that G + =F + with the following properties: –FDs in G have a single attribute on the right hand side; that is, they have the form XA –All FDs are required: no FD XA in G is such that G\{XA} ⊨ XA –All attributes are required: no FD XBA in G is such that G ⊨ XA 65

66 Producing A Minimal Cover Transform the FDs in F - a single attribute on the r.h.s., let G be the result Do 1.If there is a FD X  A in G such that A in CLOSURE(X, G \ {X  A} ) delete X  A from G. 2.If there is XB  A in G such that A in CLOSURE(X, G), replace XB  A with X  A in G. Until there are no more changes to G G is a minimal cover for the original F. Can show (1 2)* equivalent to 2* 1* but not to 1* 2* 66

67 Revised Example F={ AC, CB, DC } AC CD ACAC BC CBCB DCDC AD ABCD 1001 2002 { AB, ABC, CB, DC } min-cover AC 10 20 BC 00 AD 11 22 CD 01 02

68 68

69 Algorithm for 3NF Decomposition 69 3NFDec(U,F) { D = G := MinCover(F) for all (XA in G) do D := D ∪ {XA} if (no set in D is a superkey) D := D ∪ {FindKey(U,F)} D := RemoveConained(D) return D } 3NFDec(U,F) { D = G := MinCover(F) for all (XA in G) do D := D ∪ {XA} if (no set in D is a superkey) D := D ∪ {FindKey(U,F)} D := RemoveConained(D) return D } No need for schemas contained in others

70 About the Proof We will not prove the correctness here Still, what needs to be proved? –Resulting schemas are all in 3NF Follows from minimality of the cover –Dependencies are preserved Straightforward: all dependencies of the minimal cover are presented –Lossless What would the lossless-testing algorithm do when one X i is a key and dependencies are preserved? 70

71 Example Revisited 71 min-cover track,facultyconsultant trackcampus consultantfaculty facultycampus trackcampustrackconsultantfacultyconsultantfaculty removed contained trackconsultantfaculty trackconsultantcampusfaculty track,facultyconsultant trackcampus consultantcampus,faculty facultycampus facultycampus trackcampusfacultycampus

72 72

73 73

74 A problem? 74

75 Introduction Normal Forms  BCNF  3NF Decomposition  NF Decompositions  Preserving Data  Preserving Dependencies Decomposition Algorithms  BCNF  3NF  Note on 4NF Outline 75

76 Fourth Normal Form (4NF) Recall: An MVD has the form X ↠ Y where X and Y are disjoint sets of attributes –For every two tuples that agree on X, swapping their Y component doesn’t change the relation Recall: An MVD X ↠ Y is trivial (always holds) if and only if Y= ∅ or Y=U\X Recall: an FD XY can be viewed as a special type of the MVD X ↠ Y (why?) A schema (U,F), where F contains both FDs and MVDs, is in 4NF if every nontrivial FD/MVD has a superkey in its premise (lhs) –When all dependencies are FDs, same as BCNF

77 4NF Decomposition T HEOREM : Let (U,F) be a schema, where F contains both FDs and MVDs. Then F satisfies X ↠ Y iff for all relations r over U we have: r = π X ∪ Y (r) π X ∪ (U\Y) (r) Hence, the recursive decomposition algorithm for BCNF decomposition works here – Decompose(X ∪ Y) ∪ Decompose(X ∪ (U\Y)) –A polynomial time is known for special cases In particular, there is always a lossless 4NF decomposition –What about dependency preserving? –Answer: No! Even if there are only FDs (recall BCNF) 77


Download ppt "Database Management Systems Course 236363 Faculty of Computer Science Technion – Israel Institute of Technology Lecture 7: Schema Normalization."

Similar presentations


Ads by Google