Download presentation
Presentation is loading. Please wait.
Published byDarren Bailey Modified over 8 years ago
1
1
2
Functional dependencies Modification anomalies Major normal forms Relationship independence Practical concerns
3
X ->Y is an assertion about a relation R that whenever two tuples of R agree on all the attributes of X, then they must also agree on all attributes in set Y. X : set of attributes Y : Single attribute Say “X ->Y holds in R.” Convention: …, X, Y, Z represent sets of attributes; A, B, C,… represent single attributes. Convention: no set formers in sets of attributes, just ABC, rather than {A,B,C }. 3
4
This figure suggests what this FD tells us about, two tuples t and u in the relation R. X’s and Y’s can be anywhere; it is not necessary for the X’s and Y’s to appear consecutively or X’s precede the Y’s. X’s Y T u If tuples t and u agrees under X’s also agree under Y’s must be b could be anything 4 XYZ abc ab? ………
5
TitleYearLengthGenreStudio Name Star Name Star Wars1977124Sci-FiFoxCarrie Fisher Star Wars1977124Sci-FiFoxMark Hamill Star Wars1977124Sci-FiFoxHarrison Ford Gone with the wind 1939231DramaMGMVivien Leigh Wayne’s World 199295ComedyParamountDana Carvey Wayne’s World 199295ComedyParamountMike Meyers 5
6
The above table is an instance relation called Movies1(Title, year, length, genre, starName) this relation tries to do too much, and it is not a good design. To see what is wrong with the design we need to determine the FDs that in this relation. The FD holds: title year length genre studioName because the movie with same title create only once a year 6
7
X->A 1 A 2 …A n holds for R exactly when each of X->A 1, X->A 2,…, X->A n hold for R. Example: A->BC is equivalent to A->B and A->C. There is no splitting rule for left sides. We’ll generally express FD’s with singleton right sides. 7
8
Consumers(name, addr, candiesLiked, manf, favCandy) Reasonable FD’s to assert: 1.name -> addr favCandy wNote this FD is the same as name -> addr and name -> favCandy. 2.candiesLiked -> manf 8
9
A FD A1A2…An-> B1B2…Bm is trivial if {B1,…,Bm} subset of {A1,…,An} For Example Title Year->Title is a trivial for Title->Title Non trivial Example on Student (sid, name, supervisor_id, specialization): {supervisor_id} {specialization} Non-trivial FDs are given implicitly in the form of constraints when designing a database. For instance, the specialization of a students must be the same as that of the supervisor. They constrain the set of legal relation instances. For instance, if I try to insert two students under the same supervisor with different specializations, the insertion will be rejected by the DBMS 9
10
10 Example Data nameaddr candiesLiked manffavCandy JanewayVoyager Twizzlers HersheySmarties JanewayVoyager Smarties NestleSmarties SpockEnterprise Twizzlers HersheyTwizzlers Because name -> addr Because name -> favCandy Because candiesLiked -> manf
11
11 K is a superkey for relation R if K functionally determines all of R. K is a key for R if K is a superkey, but no proper subset of K is a superkey. Example : table with the fields,, and possible SK, and but the candidate Key is
12
12 Consumers(name, addr, candiesLiked, manf, favCandy) {name, candiesLiked} is a superkey because together these attributes determine all the other attributes. name -> addr favCandy candiesLiked -> manf
13
13 {name, candiesLiked} is a key because neither {name} nor {candiesLiked} is a superkey. name doesn’t -> manf; candiesLiked doesn’t -> addr. There are no other keys, but lots of superkeys. Any superset of {name, candiesLiked}.
14
14 1.Just assert a key K. The only FD’s are K -> A for all attributes A. 2.Assert FD’s and deduce the keys by systematic exploration.
15
15 We are given FD’s X 1 -> A 1, X 2 -> A 2,…, X n -> A n, and we want to know whether an FD Y -> B must hold in any relation that satisfies the given FD’s. Example: If A -> B and B -> C hold, surely A -> C holds, even if we don’t say so. Important for design of good relation schemas.
16
16 Use the given FD’s to infer that these tuples must also agree in certain other attributes. If B is one of these attributes, then Y -> B is true. Otherwise, the two tuples, with any forced equalities, form a two-tuple relation that proves Y -> B does not follow from the given FD’s.
17
17 Test whether Y -> B is true. Assume two tuples agree on attributes Y Use FDs to infer these tuples also agree on another attributes If B one of the other attributes, then Y->B holds
18
18 An easier way to test is to compute the closure of Y, denoted Y +. Basis: Y + = Y. Induction : Look for an FD’s left side X that is a subset of the current Y +. If the FD is X - > A, add A to Y +.
19
19 Y+Y+ new Y + XA
20
20 If we have FD's A -> B and B -> C, then it is also true that A -> C. Ex: If name -> address and address -> phone, then name -> phone. What about a chain of such deductions? Called closure
21
21 Input: a set of attributes {A1,…,An} and a set of FD's S 1.Z := {A1,…,An} 2.find an FD in S of the form X -> C such that all the attributes in X are in Z but C is not in Z. Add C to Z 3.repeat step 2 until there is nothing more that can be put in Z 4.return Z as the closure of {A1,…,An}
22
22 Given relation with attributes A, B, C, D, E, F and FD's AB -> C BC -> A, BC -> D D -> E CF -> B Compute closure of {A,B}. Answer: Z := {A,B} add C add A and D add E final answer is Z = {A,B,C,D,E}
23
23 Now we can check if a particular FD A1 … An -> B follows from a set of FD's S: compute {A1,…,An} + using S if B is in the closure, then the FD follows otherwise it does not
24
Consider the relations and FDs of example above slide 22. Suppose we wish to test AB->D follows from these FD’s we compute {A,B}+ as we saw in the above example where AB->D does follow. On other hand, Consider the FD D->A. To test it, first compute {D}+. We start with X={D}. Use FD D->E by this we E to X. {D}+={D,E}. Since A is not a member of {D,E} we conclude that D->A doesn’t follow. 24
25
25 Motivation: “normalization,” the process where we break a relation schema into two or more schemas. Example: ABCD with FD’s AB ->C, C ->D, and D ->A. Decompose into ABC, AD. What FD’s hold in ABC ? Not only AB ->C, but also C ->A !
26
26 Why? a1b1ca1b1c ABC ABCD a2b2ca2b2c Thus, tuples in the projection with equal C’s have equal A’s; C -> A. a 1 b 1 cd 1 a 2 b 2 cd 2 comes from d 1 =d 2 because C -> D a 1 =a 2 because D -> A
27
27 For FD’s should be one attribute on right. Example: StdSSN StdCity and StdSSN StdClass become StdSSN StdCity, StdClass More than one attribute on left may be essential. — Combination of StdSSN and OfferNo determine EnrGrade (not either column alone) —StdSSN, OfferNo-> EnrGrade
28
28 If an FD Y -> B follows from FD’s X 1 -> A 1,…,X n -> A n, then the region in the space of instances for Y -> B must include the intersection of the regions for the FD’s X i -> A i. That is, every instance satisfying all the FD’s X i - > A i surely satisfies Y -> B. But an instance could satisfy Y -> B, yet not be in this intersection.
29
29 Example A->B B->C CD->A Instances satisfying A->B, B->C, and CD->A
30
Anomaly is occurrence or object that is strange, unusual, or unique. Unexpected side effect Insert, modify, and delete more data than desired Caused by excessive redundancies Strive for one fact in one place
31
31 Goal of relational schema design is to avoid anomalies and redundancy. Update anomaly : one occurrence of a fact is changed, but not all occurrences. Deletion anomaly : valid fact is lost when a tuple is deleted.
32
32 Example of Bad Design Consumers(name, addr, candiesLiked, manf, favCandy) nameaddrcandiesLiked manffavCandy JanewayVoyager Twizzlers Hershey Smarties Janeway???SmartiesNestle??? SpockEnterprise Twizzlers ???Twizzlers Data is redundant, because each of the ???’s can be figured out by using the FD’s name -> addr favCandy and candiesLiked -> manf.
33
33 This Bad Design Also Exhibits Anomalies nameaddrcandiesLikedmanffavCandy JanewayVoyagerTwizzlers HersheySmarties JanewayVoyagerSmarties NestleSmarties SpockEnterpriseTwizzlers HersheyTwizzlers Update anomaly: if Janeway is transferred to Intrepid, will we remember to change each of her tuples? Deletion anomaly: If nobody likes Twizzlers, we lose track of the fact that Hershey. manufactures Twizzler.
34
GUIDELINE 1: Informally, each tuple in a relation should represent one entity or relationship instance. (Applies to individual relations and their attributes). Attributes of different entities (EMPLOYEEs, DEPARTMENTs, PROJECTs) should not be mixed in the same relation Only foreign keys should be used to refer to other entities Entity and relationship attributes should be kept apart as much as possible. Bottom Line: Design a schema that can be explained easily relation by relation. The semantics of attributes should be easy to interpret. 34
35
12-414-35
36
Information is stored redundantly Wastes storage Causes problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies 14-36
37
Consider the relation: EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours) Update Anomaly: Changing the name of project number P1 from “Billing” to “Customer-Accounting” may cause this update to be made for all 100 employees working on project P1. 14-37
38
Consider the relation: EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours) Insert Anomaly: Cannot insert a project unless an employee is assigned to it. Conversely Cannot insert an employee unless an he/she is assigned to a project. 38
39
Consider the relation: EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours) Delete Anomaly: When a project is deleted, it will result in deleting all the employees who work on that project. Alternately, if an employee is the sole employee on a project, deleting that employee would result in deleting the corresponding project. 39
40
12-514-40 Design a relation schema so that it is easy to explain its meaning. Do not combine attributes from multiple entity types and relationship types into a single relation EMPLOYEE * DEPARTMENT attributes from department attributes from project
41
41
43
GUIDELINE 2: Design a schema that does not suffer from the insertion, deletion and update anomalies. If there are any anomalies present, then note them so that applications can be made to take them into account. 43
44
GUIDELINE 3: Relations should be designed such that their tuples will have as few NULL values as possible Attributes that are NULL frequently could be placed in separate relations (with the primary key) Reasons for nulls: Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist, but unavailable 44
45
Process of removing unwanted redundancies Apply normal forms Identify FDs Determine whether FDs meet normal form Split the table to meet the normal form if there is a violation
47
1NF: least restrictive; every table in 1NF 2NF: more restrictive than 1NF; every table in 2NF is also in 1NF 3NF/BCNF: BCNF is a revised definition of 3NF; BCNF is more restrictive than 3NF 4NF: Inappropriate usage of an n-ary relationship; Relationship independence and MVDs; does not involve FDs 5NF: does not involve FDs; Inappropriate usage of an n- ary relationship; more specialized than 4NF DKNF: ideal rather than a practical normal form
48
Disallows composite attributes, multivalued attributes, and nested relations; attributes whose values for an individual tuple are non-atomic Considered to be part of the definition of relation 48
49
49 A relation is in 1 NF if and only if all its attributes are monovaluated or composite attributes (no multivalued attributes) Not in 1 NF Solution: DEPARTMENT (name, ndpt, ss_sup), LOCALIZATIONS (ndpt, localization) NameNdp t Ss_supDlocalizations Research1333445555Beirut, Tripoli, Saida Administratio n 4987654321Bchari Social Admin5888665555Bikaa FD: ndpt name, ss_sup.
50
Example 50
51
51
52
Uses the concepts of FDs, primary key Definitions Prime attribute: An attribute that is member of the primary key K Full functional dependency: a FD Y -> Z where removal of any attribute from Y means the FD does not hold any more Examples: {SSN, PNUMBER} -> HOURS is a full FD since neither SSN -> HOURS nor PNUMBER -> HOURS hold {SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial dependency ) since SSN -> ENAME also holds
53
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key R can be decomposed into 2NF relations via the process of 2NF normalization 53
54
54
55
55
56
56 Definition: Transitive functional dependency: a FD X -> Z that can be derived from two FDs X -> Y and Y -> Z Examples: SSN -> DMGRSSN is a transitive FD Since SSN -> DNUMBER and DNUMBER -> DMGRSSN hold SSN -> ENAME is non-transitive Since there is no set of attributes X where SSN -> X and X -> ENAME
57
A relation schema R is in third normal form (3NF) if it is in 2NF and no non-prime attribute A in R is transitively dependent on the primary key R can be decomposed into 3NF relations via the process of 3NF normalization NOTE: In X -> Y and Y -> Z, with X as the primary key, we consider this a problem only if Y is not a candidate key. When Y is a candidate key, there is no problem with the transitive dependency. E.g., Consider EMP (SSN, Emp#, Salary ). Here, SSN -> Emp# -> Salary and Emp# is a candidate key. 57
58
1 st normal form All attributes depend on the key 2 nd normal form All attributes depend on the whole key 3 rd normal form All attributes depend on nothing but the key 58
59
59 A relation schema R is in Boyce-Codd Normal Form (BCNF) if whenever an FD X -> A holds in R, then X is a superkey of R Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF There exist relations that are in 3NF but not in BCNF The goal is to have each relation in BCNF (or 3NF)
60
60 Drinkers(name, addr, cokesLiked, manf, favCoke) FD’s: name->addr favCoke, cokesLiked->manf Only key is {name, cokesLiked}. In each FD, the left side is not a superkey. Any one of these FD’s shows Drinkers is not in BCNF
61
61 Cokes(name, manf, manfAddr) FD’s: name->manf, manf->manfAddr Only key is {name}. name->manf does not violate BCNF, but manf->manfAddr does.
62
62 Two FDs exist in the relation TEACH: fd1: { student, course} -> instructor fd2: instructor -> course {student, course} is a candidate key for this relation and that the dependencies shown follow the pattern in Figure 10.12 (b). So this relation is in 3NF but not in BCNF A relation NOT in BCNF should be decomposed so as to meet this property, while possibly forgoing the preservation of all functional dependencies in the decomposed relations.
63
Three possible decompositions for relation TEACH {student, instructor} and {student, course} {course, instructor } and {course, student} {instructor, course } and {instructor, student} All three decompositions will lose fd1. We have to settle for sacrificing the functional dependency preservation. But we cannot sacrifice the non-additivity property after decomposition. Out of the above three, only the 3rd decomposition will not generate spurious tuples after join.(and hence has the non-additivity property). 63
64
MVD (multi-valued dependency): difficult to identify A B | C (multi-determines) A associated with a collection of B and C values where B and C are independent An FD is an MVD with collection is a single value Non trivial MVD: not also an FD 4NF: no non trivial MVDs
65
A B | C OfferNo StdSSN | TextNo Given the two rows above the line, the two rows below the line are in the table if the MVD is true (if A multi determines B | C).
66
Assume the following relation: Employee (Eid:pk1, Language:pk2, Skill:pk3) 66 SingingArabic200 CookingEnglish200 CookingFrench100 PoliticKurdish10000 TeachingEnglish100 SkillLanguage Eid
67
Assume the following relation with multivalued dependency: Employee (Eid:pk1, Languages:pk2, Skills:pk3) Eid --->> LanguagesEid --->> Skills Languages and Skills are independent. 67
68
68 SingingArabic200 CookingFrench100 PoliticKurdish100 TeachingEnglish100 SkillLanguageEid
69
5NF for n-ary relationships DKNF: absolute normal form DKNF is an ideal, not a practical normal form
70
More specialized than 4NF More difficult to understand than 4NF Split a three way relationship into three (not two) binary relationships 70
71
Domain: sets of values Key: candidate key (uniqueness property) All constraints derivable from domains and keys Not possible to test a table for DKNF compliance No known procedure to construct a DKNF table 71
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.