Lecture 17 4 th Normal Form Prof. Sin-Min Lee Department of Computer Science
Functional Dependencies oDependencies for this relation: A B A D BC EF oDo they all hold in this instance of the relation R? Functional dependencies are specified by the database programmer based on the intended meaning of the attributes.
Armstrong’s Axioms Armstrong’s Axioms: Let X, Y be sets of attributes from a relation T. [1] Inclusion rule: If Y X, then X Y. [2] Transitivity rule: If X Y, and Y Z, then X Z. [3] Augmentation rule: If X Y, then XZ YZ. Other derived rules: [1] Union rule: If X Y and X Z, then X YZ [2] Decomposition rule: If X YZ, then X Y and X Z [3] Pseudotransitivity: If X Y and WY Z, then XW Z [4] Accumulation rule: If X YZ and Z BW, then X YZB
Closure Let F be a set of functional dependencies. We say F implies a functional dependency g if g can be derived from the FDs in F using the axioms. The closure of F, written as F +, is the set of all FDs that are implied by F. Example: Given FDs { A BC, C D, AD B }, what is the closure? The closure is a potentially exponential set: Trivial dependencies: {A A, B B,…,ABC ABC,…}, other dependencies obtained by augmentation {AB ABC, BC BD,…}, dependencies obtained by other rules (or multiple rules), {A BC, C D, AD B, A D }
Closure Given a set F of functional dependencies. A functional dependency X Y is said to be entailed (implied) by F, if X Y is in F +. To sets of functional dependencies F1, F2 are said to be equivalent, if their closures are equivalent, I.e. F1 + = F2 +.
Checking Entailment To find whether a functional dependency X Y is implied by a set of functional dependencies F –We can apply all the rules in Armstrong’s Axioms to find whether we can obtain this dependency from the dependencies in F OR –We can determine the closure of attributes X, denoted by X + with respect to F, and check whether Y X +
Closure of a set of attributes Given a set of F of functional dependencies, the closure of a set of attributes X, denoted by X + is the largest set of attributes Y such that X Y. Algorithm Closure(X,F) X[0] = X; I = 0; repeat I = I + 1; X[I] = X[I-1]; FOR ALL Z W in F IF Z X[I] THEN X[I] = X[I] W END FOR until X[I] = X[I-1]; RETURN X + = X[I]
Entailment Given F = { C DE, AB CE, EB CF, G A } Find: GB + –Initialize: GB + = {G,B} –Use G A, add A, GB + = {A,B,G} –Use AB CE, add C,E, GB + = {A, B, C, E, G} –Use C DE, add D, GB + = {A, B, C, D, E, G} –Use EB CF, add F, GB + = {A, B, C, D, E, F, G} –Incidentally, GB is a superkey. Is it also a key?
Closure of a set of attributes Given a set of functional dependencies F for a relation R, X is said to be a superkey, if X + contains all the attributes in R. –In other words, X implies all other attributes. –Alternatively, if two tuples are the same with respect to X then they should be the same with respect to all other attributes.
Boyce-Codd Normal Form A table T is said to be in Boyce-Codd Normal Form (BCNF) with respect to a given set of functional dependencies F if for all functional dependencies of the form X A entailed by F the following is true: –If A is not a subset of X then X is a superkey, or –If A is not contained in X then, X contains all the attributes in a key. Given {AB C, AB D, AE D, C F} with Key: {A,B,E} –not in BCNF since C is a single attribute not in AB, but AB is not a superkey.
Boyce-Codd Normal Form Given head(T)={A,B,C,D,E,F} with functional dependencies {AC D, AC E, AF B, AD F, BC A, ABC F } and keys: {A, C}, {B, C}, is this relation T in BCNF? No. It is sufficient to find one violation! –AF B violates BCNF since B is not in AF and AF is not a superkey. –AD F violates BCNF since F is not in AD and AD is not a superkey. Note: ABC F does not violate BCNF since ABC is a superkey.
Decomposition A decomposition of a relation R with functional dependency set F is a sequence of pairs of the form –(R1,F1), …, (Rn,Fn) such that –The union of attributes in R1,…,Rn is equivalent to the attributes in R –All functional dependencies in F1,…,Fn are entailed by F. A decomposition is obtained by a simple projection of R onto the attributes in the decomposed relations. –For example, given R1 with schema R1(A1), then R1 = A1 (R)
Lossless Decompositions A decomposition of R to (R1, F1) and (R2,F2) is said to be lossless, if we join R1 and R2 on the common attributes, we are guaranteed to get R. In other words, given R1(A1) and R2(A2), we need to make sure that it is always the case that R=R1 join(A1 A2) R2 where join(A1 A2) means equi-join on the common attributes in A1 and A2.
{R1, R2} is not a lossless decompostion of relation R. The join with respect to R1.B=R2.B is not equal to R.
Lossless Decomposition Given a relation R with a set F of functional dependencies, and a decomposition of to (R1, F1) and (R2,F2) such that R1(A1) and R2(A2) is said to be lossless iff either A1 A2 A1 or A1 A2 A2 is entailed by F.
Lossless Decompositions Let F={AB C, CD E, AB E, DE AF} and R1(A,B,C,D) F1={AB C} R2(C,D,E,F) F2={CD E,DE F} is this lossless? ABCD CDEF = CD is CD CDEF entailed by F?
Dependency preservation A decomposition of relation (R,F) into (R1(A1), F1) and (R2(A2), F2) is said to be dependency preserving iff F1 F2 is equivalent to F. To check whether F1 F2 is equivalent to F, we need to check 1.Whether all functional dependencies in F1 F2 are entailed by F, and 2.Whether all functional dependencies in F are entailed by F1 F2.
Dependency preservation Given a decomposition R1(A1) of a relation R with functional dependency set F, the only dependencies that can be preserved in R1 are all dependencies in F + of the form B C such that BC A1. Let F={AB C, CD E, AB E, DE AF} Find all the functional dependencies that can be preserved in: R1(A,B,C,D) R2(C,D,E,F)
Dependency preservation When a decomposition is lossy, then information connecting tuples is lost. We get errorneous information. When dependencies are lost in a decomposition, they cannot be enforced as table constraints. They have to be enforced as additional constraints. It is vital that decompositions are lossless. It is important but not vital that decompositions are dependency preserving.
Normal Forms If a relation is not in BNCF normal form, then it can be decomposed by lossless decompositions into smaller relations that are in BCNF. BCNF decomposition: –Until all relations are in BCNF Find a dependency X Y in R(A) that violated BCNF Replace R, with R1(XY), and R2(A-Y) X This algorithm may cause many dependencies to be lost.
3NF Conversion Given R(A,B,C,D,E,F) and F= { ABC D, ABC E, BD E, E C, E F} Keys: ABC and ABE. Not in BNCF (violations: BD E, E C, E F), Not in 3NF (violations: E F). Convert to 3NF using the algorithm: –First compact functional dependencies with common left side, to get F= { ABC DE, BD E, E CF} –Create relations R1(A,B,C,D,E), R2(B,D,E), R3(E,C,F) –Since there exists relations that contain ABC, and ABE, we are done. –Incidentally, all relations are also in BCNF.
Multivalued Dependencies (MVDs) Let R be a relation schema and let R and R. The multivalued dependency holds on R if in any legal relation r(R), for all pairs for tuples t 1 and t 2 in r such that t 1 [ ] = t 2 [ ], there exist tuples t 3 and t 4 in r such that: t 1 [ ] = t 2 [ ] = t 3 [ ] = t 4 [ ] t 3 [ ] = t 1 [ ] t 3 [R – ] = t 2 [R – ] t 4 [ ] = t 2 [ ] t 4 [R – ] = t 1 [R – ]
Motivation There are schemas that are in BCNF that do not seem to be sufficiently normalized namestreet Stars citytitleyear C. Fisher 123 Maple Str. 5 Locust Ln. 123 Maple Str. 5 Locust Ln. 123 Maple Str. 5 Locust Ln.C. Fisher Hollywood Malibu Hollywood Malibu Hollywood Malibu Star Wars1977 Star Wars1977 Empire Strikes Back1980 Empire Strikes Back1980 Return of the Jedi1983 Return of the Jedi1983
Attribute Independence No reason to associate address with one movie and not another When we repeat address and movie facts in all combinations, there is obvious redundancy However, NO BCNF violation in Stars relation –There are no non-trivial FD’s at all, all five attributes form the only superkey –Why?
Multi-valued Dependency Definition: Multivalued dependency (MVD): A 1 A 2 …A n B 1 B 2 …B m holds for relation R if: For all tuples t, u in R If t[A 1 A 2...A n ] = u[A 1 A 2...A n ], then there exists a v in R such that: (1) v[A 1 A 2...A n ] = t[A 1 A 2...A n ] = u[A 1 A 2...A n ] (2) v[B 1 B 2 …B m ] = t[B 1 B 2 …B m ] (3) v[C 1 C 2 …C k ] = u[C 1 C 2 …C k ], where C 1 C 2 …C k is all attributes in R except (A 1 A 2...A n B 1 B 2 …B m )
Example: name street city namestreet Stars citytitleyear C. Fisher 123 Maple Str. 5 Locust Ln. Hollywood Malibu Star Wars1977 Empire Strikes Back1980 Empire Strikes Back1980 C. Fisher 5 Locust Ln. 123 Maple Str. 5 Locust Ln.C. Fisher Malibu Hollywood Malibu Star Wars1977 Return of the Jedi1983 Return of the Jedi1983 t u v
Example: name street city namestreet Stars citytitleyear C. Fisher 123 Maple Str. 5 Locust Ln. 123 Maple Str. 5 Locust Ln. Hollywood Malibu Hollywood Malibu Star Wars1977 Star Wars1977 Empire Strikes Back1980 Empire Strikes Back1980 C. Fisher123 Maple Str. 5 Locust Ln.C. Fisher Hollywood Malibu Return of the Jedi1983 Return of the Jedi1983 u t w v
More on MVDs Intuitively, A 1 A 2 …A n B 1 B 2 …B m says that the relationship between A 1 A 2 …A n and B 1 B 2 …B m is independent of the relationship between A 1 A 2 …A n and R -{B 1 B 2 …B m } –MVD's uncover situations where independent facts related to a certain object are being squished together in one relation Functional dependencies rule out certain tuples from being in a relation –How? Multivalued dependencies require that other tuples of a certain form be present in the relation –a.k.a. tuple-generating dependencies
Let’s Illustrate In Stars, we must repeat the movie (title, year) once for each address (street, city) a movie star has –Alternatively, we must repeat the address for each movie a star has made Example: Stars with name street city namestreetcitytitleyear C. Fisher 123 Maple Str. 5 Locust Ln. 123 Maple Str. Hollywood Malibu Hollywood Star Wars1977 Empire Strikes Back1980 Return of the Jedi1983 Is an incomplete extent of Stars –Infer the existence of a fourth tuple under the given MVD
Trivial MVDs Trivial MVD A 1 A 2 …A n B 1 B 2 …B m where B 1 B 2 …B m is a subset of A 1 A 2 …A n or (A 1 A 2 …A n B 1 B 2 …B m ) contains all attributes of R
Reasoning About MVDs FD-IS-AN-MVD Rule (Replication) If A 1 A 2 …A n B 1 B 2 …B m then A 1 A 2 …A n B 1 B 2 …B m holds
Reasoning About MVDs COMPLEMENTATION Rule If A 1 A 2 …A n B 1 B 2 …B m then A 1 A 2 …A n C 1 C 2 …C k where C 1 C 2 …C k is all attributes in R except (A 1 A 2 …A n B 1 B 2 …B m ) AUGMENTATION Rule If X Y and W Z then WX YZ TRANSITIVITY Rule If X Y and Y Z then X (Z Y)
Coalescence Rule for MVD X Y W:W Z Then: X Z If: Remark: Y and W have to be disjoint and Z has to be a subset of or equal to Y
Definition 4NF Given: relation R and set of MVD's for R Definition: R is in 4NF with respect to its MVD's if for every non-trivial MVD A 1 A 2 …A n B 1 B 2 …B m, A 1 A 2 …A n is a superkey Note: Since every FD is also an MVD, 4NF implies BCNF Example: Stars is not in 4NF
Decomposition Algorithm (1) apply closure to the user-specified FD's and MVD's**: (2) repeat until no more 4NF violations: if R with AA ->> BB violates 4NF then: (2a) decompose R into R1(AA,BB) and R2(AA,CC), where CC is all attributes in R except (AA BB) (2b) assign FD's and MVD's to the new relations** ** MVD's: hard problem! No simple test analogous to computing the attribute closure for FD’s exists for MVD’s. You are stuck to have to use the 5 inference rules for MVD’s when computing the closure!
Exercise Decompose Stars into a set of relations that are in 4NF. name street city is a 4NF violation Apply decomposition: R(name, street, city) S( name, title, year) What about name street city in R and name title year in S ?
MVD (Cont.) Tabular representation of
X ->> Y is trivial if (a)Y X or (b)Y U X = R
Multivalued Dependencies There are database schemas in BCNF that do not seem to be sufficiently normalized Consider a database classes(course, teacher, book) such that (c,t,b) classes means that t is qualified to teach c, and b is a required textbook for c The database is supposed to list for each course the set of teachers any one of which can be the course’s instructor, and the set of books, all of which are required for the course (no matter who teaches it).
There are no non-trivial functional dependencies and therefore the relation is in BCNF Insertion anomalies – i.e., if Sara is a new teacher that can teach database, two tuples need to be inserted (database, Sara, DB Concepts) (database, Sara, Ullman) courseteacherbook database operating systems Avi Hank Sudarshan Avi Jim DB Concepts Ullman DB Concepts Ullman DB Concepts Ullman OS Concepts Shaw OS Concepts Shaw classes Multivalued Dependencies
Therefore, it is better to decompose classes into: courseteacher database operating systems Avi Hank Sudarshan Avi Jim teaches coursebook database operating systems DB Concepts Ullman OS Concepts Shaw text We shall see that these two relations are in Fourth Normal Form (4NF) Multivalued Dependencies
Multivalued Dependencies (MVDs) Let R be a relation schema and let R and R. The multivalued dependency holds on R if in any legal relation r(R), for all pairs for tuples t 1 and t 2 in r such that t 1 [ ] = t 2 [ ], there exist tuples t 3 and t 4 in r such that: t 1 [ ] = t 2 [ ] = t 3 [ ] = t 4 [ ] t 3 [ ] = t 1 [ ] t 3 [R – ] = t 2 [R – ] t 4 [ ] = t 2 [ ] t 4 [R – ] = t 1 [R – ]
MVD (Cont.) Tabular representation of
4th Normal Form No multi-valued dependencies
4th Normal Form Note: 4th Normal Form violations occur when a triple (or higher) concatenated key represents a pair of double keys
4th Normal Form
Multuvalued dependencies
4th Normal Form INSTR-BOOK-COURSE(InstrID, Book, CourseID) COURSE-BOOK(CourseID, Book) COURSE-INSTR(CourseID, InstrID)
4NF (No multivalued dependencies) TABLE Independent repeating groups have been treated as a complex relationship.
Example Let R be a relation schema with a set of attributes that are partitioned into 3 nonempty subsets. Y, Z, W We say that Y Z (Y multidetermines Z) if and only if for all possible relations r(R) r and r then r and r Note that since the behavior of Z and W are identical it follows that Y Z if Y W
Theory of MVDs From the definition of multivalued dependency, we can derive the following rule: –If , then That is, every functional dependency is also a multivalued dependency The closure D + of D is the set of all functional and multivalued dependencies logically implied by D. –We can compute D + from D, using the formal definitions of functional dependencies and multivalued dependencies. –We can manage with such reasoning for very simple multivalued dependencies, which seem to be most common in practice –For complex dependencies, it is better to reason about sets of dependencies using a system of inference rules
Fourth Normal Form A relation schema R is in 4NF with respect to a set D of functional and multivalued dependencies if for all multivalued dependencies in D + of the form , where R and R, at least one of the following hold: – is trivial (i.e., or = R) – is a superkey for schema R If a relation is in 4NF it is in BCNF
Restriction of Multivalued Dependencies The restriction of D to R i is the set D i consisting of –All functional dependencies in D + that include only attributes of R i –All multivalued dependencies of the form ( R i ) where R i and is in D +
4NF Decomposition Algorithm result: = {R}; done := false; compute D + ; Let D i denote the restriction of D + to R i while (not done) if (there is a schema R i in result that is not in 4NF) then begin let be a nontrivial multivalued dependency that holds on R i such that R i is not in D i, and ; result := (result - R i ) (R i - ) ( , ); end else done:= true; Note: each R i is in 4NF, and decomposition is lossless-join
Example R =(A, B, C, G, H, I) F ={ A B B HI CG H } R is not in 4NF since A B and A is not a superkey for R Decomposition a) R 1 = (A, B) (R 1 is in 4NF) b) R 2 = (A, C, G, H, I) (R 2 is not in 4NF) c) R 3 = (C, G, H) (R 3 is in 4NF) d) R 4 = (A, C, G, I) (R 4 is not in 4NF) Since A B and B HI, A HI, A I e) R 5 = (A, I) (R 5 is in 4NF) f)R 6 = (A, C, G) (R 6 is in 4NF)