cs3431 Normalization
cs3431 Why Normalization? To remove potential redundancy in design Redundancy causes several anomalies: insert, delete and update Redundancy wastes storage, and often slows down query processing Examples to follow next.
cs3431 What is Normalization? Normalization uses concept of dependencies Functional Dependencies Technique used: Decomposition Break R (A, B, C, D) into R1 (A, B) and R2 (B, C, D)
cs3431 Insert Anomaly sNumbersNamepNumberpName s1Davep1MM s2Gregp2ER Student Question: Could we insert any professor ? Note: We cannot insert a professor who has no students. Insert Anomaly: We are not able to insert “valid” value/(s)
cs3431 Delete Anomaly sNumbersNamepNumberpName s1Davep1MM s2Gregp2ER Student Question: Can we delete a student and keep a professor info ? Note: We cannot delete a student that is the only student of a professor. Note: In both cases, minimum cardinality of Professor in the corresponding ER schema is 0 Delete Anomaly: We are not able to perform a delete without losing some “valid” information.
cs3431 Update Anomaly sNumbersNamepNumberpName s1Davep1MM s2Gregp1MM Student Question: Can we simply update a professor’s name ? Note: To update the name of a professor, we have to update in multiple tuples. Note the maximum cardinality of Professor in the corresponding ER schema is * Update Anomaly: To update a value, we have to update multiple rows. Update anomalies are due to redundancy.
cs3431 Normalization Need a method to find such “dependencies” exist between attributes Functional dependencies Need a method to remove such harmful dependencies, when they exist Relational decomposition
cs3431 Keys : Revisited A key for a relation R (a1, a2, …, an) is a set of attributes, K, that together uniquely determine the values for all attributes of R. A key is minimal: no subset of K is a key. A superkey need not be minimal A prime attribute: an attribute that is part of a key
cs3431 Keys: Example sNumbersNameaddress 1Dave144FL 2Greg320FL Student Primary Key: Candidate key: Some superkeys: {,, } Prime Attribute: {sNumber, sName}
cs3431 Functional Dependencies (FDs) sNumbersNameaddress 1Dave144FL 2Greg320FL Student Suppose we have the FD: sName address That is, there is a function from sName to address Meaning: For any two rows in the Student relation with the same value for sName, the value for address must be same.
cs3431 FD and Keys sNumbersNameaddress 1Dave144FL 2Greg320FL Student Questions : Does a key implies functional dependencies? Which ones ? Does a functional dependency imply keys ? Which ones ? Observation : Any key (primary or candidate) or superkey of a relation R functionally determines all attributes of R. Primary Key : FD : sName address
cs3431 Properties of FDs Consider A, B, C, Z are sets of attributes Reflexive (trivial FD): if A B, then A B Transitive: if A B, and B C, then A C Augmentation: if A B, then AZ BZ Union: if A B, A C, then A BC Decomposition: if A BC, then A B, A C Note: Sound and complete inference rules for FDs
cs3431 Inferring FDs Suppose we have : a relation R (A, B, C) and functional dependencies A B, B C, C A Questions : What is a key for R? Should we split R into multiple relations? We can infer A ABC, B ABC, C ABC. Hence A, B, C are all keys.
cs3431 Reasoning About FDs An FD f is implied by a set of FDs F if f holds whenever all FDs in F hold. = closure of F is the set of all FDs that are implied by F. Computing closure of a set of FDs can be expensive. Size of closure is exponential in # attrs!
cs3431 Reasoning About FDs Instead of computing closure F+ of a set of FDs Too expensive Typically, we just need to know if a given FD X Y is in closure of a set of FDs F. Algorithm for efficient check: Compute attribute closure of X (denoted ) wrt F: Set of all attributes A such that X A is in There is a linear time algorithm to compute this. Check if Y is in X+. If yes, then X A in F+.
cs3431 Reasoning About FDs (Contd.) Does F = {A B, B C, C D E } imply A E? Question : i.e, is A E in the functional set closure ? Equivalent Question : Is E in the attribute closure ?
cs3431 Algorithm for Inference of FDs Computing the closure of set of attributes {A1, A2, …, An}, denoted {A1, A2, …, An} + 1. Let X = {A1, A2, …, An} 2. If there exists a FD : B1, B2, …, Bm C, such that every Bi X, then X = X C 3. Repeat step 2 until no more attributes can be added. 4. {A1, A2, …, An} + = X
cs3431 Inferring FDs: Example 1 Consider R (A, B, C, D, E) with FDs A B, B C, CD E Does A E? (Is A E in F+ ?) Rephrase as : Is E in A+ ? Let us compute {A} + {A} + = {A, B, C} Therefore, A E is false
cs3431 Inferring FDs: Example 2 Given R (A, B, C), and FDs : A B, B C, C A What are possible keys for R ? Compute the closure of attributes: {A} + = {A, B, C} {B} + = {A, B, C} {C} + = {A, B, C} So keys for R are,,
cs3431 Decomposing Relations sNumbersNamepNumberpName s1Davep1MM s2Gregp2MM StudentProf FDs: pNumber pName sNumbersNamepNumber s1Davep1 s2Gregp2 Student pNumberpName p1MM p2MM Professor
cs3431 Decomposition Decomposition: Must be Lossless (no spurious tuples)
cs3431 Decomposition: Lossless Join Property sNumbersNamepName S1DaveMM S2GregMM Student pNumberpName p1MM p2MM Professor sNumbersNamepNumberpName s1Davep1MM s1Davep2MM s2Gregp1MM s2Gregp2MM StudentProf Spurious Tuples
cs3431 Normalization How do I decide if I need to further decompose? Once decided, what is the algorithm for decomposing?