Presentation is loading. Please wait.

Presentation is loading. Please wait.

Normalization cs3431.

Similar presentations


Presentation on theme: "Normalization cs3431."— Presentation transcript:

1 Normalization cs3431

2 Why Normalization? To remove potential redundancy in design
Redundancy causes several anomalies: insert, delete and update Redundancy wastes storage, and often slows down query processing cs3431

3 Insert Anomaly Student sNumber sName pNumber pName s1 Dave p1 MM s2
Greg p2 ER Question: Could we insert any professor ? Note: We cannot insert a professor who has no students. Insert Anomaly: We are not able to insert “valid” value/(s) cs3431

4 Delete Anomaly Student sNumber sName pNumber pName s1 Dave p1 MM s2
Greg p2 ER Question: Can we delete a student and keep a professor info ? Note: We cannot delete a student that is the only student of a professor. Delete Anomaly: We are not able to perform a delete without losing some “valid” information. Note: In both cases, minimum cardinality of Professor in the corresponding ER schema is 0 cs3431

5 Update Anomaly Student sNumber sName pNumber pName s1 Dave p1 MM s2
Greg Question: Can we simply update a professor’s name ? Note: To update the name of a professor, we have to update in multiple tuples. Update Anomaly: To update a value, we have to update multiple rows. Update anomalies are due to redundancy. Note the maximum cardinality of Professor in the corresponding ER schema is * cs3431

6 Normalization Need a method to find “dependencies” between attributes
Functional dependencies Need a method to remove such harmful dependencies, when they exist Relational decomposition Break R (A,B,C,D) into R1 (A, B) and R2 (B, C, D) cs3431

7 Keys : Revisited A key for a relation R (a1, a2, …, an) is a set of attributes, K, that together uniquely determine the values for all attributes of R. A key is minimal: no subset of K is a key. A superkey may not be minimal A prime attribute: an attribute that is part of a key cs3431

8 Functional Dependencies (FDs)
Student sNumber group address 1 DB 144FL 2 AI 320FL Suppose we have the FD: group address That is, there is a function from group to address Meaning: For any two rows in the Student relation with the same value for group, the value for address must be same. cs3431

9 FD and Keys Student sNumber group address 1 DB 144FL 2 AI 320FL Primary Key : <sNumber> FD : group  address Questions : Does a key implies functional dependencies? Which ones ? Does a functional dependency imply keys ? Which ones ? We assume NO NULL values here. Observation : Any key (primary or candidate) or superkey of a relation R functionally determines all attributes of R. cs3431

10 Properties of FDs Consider A, B, C, Z are sets of attributes
Reflexive (trivial FD): if A  B, then A  B Transitive: if A  B, and B  C, then A  C Augmentation: if A  B, then AZ  BZ Union: if A  B, A  C, then A  BC Decomposition: if A  BC, then A  B, A  C Note: Sound and complete inference rules for FDs cs3431

11 Inferring FDs Suppose we have : Questions :
a relation R (A, B, C) and functional dependencies A  B, B  C, C  A Questions : What is a key for R? Should we split R into multiple relations? We can infer A  ABC, B  ABC, C  ABC. Hence A, B, C are all keys. cs3431

12 Reasoning About FDs An FD f is implied by a set of FDs F if f holds whenever all FDs in F hold. Closure of F, denoted by F+, is the set of all FDs that are implied by F. Computing closure F+ of a set of FDs can be expensive. Size of closure is exponential in # attrs! cs3431 19

13 Reasoning About FDs But given question :
Is X  Y in closure of a set of FDs F? Fortunately, computing just attribute closure is sufficient (and linear time complexity) Compute attribute closure of X, denoted X+, wrt F: Set of all attributes A such that X  A is in F+ Check if Y is in X+ . If yes, then X  Y in F+. cs3431 21

14 Reasoning About FDs (Contd.)
Does F = {A B, B C, C D E } imply A E? Question : i.e, is A E in the closure F+ ? Equivalent Question : Is E in the attribute closure ? cs3431 21

15 Algorithm for Inference of FDs
Computing the closure of set of attributes {A1, A2, …, An}: Let X = {A1, A2, …, An} If there exists a FD : B1, B2, …, Bm  C, such that every Bi  X, then X = X  C Repeat step 2 until no more attributes can be added. {A1, A2, …, An}+ = X cs3431

16 Inferring FDs: Example 1
Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E? (Is A  E in F+ ?) Rephrase as : Is E in A+ ? Let us compute {A}+ {A}+ = {A, B, C} Therefore, A  E is false cs3431

17 Inferring FDs: Example 2
Given R (A, B, C), and FDs : A  B, B  C, C  A What are possible keys for R ? Compute the closure of attributes: {A}+ = {A, B, C} {B}+ = {A, B, C} {C}+ = {A, B, C} So keys for R are <A>, <B>, <C> cs3431

18 Decomposing Relations
StudentProf Greg Dave sName p2 p1 pNumber MM s2 s1 pName sNumber FDs: pNumber  pName Greg Dave sName p2 p1 pNumber s2 s1 sNumber Student MM pName Professor Greg Dave sName MM pName S2 S1 sNumber Student p2 p1 pNumber Professor cs3431

19 Decomposition Decomposition: Must be Lossless (no spurious tuples)
cs3431

20 Decomposition: Lossless Join
Greg Dave sName p2 p1 pNumber MM s2 s1 pName sNumber Greg Dave sName MM pName S2 S1 sNumber Student p2 p1 pNumber Professor StudentProf sNumber sName pNumber pName s1 Dave p1 MM p2 s2 Greg Spurious Tuples cs3431

21 Normalization Once decided, what is the algorithm for (lossless) decomposing? cs3431 21

22 Normalization Step : Decompose
Consider relation R with set of attributes AR. Consider a FD : A  B (such that no other attribute in (AR – A – B) is functionally determined by A). If A is not a superkey for R, we may decompose R as: Create R’ with attributes (AR – B) Create R’’ with attributes A  B Key for R’’ = A Foreign key : R’ (A) references R’’ (A) cs3431

23 Example Decomposition Revisited
StudentProf sNumber sName pNumber pName s1 Dave p1 MM s2 Greg p2 FDs: pNumber  pName Student Professor sNumber sName pNumber s1 Dave p1 s2 Greg p2 pNumber pName p1 MM p2 FOREIGN KEY: Student (PNum) references Professor (PNum) cs3431

24 Normalization How do I decide if I need to further decompose?
Once decided, what is the algorithm for decomposing? cs3431 21


Download ppt "Normalization cs3431."

Similar presentations


Ads by Google