Download presentation
Presentation is loading. Please wait.
1
Functional Dependencies and Normalization
Instructor: Mohamed Eltabakh
2
FDs and Normalization Given a database schema, how do you judge whether or not the design is good? How do you ensure it does not have redundancy or anomaly problems? To ensure your database schema is in a good form we use: Functional Dependencies Normalization Rules
3
What is Normalization Normalization is a set of rules to systematically achieve a good design If these rules are followed, then the DB design is guarantee to avoid several problems: Inconsistent data Anomalies: insert, delete and update Redundancy: which wastes storage, and often slows down query processing
4
Problem I: Insert Anomaly
Student sNumber sName pNumber pName s1 Dave p1 MM s2 Greg p2 ER Student Info Professor Info Question: Could we insert a professor without student? Note: We cannot insert a professor who has no students. Insert Anomaly: We are not able to insert “valid” value/(s)
5
Problem II: Delete Anomaly
Student sNumber sName pNumber pName s1 Dave p1 MM s2 Greg p2 ER Student Info Professor Info Question: Can we delete a student and keep a professor info ? Note: We cannot delete a student that is the only student of a professor. Delete Anomaly: We are not able to perform a delete without losing some “valid” information.
6
Problem III: Update Anomaly
Student sNumber sName pNumber pName s1 Dave p1 MM s2 Greg VV VV Student Info Professor Info Question: Can we simply update a professor’s name ? Note: To update the name of a professor, we have to update in multiple tuples. Update Anomaly: To update a value, we have to update multiple rows. Update anomalies are due to redundancy.
7
Problem IV: Inconsistency
Student sNumber sName pNumber pName s1 Dave p1 MM s2 Greg VV Student Info Professor Info What if the name of professor p1 is updated in one place and not the other!!! Inconsistent Data: The same object has multiple values. Inconsistency is due to redundancy.
8
Schema Normalization Following the normalization rules, we avoid
Insert anomaly Delete anomaly Update anomaly Inconsistency
9
When to combine and when to decompose???
Combining Tables Suppose we combine borrow and loan to get bor_loan = (customer_id, loan_number, amount ) A loan can be given to multiple customers Result is possible repetition of information (L-100 in example below) When to combine and when to decompose???
10
After the join, did not get back the original correct data
Decomposing Tables After the join, did not get back the original correct data
11
What is Needed… Functional Dependency Normalization Theory
A method to find “dependencies” between attributes Normalization Theory Rules to remove harmful dependencies, when they exist Relational decomposition Break R (A,B,C,D) into R1 (A, B) and R2 (B, C, D) These two together are used to: Decide whether a particular relation R is in “good” form If not, how to decompose R to be in a “good” form
12
What to Cover Functional Dependencies (FDs)
Closure of Functional Dependencies Lossy & Lossless Decomposition Normalization
13
Functional Dependencies (FDs)
14
Usage of Functional Dependencies
Discover all dependencies between attributes Identify the keys of relations Enable good (Lossless) decomposition of a given relation
15
Keys : Revisited A key for a relation R(a1, a2, …, an) is a set of attributes, K, that together uniquely determine the values for all attributes of R. A Candidate key is minimal: no subset of K is a key. A super key need not be minimal A prime attribute: an attribute that is part of a key
16
Functional Dependencies (FDs)
Student sNumber sName address 1 Dave 144FL 2 Greg 320FL Suppose we have the FD: sNumber address That is, there is a functional dependency from sNumber to address Meaning: A student number determines the student address Or: For any two rows in the Student relation with the same value for sNumber, the value for address must be same.
17
Functional Dependencies (FDs)
Require that the value for a certain set of attributes determines uniquely the value for another set of attributes A functional dependency is a generalization of the notion of a key FD: A1,A2,…An B1, B2,…Bm L.H.S R.H.S
18
Functional Dependencies (FDs)
The basic form of a FDs A1,A2,…An B1, B2,…Bm L.H.S R.H.S >> The values in the L.H.S uniquely determine the values in the R.H.S attributes (when you lookup the DB) >> It does not mean that L.H.S values compute the R.H.S values Examples: SSN personName, personDoB, personAddress DepartmentID, CourseNum CourseTitle, NumCredits personName personAddress X
19
FD and Keys Student sNumber sName address 1 Dave 144FL 2 Greg 320FL
Primary Key : <sNumber> Questions : Does a primary key implies functional dependencies? Which ones ? Does unique keys imply functional dependencies? Which ones ? Does a functional dependency imply keys ? Which ones ? We assume NO NULL values here. Observation : Any key (primary or candidate) or superkey of a relation R functionally determines all attributes of R.
20
Functional Dependencies (FDs)
Let R be a relation schema where α⊆R and β⊆R -- α and β are subsets of R’s attributes The functional dependency α→β holds on R if and only if: For any legal instance of R, whenever any two tuples t1 and t2 agree on the attributes α, they also agree on the attributes β. That is, t1[α]=t2[α] ⇒ t1[β] =t2[β] A B A B (Does not hold) B A (holds)
21
Functional Dependencies & Keys
K is a superkey for relation schema R if and only if K → R -- K determines all attributes of R K is a candidate key for R if and only if K→R, and No α⊂K, α→R Keys imply FDs, and FDs imply keys
22
Example I If you know that SSN is a key, Then
Student(SSN, Fname, Mname, Lname, DoB, address, age, admissionDate) If you know that SSN is a key, Then SSN Fname, Mname, Lname, DoB, address, age, admissionDate If you know that (Fname, Mname, Lname) is a key, Then Fname, Mname, Lname SSN, DoB, address, age, admissionDate
23
Example II Student(SSN, Fname, Mname, Lname, DoB, address, age, admissionDate) If you know that SSN Fname, Mname, Lname, DoB, address, age, admissionDate Then, we infer that SSN is a candidate key If you know that Fname, Mname, Lname SSN, DoB, address, age, admissionDate Then, we infer that (Fname, Mname, Lname) is a key. Is it Candidate or super key??? Does any pair of attributes together form a key?? If no (Fname, Mname, Lname) is a candidate key (minimal) If yes (Fname, Mname, Lname) is a super key
24
Example III What is a key of this relation? Does this FD hold? YES
Title, year length, genre, studioName Title, year starName What is a key of this relation? {title, year, starName} Is it candidate key? YES NO >> For this instance not a candidate key (title, starName) can be a key
25
Properties of FDs Consider A, B, C, Z are sets of attributes
Reflexive (trivial): A B is trivial if B A
26
Properties of FDs (Cont’d)
Consider A, B, C, Z are sets of attributes Transitive: if A B, and B C, then A C Augmentation: if A B, then AZ BZ Union: if A B, A C, then A BC Decomposition: if A BC, then A B, A C Use these properties to derive more FDs
27
Use the FD properties to derive more FDs
Example Use the FD properties to derive more FDs Given R( A, B, C, D, E) F = {A BC, DE C, B D} Is A a key for R or not? Does A determine all other attributes? A A B C D Is BE a key for R? BE B E D C Is ABE a candidate or super key for R? ABE A B E D C AE A E B C D NO NO >> ABE is a super key >> AE is a candidate key
28
What to Cover Functional Dependencies (FDs)
Closure of Functional Dependencies Lossy & Lossless Decomposition Normalization
29
Closure of a Set of Functional Dependencies
Given a set F set of functional dependencies, there are other FDs that can be inferred based on F For example: If A → B and B → C, then we can infer that A → C Closure set F F+ The set of all FDs that can be inferred from F We denote the closure of F by F+ F+ is a superset of F Computing the closure F+ of a set of FDs can be expensive
30
Inferring FDs Suppose we have: Question:
a relation R (A, B, C, D) and functional dependencies A B, C D, A C Question: What is a key for R? We can infer A ABC, and since C D, then A ABCD Hence A is a key in R Is it is the only key ???
31
Attribute Closure Attribute Closure of A
Given a set of FDs, compute all attributes X that A determines A X Attribute closure is easy to compute Just recursively apply the transitive property A can be a single attribute or set of attributes 21
32
Algorithm for Computing Attribute Closures
Computing the closure of set of attributes {A1, A2, …, An}: Let X = {A1, A2, …, An} If there exists a FD: B1, B2, …, Bm C, such that every Bi X, then X = X C Repeat step 2 until no more attributes can be added. X is the closure of the {A1, A2, …, An} attributes X = {A1, A2, …, An} +
33
Example 1: Inferring FDs
Assume relation R (A, B, C) Given FDs : A B, B C, C A What are the possible keys for R ? Compute the closure of each attribute X, i.e., X+ X+ contains all attributes, then X is a key For example: {A}+ = {A, B, C} {B}+ = {A, B, C} {C}+ = {A, B, C} So keys for R are <A>, <B>, <C>
34
Example 2: Attribute Closure
Given R( A, B, C, D, E) F = {A BC, DE C, B D} What is the attribute closure {AB}+ ? {AB}+ = {A B} {AB}+ = {A B C} {AB}+ = {A B C D} What is the attribute closure {BE}+ ? {BE}+ = {B E} {BE}+ = {B E D} {BE}+ = {B E D C} Set of attributes α is a key if α+ contains all attributes
35
Example 3: Inferring FDs
Assume relation R (A, B, C, D, E) Given F = {A B, B C, C D E } Does A E? The above question is the same as Is E in the attribute closure of A (A+)? Is A E in the function closure F+ ? A E does not hold A D ABCDE does hold A D is a key for R 21
36
Summary of FDs They capture the dependencies between attributes
How to infer more FDs using properties such as transitivity, augmentation, and union Functional closure F+ Attribute closure A+ Relationship between FDs and keys
37
What to Cover Functional Dependencies (FDs)
Closure of Functional Dependencies Lossy & Lossless Decomposition Normalization
38
Decomposing Relations
StudentProf Greg Dave sName p2 p1 pNumber MM s2 s1 pName sNumber FDs: pNumber pName Lossless Greg Dave sName p2 p1 pNumber s2 s1 sNumber Student MM pName Professor Lossy Greg Dave sName MM pName S2 S1 sNumber Student p2 p1 pNumber Professor
39
Lossless vs. Lossy Decomposition
Assume R is divided into R1 and R2 Lossless Decomposition R1 natural join R2 should create exactly R Lossy Decomposition R1 natural join R2 adds more records (or deletes records) from R
40
Lossless Decomposition
StudentProf Greg Dave sName p2 p1 pNumber MM s2 s1 pName sNumber FDs: pNumber pName Lossless Greg Dave sName p2 p1 pNumber s2 s1 sNumber Student MM pName Professor Student & Professor are lossless decomposition of StudentProf (Student ⋈ Professor = StudentProf)
41
Lossy Decomposition StudentProf Greg Dave sName p2 p1 pNumber MM s2 s1
pName sNumber FDs: pNumber pName Lossy Greg Dave sName MM pName S2 S1 sNumber Student p2 p1 pNumber Professor Student & Professor are lossy decomposition of StudentProf (Student ⋈ Professor != StudentProf)
42
Goal: Ensure Lossless Decomposition
How to ensure lossless decomposition? Answer: The common columns must be candidate key in one of the two relations
43
Back to our example StudentProf Greg Dave sName p2 p1 pNumber MM s2 s1
pName sNumber pNumber is candidate key FDs: pNumber pName Lossless Greg Dave sName p2 p1 pNumber s2 s1 sNumber Student MM pName Professor pName is not candidate key Lossy Greg Dave sName MM pName S2 S1 sNumber Student p2 p1 pNumber Professor
44
What to Cover Functional Dependencies (FDs)
Closure of Functional Dependencies Lossy & Lossless Decomposition Normalization
45
Normalization
46
Normalization Set of rules to avoid “bad” schema design
Decide whether a particular relation R is in “good” form If not, decompose R to be in a “good” form Several levels of normalization First Normal Form (1NF) BCNF Third Normal Form (3NF) Fourth Normal Form (4NF) If a relation is in a certain normal form, then it is known that certain kinds of problems are avoided or minimized
47
We assume all relations are in 1NF
First Normal Form (1NF) Attribute domain is atomic if its elements are considered to be indivisible units (primitive attributes) Examples of non-atomic domains are multi-valued and composite attributes A relational schema R is in first normal form (1NF) if the domains of all attributes of R are atomic We assume all relations are in 1NF
48
First Normal Form (1NF): Example
Since all attributes are primitive It is in 1NF
49
Boyce-Codd Normal Form (BCNF): Definition
A relation schema R is in BCNF with respect to a set F of functional dependencies if for all functional dependencies in F+ of the form α → β where α ⊆ R and β ⊆ R, then at least one of the following holds: α → β is trivial (i.e.,β⊆α) α is a superkey for R
50
BCNF: Example Is relation Student in BCNF given FD: pNumber pName
sNumber sName pNumber pName s1 Dave p1 MM s2 Greg p2 ER s3 Mike Student Info Professor Info Is relation Student in BCNF given FD: pNumber pName It is not trivial FD pNumber is not a key in Student relation How to fix it and make it in BCNF??? NO
51
Decomposing a Schema into BCNF
If R is not in BCNF because of non-trivial dependency α → β, then decompose R R is decomposed into two relations R1 = (α U β ) α is super key in R1 R2 = (R- (β - α)) -- R2.α is foreign keys to R1.α
52
Example of BCNF Decomposition
StudentProf sNumber sName pNumber pName s1 Dave p1 MM s2 Greg p2 FDs: pNumber pName Student Professor sNumber sName pNumber s1 Dave p1 s2 Greg p2 pNumber pName p1 MM p2 FOREIGN KEY: Student (PNum) references Professor (PNum)
53
What is Nice about this Decomposing ???
R is decomposed into two relations R1 = (α U β ) α is super key in R1 R2 = (R- (β - α)) -- R2.α is foreign keys to R1.α This decomposition is lossless (Because R1 and R2 can be joined based on α, and α is unique in R1) When you join R1 and R2 on α, you get R back without lose of information
54
StudentProf = Student ⋈ Professor
sNumber sName pNumber pName s1 Dave p1 MM s2 Greg p2 FDs: pNumber pName Student Professor sNumber sName pNumber s1 Dave p1 s2 Greg p2 pNumber pName p1 MM p2 FOREIGN KEY: Student (PNum) references Professor (PNum)
55
Multi-Step Decomposition
Relation R and functional dependency F R = (customer_name, loan_number, branch_name, branch_city, assets, amount ) F = {branch_name assets branch_city, loan_number amount branch_name} Is R in BCNF ?? Based on branch_name assets branch_city R1 = (branch_name, assets, branch_city) R2 = (customer_name, loan_number, branch_name, amount) Are R1 and R2 in BCNF ? Divide R2 based on loan_number amount branch_name R3 = (loan_number, amount, branch_name) R4 = (customer_name, loan_number) NO R2 is not Final Schema has R1, R3, R4
56
What is NOT Nice about BCNF
Dependency Preservation After the decomposition, all FDs in F+ should be preserved BCNF does not guarantee dependency preservation Can we always find a decomposition that is both BCNF and preserving dependencies? No…This decomposition may not exist That is why we study a weaker normal form called (third normal form –3NF)
57
Decomposition : Dependency Preserving
Intuition: Can we check functional dependencies locally in each decomposed relation, and assure that globally all constraints are enforced by that? 3
58
Example of Lost FD Assume relation R(C, S, J, D, T, Q, V)
C is key, JT C and SD T C CSJDTQV (C is key) -- Good for BCNF JT CSJDTQV (JT is key) -- Good for BCNF SD T (SD is not a key) –Bad for BCNF Decomposition: R1(C, S, J, D, Q, V) and R2(S, D, T) Problem: Can JT C be checked? This dependency is lost !!! Lossless & in BCNF 3
59
Dependency Preservation Test
Assume R is decomposed into R1 and R2 The closure of FDs in R is F+ The FDs in R1 and R2 are FR1 and FR2, respectively Then dependencies are preserved if: F+ = (FR1 union FR2)+ Projection of dependencies on R1 Projection of dependencies on R2 4
60
Back to Our Example Assume relation R(C, S, J, D, T, Q, V)
C is key, JT C and SD T C CSJDTQV (C is key) -- Good for BCNF JT CSJDTQV (JT is key) -- Good for BCNF SD T (SD is not a key) –Bad for BCNF Decomposition: R1(C, S, J, D, Q, V) and R2(S, D, T) F+ = {C CSJDTQV, JT CSJDTQV, SD T} FR1 = {C CSJDQV} FR2 = {SD T} FR1 U FR2 = {C CSJDQV, SD T} (FR1 U FR2)+ = {C CSJDQV, SD T, C T} JT C is still missing 3
61
Another Example Assume relation R (A, B, C) with
F = {A B, B C, C A} Is the following decomposition dependency preserving ? R1(AB), R2(BC) NO (C A is lost) 4
62
Dependency Preservation
BCNF does not necessarily preserve FDs. But 3NF is guaranteed to be able to preserve FDs.
63
Third Normal Form: Motivation
There are some situations where BCNF is not dependency preserving Solution: Define a weaker normal form, called Third Normal Form (3NF) Allows some redundancy (we will see examples later) But all FDs can be checked on individual relations without computing a join There is always a lossless-join, dependency-preserving decomposition into 3NF
64
R.H.S consists of prime attributes
Normal Form : 3NF Relation R is in 3NF if, for every FD in F+ α β, where α ⊆ R and β ⊆ R, at least one of the following holds: α → β is trivial (i.e.,β⊆α) α is a superkey for R Each attribute in β-α is part of a candidate key (prime attribute) L.H.S is superkey OR R.H.S consists of prime attributes
65
Comparison between 3NF & BCNF ?
If R is in BCNF, obviously R is in 3NF If R is in 3NF, R may not be in BCNF 3NF allows some redundancy and is weaker than BCNF 3NF is a compromise to use when BCNF with good constraint enforcement is not achievable Important: Lossless-join, dependency-preserving decomposition of R into a collection of 3NF relations always possible ! 24
66
Example Relation R= (J,K,L) Is R in BCNF ? Is R in 3NF ?
F = {JK → L, L → K } Two candidate keys: JK and JL Is R in BCNF ? Is R in 3NF ? JK → L (JK is a superkey) L → K (K is contained in a candidate key) NO YES
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.