Revisit FDs & BCNF Normalization 1 Instructor: Mohamed Eltabakh
Announcements Project Phase 2 is due Now !!! Project Phase 3 will be out today (Nov. 11) and due on Nov. 22 (8:00am) Project feedback !!! Keep in mind the midterm exam is on Nov. 22 2
FDs and Normalization Given a database schema, how do you judge whether or not the design is good? How do you ensure it does have redundancy or anomaly problems? To ensure your database schema is in a good form we use: Functional Dependencies Normalization Rules 3
Usage of Functional Dependencies Discover all dependencies between attributes Identify the keys of relations Enable good (Lossless) decomposition of a given relation 4
Functional Dependencies (FDs) The basic form of a FDs A1,A2,…An B1, B2,…Bm 5 L.H.SR.H.S >> The values in the L.H.S uniquely determine the values in the R.H.S attributes (when you lookup the DB) >> It does not mean that L.H.S values compute the R.H.S values >> The values in the L.H.S uniquely determine the values in the R.H.S attributes (when you lookup the DB) >> It does not mean that L.H.S values compute the R.H.S values Examples: SSN personName, personDoB, personAddress DepartmentID, CourseNum CourseTitle, NumCredits personName personAddress X
Functional Dependencies (FDs) Let R be a relation schema where α ⊆ R and β ⊆ R -- α and β are subsets of R’s attributes The functional dependency α→β holds on R if and only if: For any legal instance of R, whenever any two tuples t1 and t2 agree on the attributes α, they also agree on the attributes β. That is, t1[α]=t2[α] ⇒ t1[β] =t2[β] 6 AB A B (Does not hold) B A (holds) A B (holds) B A (holds) AB 4 A B (holds) B A (Does not holds) AB 4 4
Functional Dependencies & Keys K is a superkey for relation schema R if and only if K → R -- K determines all attributes of R K is a candidate key for R if and only if K→R, and No α ⊂ K, α→R 7 Keys imply FDs, and FDs imply keys Minimal superkey
Example I Student(SSN, Fname, Mname, Lname, DoB, address, age, admissionDate) If you know that SSN is a key, Then SSN Fname, Mname, Lname, DoB, address, age, admissionDate If you know that (Fname, Mname, Lname) is a key, Then Fname, Mname, Lname SSN, DoB, address, age, admissionDate If you know that SSN Fname, Mname, Lname, DoB, address, age, admissionDate Then, we infer that SSN is a candidate key If you know that Fname, Mname, Lname SSN, DoB, address, age, admissionDate Then, we infer that (Fname, Mname, Lname) is a key. Is it Candidate or super key??? Does any pair of attributes together form a key?? If no (Fname, Mname, Lname) is a candidate key If yes (Fname, Mname, Lname) is a super key 8
Example II Does this FD hold? title year length genre studioName Does this FD hold? title year starName What is a key of this relation? {title, year, starName} Is it candidate key? 9 YES NO >> For this instance not a candidate key >> In general it can be candidate key (depending on the assumptions) >> For this instance not a candidate key >> In general it can be candidate key (depending on the assumptions)
Properties of FDs Consider A, B, C, Z are sets of attributes Reflexive (trivial): A B is trivial if B A 10
Properties of FDs (Cont’d) Consider A, B, C, Z are sets of attributes Transitive: if A B, and B C, then A C Augmentation: if A B, then AZ BZ Union: if A B, A C, then A BC Decomposition: if A BC, then A B, A C 11 Use these properties to derive more FDs
Example Given R( A, B, C, D, E) F = {A BC, DE C, B D} Is A a key for R or not? Does A determine all other attributes? A A B C D Is BE a key for R? BE B E D C Is ABE a candidate or super key for R? ABE A B E D C AE A E B C D 12 NO >> ABE is a super key >> AE is a candidate key >> ABE is a super key >> AE is a candidate key
Closure of Functional Dependencies Given a set F set of functional dependencies, there are other FDs that can be inferred based on F For example: If A → B and B → C, then we can infer that A → C Closure set F F + The set of all FDs that can be inferred from F We denote the closure of F by F + F + is a superset of F 13
Functional Closure: Example Given R( A, B, C, D, E) F = {A BC, DE C, B D} Report 4 FDs in F + A A B C D AE A B C D E DEB C B B E B E D C Which properties did we use to infer these extra FDs ?? 14
Attribute Closure Attribute Closure of A Given a set of FDs, compute all attributes X that A determines A X Attribute closure is easy to compute Just recursively apply the transitive property A can be a single attribute or set of attributes 15
Algorithm for Computing Attribute Closures Computing the closure of set of attributes {A1, A2, …, An}: 1. Let X = {A1, A2, …, An} 2. If there exists a FD: B1, B2, …, Bm C, such that every Bi X, then X = X C 3. Repeat step 2 until no more attributes can be added. X is the closure of the {A1, A2, …, An} attributes X = {A1, A2, …, An} + 16
Example 1: Attribute Closure Assume relation R (A, B, C, D, E) Given F = {A B, B C, C D E } What is the attribute closure of A (A + )? A + = {A} A + = {A, B} A + = {A, B, C} 17
Example 2: Attribute Closure Given R( A, B, C, D, E) F = {A BC, DE C, B D} What is the attribute closure {AB} + ? {AB} + = {A B} {AB} + = {A B C} {AB} + = {A B C D} What is the attribute closure {BE} + ? {BE} + = {B E} {BE} + = {B E D} {BE} + = {B E D C} 18 Set of attributes α is a key if α + contains all attributes
Summary of FDs They capture the dependencies between attributes How to infer more FDs using properties such as transitivity, augmentation, and union Functional closure F + Attribute closure A + Relationship between FDs and keys 19
Normalization 20
Normalization Set of rules to avoid “bad” schema design Decide whether a particular relation R is in “good” form If not, decompose R to be in a “good” form Several levels of normalization First Normal Form (1NF) BCNF Third Normal Form (3NF) Fourth Normal Form (4NF) If a relation is in a certain normal form, then it is known that certain kinds of problems are avoided or minimized 21
First Normal Form (1NF) Attribute domain is atomic if its elements are considered to be indivisible units (primitive attributes) Examples of non-atomic domains are multi-valued and composite attributes A relational schema R is in first normal form (1NF) if the domains of all attributes of R are atomic 22 We assume all relations are in 1NF
First Normal Form (1NF): Example 23 Since all attributes are primitive It is in 1NF
Boyce-Codd Normal Form (BCNF): Definition A relation schema R is in BCNF with respect to a set F of functional dependencies if for all functional dependencies in F + of the form α → β where α ⊆ R and β ⊆ R, then at least one of the following holds: α → β is trivial (i.e.,β ⊆ α) α is a superkey for R 24
BCNF: Example 25 sNumbersNamepNumberpName s1Davep1MM s2Gregp2ER s3Mikep1MM Student Student InfoProfessor Info Is relation Student in BCNF given FD: pNumber pName It is not trivial FD pNumber is not a key in Student relation How to fix it and make it in BCNF??? NO
Decomposing a Schema into BCNF If R is not in BCNF because of non-trivial dependency α → β, then decompose R R is decomposed into two relations R1 = (α U β ) -- α is super key in R1 R2 = (R- (β - α)) -- R2.α is foreign keys to R1.α 26
Example of BCNF Decomposition sNumbersNamepNumberpName s1Davep1MM s2Gregp2MM StudentProf FDs: pNumber pName sNumbersNamepNumber s1Davep1 s2Gregp2 Student pNumberpName p1MM p2MM Professor FOREIGN KEY: Student (PNum) references Professor (PNum) 27
What is Nice about this Decomposing ??? R is decomposed into two relations R1 = (α U β ) -- α is super key in R1 R2 = (R- (β - α)) -- R2.α is foreign keys to R1.α 28 This decomposition is lossless (Because R1 and R2 can be joined based on α, and α is unique in R1) This decomposition is lossless (Because R1 and R2 can be joined based on α, and α is unique in R1) When you join R1 and R2 on α, you get R back without lose of information
StudentProf = Student ⋈ Professor sNumbersNamepNumberpName s1Davep1MM s2Gregp2MM StudentProf FDs: pNumber pName sNumbersNamepNumber s1Davep1 s2Gregp2 Student pNumberpName p1MM p2MM Professor FOREIGN KEY: Student (PNum) references Professor (PNum) 29