Functional Dependencies and Normalization

Slides:



Advertisements
Similar presentations
Schema Refinement: Normal Forms
Advertisements

Normalization 1 Instructor: Mohamed Eltabakh Part II.
Murali Mani Normalization. Murali Mani What and Why Normalization? To remove potential redundancy in design Redundancy causes several anomalies: insert,
©Silberschatz, Korth and Sudarshan Relational Database Design First Normal Form Pitfalls in Relational Database Design Functional Dependencies Decomposition.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
Normalization DB Tuning CS186 Final Review Session.
Normalization DB Tuning CS186 Final Review Session.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
Nov 11, 2003Murali Mani Normalization B term 2004: lecture 7, 8, 9.
1 Normalization Chapter What it’s all about Given a relation, R, and a set of functional dependencies, F, on R. Assume that R is not in a desirable.
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.
Department of Computer Science and Engineering, HKUST Slide 1 7. Relational Database Design.
Chapter 8: Relational Database Design First Normal Form First Normal Form Functional Dependencies Functional Dependencies Decomposition Decomposition Boyce-Codd.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design Pitfalls in.
Relational Database Design by Relational Database Design by Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING.
Normal Forms1. 2 The Problems of Redundancy Redundancy is at the root of several problems associated with relational schemas: Wastes storage Causes problems.
Chapter 8: Relational Database Design First Normal Form First Normal Form Functional Dependencies Functional Dependencies Decomposition Decomposition Boyce-Codd.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Functional Dependencies and Normalization 1 Instructor: Mohamed Eltabakh
Revisit FDs & BCNF Normalization 1 Instructor: Mohamed Eltabakh
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
1 CS122A: Introduction to Data Management Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li.
Functional Dependencies and Normalization 1 Instructor: Mohamed Eltabakh Part 2.
COP 6726: New Directions in Database Systems
Normalization Database Management Systems, 3rd ed., Ramakrishnan and Gehrke, Chapter 19.
Functional Dependency and Normalization
Advanced Normalization
Schedule Today: Next After that Normal Forms. Section 3.6.
Schema Refinement and Normal Forms
CS411 Database Systems 08: Midterm Review Kazuhiro Minami 1.
Module 5: Overview of Database Design -- Normalization
Normalization First Normal Form (1NF) Boyce-Codd Normal Form (BCNF)
Database Design Dr. M.E. Fayad, Professor
Relational Database Design by Dr. S. Sridhar, Ph. D
Chapter 7: Relational Database Design
Relational Database Design
CS 480: Database Systems Lecture 22 March 6, 2013.
Relational Database Design
Chapter 8: Relational Database Design
Handout 4 Functional Dependencies
Advanced Normalization
Chapter 8: Relational Database Design
Functional Dependencies and Normalization
Schema Refinement and Normalization
Module 5: Overview of Normalization
Normalization Murali Mani.
Schema Refinement and Normal Forms
Functional Dependencies and Normalization
Functional Dependencies and Normalization
Sridhar Narayan Normalization Sridhar Narayan
Normalization Part II cs3431.
Functional Dependencies and Normalization
Normalization cs3431.
CS 405G: Introduction to Database Systems
Instructor: Mohamed Eltabakh
Chapter 19 (part 1) Functional Dependencies
Designing Relational Databases
Schema Refinement and Normalization
Relational Database Design
Instructor: Mohamed Eltabakh
Database Design Dr. M.E. Fayad, Professor
Chapter 7a: Overview of Database Design -- Normalization
Functional Dependencies and Normalization
CS4222 Principles of Database System
Presentation transcript:

Functional Dependencies and Normalization Instructor: Mohamed Eltabakh meltabakh@cs.wpi.edu

What to Cover Functional Dependencies (FDs) Closure of Functional Dependencies Lossy & Lossless Decomposition Normalization

Normalization First Normal Form (1NF) Boyce-Codd Normal Form (BCNF) Third Normal Form (3NF) Canonical Cover of FDs

Normalization Set of rules to avoid “bad” schema design Decide whether a particular relation R is in “good” form If not, decompose R to be in a “good” form Several levels of normalization First Normal Form (1NF) BCNF Third Normal Form (3NF) Fourth Normal Form (4NF) If a relation is in a certain normal form, then it is known that certain kinds of problems are avoided or minimized

We assume all relations are in 1NF First Normal Form (1NF) Attribute domain is atomic if its elements are considered to be indivisible units (primitive attributes) Examples of non-atomic domains are multi-valued and composite attributes A relational schema R is in first normal form (1NF) if the domains of all attributes of R are atomic We assume all relations are in 1NF

First Normal Form (1NF): Example Since all attributes are primitive  It is in 1NF

Boyce-Codd Normal Form (BCNF): Definition A relation schema R is in BCNF with respect to a set F of functional dependencies if for all functional dependencies in F+ of the form α → β where α ⊆ R and β ⊆ R, then at least one of the following holds: α → β is trivial (i.e.,β⊆α) α is a superkey for R Remember: Candidate keys are also superkeys

BCNF: Example Is relation Student in BCNF given pNumber  pName sNumber sName pNumber pName s1 Dave p1 MM s2 Greg p2 ER s3 Mike Student Info Professor Info Is relation Student in BCNF given pNumber  pName It is not trivial FD pNumber is not a key in Student relation How to fix it and make it in BCNF??? NO

Decomposing a Schema into BCNF If R is not in BCNF because of non-trivial dependency α → β, then decompose R R is decomposed into two relations R1 = (α U β ) -- α is super key in R1 R2 = (R- (β - α)) -- R2.α is foreign keys to R1.α

Example of BCNF Decomposition StudentProf sNumber sName pNumber pName s1 Dave p1 MM s2 Greg p2 FDs: pNumber  pName Student Professor sNumber sName pNumber s1 Dave p1 s2 Greg p2 pNumber pName p1 MM p2 FOREIGN KEY: Student (PNum) references Professor (PNum)

What is Nice about this Decomposing ??? R is decomposed into two relations R1 = (α U β ) -- α is super key in R1 R2 = (R- (β - α)) -- R2.α is foreign keys to R1.α This decomposition is lossless (Because R1 and R2 can be joined based on α, and α is unique in R1) When you join R1 and R2 on α, you get R back without lose of information

StudentProf = Student ⋈ Professor sNumber sName pNumber pName s1 Dave p1 MM s2 Greg p2 FDs: pNumber  pName Student Professor sNumber sName pNumber s1 Dave p1 s2 Greg p2 pNumber pName p1 MM p2 BCNF decomposition rule create lossless decomposition

Multi-Step Decomposition Relation R and functional dependency F R = (customer_name, loan_number, branch_name, branch_city, assets, amount ) F = {branch_name  assets branch_city, loan_number  amount branch_name} Is R in BCNF ?? Based on branch_name  assets branch_city R1 = (branch_name, assets, branch_city) R2 = (customer_name, loan_number, branch_name, amount) Are R1 and R2 in BCNF ? Divide R2 based on loan_number  amount branch_name R3 = (loan_number, amount, branch_name) R4 = (customer_name, loan_number) NO R2 is not Final Schema has R1, R3, R4

What is NOT Nice about BCNF Before decomposition, we had set of functional dependencies FDs (Say F) After decomposition, do we still have the same set of FDs or we lost something ??

What is NOT Nice about BCNF Dependency Preservation After the decomposition, all FDs in F+ should be preserved BCNF does not guarantee dependency preservation Can we always find a decomposition that is both BCNF and preserving dependencies? No…This decomposition may not exist That is why we study a weaker normal form called (third normal form –3NF)

Dependency Preserving Assume R is decomposed to R1 and R2 Dependencies of R1 and R2 include: Local dependencies α → β All columns of α and β must be in a single relation Global Dependencies Use transitivity property to form more FDs across R1 and R2 relations Does these dependencies match the ones in R ? Yes  Dependency preserving No  Not dependency preserving 3

Example of Lost FD Assume relation R(C, S, J, D, T, Q, V) C is key, JT  C and SD  T C  CSJDTQV (C is key) -- Good for BCNF JT  CSJDTQV (JT is key) -- Good for BCNF SD  T (SD is not a key) –Bad for BCNF Decomposition: R1(C, S, J, D, Q, V) and R2(S, D, T) Does C CSJDTQV still exist? Yes: C CSJDQV (local), SDT (local), C CSJDQVT (global) Lossless & in BCNF 3

Example of Lost FD (Cont’d) Assume relation R(C, S, J, D, T, Q, V) C is key, JT  C and SD  T C  CSJDTQV (C is key) -- Good for BCNF JT  CSJDTQV (JT is key) -- Good for BCNF SD  T (SD is not a key) –Bad for BCNF Decomposition: R1(C, S, J, D, Q, V) and R2(S, D, T) Does SD T still exist? Yes: SDT (local) Lossless & in BCNF 3

Example of Lost FD (Cont’d) Assume relation R(C, S, J, D, T, Q, V) C is key, JT  C and SD  T C  CSJDTQV (C is key) -- Good for BCNF JT  CSJDTQV (JT is key) -- Good for BCNF SD  T (SD is not a key) –Bad for BCNF Decomposition: R1(C, S, J, D, Q, V) and R2(S, D, T) Does JT CSJDTQV still exist? No this one is lost (no way from the local FDs to get this one) Lossless & in BCNF 3

Dependency Preservation Test Assume R is decomposed into R1 and R2 The closure of FDs in R is F+ The FDs in R1 and R2 are FR1 and FR2, respectively Then dependencies are preserved if: F+ = (FR1 union FR2)+ local dependencies in R1 local dependencies in R2 4

Back to Our Example Assume relation R(C, S, J, D, T, Q, V) C is key, JT  C and SD  T C  CSJDTQV (C is key) -- Good for BCNF JT  CSJDTQV (JT is key) -- Good for BCNF SD  T (SD is not a key) –Bad for BCNF Decomposition: R1(C, S, J, D, Q, V) and R2(S, D, T) F+ = {C  CSJDTQV, JT CSJDTQV, SD T} FR1 = {C  CSJDQV}  local for R1 FR2 = {SD  T}  local for R2 FR1 U FR2 = {C  CSJDQV, SD  T} (FR1 U FR2)+ = {C  CSJDQV, SD  T, C T} JT  C is still missing 3

Dependency Preservation BCNF does not necessarily preserve FDs. But 3NF is guaranteed to be able to preserve FDs.

Normalization First Normal Form (1NF) Boyce-Codd Normal Form (BCNF) Third Normal Form (3NF) Canonical Cover of FDs

Third Normal Form: Motivation There are some situations where BCNF is not dependency preserving Solution: Define a weaker normal form, called Third Normal Form (3NF) Allows some redundancy (we will see examples later) But all FDs are preserved There is always a lossless, dependency-preserving decomposition in 3NF

R.H.S consists of prime attributes Normal Form : 3NF Relation R is in 3NF if, for every FD in F+ α  β, where α ⊆ R and β ⊆ R, at least one of the following holds: α → β is trivial (i.e.,β⊆α) α is a superkey for R Each attribute in β-α is part of a candidate key (prime attribute) L.H.S is superkey OR R.H.S consists of prime attributes

Testing for 3NF Use attribute closure to check for each dependency α → β, if α is a superkey If α is not a superkey, we have to verify if each attribute in (β- α) is contained in a candidate key of R

3NF: Example Is relation Lot in 3NF ? Lot (ID, county, lotNum, area, price, taxRate) Primary key: ID Candidate key: <county, lotNum> FDs: county  taxRate area  price Is relation Lot in 3NF ? NO Decomposition based on county  taxRate Lot (ID, county, lotNum, area, price) County (county, taxRate) Are relations Lot and County in 3NF ? Lot is not

3NF: Example (Cont’d) Is every relation in 3NF ? Lot (ID, county, lotNum, area, price) County (county, taxRate) Candidate key for Lot: <county, lotNum> FDs: county  taxRate area  price Decompose Lot based on area  price Lot (ID, county, lotNum, area) County (county, taxRate) Area (area, price) Is every relation in 3NF ? YES

Comparison between 3NF & BCNF ? If R is in BCNF, obviously R is in 3NF If R is in 3NF, R may not be in BCNF 3NF allows some redundancy and is weaker than BCNF 3NF is a compromise to use when BCNF with good constraint enforcement is not achievable Important: Lossless, dependency-preserving decomposition of R into a collection of 3NF relations always possible ! 24

Normalization First Normal Form (1NF) Boyce-Codd Normal Form (BCNF) Third Normal Form (3NF) Canonical Cover of FDs

Canonical Cover of FDs

Canonical Cover of FDs Given set of FDs (F) with functional closure F+ Canonical Cover (Minimal Cover) = G Is the smallest set of FDs that produce the same F+ There are no extra attributes in the L.H.S or R.H.S of and dependency in G Given set of FDs (F) with functional closure F+ Canonical cover of F is the minimal subset of FDs (G), where G+ = F+ Every FD in the canonical cover is needed, otherwise some dependencies are lost 8

Example : Canonical Cover Given F: A  B, ABCD  E, EF  GH, ACDF  EG Then the canonical cover G: A  B, ACD  E, EF  GH The smallest set (minimal) of FDs that can generate F+ 8

Computing the Canonical Cover Given a set of functional dependencies F, how to compute the canonical cover G

Example : Canonical Cover (Lets Check L.H.S) Given F = {A  B, ABCD  E, EF  G, EF  H, ACDF  EG} Union Step: {A  B, ABCD  E, EF  GH, ACDF  EG} Test ABCD  E Check A: {BCD}+ = {BCD}  A cannot be deleted Check B: {ACD}+ = {A B C D E}  Then B can be deleted Now the set is: {A  B, ACD  E, EF  GH, ACDF  EG} Test ACD  E Check C: {AD}+ = {ABD}  C cannot be deleted Check D: {AC}+ = {ABC}  D cannot be deleted 8

Example: Canonical Cover (Lets Check L.H.S-Cont’d) Now the set is: {A  B, ACD  E, EF  GH, ACDF  EG} Test EF  GH Check E: {F}+ = {F}  E cannot be deleted Check F: {E}+ = {E}  F cannot be deleted Test ACDF  EG None of the H.L.S can be deleted 8

Example: Canonical Cover (Lets Check R.H.S) Now the set is: {A  B, ACD  E, EF  GH, ACDF  EG} Test EF  GH Check G: {EF}+ = {E F H}  G cannot be deleted Check H: {EF}+ = {E F G}  H cannot be deleted Test ACDF  EG Check E: {ACDF}+ = {A B C D F E G}  E can be deleted Now the set is: {A  B, ACD  E, EF  GH, ACDF  G} 8

Example: Canonical Cover (Lets Check R.H.S-Cont’d) Now the set is: {A  B, ACD  E, EF  GH, ACDF  G} Test ACDF  G Check G: {ACDF}+ = {A B C D F E G}  G can be deleted Now the set is: {A  B, ACD  E, EF  GH} The canonical cover is: {A  B, ACD  E, EF  GH} 8

Canonical Cover Used to find the smallest (minimal) set of FDs that have the same closure as the original set. Used in the decomposition of relations to be in 3NF The resulting decomposition is lossless and dependency preserving

Done with Normalization First Normal Form (1NF) Boyce-Codd Normal Form (BCNF) Third Normal Form (3NF) Canonical Cover of FDs

Questions ?