COP 6726: New Directions in Database Systems Normalization
Outline Goal: Measure the quality of the logical model Normalization First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce-Codd Normal Form (BCNF)
What is problem? z-no name address c-id c-name c-loc grade z1 James 5th st 1 Database CM128 A 2 Data mining CM25 3 Graph theory CM100 A+ 4 Optimization CM21 C z2 Paul 11th ave B 5 Network EE221 z3 Jin 9th st D
What is problem? z-no name address c-id c-name c-loc grade z1 James 5th st 1 Database CM128 A 2 Data mining CM25 3 Graph theory CM100 A+ 4 Optimization CM21 C z2 Paul 11th ave B 5 Network EE221 z3 Jin 9th st D
Normalization Normalization is the process of decomposing large tables into smaller tables in order to eliminate redundant data and duplicate data and to avoid problems with inserting, deleting, or updating data. Goal: Information preservation Minimum redundancy.
Top Down Approach z-no name address c-id c-name c-loc grade z-no name
Semantics of attributes is clear. Informal Guideline Semantics of attributes is clear. Reducing redundant information in tuples. Reducing NULL values in tuples. Disallowing the possibility of generating spurious tuples
Insertion Anomalies Insert a new employee Employee works for ‘Research’ Department. Employee does not work for any department. Insert a new department tuple Department without an employee
Deletion Anomalies Delete a department. ‘Research’ department
Update Anomalies Update a department. Update manager in ‘Research’ department
Functional Dependency (FD) Given two tuples t1 and t2 ,and two attributes (or columns) X and Y, If t1[X] = t2 [X], then t1 [Y] = t2 [Y] Notation: X Y Ssn {Ename, Bdate, Address, Dnumber} Dnumber {Dname, Dmgr_ssn} {SSN Pnumber} {Hour} Ssn Ename Pnumber {Pname, Plocatoin}
Functional Dependency (FD) If X is a candidate key in a relation R, then X R If X Y in R, then we cannot say Y X Ssn {Ename, Bdate, Address, Dnumber} Dnumber {Dname, Dmgr_ssn} {SSN Pnumber} {Hour} Ssn Ename Pnumber {Pname, Plocatoin}
Functional Dependency (FD) A functional dependency is a property of the semantics or meaning of the attributes. Ssn Ename : Ssn uniquely determines the employee name. Pnumber {Pname, Plocation} : Project’s number uniquely determines the project names and location. {Ssn, Pnumber} Hours : Ssn and Pnumber uniquely determines the number of hours.
Normalization Normalization process takes a relation schema though a series of tests to certify whether it satisfies a certain normal form. Minimize redundancy Minimize the insertion, deletion, and update anomalies. Normal Form (NF) 1 NF, 2NF, 3NF, Boyce-Codd normal form (BCNF)
Normalization Non-addictive Join (or lossless join) property Spurious tuples generation problem does not occurs after decomposition. Dependency property Each functional dependency is preserved after decomposition.
Basic Concept Super key Candidate key Primary key Prime attribute is a member of some candidate key. Non-prime attribute is not a prime attribute.
Basic Concept Dependency Preservation Property Every functional dependency should be preserved after decomposition. Non-addictive Join (or lossless Join) Property
First Normal Form An attribute is single atomic value.
First Normal Form
Second Normal Form FD1 : {Ssn, Pnumber} Hours FD2 : Ssn Ename Every non-prime attribute is full functionally dependent on any key. FD1 : {Ssn, Pnumber} Hours FD2 : Ssn Ename FD3 : Pnumber {Pname, Plocation}
Second Normal Form Every non-prime attribute is full functionally dependent on any key. FD1 : {Ssn, Pnumber} Hours FD2 : Ssn Ename FD3 : Pnumber {Pname, Plocation}
Third Normal Form Given FD X A, (a) X is a superkey or (b) A is a prime attribute Transitive dependency FD1 : Ssn {Ename, Bdate, Address, Dnumber} FD2 : Dnumber {Dname, Dmgr_ssn}
Third Normal Form Given FD X A, (a) X is a superkey or (b) A is a prime attribute
1 NF, 2NF, 3NF
Example FD1 : Property_id# {Country_name, Lot#, Area, Price, Tax_rate} FD2 : {County_name, Lot#} {Property_id#, Area, Price, Tax_rate} FD3 : County_name Tax_rate FD4 : Area Price
Example FD1 : Property_id# {Country_name, Lot#, Area, Price, Tax_rate} FD2 : {County_name, Lot#} {Property_id#, Area, Price, Tax_rate} FD3 : County_name Tax_rate FD4 : Area Price
Example FD1 : Property_id# {Country_name, Lot#, Area, Price, Tax_rate} FD2 : {County_name, Lot#} {Property_id#, Area, Price, Tax_rate} FD3 : County_name Tax_rate FD4 : Area Price
Normalization (Top Down Approach)
BCNF (Boyce Codd Normal Form) Given FD X A, X is always a super key FD1 : Property_id# {Country_name, Lot#, Area} FD2 : {County_name, Lot#} {Property_id#, Area} FD5 : Area Country_name Third Normal Form Given FD X A, (a) X is a superkey or (b) A is a prime attribute.
BCNF (Boyce Codd Normal Form) FD1 : Property_id# {Country_name, Lot#, Area} FD2 : {County_name, Lot#} {Property_id#, Area} FD5 : Area Country_name
Take Home Message Functional Dependency (FD) Non-addictive join property Dependency preservation property Normalization First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce-Codd Normal Form (BCNF)