Further Normalization I Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF
Topics in this Chapter Nonloss Decomposition and Functional Dependencies First, Second, and Third Normal Forms Dependency Preservation Boyce/Codd Normal Form A Note on Relation-Valued Attributes
Normalization and Database Design The “normal forms represent stages in achieving a more desirable design. (“More desirable” means being more robust, having greater integrity.) First normal form ( 1NF ) is what we achieved by specifying that relations contain single valued attributes only (each tuple has exactly one value for each attribute). So, relations are always in (at least) 1NF.
Normalization and Database Design Additional constraints that produce “further normalization” lead to one of the other designations ( 2NF, 3NF, etc.) Each “higher” normal form (2nd, 3rd, etc.) includes the previous ones—i.e., to be in “third normal form” means that the data is also in 2nd and in 1st.
Normalization Normalized and 1 NF are the same thing; Frequently “normalized” is used to refer (incorrectly) to 3NF Normalization helps control redundancy Normalization is reversible; i.e. nonloss, or information preserving Six normal forms are discussed: 1 through 5, and Boyce-Codd Normal Form (BCNF), which is an improvement on 3NF
First Normal Form A relvar is in 1NF if and only if in every legal value of that relvar, every tuple contains exactly one value for each attribute In this way, relvars are always in 1NF A relvar in 1NF may display functional dependencies other than those emanating from the primary key Such non-primary-key dependencies promote a miasma of update anomolies
S SP P Just “looks” right --because it is. Satisfies all normal forms. +------+-------+--------+--------+ | snum | sname | status | city | | S1 | Smith | 20 | London | | S2 | Jones | 10 | Paris | | S3 | Blake | 30 | Paris | | S4 | Clark | 20 | London | | S5 | Adams | 30 | Athens | SP +------+------+------+ | snum | pnum | qty | | S1 | P1 | 300 | | S1 | P2 | 200 | | S1 | P3 | 400 | | S1 | P4 | 200 | | S1 | P5 | 100 | | S1 | P6 | 100 | | S2 | P1 | 300 | | S2 | P2 | 400 | | S3 | P2 | 200 | | S4 | P2 | 200 | | S4 | P4 | 300 | | S4 | P5 | 400 | P +------+-------+-------+--------+--------+ | pnum | pname | color | weight | city | | P1 | Nut | Red | 12.0 | London | | P2 | Bolt | Green | 17.0 | Paris | | P3 | Screw | Blue | 17.0 | Rome | | P4 | Screw | Red | 14.0 | London | | P5 | Cam | Blue | 12.0 | Paris | | P6 | Cog | Red | 19.0 | London | Just “looks” right --because it is. Satisfies all normal forms. The suppliers and parts database
The table “SCP” recording supplier city in SCP rather than in S +------+--------+------+------+ | snum | scity | pnum | qty | | S1 | London | P1 | 300 | | S1 | London | P2 | 200 | | S2 | Paris | P1 | 300 | | S2 | Paris | P2 | 400 | | S3 | Paris | P2 | 200 | | S4 | London | P2 | 200 | | S4 | London | P4 | 300 | | S4 | London | P5 | 400 | recording supplier city in SCP rather than in S redundancy! update problems: how to change S4’s city (in three places) how to record the city of a new supplier for whom there are no shipments? primary key
Second Normal Form A relation violates 2NF if a non-key field is a fact about a subset of a key. A relation satisfies 2NF (is in 2NF) if it is in 1NF and every non-key attribute is irreducibly dependent on the primary key. (i.e., dependent on the entire primary key)
Second Normal Form A relvar is in 2NF if and only if it is in 1NF and every nonkey attribute is irreducibly dependent on the primary key (Assumes only one candidate key) A relvar in 2NF is less susceptible to update anomalies, but may still exhibit transitive dependencies Both attributes in a transitive dependency are irreducibly implied by the primary key, and each implies the other
The table “Employees” In 1NF, but not good REDUNDANCY! And +--------+-----------+--------+------------+ | Emp_Id | Emp_Name | Dept# | DeptName | +--------+-----------+--------+------------| | A001 | Johnson | 10 | Accounting | | A023 | Chung | 10 | Accounting | | C085 | Allen | 10 | Accounting | | B120 | Gomez | 20 | Sales | | B211 | Davis | 20 | Sales | | A227 | Greenberg | 40 | Production | | C340 | Brown | 40 | Production | | C389 | Lopez | 40 | Production | | C395 | Clark | 40 | Production | | A502 | Edwards | 20 | Sales | | A616 | Scott | 40 | Production | | A700 | Sanyo | 60 | Delivery | | A722 | Adams | 20 | Sales | REDUNDANCY! And update problems: change name of a department? (multiple updates required) eliminate employee Sanyo? (what is the name of Dept 60?)
Update Anomalies “Update anomalies” include three operations: An INSERT anomaly occurs when the user wishes to record a subordinate fact that is not dependent on the primary key (e.g., recording a supplier location before the supplier supplies a part) A DELETE anomaly, conversely, may delete the location inadvertently An UPDATE anomaly occurs when many updates are required to record a simple fact
The table “Employees” a transitive dependency Emp_Id Dept# Dept# DeptName Emp_Id transitively determines DeptName Emp_Id DeptName +--------+-----------+--------+------------+ | Emp_Id | Emp_Name | Dept# | DeptName | +--------+-----------+--------+------------| | A001 | Johnson | 10 | Accounting | | A023 | Chung | 10 | Accounting | | C085 | Allen | 10 | Accounting | | B120 | Gomez | 20 | Sales | | B211 | Davis | 20 | Sales | | A227 | Greenberg | 40 | Production | | C340 | Brown | 40 | Production | | C389 | Lopez | 40 | Production | | C395 | Clark | 40 | Production | | A502 | Edwards | 20 | Sales | | A616 | Scott | 40 | Production | | A700 | Sanyo | 60 | Delivery | | A722 | Adams | 20 | Sales |
Mutually independent keys Non-key attributes are “mutually independent” if no such key is functionally dependent on any combination of the others (assuming only one candidate key). Mutually independent => no transitive dependencies, such as Emp# → Dept# Dept# → DeptName
Third Normal Form A relation violates 3NF if some non-key attribute is a fact about another non-key attribute. A relation is in 3NF if it is in 2NF and the non-key attributes are mutually independent. A relation satisfies 3NF if it is in 2NF (and therefore also in 1NF) and every attribute is either part of the key or provides a fact about the key (all of it) and nothing else.
Third Normal Form A relvar is in 3NF if and only if it is in 2NF and every nonkey attribute is nontransitively dependent on the primary key (Assumes only one candidate key) The process of normalization is a series of projections that eliminate complex functional dependencies Such projections must be able to be recombined via JOIN to form the original relvar
Third Normal Form A table is in 3NF if every column is either the key, or part of the key, or a fact about the key, the whole key, and nothing but the key.
Third Normal Form A relvar is in 3NF if and only if the nonkey attributes are both mutually independent and irreducibly dependent on the primary key A relvar is in 3NF if and only if, for all time, each tuple consists of a primary key value that identifies some entity, together with a set of zero or more mutually independent attribute values that describe that entity in some way
Nonloss Decomposition and Functional Dependencies Normalization uses a process of projection to decompose relvars Recomposition is a process of joins The decomposition of relvar R into projections R1…Rn is nonloss if R = the join of R1…Rn The normalization procedure can be seen as a method for eliminating functional dependencies that do not emanate from a candidate key
decompose by projection +------+--------+------+------+ | snum | scity | pnum | qty | | S1 | London | P1 | 300 | | S1 | London | P2 | 200 | | S2 | Paris | P1 | 300 | | S2 | Paris | P2 | 400 | | S3 | Paris | P2 | 200 | | S4 | London | P2 | 200 | | S4 | London | P4 | 300 | | S4 | London | P5 | 400 | The table “SCP” decompose by projection “S” “SP” +------+--------+ | snum | scity | +------+------+------+ | snum | pnum | qty | the decomposition is lossless since a join of the two tables reproduces the original
decompose by projection +--------+-----------+--------+------------+ | Emp_Id | Emp_Name | Dept# | DeptName | +--------+-----------+--------+------------| | A001 | Johnson | 10 | Accounting | | A023 | Chung | 10 | Accounting | | C085 | Allen | 10 | Accounting | | B120 | Gomez | 20 | Sales | | B211 | Davis | 20 | Sales | | A227 | Greenberg | 40 | Production | | C340 | Brown | 40 | Production | | C389 | Lopez | 40 | Production | | C395 | Clark | 40 | Production | | A502 | Edwards | 20 | Sales | | A616 | Scott | 40 | Production | | A700 | Sanyo | 60 | Delivery | “Employees” decompose by projection “Employees” “Departments” +--------+-----------+--------+ | Emp_Id | Emp_Name | Dept# | +--------+------------+ | Dept# | DeptName | +--------+------------| the decomposition is lossless since a join of the two tables reproduces the original
Dependency Preservation Dependency preservation refers to a specific case of nonloss decomposition, such that the normalized relvars are independent of each other Some nonloss decompositions do not exhibit dependency preservation Example: decompose supplier, city, status where supplier implies city and status, and city and status imply each other
Dependency Preservation Dependency is preserved in this projection: SC {S#, CITY} CS {CITY, STATUS} Dependency is not preserved in this one: CS {S#, STATUS} Although the second is nonloss, you still cannot update them independently
The table “SSP” violates BCNF snum sname +------+--------+------+------+ | snum | sname | pnum | qty | | S1 | Smith | P1 | 300 | | S1 | Smith | P2 | 200 | | S2 | Jones | P1 | 300 | | S2 | Jones | P2 | 400 | | S3 | Blake | P2 | 200 | | S4 | Clark | P2 | 200 | | S4 | Clark | P4 | 300 | | S4 | Clark | P5 | 400 | The table “SSP” again, assume unique supplier names obviously bad (redundancy, etc.) but satisfies 3NF: every attribute is key, part of the key, or about key, whole key, nothing but key +------+------+ | snum | pnum | candidate key: qty +--------+------+ | sname | pnum | candidate key: qty but: snum sname violates BCNF
Boyce/Codd Normal Form BCNF refers to decompositions involving relvars with more than one candidate key, where the candidate keys are composite and overlapping A relvar is in BCNF if and only if every nontrivial, left- irreducible FD has a candidate key as its determinant That is, a relvar is in BCNF if and only if every determinant is a candidate key
decompose by projection +------+--------+------+------+ | snum | sname | pnum | qty | | S1 | Smith | P1 | 300 | | S1 | Smith | P2 | 200 | | S2 | Jones | P1 | 300 | | S2 | Jones | P2 | 400 | | S3 | Blake | P2 | 200 | | S4 | Clark | P2 | 200 | | S4 | Clark | P4 | 300 | | S4 | Clark | P5 | 400 | The table “SSP” again, assume unique supplier names decompose by projection “S” “SP” +------+--------+ | snum | sname | +------+------+------+ | snum | pnum | qty | the decomposition is lossless since a join of the two tables reproduces the original
“Employees” In BCNF, but lots of redundancy (violates 4NF) +--------+-----------+--------+------------+ | Emp_Id | Emp_Name | Skill | Language | +--------+-----------+--------+------------| | A001 | Johnson | Cook | English | | A001 | Johnson | Cook | French | | A001 | Johnson | Cook | Spanish | | A001 | Johnson | Type | English | | A001 | Johnson | Type | French | | A001 | Johnson | Type | Spanish | | B211 | Davis | Weld | English | | B211 | Davis | Weld | German | | B211 | Davis | Type | English | | B211 | Davis | Type | German | etc. In BCNF, but lots of redundancy (violates 4NF) (multi-valued dependencies) again, the solution is projection--a skills table and a language table
The Normalization Process The “normal forms” are simply formalisms for describing problems that usually are apparent and that can cause obvious problems. They are usually apparent in the form of redundancies, and common sense says to remove them. Removal is a process of projecting the offending (proposed) table into two or more tables (in a lossless way).
Relation-Valued Attributes A relation may include attributes whose values are relations Traditionally this would be seen to violate 1NF, which was held to prohibit repeating groups Now they are theoretically sound, but in practice you should avoid them because they have complicated predicates