IMS 4212: Normalization 1 Dr. Lawrence West, Management Dept., University of Central Florida Normalization—Topics Functional Dependency Candidate Keys Normalization –1 st Normal Form –2 nd Normal Form –3 rd Normal Form –Boyce-Codd, 4 th & 5 th Normal Forms
IMS 4212: Normalization 2 Dr. Lawrence West, Management Dept., University of Central Florida Functional Dependency Functional dependency refers to the relationship between nonkey attributes and the primary key (or other attributes) of an entity Attribute A is functionally dependent on Attribute B if, having specified a value for Attribute B, there is only one true value for Attribute A E.g., if we identify a student by specifying their SSN there is only one current value of last name –Note that time dependency may come into play but we will handle that as with any time-dependent data
IMS 4212: Normalization 3 Dr. Lawrence West, Management Dept., University of Central Florida Functional Dependency (cont.) An attribute may be functionally dependent on a composite set of attributes E.g., the name of a course is functionally dependent on the course identified by Department Code and Course Number
IMS 4212: Normalization 4 Dr. Lawrence West, Management Dept., University of Central Florida Functional Dependency (cont.) Note that we don't define functional dependency to mean that there is only one possible value of the functionally dependent attribute –A student can change their name or address –A product can get a new price Identifying functional dependencies is a critical step in the normalization that follows Our goal is a structure in which each attribute in an entity is functionally dependent on only the PK and on the whole PK
IMS 4212: Normalization 5 Dr. Lawrence West, Management Dept., University of Central Florida Candidate Keys We use the term "Candidate Key" to indicate that an entity may have multiple attributes or sets of composite attributes that could be primary keys (uniquely identify each record) What are the candidate keys in the entity above? One candidate key is always selected as the PK Other(s) become alternate keys (AK) –Uniqueness must be enforced –May not be null (usually) –More on alternate keys later
IMS 4212: Normalization 6 Dr. Lawrence West, Management Dept., University of Central Florida Normalization Normalization is the process of distributing attributes among entities to achieve an efficient DB structure –Ensures entity satisfies six properties of relations (earlier lecture) –Ensure functional dependency on PK A fully normalized database has tremendous storage and retrieval advantages over a poorly designed DB We will look (later) at selectively denormalizing entities to achieve performance efficiencies
IMS 4212: Normalization 7 Dr. Lawrence West, Management Dept., University of Central Florida Normalization (cont.) Normalization consists of six steps –1 st Normal Form (1NF) –2 nd Normal Form(2NF) –3 rd Normal Form(3NF) –Boyce-Codd Normal Form(BCNF) –4 th Normal Form(4NF) –5 th Normal Form(5NF)
IMS 4212: Normalization 8 Dr. Lawrence West, Management Dept., University of Central Florida Normalization (cont.) Normalization is applied to entities An entity is said to be in 3NF, for example, if it has been checked for compliance with the rules of 3NF and any necessary corrections applied Each normalization step is applied in order –You do not attempt to put an entity in 2NF until it has been checked and corrected to be in 1NF –Etc. Corrections to normalization problems almost always result in the creation of new entities and relationships
IMS 4212: Normalization 9 Dr. Lawrence West, Management Dept., University of Central Florida First Normal Form "Place entities in 1NF by removing any repeating attributes" –Repeating attribute are multivalued attributes in another guise and are problems for the same reason –The definition doesn't tell you what to do with them Handle as with multivalued attributes –Create new entity –PK is PK of original entity plus the repeated attribute
IMS 4212: Normalization 10 Dr. Lawrence West, Management Dept., University of Central Florida Second Normal Form "Place entities in 2NF by placing them in 1NF and then ensuring that each nonkey attribute is fully functionally dependent on the whole primary key" Remove the offending attributes to a new entity –PK is the portion of the PK on which the problem attribute is dependent 2NF problems cannot exist in an entity with a single-attribute PK
IMS 4212: Normalization 11 Dr. Lawrence West, Management Dept., University of Central Florida Third Normal Form "Place entities in 3NF by placing them in 2NF and removing any transitive dependencies" A 'transitive dependency' is an attribute that is functionally dependent on another nonkey attribute (or set of attributes) which, in turn, is functionally dependent on the PK The problem attribute(s) are dependent on the PK but only because the intermediary attribute is dependent
IMS 4212: Normalization 12 Dr. Lawrence West, Management Dept., University of Central Florida Third Normal Form (cont.) If the value of the PK changes (i.e., we move to a new record) the value of the dependent attributes (as well as all other attributes) could change The problem arises because changing the value of the nonkey intermediary attribute also forces a change to the dependent attributes May also be a problem with redundant storage
IMS 4212: Normalization 13 Dr. Lawrence West, Management Dept., University of Central Florida Third Normal Form (cont.) Place entities in 3NF by removing the offending attributes to a new entity –The intermediate attribute(s) will become the PK of the new entity –New PK will remain in original entity as a foreign key –Attributes with transitive dependency will be removed to the new entity
IMS 4212: Normalization 14 Dr. Lawrence West, Management Dept., University of Central Florida Other Normal Forms Boyce-Codd, 4NF, and 5NF deal with unusual circumstances Boyce-Codd NF requires that each attribute be functionally dependent on any candidate key, not just the selected PK 4NF requires that independently multivalued composite PK attributes be removed to two new parent entities 5NF requires the removal of pairwise cyclic dependencies within composite primary keys with three or more component attributes
IMS 4212: Normalization 15 Dr. Lawrence West, Management Dept., University of Central Florida 4 th Normal Form Violation RestaurantPizza VarietyDelivery Area Vincenzo's PizzaThick CrustSpringfield Vincenzo's PizzaThick CrustShelbyville Vincenzo's PizzaThin CrustSpringfield Vincenzo's PizzaThin CrustShelbyville Elite PizzaThin CrustCapital City Elite PizzaStuffed CrustCapital City A1 PizzaThick CrustSpringfield A1 PizzaThick CrustShelbyville A1 PizzaThick CrustCapital City A1 PizzaStuffed CrustSpringfield A1 PizzaStuffed CrustShelbyville A1 PizzaStuffed CrustCapital City
IMS 4212: Normalization 16 Dr. Lawrence West, Management Dept., University of Central Florida Fix by splitting two independent PK components into two new entities Note that if there are attributes dependent on a subset of the original key this violates 2NF 4 th Normal Form (cont.) Varieties By Restaurant RestaurantPizza Variety Vincenzo's PizzaThick Crust Vincenzo's PizzaThin Crust Elite PizzaThin Crust Elite PizzaStuffed Crust A1 PizzaThick Crust A1 PizzaStuffed Crust Delivery Areas By Restaurant RestaurantDelivery Area Vincenzo's PizzaSpringfield Vincenzo's PizzaShelbyville Elite PizzaCapital City A1 PizzaSpringfield A1 PizzaShelbyville A1 PizzaCapital City
IMS 4212: Normalization 17 Dr. Lawrence West, Management Dept., University of Central Florida Fifth Normal Form If entities satisfy 4NF it will be rare that they violate 5NF and will do so only when certain constraints on allowable values apply. See Wikipedia for an example