RELATIONAL TABLE NORMALIZATION
Key Concepts Guidelines for Primary Keys Deletion anomaly Update anomaly Insertion anomaly Functional dependency Transitive dependency Guidelines for Primary Keys Deletion anomaly Update anomaly Insertion anomaly Functional dependency Transitive dependency
Key Concepts (cont’d) Multivalued dependency First normal form (1NF) Second normal form (2NF) Third normal form (3NF) Fourth normal form (4NF) Domain key normal form (DKNF) Multivalued dependency First normal form (1NF) Second normal form (2NF) Third normal form (3NF) Fourth normal form (4NF) Domain key normal form (DKNF)
Guidelines for Primary Keys Guideline 1 –The domain of the primary key should be large enough to accommodate the identification of unique rows for the next 100 years Guideline 2 –Primary keys should be a unique random collection of alphabetic, numeric or alphanumeric characters Guideline 1 –The domain of the primary key should be large enough to accommodate the identification of unique rows for the next 100 years Guideline 2 –Primary keys should be a unique random collection of alphabetic, numeric or alphanumeric characters
Guidelines for Primary Keys (cont’d) Guideline 3 –Avoid using smart keys. Primary keys should not contain “fact giving” data. If these facts are necessary, they should be entity attributes Guideline 4 –Use the suffix ID in constructing primary key names (CUST_ID, Vendor_ID, etc.) Guideline 3 –Avoid using smart keys. Primary keys should not contain “fact giving” data. If these facts are necessary, they should be entity attributes Guideline 4 –Use the suffix ID in constructing primary key names (CUST_ID, Vendor_ID, etc.)
Data Anomalies Definition of “anomaly”: –Deviation or departure from the normal or common order, form, or rule. –An item that is peculiar, irregular, abnormal, or difficult to classify Definition of “anomaly”: –Deviation or departure from the normal or common order, form, or rule. –An item that is peculiar, irregular, abnormal, or difficult to classify
Deletion Anomaly Occurs when the removal of a record results in a lost of important information For example, if all the information about a customer is contained in the ORDER table, deleting an order also deletes customer information See Recycled Tractor problem for example Occurs when the removal of a record results in a lost of important information For example, if all the information about a customer is contained in the ORDER table, deleting an order also deletes customer information See Recycled Tractor problem for example
Update Anomaly Occurs when multiple record changes for a single attribute are necessary when a change to only one record in a database should be necessary. Example: an evaluator at Recycled Tractor changes his/her cell phone number Occurs when multiple record changes for a single attribute are necessary when a change to only one record in a database should be necessary. Example: an evaluator at Recycled Tractor changes his/her cell phone number
Insertion Anomaly Occurs when there does not appear to be any reasonable place to assign attributes and attribute values to records in a database Two types of insertion anomalies: –Type 1: Adding new attributes to a record –Type 2: Updating only part of a record Occurs when there does not appear to be any reasonable place to assign attributes and attribute values to records in a database Two types of insertion anomalies: –Type 1: Adding new attributes to a record –Type 2: Updating only part of a record
Insertion Anomaly (cont’d) Type 1 example: –Adding Recycled Tractor evaluator’s home address and phone number to the database Type 1 example: –Adding Recycled Tractor evaluator’s home address and phone number to the database
Insertion Anomaly (cont’d) Type 2 example: –Essence of the Insertion Anomaly problem: when to enter values into the database Assign the new Recycled Tractor evaluator to a new dummy lead Or, add new evaluator to all records in LEAD database – can result in lots of null values Type 2 example: –Essence of the Insertion Anomaly problem: when to enter values into the database Assign the new Recycled Tractor evaluator to a new dummy lead Or, add new evaluator to all records in LEAD database – can result in lots of null values
Eliminating Data Anomalies Normalization facilitates the removal of data anomalies Basic rule of normalization: –The attribute values in a relational table should be functionally dependent on the primary key value Normalization facilitates the removal of data anomalies Basic rule of normalization: –The attribute values in a relational table should be functionally dependent on the primary key value
Eliminating Data Anomalies (cont’d) Corollaries to the basic rule: –No repeating groups are allowed in relational tables –A relational table cannot have attributes involved in a transitive dependency with the primary key Corollaries to the basic rule: –No repeating groups are allowed in relational tables –A relational table cannot have attributes involved in a transitive dependency with the primary key
Eliminating Data Anomalies (cont’d) The different types of dependencies are critical to understanding and executing the normalization process One of the primary responsibilities of the database designer is to formalize data relationships by identifying the dependencies among the attributes The different types of dependencies are critical to understanding and executing the normalization process One of the primary responsibilities of the database designer is to formalize data relationships by identifying the dependencies among the attributes
Functional Dependency A functionally dependent relationship exists between two attributes when one attribute value implies or determines the value for the other attribute Example: the value LEAD_NAME determines value of LEAD_BANK in the Recycled Tractor problem A functional dependency can be reciprocal –Social Security # and Name of person A functionally dependent relationship exists between two attributes when one attribute value implies or determines the value for the other attribute Example: the value LEAD_NAME determines value of LEAD_BANK in the Recycled Tractor problem A functional dependency can be reciprocal –Social Security # and Name of person
Transitive Dependency (TD) Occurs when a nonkey attribute value is functionally dependent on another nonkey attribute value that is not a candidate key
Transitive Dependency (cont’d) Example: –EMPLOYEE (EMPLOYEE_ID, CATEGORY, HOURLY_RATE) If JOB_CATEGORY = SUPERVISOR –Then HOURLY_RATE is $25.00 per hour if JOB_CATEGORY = WELDER –Then HOURLY_RATE is $18.00 per hour HOURLY_RATE is dependent on JOB_CATEGORY Example: –EMPLOYEE (EMPLOYEE_ID, CATEGORY, HOURLY_RATE) If JOB_CATEGORY = SUPERVISOR –Then HOURLY_RATE is $25.00 per hour if JOB_CATEGORY = WELDER –Then HOURLY_RATE is $18.00 per hour HOURLY_RATE is dependent on JOB_CATEGORY
Multi-Valued Dependency (MVD) Results from having multiple values for a particular attribute Three types of MVD’s –Simple –Independent –Transitive Results from having multiple values for a particular attribute Three types of MVD’s –Simple –Independent –Transitive
Simple MVD Similar to 1:N cardinality (one to many) Most common type of MVD Examples –A student can register for many courses –LEAD_ID functionally determines many values for TRACTOR_ID Similar to 1:N cardinality (one to many) Most common type of MVD Examples –A student can register for many courses –LEAD_ID functionally determines many values for TRACTOR_ID
Independent and Transitive MVD’s Both types involve three or more attributes Usually eliminated by first three normal forms Both types involve three or more attributes Usually eliminated by first three normal forms
First Normal Form (1NF) A relational table is in first normal form if no attributes form repeating groups Repeating group attributes are removed by creating another table In the Recycled Tractor problem, tractor attributes are removed from LEAD and placed in the TRACTOR table, and EVALUATOR attributes are placed in the EAVLUATOR table A relational table is in first normal form if no attributes form repeating groups Repeating group attributes are removed by creating another table In the Recycled Tractor problem, tractor attributes are removed from LEAD and placed in the TRACTOR table, and EVALUATOR attributes are placed in the EAVLUATOR table
Second Normal Form (2NF) A relational table is in second normal form when all nonkey attributes are functionally dependent on the primary key Only tables with concatenated (composite) keys will a problem in meeting the 2NF requirement Does our new EVALUATOR table meet 2NF requirements? A relational table is in second normal form when all nonkey attributes are functionally dependent on the primary key Only tables with concatenated (composite) keys will a problem in meeting the 2NF requirement Does our new EVALUATOR table meet 2NF requirements?
Third Normal Form (3NF) A relational table is in third normal form when –it is in second normal form –no attribute has a transitive dependency involving nonkey attributes In the Recycled Tractor problem, TRACKER_PHONE# is functionally dependent on TRACKER_NAME, which is functionally dependent on LEAD_ID Boyce-Codd normal form adds requirement that all attribute determinants are also candidate keys A relational table is in third normal form when –it is in second normal form –no attribute has a transitive dependency involving nonkey attributes In the Recycled Tractor problem, TRACKER_PHONE# is functionally dependent on TRACKER_NAME, which is functionally dependent on LEAD_ID Boyce-Codd normal form adds requirement that all attribute determinants are also candidate keys
Fourth Normal Form (4NF) A relational table is in fourth normal form when all multivalued dependencies have been removed In most situations, normalizing tables to third normal form removes multivalued dependencies A relational table is in fourth normal form when all multivalued dependencies have been removed In most situations, normalizing tables to third normal form removes multivalued dependencies
Domain-Key Normal Form ( DKNF) DKNF is a philosophy that focuses on developing themes for tables –A student table contains attributes describing students A relational table is in DKNF if every constraint on the table or file is the result of defining primary keys for a relational table and defining domains for the attributes Examples of data constraints: –Edit rules for attributes –Relationships of attributes –Functional and multi-valued dependencies DKNF is a philosophy that focuses on developing themes for tables –A student table contains attributes describing students A relational table is in DKNF if every constraint on the table or file is the result of defining primary keys for a relational table and defining domains for the attributes Examples of data constraints: –Edit rules for attributes –Relationships of attributes –Functional and multi-valued dependencies
Comments on Normalization The benefits of additional levels of normalization decrease rapidly after tables have been put in 3NF The instances where higher-level normalization strategies are necessary are considered rare and theoretical The benefits of additional levels of normalization decrease rapidly after tables have been put in 3NF The instances where higher-level normalization strategies are necessary are considered rare and theoretical