Copyright, Harris Corporation & Ophir Frieder, Normal Forms “Why be normal?” - Author unknown Normal
Copyright, Harris Corporation & Ophir Frieder, Objectives To define first, second, third, and Boyce-Codd normal forms. To discuss the motivation for normal forms, and the implications for database design.
Copyright, Harris Corporation & Ophir Frieder, Normal Forms A relational scheme is said to be in first normal form (1NF) if and only if each of it’s domains contains only scalar values. –No “repeating groups” or unbounded lists. –A field cannot itself be a table, as in Oracle8! Question: Do CITY and STATE violate 1NF in relation scheme R=(CITY,STATE,POPULATION)?
Copyright, Harris Corporation & Ophir Frieder, Motivation For 1NF Representation - with a repeating group, some method must be devised for specifying the end or length of the list. Space Allocation - how is space allocated on a per record basis if repeating groups are allowed?
Copyright, Harris Corporation & Ophir Frieder, Motivation For 1NF Operations - without 1NF, all operations become more complex, and this propagates throughout the database management system. Theory - 1NF simplifies the theoretical basis of the relational model (e.g., proof of algorithmic correctness).
Copyright, Harris Corporation & Ophir Frieder, Converting A Relational Scheme To 1NF A repeating group is typically eliminated by “flattening” the table. This makes most things simpler.
Copyright, Harris Corporation & Ophir Frieder, Single v.s. Multiple Keys For the sake of simplicity we will assume initially that each relational scheme has exactly one key. Multiple keys do occur, but less frequently. In what follows, the formal definitions do not change in the case of multiple keys. Given a relational scheme R, it will be helpful throughout the following to divide the attributes up into two sets, those that are part of the key, and those that are not.
Copyright, Harris Corporation & Ophir Frieder, Example #1: Key And Non-Key Attributes Consider the following relational scheme for a department store chain (e.g., Walmart): –Attributes: STORE_ID#- A store identification number. CITY- The city in which the store is located. STATE- The state in which the store is located. ITEM- An item sold by the store. PRICE- The price of the item. –Functional Dependencies: STORE_ID# => CITY STORE_ID# => STATE STORE_ID#, ITEM => PRICE
Copyright, Harris Corporation & Ophir Frieder, Example #1, Cont.
Copyright, Harris Corporation & Ophir Frieder, Example #1, Cont. The only key is: STORE_ID#,ITEM Key AttributesNon-Key Attributes STORE_ID#CITY ITEMSTATE PRICE
Copyright, Harris Corporation & Ophir Frieder, Partial Dependency A functional dependency X=>A is a partial dependency if: –X is a proper subset of the key attributes, and –A is a non-key attribute.
Copyright, Harris Corporation & Ophir Frieder, Example #1, Cont. Key AttributesNon-Key Attributes STORE_ID#CITY ITEMSTATE PRICE STORE_ID# => CITY is a partial dependency. Similarly, STORE_ID# => STATE is a partial dependency.
Copyright, Harris Corporation & Ophir Frieder, Normal Forms - 2NF A relational scheme is said to be in second normal form (2NF) if and only if it is in 1NF and contains no partial dependencies. Question: Why eliminate partial dependencies?
Copyright, Harris Corporation & Ophir Frieder, Example #1, Cont. Note the redundancy in the following legal relation:
Copyright, Harris Corporation & Ophir Frieder, Motivation for 2NF Example #2 Consider the following relational scheme for maintaining information associated with students at a university: –Attributes: STUDENT_ID#- The social security number of a student. NAME- The students last name. COURSE_ID#- The ID # of a course the student is registered in. DEPT_ID#- The ID # of the department that offers the course. –Functional Dependencies: STUDENT_ID# => NAME COURSE_ID# => DEPT_ID#
Copyright, Harris Corporation & Ophir Frieder, Example #2, Cont.
Copyright, Harris Corporation & Ophir Frieder, Example #2, Cont. The only key is: STUDENT_ID#, COURSE_ID# Key AttributesNon-Key Attributes STUDENT_ID#NAME COURSE_ID#DEPT_ID# STUDENT_ID# => NAME is a partial dependency, so it is not in 2NF. Similarly, COURSE_ID# => DEPT_ID# is a partial dependency.
Copyright, Harris Corporation & Ophir Frieder, Example #2, Cont. Note the redundancy in the following legal relation:
Copyright, Harris Corporation & Ophir Frieder, Anomalies Resulting From Partial Dependencies Insertion Anomalies - A new student cannot be added unless they are currently registered for at least one course. Deletion Anomalies - If a student drops their last, or only course, then there is no record left of the student. Update Anomalies - Changing a students’ name requires all their records to be updated. Similarly, for changing a course ID #, or for assigning a course to a different department. Note that the first two assume null values are not desirable, which is of considerable debate in the database community.
Copyright, Harris Corporation & Ophir Frieder, Transitive Dependency A functional dependency X=>A is a transitive dependency if: –X is a proper subset of the non-key attributes, and –A is a non-key attribute.
Copyright, Harris Corporation & Ophir Frieder, Motivation For 3NF Example #1 Consider the following relational scheme for National Football League (NFL) athletes: –Attributes: PLAYER_ID#- The social security number for an NFL athlete. TEAM- The name of the team the athlete plays for. STATE- The state in which the team is located. –Functional Dependencies: PLAYER_ID# => TEAM TEAM => STATE
Copyright, Harris Corporation & Ophir Frieder, Example #1, Cont.
Copyright, Harris Corporation & Ophir Frieder, Example #1, Cont. Key AttributesNon-Key Attributes PLAYER_ID#TEAMSTATE TEAM => STATE is a transitive dependency.
Copyright, Harris Corporation & Ophir Frieder, Normal Forms - 3NF A relational scheme is said to be in third normal form (3NF) if and only if it is in 2NF and contains no transitive dependencies. Every non-key attribute depends on the key, the whole key and nothing but the key. Question: Why eliminate transitive dependencies?
Copyright, Harris Corporation & Ophir Frieder, Motivation For 3NF, Cont. Key AttributesNon-Key Attributes PLAYER_ID#TEAM STATE There are no partial dependencies, so it is in 2NF. However, it is not in 3NF because of the transitive dependency TEAM=>STATE.
Copyright, Harris Corporation & Ophir Frieder, Example #1, Cont. Note the redundancy in the following legal relation:
Copyright, Harris Corporation & Ophir Frieder, Motivation for 3NF Example #2 Consider the following relational scheme for a university student database: –Attributes: STUDENT_ID#- A students’ social security number. CITY- The city of the students’ home address. STATE- The state of the students’ home address. ZIP- The zip code of the students’ home address. –Functional Dependencies: STUDENT_ID# => CITY STUDENT_ID# => STATE STUDENT_ID# => ZIP ZIP => STATE ZIP => CITY
Copyright, Harris Corporation & Ophir Frieder, Example #2, Cont.
Copyright, Harris Corporation & Ophir Frieder, Example #2, Cont. Key AttributesNon-Key Attributes STUDENT_IDCITY STATE ZIP There are no partial dependencies, so it is in 2NF. ZIP => STATE and ZIP=>CITY are transitive dependencies, so it is not in 3NF.
Copyright, Harris Corporation & Ophir Frieder, Example #2, Cont. Note the redundancy in the following legal relation:
Copyright, Harris Corporation & Ophir Frieder, Motivation for 3NF Example #3 Consider a relational scheme for tracking software licenses: –Attributes: LICENSE_ID#- The license ID number for a piece of software. MACHINE_ID#- The ID number of the machine on which the software is installed. EMPLOYEE_ID#- The social security number of the employee to which the machine is assigned. LOCATION- The location of the employee’s office. –Functional Dependencies: LICENSE_ID# => MACHINE_ID# MACHINE_ID# => EMPLOYEE_ID# EMPLOYEE_ID# => LOCATION
Copyright, Harris Corporation & Ophir Frieder, Example #3, Cont.
Copyright, Harris Corporation & Ophir Frieder, Example #3, Cont. Key AttributesNon-Key Attributes LICENSE_ID#MACHINE_ID# EMPLOYEE_ID# LOCATION There are no partial dependencies, so the relational scheme is in 2NF. MACHINE_ID# => EMPLOYEE_ID# and EMPLOYEE_ID# => LOCATION are both transitive dependencies, so it is not in 3NF.
Copyright, Harris Corporation & Ophir Frieder, Example #3, Cont. Note the redundancy in the following legal relation:
Copyright, Harris Corporation & Ophir Frieder, Anomalies Resulting From Transitive Dependencies Insertion Anomalies - A license cannot be added until it is installed on a machine, and until that machine is assigned to an employee. Deletion Anomalies - Deleting all of the records for a particular employee would delete any record of the machines or licenses assigned to that employee. Update Anomalies - Changing the employee assigned to a particular machine requires multiple record updates. Similarly for changing an employees’ location. As with partial dependencies, the first two assume null values are not desirable.
Copyright, Harris Corporation & Ophir Frieder, Normal Forms - BCNF A relation is said to be in Boyce/Codd normal form (BCNF) if every attribute depends on the key, the whole key, and nothing but the key.
Copyright, Harris Corporation & Ophir Frieder, BCNF Example #1 Consider a relational scheme for tracking employee salary adjustments: –Attributes: EMPLOYEE_ID#- An employee identification number DATE- A date on which the employee’s salary was adjusted AMOUNT- The amount of the salary adjustment EXPLANATION- An explanation for the adjustment –Functional Dependencies: EMPLOYEE_ID#,DATE => AMOUNT EMPLOYEE_ID#,DATE => EXPLANATION
Copyright, Harris Corporation & Ophir Frieder, Example #3, Cont. Key AttributesNon-Key Attributes EMPLOYEE_ID#AMOUNT DATEEXPLANATION There are no partial dependencies, so the relational scheme is in 2NF. There are no transitive dependencies, so the relational scheme is in 3NF. Each of the non-key attributes depends on both of the key attributes (the key, the whole key, and nothing but the key), so the relational scheme is in BCNF.
Copyright, Harris Corporation & Ophir Frieder, Normal Forms - BCNF, Cont. Note that the definition of BCNF does not reference that for 3NF. This raises a couple of questions: –If a relational scheme is in BCNF, is it also in 3NF? –If a relational scheme is in 3NF, is it also in BCNF? The answer to the first is yes (proof is left as an exercise). The answer to the second question depends...
Copyright, Harris Corporation & Ophir Frieder, Normal Forms - BCNF, Cont. Relational Scheme Has Only One Key: –A relational scheme is in 3NF if and only if it is in BCNF. Relational Scheme Has Multiple Keys: –If the relational scheme is in BCNF, then it is in 3NF (already stated). –If the relational scheme is in 3NF, however, it is not necessarily in BCNF. If the case of multiple keys, 1NF, 2NF, and 3NF definitions are still the same.
Copyright, Harris Corporation & Ophir Frieder, Motivation For BCNF Example #1 Consider the following relational scheme: –Attributes: STUDENT_ID#- A student ID number. COURSE_ID#- The ID# of a course being taken by the student. FACULTY_ID#- The ID# of the faculty member who teaches the course taken by the student. –Functional Dependencies: STUDENT_ID#,COURSE_ID# => FACULTY_ID# FACULTY_ID# => COURSE_ID#
Copyright, Harris Corporation & Ophir Frieder, Example #1, Cont.
Copyright, Harris Corporation & Ophir Frieder, Example #1, Cont. There are two keys: STUDENT_ID#,COURSE_ID# STUDENT_ID#,FACULTY_ID# The relation is in 1NF, 2NF, and 3NF (why?) The relation is not in BCNF because of the dependency: FACULTY_ID# => COURSE_ID#
Copyright, Harris Corporation & Ophir Frieder, Example #1, Cont. Note the redundancy in the following legal relation:
Copyright, Harris Corporation & Ophir Frieder, Motivation For BCNF Example #2 Consider the following relational scheme: –Attributes: LICENSE_ID#- A Florida state driver’s license number. SS#- The social security number of the person holding the license. CODE- A traffic violation code. QTY- The number of times the person has been issued the violation within the past year. –Functional Dependencies: LICENSE_ID# => SS#LICENSE_ID#,CODE => QTY SS# => LICENSE_ID#SS#,CODE => QTY
Copyright, Harris Corporation & Ophir Frieder, Example #2, Cont.
Copyright, Harris Corporation & Ophir Frieder, Example #2, Cont. There are two keys: LICENSE_ID#,CODE SS#,CODE The relation is in 1NF, 2NF, and 3NF (why?) The relation is not in BCNF because of the dependencies: LICENSE_ID# => SS# SS# => LICENSE_ID#
Copyright, Harris Corporation & Ophir Frieder, Example #2, Cont. Note the redundancy in the following legal relation:
Copyright, Harris Corporation & Ophir Frieder, Update Anomolies Insertion Anomalies - The fact that a license ID# has been assigned to a particular person cannot be recorded unless they have at least one violation. Deletion Anomalies - Deleting all of the violations for a particular driver would delete any record of the license ID# for that person. Update Anomalies - Changing a driver’s name requires changing all the records for each type of violation the driver has committed. As with partial dependencies, the first two assume null values are not desirable.
Copyright, Harris Corporation & Ophir Frieder, Normal Forms Summary A relational scheme is said to be in first normal form (1NF) if and only if each of it’s domains contains only scalar values. A relational scheme is said to be in second normal form (2NF) if and only if it is in 1NF and contains no partial dependencies. A relational scheme is said to be in third normal form (3NF) if and only if it is in 2NF and contains no transitive dependencies. A relational scheme is said to be in Boyce/Codd normal form (BCNF) if and only if the only nontrivial dependencies for the relational scheme are those in which a key functionally determines one or more attributes (“every attribute depends on the key, the whole key, and nothing but the key”).
Copyright, Harris Corporation & Ophir Frieder, Normal Forms - 4NF & 5NF Currently Beyond The Scope Of This Course: A relational scheme R is said to be in fourth normal form (4NF) if and only if whenever there is a multivalued dependency X=>>Y, where Y is not empty or a subset of X, and XY does not include all the attributes of R, then X is a superkey of R. A relational scheme R is said to be in fifth normal form (5NF) - also called projection-join normal form (PJ/NF) - if and only if every join dependency in R is implied by the candidate keys of R.