4TH NORMAL FORM By: Karen McVay
REVIEW OF NFs 1NF All values of the columns are atomic. That is, they contain no repeating values. 2NF it is in 1NF and every non-key column is fully dependent upon the primary key.
REVIEW OF NF Cont… 3NF it is already in 2NF and every non-key column is non transitively dependent upon its primary key. In other words, all non-key attributes are functionally dependent only upon the primary key. BCNF A relation is in BCNF if every determinant is a candidate key. This is an improved form of third normal form. Determinant: an attribute on which some other attribute is fully functionally dependent
4th Normal Form A Boyce Codd normal form relation is in fourth normal form if there is no multi value dependency in the relation or there are multi value dependency but the attributes, which are multi value dependent on a specific attribute, are dependent between themselves.
4th Normal Form Cont… This is best discussed through mathematical notation. Assume the following relation R(a:pk1, b:pk2, c:pk3) Recall that a relation is in BCNF if all its determinant are candidate keys, in other words each determinant can be used as a primary key. Because relation R has only one determinant (a, b, c), which is the composite primary key and since the primary is a candidate key therefore R is in BCNF.
4th Normal Form Cont… Now R may or may not be in fourth normal form. 1. If R contains no multi value dependency then R will be in Fourth normal form. 2. Assume R has the following two-multi value dependencies: a --->> b and a --->> c In this case R will be in the fourth normal form if b and c dependent on each other. However if b and c are independent of each other then R is not in fourth normal form and the relation has to be projected to following two non-loss projections. These non-loss projections will be in fourth normal form.
4th Normal Form Cont… Many-to-many relationships Fourth Normal Form applies to situations involving many-to-many relationships. In relational databases, many-to-many relationships are expressed through cross-reference tables.
Note about FDs and MVDs Every Functional Dependency is a MVD (if A1A2…An ® B1B2…Bn , then A1A2…An Þ B1B2…Bn ) FDs rule out certain tuples (i.e. if A ® B then two tuples will not have the same value for A and different values for B) MVDs do not rule out tuples. They guarantee that certain tuples must exist.
Formal Definitions Fourth Normal Form - if R is valid BCNF and… - given the “non-trivial” MVD: A1A2…An Þ B1B2…Bn {A1A2…An} is a superkey A MVD: A1A2…An Þ B1B2…Bn for a Relation R is “non-trivial” if: 1. none of the Bs are among the As 2. Not all of the attributes of R are among the As and Bs A MVD is “trivial” if it contains all the variations of A1A2…An x B1B2…Bn. A relation cannot be decomposed any further (under 4NF rules) if it has a trivial MVD
Example 1 Consider a case of class enrollment. Each student can be enrolled in one or more classes and each class can contain one or more students. Clearly, there is a many-to-many relationship between classes and students. This relationship can be represented by a Student/Class cross-reference table: {StudentID, ClassID}
Example 1 Cont… The key for this table is the combination of StudentID and ClassID. To avoid violation of 2NF, all other information about each student and each class is stored in separate Student and Class tables, respectively. Note that each StudentID determines not a unique ClassID, but a well-defined, finite set of values. This kind of behavior is referred to as multi-valued dependency of ClassID on StudentID.
Example 2 Consider another example with two many-to-many relationships, between students and classes and between classes and teachers. Students Classes * Classes Teachers * Also, a many-to-many relationship between students and teachers is implied.
Example 2 Cont… However, the business rules do not constrain this relationship in any way—the combination of StudentID and TeacherID does not contain any additional information beyond the information implied by the student/class and class/teacher relationships. Consequentially, the student/class and class/teacher relationships are independent of each other—these relationships have no additional constraints. The following table is, then, in violation of 4NF: {StudentID, ClassID, TeacherID}
4th NF and Anomalies As an example of the anomalies that can occur, realize that it is not possible to add a new class taught by some teacher without adding at least one student who is enrolled in this class. To achieve 4NF, represent each independent many-to-many relationship through its own cross-reference table.
4th Normal Form and anomalies Cont… Case 1: Assume the following relation: Employee (Eid:pk1, Language:pk2, Skill:pk3) No multi value dependency, therefore R is in fourth normal form.
4th Normal Form and anomalies Cont… case 2: Assume the following relation with multi-value dependency: Employee (Eid:pk1, Languages:pk2, Skills:pk3) Eid --->> Languages Eid --->> Skills Languages and Skills are dependent. This says an employee speak several languages and has several skills. However for each skill a specific language is used when that skill is practiced.
Eid Language Skill 100 English Teaching Kurdish Politic French Cooking Thus employee 100 when he/she teaches speaks English but when he cooks speaks French. This relation is in fourth normal form and does not suffer from any anomalies. Eid Language Skill 100 English Teaching Kurdish Politic French Cooking 200 Arabic Singing
4th Normal Form and anomalies Cont… case 3: Assume the following relation with multi-value dependency: Employee (Eid:pk1, Languages:pk2, Skills:pk3) Eid --->> Languages Eid --->> Skills Languages and Skills are independent.
4th Normal Form and anomalies Cont… This relation is not in fourth normal form and suffers from all three types of anomalies. Eid Language Skill 100 English Teaching Kurdish Politic 200 Arabic Singing
Eid Language Skill 100 English Teaching Kurdish Politics 200 Arabic Insertion anomaly: To insert row (200 English Cooking) we have to insert two extra rows (200 Arabic cooking), and (200 English Singing) otherwise the database will be inconsistent. Note the table will be as follow: Eid Language Skill 100 English Teaching Kurdish Politics 200 Arabic Singing Cooking
Eid Language Skill 100 English Teaching Kurdish Politics 200 Arabic Deletion anomaly: If employee 100 discontinue politic skill we have to delete two rows: (100 Kurdish Politic), and (100 English Politic) otherwise the database will be inconsistent. Eid Language Skill 100 English Teaching Kurdish Politics 200 Arabic Singing Cooking
More anomalies Update anomaly: If employee 200 changes his skill from singing to dancing we have to make changes in more than one place.
Eid Language 100 English Kurdish 200 Arabic The relation is projected to the following two non-loss projections which are in forth normal form Emplyee_Language(Eid:pk1, Languages:pk2) Eid Language 100 English Kurdish 200 Arabic
Emplyee_Language(Eid:pk1, Skills:pk2) Cont… Emplyee_Language(Eid:pk1, Skills:pk2) Eid Skill 100 Teaching Politic 200 Singing
References Functional Dependency (Normalization) http://www.emunix.emich.edu/~khailany/files/Normalization.htm Multivalued Dependencies (Ozmar Zaine): http://www.cs.sfu.ca/CC/354/zaiane/material/notes/Chapter7/node13.html