Copyright © Curt Hill Schema Refinement III 4 th NF and 5 th NF
Copyright © Curt Hill Now what? An example Consider a table that contains courses, instructors and textbooks There may be multiple instructors for multiple sections of the class There may be multiple textbooks as well Both instructors and textbooks come from a set of possibilities
Copyright © Curt Hill Course/Instructor/Book DeptNumberInstructorBook CIS385221Smith & Boss CIS385221Noble CIS385403Smith & Boss CIS385403Noble Key is entire tuple Each instructor uses two books for the course There is a redundancy
Copyright © Curt Hill Commentary There is redundancy that we should deal with The table is in BCNF –No examination of FDs will help us The two instructors and two textbooks are both determined by the course department and number This is an example of a MultiValued Dependency
Commentary Again First normal form disallows repeating groups A repeating group is often a set A MultiValued Dependecy is a set depending on an item Examples: –People working on many projects –Each of these have many dependents Copyright © Curt Hill
Examples In this example the course determines a set of instructors The course also determines a set of textbooks These two sets are independent If the sets are large we get plenty of redundancy and yet are still in BCNF –If we have every book connected to every instructor connected to the course Copyright © Curt Hill
MultiValue Dependency An MVD determines a value from a set Notation is two arrows Dept,Number Instructor and Dept,Number Book The correct decomposition is splitting teacher from book
Copyright © Curt Hill Course/Instructor/Book DeptNumberInstructorBook CIS385221Smith & Boss CIS385221Noble CIS385403Smith & Boss CIS385403Noble DeptNumInstruct CIS CIS DeptNumBook CIS385Smith & Boss CIS385Noble Project into
Copyright © Curt Hill Fourth Normal Form The above two tables are in 4 th NF A table is in 4 th NF if and only if The table is in BCNF All MVDs are now FDs If there are no MVDs then BCNF is also 4NF
Copyright © Curt Hill Another View of 4 th NF If a relation is in 4 th NF then for each MVD, X A one of the following must hold The MVD is trivial –A is part of X or –XA is the whole relation X is a superkey
Copyright © Curt Hill Is this 4 th NF? DeptNumberInstructorBook CIS385221Smith & Boss CIS385221Noble CIS385403Smith & Boss CIS385403Noble There are two MVDs –Dept,Number Instructor –Dept,Number Book Trivial MVDs? - No Dept,Number superkey? - No
Copyright © Curt Hill Is this 4 th NF? DeptNumInstruct CIS CIS There is one MVD –Dept,Num Instructor Trivial MVD? –Yes, this is whole relation
Copyright © Curt Hill Decomposability A strange thing happens: There are relations that may not be lossless join decomposed into two relations But they can be decomposed into larger number of relations The following example shows a relation that can be decomposed into three but not two
Copyright © Curt Hill SPJ A Example
Copyright © Curt Hill What about this? What is the key? –Entire tuple –Must be in 4 th NF What MVDs? –S P –S J –P J –Among others
Decomposition In the next slide we will see the table decomposed into tables of two fields However, no two of them can be joined into the original without extra rows All three of them can be joined into the original Copyright © Curt Hill
SPJ SJ SP PJ SPJ SPJ A BCD Example Decomposed
Copyright © Curt Hill What Just Happened? A could not be lossless join decomposed into any two of {B, C, D} –Decomposing into just two must break an MVD It could be lossless join decomposed into all three There is a join dependency between A and {B, C, D} There is no join dependency between any of –A and {B, C} –A and {B, D} –A and {C, D}
Copyright © Curt Hill Join Dependencies A Join Dependency {R 1,R 2,…R N } holds over R if R 1,R 2,…R N is a lossless join decomposition of R –In other words, joining R 1,R 2,…R N gives R Notation: {R 1,R 2,…R N } A JD is a generalization of MVDs In the previous example, the MVDs S P S J P J may be expressed as the join dependency {B,C,D}
Copyright © Curt Hill Trivial Join Decompositions The join dependency {R 1,R 2,…R N } on R is trivial iff –At least one of R 1,R 2,…R N is the set of all attributes of R –In other words, there is a relation equivalent to R in the decomposition Joining R to any decomposition of R or its join reproduces the original
Copyright © Curt Hill Implied Join Dependencies Suppose the join dependency {R 1,R 2,…R N } on R This Join Dependency is Implied by the Candidate Key(s) iff Each relation R 1,R 2,…R N is a superkey for R
Copyright © Curt Hill Fifth Normal Form 5 th NF is also known as: Projection Join Normal Form (PJNF) A relation R is in 5 th NF if and only if every non-trivial join dependency that is satisfied by R is implied by the candidate key(s) of R
Copyright © Curt Hill SPJ Is this in 5 th NF? There is a non-trivial join decomposition, {B,C,D} –None of these are A This decomposition is not implied by the only candidate key, SPJ –None of these contain SPJ No – not in 5NF
Copyright © Curt Hill Is 5 th NF the Ultimate? It is the ultimate that can be obtained with just projections –The guaranteed best in terms of a lack of anomalies that can be removed by projections Hence the name Join Projection Normal Form However, there may be some anomalies that cannot be eliminated with just projections
Copyright © Curt Hill JDs and FDs FDs and MVDs have a set of inference rules –This allows us to reason about them JDs lack this set Thus finding JDs and using them to move to 5 th NF has its problems We do have one tool
Copyright © Curt Hill 3NF and 5NF If a relation is in 3 rd NF and each of its keys is atomic then the relation is also in 5 th NF –The same may be said on BCNF There may be 5 th NF relations that do not have atomic keys When we can apply this we can determine the table is in 5 th NF without any consideration of JDs
Copyright © Curt Hill Denormalization The argument against making everything 5 th NF: –Lots of separate relations –These relations become separate files –This means lots of I/O Since SQL cannot separate a relation from a file, the argument has some merit
Conclusion MVD are much less common than FD Thus tables that are in BCNF are very often in 5NF because there are no MVDs MVDs are also harder to observe and reason about Thus 3NF and BCNF are the most common normal forms Copyright © Curt Hill