Copyright © Curt Hill Schema Refinement II 2 nd NF to 3 rd NF to BCNF
Copyright © Curt Hill Recap First Normal Form is just a standard rectangular table No repeating groups It is good, but may have anomalies –Insert Need extra info to insert –Delete Extra info may be lost when deleting –Update Multiple updates may be needed
Copyright © Curt Hill Functional Dependencies Why? –The analysis of the FDs shows the problems with First and Second Normal Forms –Why Third and Boyce-Codd Normal Forms are better Notation: A B This is read: A determines B Or: B is dependent on A B is fully functionally dependent on A –B is functionally dependent on A –B is not functionally dependent on any subset of A –Notation is A ↠ B
Copyright © Curt Hill Lossless Join Decomposition The goal is to project a table in a lower normal form into several tables of higher normal form This is done using a lossless join decomposition Occurs when no dependencies are broken When the tables are joined, exact original table is reproduced
Copyright © Curt Hill Second Normal Form (2 nd NF) A table is in Second Normal Form if and only if It is in 1 st NF and Every non-key attribute is fully functionally dependent on the whole key Eliminates partial dependencies
Copyright © Curt Hill Partial Dependencies Key X A X A X is part of key but not all of it Violation of 2 nd NF
Copyright © Curt Hill Student File Revisited SIDSNameLCodeStatus 21JonesA11 32SmithA11 36EricsonA32 39WilliamsA23 This is 2 nd NF but still demonstrates anomalies, because LCode Status We must know the LCode and Status before an insert Updating Jone’s LCode could put conflicting information in file Deleting Ericson loses all info about LCode A3
Copyright © Curt Hill Transitivity Again Functional dependencies are transitive –If A B and B C then A C –Equality, greater and lesser are also transitive
Copyright © Curt Hill Transitivity is the Problem The Status depends on the LCode The LCode depends on SID which is the key Thus 2 nd NF Status depends directly on a non- key –It depends transitively on the key There are still anomalies because of the transitive dependency
Copyright © Curt Hill Transitive Dependencies SID LCode LCode Status Lcode is not part of key Status
Copyright © Curt Hill Third Normal Form A table is in 3 rd Normal Form if and only if The table is 2 nd NF Every non-key item is intransitively dependent on the key –In other words: Each item not in the key depends directly on the key and does not depend on anything not in the key
Copyright © Curt Hill Another view of 3 rd NF For each Functional Dependency in the relation R, X A then one of the following conditions must be true The FD is trivial –A is part of X, so dependency is trivial X is a superkey –The key plus additional fields A is part of some key for R
Copyright © Curt Hill Student File Revisited SIDSNameLCodeStatus 21JonesA11 32SmithA11 36EricsonA32 39WilliamsA23 Is? LCode Status –The FD is trivial - No –X is a superkey - No –A is part of some key for R – No Not in 3 rd Normal Form
Copyright © Curt Hill So how do we fix? Projection! SIDSNameLCodeStatus 21JonesA11 32SmithA11 36EricsonA32 39WilliamsA23 SIDSNameLCode 21JonesA1 32SmithA1 36EricsonA3 39WilliamsA2 LCodeStatus A11 A32 A23 Becomes
Notes With LCode and Status in a separate table some anomalies are eliminated No deletion of a student loses LCode information No insertion of a student needs status information, so it cannot be in conflict with other status information Changing status and LCode only needs a single update Copyright © Curt Hill
Boyce-Codd Normal Form Slight strengthening of 3 rd NF A table is in 3 rd NF iff –The table is 2 nd NF –Every non-key item is intransitively dependent on the the key A table is in Boyce-Codd NF iff –The table is 2 nd NF –Every item is dependent only on the key
Copyright © Curt Hill 3 rd NF and BCNF For each FD in the relation R, X A One of the following conditions must be true for 3 rd NF –The FD is trivial –X is a superkey –A is part of some key for R One of the following conditions must be true for BCNF –The FD is trivial –X is a superkey
Copyright © Curt Hill Explanation Many consider 3 rd NF and BCNF as identical What 3 rd NF does not consider is the possibility of alternate keys The definition of a key in 3 rd NF is the primary or other candidate key BCNF forces everything to be dependent on only the primary key
Copyright © Curt Hill 3NF and BCNF Key X A X A Disallowed by 3NF and BCNF Disallowed by BCNF
Copyright © Curt Hill The catchy saying Each item should be dependent on: the key (1 st NF) the whole key (2 nd NF) and nothing but the key (BCNF)
Copyright © Curt Hill 3NF or BCNF? Is there a practical difference? Yes BCNF is slightly stronger –It eliminates a type of redundancy –It may also introduce another problem
Copyright © Curt Hill An example Relation has 6 attributes: ABCDEF FDs: A ABCDEF (A is key) CE A (CE is also a key) BD E Not BCNF, but is 3NF –BD is not a key Project into ABCDF and BDE to make it BCNF
Copyright © Curt Hill Problem with this Projection Project into ABCDF and BDE This is a lossless join decomposition Now in BCNF There is an integrity constraint issue –CE A cannot be checked without doing a join This was not a dependency preserving decomposition
Copyright © Curt Hill Testing for Dependency Preserving Decompositions If we decompose relation R into S and T If the R + = (S U T) + –Then it is dependency preserving This was not ABCDF and BDE –CE A is in R + but not in (S U T) + –Neither relation had all three fields
Copyright © Curt Hill Decompositions May Be: Lossless join Dependency preserving Both (clearly the preferred) Neither There is always a lossless join and dependency preserving decomposition into 3NF This is not always the case with BCNF –We can always get to BCNF –Is it desirable?
Copyright © Curt Hill Perspective Some redundancy is allowed in 3NF that is disallowed in BCNF We can not always get to BCNF with dependency preserving decompositions –Even though we can always get to BCNF We then have to decide where to stop We may actually settle for 2NF for other reasons –Such as efficient queries
Copyright © Curt Hill Final Thoughts This is far as FDs and FFDs may be pushed Higher normal forms require looking at something else Fourth Normal Form requires consideration of multi-valued dependencies