Relation A B C D E F 1NF? 2NF? 3NF? Relation1 A B Relation2 A* C D E* Help me Codd!! Relation3 E F Normalisation Reading: Connolly and Begg 13 & 14 (4th ed),
Normalisation From this… …to this In 3+ easy(?) steps
What is normalisation? A method for database design – Theory examines how “good” is a schema? – Transform non-normalised schemas – Minimise storage Takes a set of attributes and derives the relational model – By separating out the required tables Completely different approach to ERM – But should get the same result A minimum of 3 steps are used: For each stage, the normal form gets stronger (i.e. removes redundancy) so less open to update anomalies All based on functional dependencies
Functional Dependency Underpins normalisation process If every value of column A uniquely determines the value in column B, then – B is functionally dependent on A (B depends on A) – A determines B, or, formally, A B (A is called the determinant) For example, – EmpID Age, Dept (A B,C) Employee ID, Project Role (X, Y Z) – Note multiple attributes are often involved EmpID Project Age Dept Dsize Budget Role
Rules for functional dependency A B does NOT automatically mean B A – E.g. student ID name but not name ID Transitive dependency: If A B and B C then A C Many other rules – E.g. if X,Y Z but X Z also – In this case Z is partially dependent on X,Y “Transitive” and “partial” dependency are two key concepts of the normalisation process
A Question for you! EmpID Project Age Dept Dsize Budget Role E1 P233D210100Analyst E1 P133D210200Prog. E2 P134D510200Prog. E2 P234D520100Analyst Which functional dependency is violated by the data? A B C D
Unnormalised Form Relation contains: – non-atomic attribute values non-atomic values ID Employee Salary Project 1 Grey 31000A 2 Brown 35000B,C 3 White 55000A,B,C 4 Black 47000A,C Violation of 1NF
First Normal Form Permits only single (atomic) attribute values ID Employee Salary 1 Grey Brown White Black ID (fk) Project Budget 1A10 2B5 2C5 3A5 3B5 3C5 4A10 4C5 Remove Repeating Group along with primary key from other Table ID Employee Salary Project Budget 1Grey31000A10 2Brown35000B5 2Brown35000C5 3White55000A5 3White55000B5 3White55000C5 4Black47000A10 4Black47000C5 redundancyRepeating
Full Functional Dependency (FFD) X Y is FFD – if removal of any attribute from X removes the dependency X Y is partially dependent – if removal of attribute from X leaves the dependency intact 2NF test – involves testing for partial dependency on the PK (therefore PK MUST be composite to test for 2NF) Relation R is in 2NF if: – every non-primary-key attribute in R is FFD on the primary key of R Second Normal Form
So which FD’s are violating 2NF? “Second Normalised” by: – removing non-primary-key attributes and forming a FFD on appropriate part of primary key 2NF EmpID Project Age Dept Dsize Budget Role {EmpID,Age, Dept, Dsize}{EmpID*, Project*, Role} {Project, Budget}
Third Normal Form Remove Transitive Dependency Conditions – A non-primary-key attribute Z is transitively dependent on primary key X if: X Y; Y Z (Y attribute provides the transition to the PK) [EmpID* Project* Role] [Project Budget] [EmpID Age Dept Dsize] A B C Which of the above could have transitive dependency? D None of the above
Here is an un-normalised Table Ord# Date Cust#NameProd# Desc Qty Supplier Tel 112/1/01 1Jones1Disk3X /1/01 1Jones2CD5Y /1/01 2Black1Disk1X /1/01 2Black2CD1Y /1/01 2Black3Mouse1X /1/01 1Jones3Mouse1X101
Normalise it to 1NF Ord# Date Cust#Name 1 12/1/011Jones 2 13/1/012Black 3 13/1/011Jones Ord# Date Cust#NameProd# Desc Qty Supplier Tel 112/1/01 1Jones1Disk3X /1/01 1Jones2CD5Y /1/012Black1Disk1X /1/012Black2CD1Y /1/012Black3Mouse1X /1/011Jones3Mouse1X101 Ord#Prod#DescQty Supplier Tel 11Disk3X101 12CD5Y223 21Disk1X101 22CD1Y223 23Mouse1X101 33Mouse 1X101 fk
Ord# Date Cust#Name 1 12/1/011Jones 2 13/1/012Black 3 13/1/011Jones Ord#Prod#DescQty Supplier Tel 11Disk3X101 12CD5Y223 21Disk1X101 22CD1Y223 23Mouse1X101 33Mouse 1X101 Already in 2NF Prod#DescSupplier Tel 1DiskX101 2CDY223 3Mouse X101 Ord#Prod# Qty Now we normalise this to 2NF remembering to test on the PK for any partial dependency fk
So, any transitive dependency? Ord# Date Cust#Name 1 12/1/011Jones 2 13/1/012Black 3 13/1/011Jones Prod#DescSupplier Tel 1DiskX101 2CDY223 3Mouse X101 Ord#Prod#Qty fk
Yes! But not in all ……………. Ord# Date Cust#Name 1 12/1/011Jones 2 13/1/012Black 3 13/1/011Jones Prod#DescSupplier Tel 1DiskX101 2CDY223 3Mouse X101 Prod#DescSupplier (fk) 1DiskX 2CDY 3Mouse X Ord# Date Cust# (fk) 1 12/1/ /1/ /1/011 Supplier Tel X101 Y223 Cust#Name 1Jones 2Black Ord#Prod#Qty OK!
Final Decomposition Ord#{fk} Prod#{fk}Qty Ord# Date Cust# (fk) 1 12/1/ /1/ /1/011 Cust#Name 1Jones 2Black Prod#DescSupplier (fk) 1DiskX 2CDY 3Mouse X Supplier Tel X101 Y223 Now in 3NF
The underlying E-R Model ….. Ord# Date Cust#NameProd# Desc Qty Supplier Tel 112/1/01 1Jones1Disk3X /1/01 1Jones2CD5Y /1/012Black1Disk1X /1/012Black2CD1Y /1/012Black3Mouse1X /1/011Jones3Mouse1X101 CustomerOrder ProductSupplier 0..* * * makes has despatches How many tables would you get from mapping?
So Normalisation to 3NF is Normal!! Remember, 2NF and 3NF disallow partial and transitive dependencies respectively on the PK, otherwise they are open to update anomalies But ….. even at 3NF, a relation may be open to update anomalies on rare occasions due to redundancy too So we look briefly at these – Boyce-Codd – 4NF
Boyce-Codd NF Is a stronger normalised form then 3NF Definition: A relation is in BCNF, if and only if, every determinant is a candidate key And remember that a candidate key is any key that could become the PK of the relation (i.e. there may be competition for it!) Potential to violate BCNF comes from: – A relation containing at least 2 composite candidate keys – Or candidate keys overlapping (i.e. they have at least one attribute in common)
BCNF Example Consider the candidate keys for: Adapted from Connolly and Begg, 2005, 4 th ed. Page 420 clientNointerviewDateinterviewTimestaffNoroomNo CR7613/5/ SG5G101 CR5613/5/ SG5G101 CR7413/5/ SG37G102 CR561/7/ SG5G102 FD1 {PK}: clientNo, interviewDate interviewTime, staffNo, roomNo FD2 {CK}: staffNo, interviewDate, interviewTime clientNo FD3 {CK}: roomNo, interviewDate, interviewTime staffNo, clientNo FD4: staffNo, interviewDate roomNo PK is primary key and CK is candidate key. But what about FD4? It is not a CK
So new decomposition? clientNointerviewDate*interviewTimestaffNo* CR7613/5/ SG5 CR5613/5/ SG5 CR7413/5/ SG37 CR561/7/ SG5 interviewDatestaffNoroomNo 13/5/08SG5G101 13/5/08SG37G102 1/7/08SG5G102 So duplication in the room number is now eradicated
4NF Comes from 2 multi- valued attributes in a relation E.g. for each value of A there is a set of values for B and a set for C, while B and C remain independent of each other Branch BranchNo staffName[1..*] ownerName[1..*] So if you model your databases from ERM’s this type of dependency should not arise.
Example of 4NF branchNostaffNameownerName C003AnneCarol C003DavidCarol C003AnneTina C003DavidTina branchNo*staffName C003Anne C003David branchNo*ownerName C003Carol C003Tina Note: if step 9 applied to multi-valued attributes then we should map this correctly and avoid such redundancy as the two tables on the right would be the result of the mapping! Adapted from Connolly and Begg, 2005, 4 th ed. Page 428
Normal Form Summary A Relation’s degree of normalisation Stronger in format at each stage – less vulnerable to update anomalies First Normal Form (1NF) – The relation has no non-atomic values – Or the relation has “no repeating group” 2 nd Normal Form (2NF) – The relation has no partial dependencies – All non-key attributes are fully functionally dependent on the PK 3 rd Normal Form (3NF) – The relation has no transitive dependencies Boyce-Codd – Every determinant is a candidate key 4NF – no multi-valued dependencies