Download presentation
Presentation is loading. Please wait.
Published byHilary Logan Modified over 9 years ago
1
DATA NORMALIZATION CS 260 Database Systems
2
Overview Introduction Anomalies Functional dependence Normal forms 1NF 2NF 3NF BCNF Denormalization
3
Introduction Database normalization is a process used to generate a schema that is without unnecessary redundancy while allowing information to be retrieved easily It consists primarily of breaking tables into smaller tables to remove redundant data that can lead to anomalies The results of the normalization process allow schemas to be described as adhering to a particular “normal form”
4
Introduction Normalization requires domain specific knowledge in order to identify “functional dependencies” Some of this may be expressed in an ER model, but not always Normalization is particularly useful for addressing and fixing an existing (and possibly poorly designed) database schema Normalization allows for a design that is free of insertion, update, and deletion anomalies
5
Overview Introduction Anomalies Functional dependence Normal forms 1NF 2NF 3NF BCNF Denormalization
6
Anomalies Insertion anomaly Occurs when inserting a new record causes data to become inconsistent In the following example, an insertion anomaly occurred when Franklin T. Wong’s employee record was first inserted His department manager’s SSN was entered incorrectly Occurs also when a new record cannot be inserted due to missing data In the following example, an insertion anomaly would occur if an attempt was made to insert a record for a new project in the EMPLOYEE-PROJECTS table Cannot be inserted if it doesn’t have any associated employees
7
Anomalies
8
Update anomaly Occurs when some but not all instances of a data value are updated In the following example, an update anomaly occurred if an attempt to update Joyce English’s records was made to accommodate her last name change This was updated in the EMPLOYEE table and in one record in the EMPLOYEE-PROJECTS table, but not in the second record in the EMPLOYEE-PROJECTS table An update anomaly may also have occurred if an attempt was made to change Project X’s location to “Bellaire” Project 1’s related data was missed in the update
9
Anomalies
10
Deletion anomaly Occurs when a record is deleted to remove some data instance, but other data was inadvertently deleted as well In the following example, a deletion anomaly would occur if Franklin T. Wong’s records are removed from the EMPLOYEE and EMPLOYEE-PROJECTS tables Now data regarding the “Computerization” and “Reorganization” projects are gone
11
Anomalies
12
Overview Introduction Anomalies Functional dependence Normal forms 1NF 2NF 3NF BCNF Denormalization
13
Functional Dependence Functional dependence Functional dependence occurs when the values of one or more attributes (A) in some entity unambiguously determine the values of one or more other attributes (B) Notation: AB In other words, if we know the values of all attributes in set A, then we can uniquely identify the values of all attributes in set B These sets can consist of one or more attributes
14
Functional Dependence Strategies for determining functional dependence For each field, ask if its value can be determined if the values of one or more other fields are known Is the field dependent on one or more other fields For each field, ask if its value is known, can the values of any other fields be identified Is the field a determinant for one or more other fields Group functional dependencies with the same determinant into a single relation The following types of functional dependencies can be ignored {A, B}A {A, B}B {A, B}{A, B}
15
Functional Dependence Super Keys A super key is a set of attributes (possibly consisting of a single attribute) that uniquely identifies a record Example In the CANDY_CUSTOMER table in our candy database, {cust_id} is a super key {cust_id, cust_name} is also a super key {cust_id, cust_type} is also a super key Any combination of attributes in CANDY_CUSTOMER that includes a super key is also a super key
16
Sample Database (CANDY) CANDY_CUSTOMER CANDY_PURCHASE CANDY_CUST_TYPE CANDY_PRODUCT
17
Functional Dependence Candidate Keys A candidate key is a super key with a minimal set of attributes Unlike a primary key, a table can potentially have more than one candidate key So a primary key is a candidate key, but a candidate key is not necessarily a primary key Example In the CANDY_CUSTOMER table in our candy database, {cust_id} is a candidate key The addition of any other attributes in this set would be a super key, but not a candidate key If usernames also must be unique, then {username} is also a candidate key
18
Sample Database (CANDY) CANDY_CUSTOMER CANDY_PURCHASE CANDY_CUST_TYPE CANDY_PRODUCT
19
Functional Dependence Functional dependence is often illustrated for a table using a dependency diagram This diagram identifies the fields whose values determine the values of other fields Arrows are drawn from the “determinant” fields to the “dependent” fields We’ll see examples of these as we discuss normal forms in more detail
20
Overview Introduction Anomalies Functional dependence Normal forms 1NF 2NF 3NF BCNF Denormalization
21
First Normal Form First normal form (1NF) A database table is in 1NF if it meets the following requirements: It does not contain any multivalued attributes It does not contain any inappropriately complex attributes Solution for converting a database table to 1NF Replace multivalued attributes with multiple records Replace complex attributes with atomic attributes
22
First Normal Form 1NF example Non-1NF Table Corresponding 1NF Table This assumes that an EmpName will never need to be searched, sorted, or formatted according to first/last names
23
First Normal Form Functional dependencies of 1NF table {Proj#} {ProjName} {Proj#, Emp#} {ProjName, EmpName, JobType, ChgPerHour, Hours} {Emp#} {EmpName, JobType, ChgPerHour} {JobType} {ChgPerHour} DeterminantsDependents Candidate key(s): {Proj#, Emp#}
24
First Normal Form Corresponding dependency diagram
25
Overview Introduction Anomalies Functional dependence Normal forms 1NF 2NF 3NF BCNF Denormalization
26
Second Normal Form Second normal form (2NF) A database table is in 2NF if it meets the following requirements: It is in 1NF, and All non-prime attributes depend on all attributes of each candidate key (no “partial dependencies”) A “non-prime” attribute is one that does not belong to any candidate key of the table As a result, if all of a table’s candidate keys consist of only single attributes, and it is already in 1NF, then it is already in 2NF
27
Second Normal Form Non-2NF table (previously seen table in 1NF) Non-prime attributes dependent on only a part of the lone candidate key
28
Second Normal Form Solution for converting a table to 2NF Convert it to 1NF (if it’s not already in 1NF) Create a table for each of the functional dependencies that involved only a part of the candidate key Those candidate key components should now be candidate keys in their new tables If a M:M relationship exists between the entities that are now in separate tables or the relationship has attributes Create a linking table containing each of those candidate key components, as well as any attributes that were originally dependent on the entire candidate key Otherwise, add a foreign key appropriate for the relationship type
29
Second Normal Form Corresponding 2NF tables
30
Overview Introduction Anomalies Functional dependence Normal forms 1NF 2NF 3NF BCNF Denormalization
31
Third Normal Form Third normal form (3NF) A database table is in 3NF if it meets the following requirements: It is in 2NF, and All non-prime attributes are dependent only on every candidate key in the table (no “transitive dependencies”) Solution for converting a table to 3NF Convert it to 2NF (if it isn’t already in 2NF) Create a table for each of the offending functional dependencies and join appropriately to the original table
32
Third Normal Form Non-3NF table (previously seen tables in 2NF) Violates 3NF Transitive dependency Emp# -> JobType -> ChgPerHour Non-prime ChgPerHour depends on the non-prime JobType attribute Non-prime attributes: EmpName JobType ChgPerHour
33
Third Normal Form Corresponding 3NF tables
34
Overview Introduction Anomalies Functional dependence Normal forms 1NF 2NF 3NF BCNF Denormalization
35
Boyce Codd Normal Form (BCNF) Boyce Codd normal form (BCNF) A database table is in BCNF if it meets the following requirements: It is in 3NF, and Every determinant in the table is a candidate key It’s uncommon for a table to be in 3NF but not BCNF 3NF adds restrictions between non-prime attributes and candidate keys while BCNF adds restrictions between candidate key components For a table to be in 3NF but not BCNF, it must contain two or more overlapping composite candidate keys A component of one of these candidate keys must determine a component of another candidate key to be in 3NF but not BCNF
36
Boyce Codd Normal Form (BCNF) Non-BCNF table (3NF) CourtTime SlotRate Type 11SAVER 13 14STANDARD 21PREMIUM-B 24 25PREMIUM-A TENNIS COURT BOOKING Candidate Keys {Court, Time Slot} {Rate Type, Time Slot} Offending Functional Dependency {Rate Type} -> {Court} SAVER and STANDARD rate types apply to court 1 while PREMIUM-A and PREMIUM-B rate types apply to court 2
37
Boyce Codd Normal Form (BCNF) Solution for converting a table to BCNF Convert it to 3NF (if it isn’t already in 3NF) Create a separate table for the offending functional dependency and join appropriately to the original table Rate TypeTime Slot SAVER1 3 STANDARD4 PREMIUM-B1 4 PREMIUM-A5 TENNIS COURT BOOKING CourtRate Type 1SAVER 1STANDARD 2PREMIUM-A 2PREMIUM-B TENNIS COURT RATES
38
Normal Forms Other normal forms exist as well 4NF 5NF 6NF These either rarely occur or are more theoretical 1NF through BCNF are adequate for practical use Table derivations from ER diagrams usually result in a 3NF design Dependency diagrams can be used for revisions and verification
39
Overview Introduction Anomalies Functional dependence Normal forms 1NF 2NF 3NF BCNF Denormalization
40
Denormalization Normalization results in more tables for the same data which increases processing complexity Complex joins require more processing Increased disk I/O for select, insert, update, and delete operations as well If performance is significantly impacted, a database schema may need to be “denormalized” Tables may be combined into fewer tables or views may be created to prevent the need for joins and multiple table inserts, updates, and deletes
41
Class Exercise Convert the following table into BCNF INVOICE InvoiceCustomerNameAddressPartPriceQuantity 100143Jim Jones12 Main St.Screw, Nut, Washer 0.10, 0.05, 0.05 200, 300, 100 100255John Smith13 Main St.Screw, Brace0.10, 5.00100, 1 100343Jim Jones12 Main St.Saw12.0010
42
Class Exercise Is the table in 1NF? Suppose customer names may be searched and sorted according to first and last names It contains multivalued attributes as well as inappropriately complex attributes, so it is not in 1NF Solution for converting a database table to 1NF Replace multivalued attributes with multiple records Replace complex attributes with atomic attributes It may make sense to add fields to the table’s primary key
43
Class Exercise INVOICE (old) INVOICE (new) InvoiceCustomerFNameLNameAddressPartPriceQuantity 100143JimJones12 Main St.Screw0.10200 100143JimJones12 Main St.Nut0.05300 100143JimJones12 Main St.Washer0.05100 100255JohnSmith13 Main St.Screw0.10100 100255JohnSmith13 Main St.Brace5.001 100343JimJones12 Main St.Saw12.0010 InvoiceCustomerNameAddressPartPriceQuantity 100143Jim Jones12 Main St.Screw, Nut, Washer 0.10, 0.05, 0.05 200, 300, 100 100255John Smith13 Main St.Screw, Brace0.10, 5.00100, 1 100343Jim Jones12 Main St.Saw12.0010
44
Class Exercise Is the table in 2NF? Identify the table’s functional dependencies There are non-prime attributes that depend on components of candidate keys, so it is not in 2NF {Invoice} -> {Customer, FName, LName} {Part} -> {Price} INVOICE InvoiceCustomerFNameLNameAddressPartPriceQuantity 100143JimJones12 Main St.Screw0.10200 100143JimJones12 Main St.Nut0.05300 100143JimJones12 Main St.Washer0.05100 100255JohnSmith13 Main St.Motor52.001 100255JohnSmith13 Main St.Brace5.001 100343JimJones12 Main St.Saw12.0010
45
Class Exercise Solution for converting a table to 2NF Convert it to 1NF (if it’s not already in 1NF) Create a table for each of the functional dependencies that involved only a part of the candidate key Those candidate key components should now be candidate keys in their new tables Create a linking table containing each of those candidate key components, as well as any attributes that were originally dependent on the entire candidate key
46
Class Exercise INVOICE (old) InvoiceCustomerFNameLNameAddressPartPriceQuantity 100143JimJones12 Main St.Screw0.10200 100143JimJones12 Main St.Nut0.05300 100143JimJones12 Main St.Washer0.05100 100255JohnSmith13 Main St.Screw0.10100 100255JohnSmith13 Main St.Brace5.001 100343JimJones12 Main St.Saw12.0010 INVOICE (new) PartPrice Screw0.10 Nut0.05 Washer0.05 Brace5.00 Saw12.00 PART (new) InvoicePartQuantit y 1001Screw200 1001Nut300 1001Washer100 1002Screw100 1002Brace1 1003Saw10 INVOICE-PART (new) InvoiceCustomerFNameLNameAddress 100143JimJones12 Main St. 100255JohnSmith13 Main St. 100343JimJones12 Main St.
47
Class Exercise Are the tables in 3NF? There are non-prime attributes (FName, LName, Address) that depend only on something other than all candidate keys, so it is not in 3NF Solution for converting a table to 3NF Convert it to 2NF (if it isn’t already in 2NF) Create a table for each of the offending functional dependencies and join appropriately to the original table INVOICE (new) PartPrice Screw0.10 Nut0.05 Washer0.05 Brace5.00 Saw12.00 PART (new) InvoicePartQuantit y 1001Screw200 1001Nut300 1001Washer100 1002Screw100 1002Brace1 1003Saw10 INVOICE-PART (new) InvoiceCustomerFNameLNameAddress 100143JimJones12 Main St. 100255JohnSmith13 Main St. 100343JimJones12 Main St.
48
Class Exercise INVOICE (old) PartPrice Screw0.10 Nut0.05 Washer0.05 Brace5.00 Saw12.00 PART (okay) InvoicePartQuantit y 1001Screw200 1001Nut300 1001Washer100 1002Screw100 1002Brace1 1003Saw10 INVOICE-PART (okay) CustomerFNameLNameAddress 43JimJones12 Main St. 55JohnSmith13 Main St. CUSTOMER (new) InvoiceCustomer 100143 100255 100343 INVOICE (new) InvoiceCustomerFNameLNameAddress 100143JimJones12 Main St. 100255JohnSmith13 Main St. 100343JimJones12 Main St. 3NF Conversion
49
Class Exercise Are the tables in BCNF? Every determinant in all tables is a candidate key, so they are in BCNF PartPrice Screw0.10 Nut0.05 Washer0.05 Brace5.00 Saw12.00 PART InvoicePartQuantit y 1001Screw200 1001Nut300 1001Washer100 1002Screw100 1002Brace1 1003Saw10 INVOICE-PART (okay) CustomerFNameLNameAddress 43JimJones12 Main St. 55JohnSmith13 Main St. CUSTOMER InvoiceCustomer 100143 100255 100343 INVOICE
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.