Presentation is loading. Please wait.

Presentation is loading. Please wait.

DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization.

Similar presentations


Presentation on theme: "DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization."— Presentation transcript:

1 DATA NORMALIZATION CS 260 Database Systems

2 Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

3 Introduction  Database normalization is a process used to generate a schema that is without unnecessary redundancy while allowing information to be retrieved easily  It consists primarily of breaking tables into smaller tables to remove redundant data that can lead to anomalies  The results of the normalization process allow schemas to be described as adhering to a particular “normal form”

4 Introduction  Normalization requires domain specific knowledge in order to identify “functional dependencies”  Some of this may be expressed in an ER model, but not always  Normalization is particularly useful for addressing and fixing an existing (and possibly poorly designed) database schema  Normalization allows for a design that is free of insertion, update, and deletion anomalies

5 Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

6 Anomalies  Insertion anomaly  Occurs when inserting a new record causes data to become inconsistent In the following example, an insertion anomaly occurred when Franklin T. Wong’s employee record was first inserted His department manager’s SSN was entered incorrectly  Occurs also when a new record cannot be inserted due to missing data In the following example, an insertion anomaly would occur if an attempt was made to insert a record for a new project in the EMPLOYEE-PROJECTS table Cannot be inserted if it doesn’t have any associated employees

7 Anomalies

8  Update anomaly  Occurs when some but not all instances of a data value are updated In the following example, an update anomaly occurred if an attempt to update Joyce English’s records was made to accommodate her last name change This was updated in the EMPLOYEE table and in one record in the EMPLOYEE-PROJECTS table, but not in the second record in the EMPLOYEE-PROJECTS table An update anomaly may also have occurred if an attempt was made to change Project X’s location to “Bellaire” Project 1’s related data was missed in the update

9 Anomalies

10  Deletion anomaly  Occurs when a record is deleted to remove some data instance, but other data was inadvertently deleted as well In the following example, a deletion anomaly would occur if Franklin T. Wong’s records are removed from the EMPLOYEE and EMPLOYEE-PROJECTS tables Now data regarding the “Computerization” and “Reorganization” projects are gone

11 Anomalies

12 Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

13 Functional Dependence  Functional dependence  Functional dependence occurs when the values of one or more attributes (A) in some entity unambiguously determine the values of one or more other attributes (B) Notation: AB In other words, if we know the values of all attributes in set A, then we can uniquely identify the values of all attributes in set B These sets can consist of one or more attributes

14 Functional Dependence  Strategies for determining functional dependence  For each field, ask if its value can be determined if the values of one or more other fields are known Is the field dependent on one or more other fields  For each field, ask if its value is known, can the values of any other fields be identified Is the field a determinant for one or more other fields  Group functional dependencies with the same determinant into a single relation  The following types of functional dependencies can be ignored {A, B}A {A, B}B {A, B}{A, B}

15 Functional Dependence  Super Keys  A super key is a set of attributes (possibly consisting of a single attribute) that uniquely identifies a record  Example In the CANDY_CUSTOMER table in our candy database, {cust_id} is a super key {cust_id, cust_name} is also a super key {cust_id, cust_type} is also a super key Any combination of attributes in CANDY_CUSTOMER that includes a super key is also a super key

16 Sample Database (CANDY) CANDY_CUSTOMER CANDY_PURCHASE CANDY_CUST_TYPE CANDY_PRODUCT

17 Functional Dependence  Candidate Keys  A candidate key is a super key with a minimal set of attributes Unlike a primary key, a table can potentially have more than one candidate key So a primary key is a candidate key, but a candidate key is not necessarily a primary key  Example In the CANDY_CUSTOMER table in our candy database, {cust_id} is a candidate key The addition of any other attributes in this set would be a super key, but not a candidate key If usernames also must be unique, then {username} is also a candidate key

18 Sample Database (CANDY) CANDY_CUSTOMER CANDY_PURCHASE CANDY_CUST_TYPE CANDY_PRODUCT

19 Functional Dependence  Functional dependence is often illustrated for a table using a dependency diagram  This diagram identifies the fields whose values determine the values of other fields  Arrows are drawn from the “determinant” fields to the “dependent” fields  We’ll see examples of these as we discuss normal forms in more detail

20 Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

21 First Normal Form  First normal form (1NF)  A database table is in 1NF if it meets the following requirements: It does not contain any multivalued attributes It does not contain any inappropriately complex attributes  Solution for converting a database table to 1NF Replace multivalued attributes with multiple records Replace complex attributes with atomic attributes

22 First Normal Form  1NF example  Non-1NF Table  Corresponding 1NF Table This assumes that an EmpName will never need to be searched, sorted, or formatted according to first/last names

23 First Normal Form  Functional dependencies of 1NF table  {Proj#} {ProjName}  {Proj#, Emp#} {ProjName, EmpName, JobType, ChgPerHour, Hours}  {Emp#} {EmpName, JobType, ChgPerHour}  {JobType} {ChgPerHour} DeterminantsDependents Candidate key(s): {Proj#, Emp#}

24 First Normal Form  Corresponding dependency diagram

25 Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

26 Second Normal Form  Second normal form (2NF)  A database table is in 2NF if it meets the following requirements: It is in 1NF, and All non-prime attributes depend on all attributes of each candidate key (no “partial dependencies”)  A “non-prime” attribute is one that does not belong to any candidate key of the table  As a result, if all of a table’s candidate keys consist of only single attributes, and it is already in 1NF, then it is already in 2NF

27 Second Normal Form  Non-2NF table (previously seen table in 1NF) Non-prime attributes dependent on only a part of the lone candidate key

28 Second Normal Form  Solution for converting a table to 2NF  Convert it to 1NF (if it’s not already in 1NF)  Create a table for each of the functional dependencies that involved only a part of the candidate key Those candidate key components should now be candidate keys in their new tables  If a M:M relationship exists between the entities that are now in separate tables or the relationship has attributes Create a linking table containing each of those candidate key components, as well as any attributes that were originally dependent on the entire candidate key  Otherwise, add a foreign key appropriate for the relationship type

29 Second Normal Form  Corresponding 2NF tables

30 Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

31 Third Normal Form  Third normal form (3NF)  A database table is in 3NF if it meets the following requirements: It is in 2NF, and All non-prime attributes are dependent only on every candidate key in the table (no “transitive dependencies”)  Solution for converting a table to 3NF Convert it to 2NF (if it isn’t already in 2NF) Create a table for each of the offending functional dependencies and join appropriately to the original table

32 Third Normal Form  Non-3NF table (previously seen tables in 2NF) Violates 3NF Transitive dependency Emp# -> JobType -> ChgPerHour Non-prime ChgPerHour depends on the non-prime JobType attribute Non-prime attributes: EmpName JobType ChgPerHour

33 Third Normal Form  Corresponding 3NF tables

34 Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

35 Boyce Codd Normal Form (BCNF)  Boyce Codd normal form (BCNF)  A database table is in BCNF if it meets the following requirements: It is in 3NF, and Every determinant in the table is a candidate key  It’s uncommon for a table to be in 3NF but not BCNF 3NF adds restrictions between non-prime attributes and candidate keys while BCNF adds restrictions between candidate key components For a table to be in 3NF but not BCNF, it must contain two or more overlapping composite candidate keys A component of one of these candidate keys must determine a component of another candidate key to be in 3NF but not BCNF

36 Boyce Codd Normal Form (BCNF)  Non-BCNF table (3NF) CourtTime SlotRate Type 11SAVER 13 14STANDARD 21PREMIUM-B 24 25PREMIUM-A TENNIS COURT BOOKING Candidate Keys {Court, Time Slot} {Rate Type, Time Slot} Offending Functional Dependency {Rate Type} -> {Court} SAVER and STANDARD rate types apply to court 1 while PREMIUM-A and PREMIUM-B rate types apply to court 2

37 Boyce Codd Normal Form (BCNF)  Solution for converting a table to BCNF  Convert it to 3NF (if it isn’t already in 3NF)  Create a separate table for the offending functional dependency and join appropriately to the original table Rate TypeTime Slot SAVER1 3 STANDARD4 PREMIUM-B1 4 PREMIUM-A5 TENNIS COURT BOOKING CourtRate Type 1SAVER 1STANDARD 2PREMIUM-A 2PREMIUM-B TENNIS COURT RATES

38 Normal Forms  Other normal forms exist as well  4NF  5NF  6NF  These either rarely occur or are more theoretical  1NF through BCNF are adequate for practical use  Table derivations from ER diagrams usually result in a 3NF design  Dependency diagrams can be used for revisions and verification

39 Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization

40 Denormalization  Normalization results in more tables for the same data which increases processing complexity  Complex joins require more processing  Increased disk I/O for select, insert, update, and delete operations as well  If performance is significantly impacted, a database schema may need to be “denormalized”  Tables may be combined into fewer tables or views may be created to prevent the need for joins and multiple table inserts, updates, and deletes

41 Class Exercise  Convert the following table into BCNF INVOICE InvoiceCustomerNameAddressPartPriceQuantity 100143Jim Jones12 Main St.Screw, Nut, Washer 0.10, 0.05, 0.05 200, 300, 100 100255John Smith13 Main St.Screw, Brace0.10, 5.00100, 1 100343Jim Jones12 Main St.Saw12.0010

42 Class Exercise  Is the table in 1NF?  Suppose customer names may be searched and sorted according to first and last names  It contains multivalued attributes as well as inappropriately complex attributes, so it is not in 1NF  Solution for converting a database table to 1NF Replace multivalued attributes with multiple records Replace complex attributes with atomic attributes It may make sense to add fields to the table’s primary key

43 Class Exercise INVOICE (old) INVOICE (new) InvoiceCustomerFNameLNameAddressPartPriceQuantity 100143JimJones12 Main St.Screw0.10200 100143JimJones12 Main St.Nut0.05300 100143JimJones12 Main St.Washer0.05100 100255JohnSmith13 Main St.Screw0.10100 100255JohnSmith13 Main St.Brace5.001 100343JimJones12 Main St.Saw12.0010 InvoiceCustomerNameAddressPartPriceQuantity 100143Jim Jones12 Main St.Screw, Nut, Washer 0.10, 0.05, 0.05 200, 300, 100 100255John Smith13 Main St.Screw, Brace0.10, 5.00100, 1 100343Jim Jones12 Main St.Saw12.0010

44 Class Exercise  Is the table in 2NF?  Identify the table’s functional dependencies  There are non-prime attributes that depend on components of candidate keys, so it is not in 2NF {Invoice} -> {Customer, FName, LName} {Part} -> {Price} INVOICE InvoiceCustomerFNameLNameAddressPartPriceQuantity 100143JimJones12 Main St.Screw0.10200 100143JimJones12 Main St.Nut0.05300 100143JimJones12 Main St.Washer0.05100 100255JohnSmith13 Main St.Motor52.001 100255JohnSmith13 Main St.Brace5.001 100343JimJones12 Main St.Saw12.0010

45 Class Exercise  Solution for converting a table to 2NF  Convert it to 1NF (if it’s not already in 1NF)  Create a table for each of the functional dependencies that involved only a part of the candidate key Those candidate key components should now be candidate keys in their new tables  Create a linking table containing each of those candidate key components, as well as any attributes that were originally dependent on the entire candidate key

46 Class Exercise INVOICE (old) InvoiceCustomerFNameLNameAddressPartPriceQuantity 100143JimJones12 Main St.Screw0.10200 100143JimJones12 Main St.Nut0.05300 100143JimJones12 Main St.Washer0.05100 100255JohnSmith13 Main St.Screw0.10100 100255JohnSmith13 Main St.Brace5.001 100343JimJones12 Main St.Saw12.0010 INVOICE (new) PartPrice Screw0.10 Nut0.05 Washer0.05 Brace5.00 Saw12.00 PART (new) InvoicePartQuantit y 1001Screw200 1001Nut300 1001Washer100 1002Screw100 1002Brace1 1003Saw10 INVOICE-PART (new) InvoiceCustomerFNameLNameAddress 100143JimJones12 Main St. 100255JohnSmith13 Main St. 100343JimJones12 Main St.

47 Class Exercise  Are the tables in 3NF?  There are non-prime attributes (FName, LName, Address) that depend only on something other than all candidate keys, so it is not in 3NF  Solution for converting a table to 3NF Convert it to 2NF (if it isn’t already in 2NF) Create a table for each of the offending functional dependencies and join appropriately to the original table INVOICE (new) PartPrice Screw0.10 Nut0.05 Washer0.05 Brace5.00 Saw12.00 PART (new) InvoicePartQuantit y 1001Screw200 1001Nut300 1001Washer100 1002Screw100 1002Brace1 1003Saw10 INVOICE-PART (new) InvoiceCustomerFNameLNameAddress 100143JimJones12 Main St. 100255JohnSmith13 Main St. 100343JimJones12 Main St.

48 Class Exercise INVOICE (old) PartPrice Screw0.10 Nut0.05 Washer0.05 Brace5.00 Saw12.00 PART (okay) InvoicePartQuantit y 1001Screw200 1001Nut300 1001Washer100 1002Screw100 1002Brace1 1003Saw10 INVOICE-PART (okay) CustomerFNameLNameAddress 43JimJones12 Main St. 55JohnSmith13 Main St. CUSTOMER (new) InvoiceCustomer 100143 100255 100343 INVOICE (new) InvoiceCustomerFNameLNameAddress 100143JimJones12 Main St. 100255JohnSmith13 Main St. 100343JimJones12 Main St.  3NF Conversion

49 Class Exercise  Are the tables in BCNF?  Every determinant in all tables is a candidate key, so they are in BCNF PartPrice Screw0.10 Nut0.05 Washer0.05 Brace5.00 Saw12.00 PART InvoicePartQuantit y 1001Screw200 1001Nut300 1001Washer100 1002Screw100 1002Brace1 1003Saw10 INVOICE-PART (okay) CustomerFNameLNameAddress 43JimJones12 Main St. 55JohnSmith13 Main St. CUSTOMER InvoiceCustomer 100143 100255 100343 INVOICE


Download ppt "DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization."

Similar presentations


Ads by Google