Chapter 6 Normalization 正規化
6-2 In This Chapter You Will Learn: 更動異常 How tables that contain redundant data can suffer from update anomalies ( 更動異常 ), which can introduce data inconsistencies into a database. The rules associated with the most commonly used normal forms, namely first (1NF), second (2NF), and third (3NF) normal forms. How tables that break the rules of 1NF, 2NF, or 3NF are likely to contain redundant data and suffer from update anomalies. How to restructure tables that break the rules of 1NF, 2NF, or 3NF.
6-3 Normalization A technique for producing a suitable set of tables given the data requirements of an enterprise. Developed by E.F. Codd (1972). For a given table, it is often performed as a series of tests to determine whether the rules of a given normal form are satisfied or violated. Normalization leads to a normalized table structure.
6-4 The Purpose of Normalization To identify a suitable set of tables that support the data requirements of an organization A suitable set of tables includes the following: A minimal number of attributes necessary to support the data requirements Attributes with a close, logical relationship are organized in a same table Minimal data redundancy Each attribute is represented only once with an important exception for attributes that form part or all of FKs
6-5 How Normalization Supports DB Design © Pearson Education Limited 1995, 2005
6-6 Normalization The most commonly used normal forms are first (1NF), second (2NF), and third (3NF) normal forms. All these normal forms are based on rules about relationships among the columns of a table. A table can be normalized to prevent the possible occurrence of update anomalies ( 更動異常 ). Note: In general, IT industry considers normalization to 3NF an acceptable level for removing data redundancy. Higher normalization levels are not widely used.
6-7 Data Redundancy and Update Anomalies Major aim of normalization is to minimize data redundancy by grouping data columns into tables and letting related data columns in a single table. reduce file storage space required by base tables Problems associated with data redundancy are illustrated by comparing StaffBranch table with Staff and Branch tables. The tables store information about staff and branches
6-8 Data Redundancy – Table Data Redundancy – StaffBranch Table Question: 1.Any redundant data? 2.Primary Key? Table StaffBranch Table
6-9 Data Redundancy Staff and Branch Tables Question: 1.Any redundant data? 2.Primary Key?
6-10 Data Redundancy and Update Anomalies StaffBranch table has redundant data The details of a branch are repeated for every member of staff. In contrast, the branch information appears only once for each branch in the Branch table, and only the branch number (branchNo) is repeated in the Staff table, to represent the location each staff member works at.
6-11 Data Redundancy and Update Anomalies Tables that contain redundant information may potentially suffer from update anomalies. Types of update anomalies ( 更動異常 ) Insertion anomalies ( 新增異常 ) Deletion anomalies ( 刪除異常 ) Modification anomalies ( 更新異常 )
6-12 Insertion Anomalies ( 新增異常 ) 1. 1.To insert a new staff located at branch B003, We must also enter the correct details of branch B003 so that the branch details are consistent with values for branch B003 in other records of the StaffBranch table To insert a new branch that currently has no members of staff into the StaffBranch table, It’s necessary to enter nulls into the staff-related columns, such as staffNo. However, as staffNo is the PK for the StaffBranch table, this is not allowed. StaffBranch table PK
6-13 A Design Without Insertion Anomalies 1. 1.To insert a new staff located at branch B003, … 2. 2.To insert a new branch that currently has no members of staff …… No problem with the new design !!!!
6-14 Deletion Anomalies ( 刪除異常 ) If we delete a record from the StaffBranch table, that represents the last member of staff located at a branch, the details about that branch are also lost from the database. Example If we delete the record for staff S0415,…
6-15 A Design Without Deletion Anomalies If we delete the record for staff S0415,… No problem with the new design !!!!
6-16 Modification Anomalies ( 更新異常 ) If we want to change the value of one of the columns of a particular branch in the StaffBranch table, for example the telephone number for branch B001, we must update the records of all staff located at that branch. If this modification is not carried out on all the appropriate records of the StaffBranch table, the database will become inconsistent.
6-17 Without Modification Anomalies If we want to change the value of the telephone number for branch B001, ….. No problem with the new design !!!!
6-18 Functional Dependency ( 函數依賴 ) Functional Dependency (FD) Describes the relationship between the columns of a table For example, Assume that A and B are columns of table R. B is functionally dependent on A (denoted A → B), if each value of A in R is associated with exactly one value of B in R, at any moment in time. For example, in StaffBranch table, staffNo → branchNo (Yes) branchNo → staffNo(No)
6-19 Functional Dependency The determinant ( 決定項 ) of a functional dependency refers to the column or a group of columns on the left- hand side of the arrow. Diagrammatic representation. Example: branchNo branchAddress
6-20 Example - Functional Dependency
6-21 The Process of Normalization A formal technique for analyzing a relation based on the PK of the relation the FDs between the columns of the relation (table). Normalization consists of a series of rules that must be applied to convert from an unnormalized structure into a normalized structure. The process is described in a series of steps which lead to “higher” levels of normalization. These levels are called normal forms. As normalization proceeds step by step, the relations become progressively more restricted ( stronger ) in format and also less vulnerable to update anomalies.
6-22
6-23 First Normal Form (1NF) A table is in 1NF if the intersection of each record and each column contains only one value in the table.
6-24 The Following Table Is Not In 1NF Branch Table
6-25 Converting Branch Table to 1NF (Method 1) Place the multi-valued column(s ) along with a copy of the original key column(s ) into a separate table Remove the multi-valued column(s ) from the original table
6-26 Converting to 1NF: Method 1
6-27 Converting to 1NF: Method 2 copy new record
6-28 Second Normal Form (2NF) Apply only to tables with composite primary keys. Based on the concept of full functional dependency (完整依附、完全依附) Full functional dependency is that if A and B are columns of a table, B is fully dependent on A if B is functionally dependent on A but not on any proper subset of A. If B is dependent on a subset of A, this is referred to as a partial dependency (PD). (部分依附)
6-29 Second Normal Form (2NF) A table is in 2NF if it is in 1NF, and every non-PK column is fully functionally dependent on the PK. A table in 1NF will be in 2NF if any one of the following applies: The PK is composed of only one column. No nonkey columns exist in the table. Every nonkey attribute is dependent on all of the columns of the PK.
6-30 TempStaffAllocation Table Is Not In 2NF. branchNo branchAddress (a PD, branchNo is part of the PK ) branchNo branchAddress (a PD, branchNo is part of the PK ) staffNo name, position(a PD, staffNo is part of the PK ) staffNo name, position(a PD, staffNo is part of the PK ) staffNo, branchNo hoursPerWeek staffNo, branchNo hoursPerWeek
6-31 Converting to Second Normal Form For each group of partial dependencies 1. 1.Determine which non-key columns are not dependent upon the table’s entire PK Remove those columns from the base table Create a new table with those columns and the partial PK columns that they depend upon Create a FK for the original base table, which links to the PK of the new table.
6-32 Converting To 2NF Using branchNo branchAddress staffNo name, position
6-33 Third Normal Form (3NF) Based on the concept of transitive dependency. ( 傳遞依附 ) Transitive Dependency (TD) If A B and B C, then C is transitively dependent on A through B.
6-34 Third Normal Form (3NF) A table is in 3NF if it is in 1NF and 2NF, and no non-PK columns transitively depends on its PK. A table is in 3NF if every nonkey column directly depends on the PK, and not on another nonkey column.
6-35 Converting to 3NF For each group of transitive dependencies, remove any columns that depend upon another non-key column: 1. 1.Determine which columns depend upon another non- key column(s) Remove those columns from the base table Create a new table with those columns and the non-key column(s) that they depend upon Create a foreign key in the original table, which links to the PK of the new table.
6-36 StaffBranch Table Is Not In 3NF staffNo name, position, salary, branchNo, branchAddress, telNo branchNo branchAddress, telNo (a group of transitive dependencies)
6-37 Converting To 3NF Using branchNo branchAddress, telNo
6-38 Example 1 – Normalization Property Rental Report
6-39 copy new record Example 1 – UNF to 1NF
6-40 Example 1 – Define Primary Key (Customer_No, RentStart) ? (Customer_No, RentFinish) ? Note: NULL values could be in RentFinish (Property_No, RentStart)? (Property_No, RentFinish)? (Customer_No, Property_No) ? Any Assumption ? A customer doesn’t rent a same property twice
6-41 Example 1 – FDs for Customer_Rental (Primary key) FDs: Customer_No, Property_No CName, PAddress, RentStart, RentFinish, Rent, Owner_no, OName Customer_No CName Property_No PAddress, Rent, Owner_no, OName Owner_No OName
6-42 (Primary key) Example 1 – Converting Customer_Rental to 2NF Remove partial dependency 1 2 3
6-43 Converting Customer_Rental to 2NF
6-44 Example 1 – Converting Property_Owner To 3NF Remove transitive dependency 1 2
Example 1 – Converting Property_Owner To 3NF
6-46 Example 1 – Process of Normalization Remove PD Remove TD Rental Customer 3 tables 4 tables
6-47 Example 1 – Summary of 3NF Original table Original table
6-48 Example 2 – Property Inspection Report
6-49 Example 2 – Property Inspection Business Rules: When staff are required to undertake inspections, they are allocated a company car for use on the day of the inspections. However, a car may be allocated to several staff members as required throughout the working day. A staff member may inspect several properties on a given date, but a property is only inspected once on a given date.
6-50 Example 2 – UNF To 1NF copy new record
6-51 Example 2 – Define Primary Key (Staff_No, IDate) ? (Property_No, IDate) ? Check business rules
6-52 Example 2 – FDs Of Property_Inspection FDs: Property_No, IDate ITime, PAddress, Comments, Staff_No, SName, Car_Reg Property_No PAddress Staff_No SName
6-53 Example 2 – Converting To 2NF Property_Inspection (Property_No, IDate, ITime, PAddress, Comments, Staff_No, SName, Car_Reg) Remove FD2 (Partial Dependency) Prop_Inspection (Property_No, IDate, ITime, Comments, Staff_No, SName, Car_Reg) Prop (Property_No, PAddress)
6-54 Example 2 – Converting To 3NF Prop (Property_No, PAddress) Prop_Inspection (Property_No, IDate, ITime, Comments, Staff_No, SName, Car_Reg) Remove FD3 (Transitive Dependency) Prop (Property_No, PAddress) Prop_Inspection (Property_No, IDate, ITime, Comments, Staff_No, Car_Reg) Staff (Staff_No, SName)
6-55 Summary: Normalization Rules Normal Form Rule Description First Normal Form The table must express a set of (1NF) unordered, two-dimensional tables. The table cannot contain repeating groups. Second Normal Form (2NF) The table must be in 1NF. Every non-key column must be dependent on all column must be dependent on all parts of the primary key. Third Normal Form (3NF) The table must be in 2NF. No non-key column may be functionally dependent on another non-key column. “ Each non-primary key value MUST be dependent on the key, the whole key, and nothing but the key.” no partial dependency no transitive dependency no repeating group
6-56 The Process of Normalization up to 3NF