Download presentation
Presentation is loading. Please wait.
1
Concept of Normalization in
More On ERD and Concept of Normalization in Relational Data Model
2
Weak Entities Before we discuss Normalization, we need to introduce one more type of entity and relationship: Weak entity and Weak relationship
3
Weak Entities Entities that we have talked about so far, are all strong (regular) entity types. A strong entity type has key attribute(s) of its own whereas a weak entity type does not have a particular key attribute that can uniquely identify its instances. Entities belonging to a weak entity type are identified by being related to specific entities from another entity type in combination with one of their attribute values. We can call the other entity type the Identifying or Owner Entity type and we call the relationship type that relates a weak entity to its owner entity Identifying Relationship.
4
Weak Entities -- Cont For example, we can store the information about the dependants of an employee in a separate entity called Dependents. The Dependants entity can keep track of the name and other personal information of dependants of the employees. Note that it might be possible to have two different employees that have a daughter named “Nancy”. So name cannot be used as the primary key of Dependants entity. In order to distinguish one instance of entity Dependants from another instance, we need to know the employee Id of the employees which is the primary key of Employee entity. The combination of employee id (that is a property of Employee entity) and the name attribute (that is a property of Dependants entity) can be used to create a primary key for the Dependants entity.
5
Mapping ERD to Tables for weak Entities
For each weak entity E in the ERD. Create a table T that includes all the simple attributes of E. Include only the simple component attributes of a composite attribute. Choose one of the key attributes of E as partial primary key for T If the chosen key of E is composite, the composite key will form the partial primary key of T. Include the primary key of the owner table in T. The primary key of the owner with the partial primary key chosen for T become the composite primary key of T.
6
Example of Weak Entity Relationship
Age Name Name Emp-Id Gender M 1 EMPLOYEE HAS-DEPENDANT DEPENDANTS Foreign key that references Employee table PART Table 1: DEPENDANTS (Emp-Id, Name, Age, Gender) A Composite Primary key for Dependants table
7
Another Example Of Changing ERD to tables
8
SEAT SeatNo 1
9
We start with taking care of the strong entities first. They include:
AIRPORT: AirportCode, Name, City, State, … AIRPLANE-TYPE: TypeName, MaxSeats, Company,… AIRPLANE AirPlaneId, TotalNoOfSeats,… FLIGHT: Number, WeekDays, Airline,…. We better check mark the entities we have taken care of
10
SEAT SeatNo 1
11
Again, lets check mark what we have done
Now we also take care of the week entities. As you see, the owner entity of some week entities are strong entities and the owner of others are week entities. It is a better idea to perform the ones that their owners are strong entities. So in our example, we get: FLIGHT-LEG FlightNumber, LegNo, … FARES FlightNumber, Code, Amount, Restrictions, …. LEG-INSTANCE FlightNumber, LegNo, Date, NoOfAvailSeats, … SEAT FlightNumber, LegNo, Date, SeatNo, CustomerName, CPhone,… Again, lets check mark what we have done
12
SEAT SeatNo 1
13
Here is what we have so far:
AIRPORT: AirportCode, Name, City, State, … AIRPLANE-TYPE: TypeName, MaxSeats, Company,… AIRPLANE AirPlaneId, TotalNoOfSeats, FLIGHT: Number, WeekDays, Airline,…. FLIGHT-LEG FlightNumber, LegNo, FARES FlightNumber, Code, Amount, Restrictions, …. LEG-INSTANCE FlightNumber, LegNo, Date, NoOfAvailSeats, SEAT FlightNumber, LegNo, Date, SeatNo, CustomerName, CPhone,…
14
Now we take care of 1:1 and 1:N relationships:
AIRPORT: AirportCode, Name, City, State, … AIRPLANE-TYPE: TypeName, MaxSeats, Company,… AIRPLANE AirPlaneId, TotalNoOfSeats, TypeName, FLIGHT: Number, WeekDays, Airline,…. FLIGHT-LEG FlightNumber, LegNo, DepAirPortCode, ScheduledDepTime, ArrAirportCode, ScheduledArrTime, FARES FlightNumber, Code, Amount, Restrictions, …. LEG-INSTANCE FlightNumber, LegNo, Date, NoOfAvailSeats, AirPlaneId, DepAirPortCode, DepTime, ArrAirportCode, ArrTime SEAT FlightNumber, LegNo, Date, SeatNo, CustomerName, CPhone,…
15
SEAT SeatNo 1
16
Now we take care of M:N relationships:
AIRPORT: AirportCode, Name, City, State AIRPLANE-TYPE: TypeName, MaxSeats, Company AIRPLANE AirPlaneId, TotalNoOfSeats, TypeName FLIGHT Number, WeekDays, Airline FLIGHT-LEG FlightNumber, LegNo, DepAirPortCode, ScheduledDepTime, ArrAirportCode, ScheduledArrTime FARES FlightNumber, Code, Amount, Restrictions LEG-INSTANCE FlightNumber, LegNo, Date, NoOfAvailSeats, AirPlaneId, DepAirPortCode, DepTime, ArrAirportCode, ArrTime SEAT FlightNumber, LegNo, Date, SeatNo, CustomerName, CPhone CAN-LAND AirportCode, TypeName
17
SEAT SeatNo 1
18
So far we learned how to develop an ERD for an organization
We also found how ERD can be changed to initial tables In developing a database, do all developers come up with the same ERD? Depending on assumptions, and style of development, it is very likely for two developers of the same database to come up with two different ERDs Your ERD does not have to be the best ERD that can possibly be created However, you need to do your best to relate entities properly and place attributes where they may fit the best No matter how well-structured your ERD is, your initial tables produced from your ERD can potentially contain redundancy. Redundancy can cause insertion, deletion, and update problems.
19
Assume a developer plans to develop a simple database for computer science department.
The developer’s ERD looks as follows: a single entity with some attributes. Name Dept Salary Course EmpID DateCourseCompleted ComputerScience
20
What is the problem with this database?
He changes his ERD to a table and inserts some data into the table. Therefore, his database contains only one table and here it is. What is the problem with this database? What is the primary key? What is/are the foreign key(s)? Can you see any redundancy in this table? EmpID Course Name Dept Salary DateCompleted 100 Database Margaret Marketing 42,000 6/19/2000 VB 10/7/2000 140 C++ Nancy Accounting 39,000 12/8/2000 Java 1/30/2001 110 Chris Info System 41,500 5/7/2000 190 ASP Steve Finance 38,000 150 Joe 8/12/2000
21
Well-Structured table
Definition: It is a table that contains a minimum amount of redundancy and allows users to insert, modify, and delete the rows in a table without errors or inconsistencies. Consider this table: EmpID Course Name Dept Salary DateCompleted 100 Database Margaret Marketing 42,000 6/19/2000 VB 10/7/2000 140 C++ Nancy Accounting 39,000 12/8/2000 Java 1/30/2001 110 Chris Info System 41,500 5/7/2000 190 ASP Steve Finance 38,000 150 Joe 8/12/2000
22
Anomalies Definition: Errors or inconsistencies that may result when a user attempts to update a table that contains redundant data. There are three types of anomalies: insertion, deletion, and modification anomalies. Insertion Anomaly: Every time you try to input the personal information of an employee into the employee table, you also need to insert the information about the course he has completed. What if we have an employee that has not assigned any course? Can we add personal information and add the course information later? No, because EmpID and Course (together) make the primary key and primary key cannot be NULL.
23
Deletion Anomaly: Suppose that ASP is no longer used by the company. They want to delete all information regarding ASP If we delete ASP, we lose information about employee 190. Modification Anomaly: Suppose that employee number 100 gets a salary increase. You must record the increase in each of the rows for that employee (two occurrences in this case); otherwise, the data is inconsistent.
24
Concept of Normalization
In order to take care of the anomalies, we need to normalize the tables. Normalization: Definition: The process of converting complex data structures into simple, stable data structures. Normal Form: Definition: A state of a table that can be determined by applying simple rules regarding dependencies to that table In order to ensure that our tables have no serious anomalies, they all have to be in third normal form. Before explaining different normal forms, we need to understand the concept of functional dependency.
25
Functional Dependencies
Definition: If an attribute of a table (Employee Name) can be determined from another attribute in the same table (Employee Number), the Name is functionally dependant on the Number. Common examples of functional dependencies are: SSN determines Name, Address and Birthday, A person’s name, address, and birthday are functionally dependent on that person’s Social Insurance number VIN determines Make, Model and Color The make, model, and color of a vehicle are functionally dependent on the vehicle identification number, ISBN determines Title The title of the book is functionally dependent on the book’s international standard book number (ISBN)
26
First Normal Form Definition: A table is in first normal form if each field (intersection of a row and a column) contains only one instance Having more than one data item in a field is called ‘Repeating Groups’ For example the following table is not in first normal form. EmpID Course Name Dept Salary DateCompleted 100 Database VB Margaret Marketing 42,000 6/19/2000 10/7/2000 140 C++ Java Nancy Accounting 39,000 12/8/2000 1/30/2001 110 Chris Info System 41,500 5/7/2000 190 ASP Steve Finance 38,000 150 Joe 8/12/2000
27
The simplest method to make a table into the first normal form is to expand the number of rows in the table to ensure that the value of each attribute is atomic. Thus, to make our table into the first normal formal we have to create two extra rows EmpID Course Name Dept Salary DateCompleted 100 Database Margaret Marketing 42,000 6/19/2000 VB 10/7/2000 140 C++ Nancy Accounting 39,000 12/8/2000 Java 1/30/2001 110 Chris Info System 41,500 5/7/2000 190 ASP Steve Finance 38,000 150 Joe 8/12/2000
28
Second Normal Form Definition: A table is in second normal form if it is in first normal form and non-key attributes don’t have partial functional dependency. Partial functional dependency: Definition: A dependency in which one or more non-key attributes are functionally dependent on part, but not on all of the primary key. Note that, a table in first normal form is also in second normal form if The primary key consists of only one attribute (column), or No non-key attribute exists, or Every non-key attribute is functionally dependent on the full primary key.
29
In the above table the primary key is “EmpID and Course”.
Name Dept Salary DateCompleted 100 Database Margaret Marketing 42,000 6/19/2000 VB 10/7/2000 140 C++ Nancy Accounting 39,000 12/8/2000 Java 1/30/2001 110 Chris Info System 41,500 5/7/2000 190 ASP Steve Finance 38,000 150 Joe 8/12/2000 In the above table the primary key is “EmpID and Course”. However, the EmpID can functionally determine Name, Dept, and Salary. Thus, Name, Dept, and Salary are only dependent on EmpID that is part of the primary key. EmpID and Course functionally determine the DateCompleted. Therefore, the above table is not in second normal form.
30
The following two tables are still related through the EmpId attribute
In order to solve this problem and making the table into second normal form we can split the table into two new tables Note that, if you split a table into two tables or more, they still have to be connected through primary and foreign key The following two tables are still related through the EmpId attribute Emp ID Course Date Completed 100 Database 6/19/2000 VB 10/7/2000 140 C++ 12/8/2000 Java 1/30/2001 110 5/7/2000 190 ASP 150 8/12/2000 Emp ID Name Dept Salary 100 Margaret Marketing 42,000 140 Nancy Accounting 39,000 110 Chris Info System 41,500 190 Steve Finance 38,000 150 Joe
31
Third Normal Form Definition: A table is in third normal form if it is in second normal form and neither Derived Dependencies nor Transitive Dependencies exist. Derived Dependency: Definition: A table has a Derived Dependency if one or more non-key attribute(s) in that table can be calculated (derived) from other non-key attributes in either the same table or in other tables in the database Transitive dependency: Definition: A table has a Transitive Dependency if one or more non-key attribute(s) depends on another non-key attribute (in the same table)
32
Example of Derived Dependency
Item Number Qty Ordered Price PerUnit Total 12341 4 $20 $80 12342 3 $15 $45 12343 5 $25 $125 The “Total” column can be derived from multiplying “QtyOrdered” and “PricePerUnit” column. Therefore, it can be removed Item Number Qty Ordered Price PerUnit 12341 4 $20 12342 3 $15 12343 5 $25
33
Example of Transitive Dependency
The following is not in 3rd normal form. Why? CustID CustName SalesRepID SalesRep Region 8023 Margaret 101 Smith South 9167 Jim 102 Hicks West 7924 Nancy 6837 Joe 103 Hernandez East 8596 Chris 7018 Steve 104 Faulb North Answer: The reason is that the non-key attribute “SalesRepID” can determine the Region of the SalesRep. Therefore, since transitive dependency exits, the above table is not in third normal form
34
Note that this table is in second normal form because the primary key is a single attribute(CustID).
However, the following anomalies still exists: Insertion Anomaly: A new salesperson assigned to the North region cannot be entered until a customer has been assigned to that sales person Deletion Anomaly:If customer number 6837 is deleted from the table, we lose the information that salesperson Hernandez is assigned to the East region Modification Anomaly: If salesperson Smith is assigned to the East region, several rows must be changed to reflect that fact.
35
Since a Primary Key is unique, using a name is never a good idea
Since a Primary Key is unique, using a name is never a good idea. It is better to add a new attribute (numeric) and use it instead. Cust-ID Cust-Name ID 8023 Margaret 101 9167 Jim 102 7924 Nancy 6837 Joe 103 8596 Chris 7018 Steve 104 ID SalesRep SalesRepRegion 101 Smith South 102 Hicks West 103 Hernandez East 104 Faulb North Foreign key and Primary key relationship
36
has Customer Name Customer Number Customer Address Order Date Order
Orders 1 has PartNumber Quantity Ordered N Details Total Price Part Description PartCost PerUnit
37
Another Example Consider the following tables that is generated from this ERD. The goal is to make the tables into third normal form. Orders Order Number Customer Number Customer Name Address Order Date 12341 121 Bob Smith 52 Clark Str. 12/10/2000 12342 122 Joe Clark 62 Park Ave 12/01/2001 12343 123 John Wong 72 Bay Str. 12/12/2000 Details Order Number Part Number Description PartCost PerUnit Quantity Ordered Total Price 12341 X1, X2, X3 Part1, Part2, Part3 $12, $15, $16 2, 5, 6 $24, $75, $96
38
First normal form -- Remove repeating groups.
Question: Are the two tables, Orders and Details, in first normal form? Orders table is in first normal form because the value for each field is atomic However, Details Table is not in first normal form because some fields have more than one value. To put Details Table in first normal form, we need to remove repeating group in Details table as follows: Details Order Number Part Number Description PartCost PerUnit Quantity Ordered Total Price 12341 X1 Part1 $12 2 $24 X2 Part2 $15 5 $75 X3 Part3 $16 6 $96
39
Second normal form - Remove partial dependencies
Question: Are the two tables, Orders and Details, in Second normal form? The Orders table has a single attribute primary key, (orderNumber) so it is in 2nd normal form already Order Number Customer Number Customer Name Address Order Date 12341 121 Bob Smith 52 Clark Str. 12/10/2000 12342 122 Joe Clark 62 Park Ave 12/01/2001 12343 123 John Wong 72 Bay Str. 12/12/2000
40
However, Details Table is not in Second Normal form. Why?
Order Number Part Number Description PartCost PerUnit Quantity Ordered Total Price 12341 X1 Part1 $12 2 $24 X2 Part2 $15 5 $75 X3 Part3 $16 6 $96 Note that Details Table has a multiple Attribute primary key (Order Number; Part Number) and there is a partial dependency in this table: “PartDescription” and “PartCostPerUnit” are dependant of Part Number We can separate “PartDescription” and “PartCostPerUnit” from the Details table and place them in a new table.
41
The old Details Table Details Parts Order Number Part Number
Description PartCost PerUnit Quantity Ordered Total Price 12341 X1 Part1 $12 2 $24 X2 Part2 $15 5 $75 X3 Part3 $16 6 $96 Details Parts Order Number Part Number Quantity Ordered Total Price 12341 X1 2 $24 X2 5 $75 X3 6 $96 Part Number Description PartCost PerUnit X1 Part1 $12 X2 Part2 $15 X3 Part3 $16 Foreign key and Primary key relationship
42
Third Normal Form -- Remove Derived Dependency and Transitive Dependency
The Details table has a column with derived dependency The “TotalPayable” can be derived from multiplying “QuantityOrdered” in Details table by the “PartCostPerUnit” in the parts table TotalPayable can be removed from the table. Here is what we get: TotalPayable = QuantityOrdered * PartCostPerUnit Details Parts Order Number Part Number Quantity Ordered 12341 X1 2 X2 5 X3 6 Part Number Description PartCost PerUnit X1 Part1 $12 X2 Part2 $15 X3 Part3 $16
43
Is there any more derived dependency in any of the non-key attributes?
Do you see anymore derived dependency or transitive dependency in the following tables? Details Parts Order Number Part Number Quantity Ordered 12341 X1 2 X2 5 X3 6 Part Number Description PartCost PerUnit X1 Part1 $12 X2 Part2 $15 X3 Part3 $16 Is there any more derived dependency in any of the non-key attributes? To have a transitive dependency in a table, we need to have at least two non-key attributes. Therefore, Details table cannot have transitive dependency either. In the Parts table the description and cost columns cannot functionally determine each other. Therefore, Details table and Parts table are in third normal form
44
How about the Orders table?
Order Number Customer Number Customer Name Address Order Date 12341 121 Bob Smith 52 Clark Str. 12/10/2000 12342 122 Joe Clark 62 Park Ave 12/01/2001 12343 123 John Wong 72 Bay Str. 12/12/2000 The Orders table contains Transitive data. Why? The customer name and address depend on Customer Number. Note that “CustomerNumber” is a non-key attribute We can place a customer with its name and address in a separate table and remove it from the Orders table.
45
The old Orders Table Orders Customers 12341 121 Bob Smith
Order Number Customer Number Customer Name Address Order Date 12341 121 Bob Smith 52 Clark Str. 12/10/2000 12342 122 Joe Clark 62 Park Ave 12/01/2001 12343 123 John Wong 72 Bay Str. 12/12/2000 Orders Customers Order Number Customer Number Order Date 12341 121 12/10/2000 12342 122 12/01/2001 12343 123 12/12/2000 Customer Number Customer Name Address 121 Bob Smith 52 Clark Str. 122 Joe Clark 62 Park Ave 123 John Wong 72 Bay Str. Primary key and foreign key relationship
46
What we are left with is 4 tables
These table can be used to represent any order we want Orders Customers Order Number Customer Number Order Date 12341 121 12/10/2000 12342 122 12/01/2001 12343 123 12/12/2000 Customer Number Customer Name Address 121 Bob Smith 52 Clark Str. 122 Joe Clark 62 Park Ave 123 John Wong 72 Bay Str. Details Parts Order Number Part Number Quantity Ordered 12341 X1 2 X2 5 X3 6 Part Number Description PartCost PerUnit X1 Part1 $12 X2 Part2 $15 X3 Part3 $16
47
Revising ERD Based on your normalization, you should revise your ERD
Customer Number Customer Address Order Number Order Date Customer Name 1 M CUSTOMER MAKES ORDER Quantity Ordered Part Number M Part Description PART INCLUDES L PartCostPerUnit
48
Some Comments The normalization process can be complicated by the ERD you start with In the order example, you should know right away that a Customer should have a table of its own How you do the normalization is not important. As long as the data reaches Third Normal Form is all that matters When normalizing data, there is no ‘Right’ way to do it. Some say there is no ‘wrong’ way either. It depends on the data you are working with
49
Sometimes, it even makes sense to ‘not’ fully normalize
EX. You calculate a ‘total’ several times a day, and each time it takes 5 minutes. This total changes at most once a day. It is faster to save the total, and update it when it changes than to recalculate it every time You should go through the normalization process to be sure though
50
Given This data, Derive normalized tables and ERD
Order No Inv Cust ID Prod Desc Price Name Add QTY Total Date Assumptions: Each customer can make several orders; but a particular order is only make for one customer Each product can be in many orders; and each order can contain many products Several invoices can be issues for the same order but each invoice refers to one order
51
Order No Inv Cust ID Prod Desc Price Name Add QTY Total Date ProdId
PRODUCT ProdId ProdDesc ProdPrice ProdId ProdDesc, ProdPrice PRODUCT (ProdId, ProdDesc, ProdPrice, ….)
52
Order No Inv Cust ID Prod Desc Price Name Add QTY Total Date
CustId CustName, CustAdd Customer (CustId, CustName, CustAdd, …) Order No Inv Cust ID Prod Desc Price Name Add QTY Total Date PRODUCT ProdId ProdDesc ProdPrice CUSTOMER CustId CustName CustAdd
53
Order No Inv Cust ID Prod Desc Price Name Add QTY Total Date
PRODUCT ProdId ProdDesc ProdPrice CUSTOMER CustId CustName CustAdd 1 N Is-In IS-For M N ORDER OrderNo OrderDate, OrderTotal Order (OrderNo, OrderDate, OrderTotal, … ) OrderDate OrderNo OrderTotal
54
Order No Inv Cust ID Prod Desc Price Name Add QTY Total Date
Invoice (InvNo, ….) Order No Inv Cust ID Prod Desc Price Name Add QTY Total Date PRODUCT ProdId ProdDesc ProdPrice CUSTOMER CustId CustName CustAdd Is-In IS-For M N ORDER OrderNo OrderDate OrderTotal 1 INVOICE InvNo Has
55
Order No Inv Cust ID Prod Desc Price Name Add QTY Total Date CUSTOMER
CustName CustAdd ProdDesc ProdId ProdPrice PRODUCT 1 N OrderQTY IS-For Is-In N M N Has ORDER INVOICE 1 OrderDate InvNo OrderNo OrderTotal
56
All Tables: ProductOrdered(ProdId, OrderNo, OrderQTY) Invoice (InvNo, OrderNo) Order (OrderNo, OrderDate, OrderTotal, CustId) Customer (CustId, CustName, CustAdd) Product (ProdId, ProdDesc, ProdPrice)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.