Improving the Quality of Database Designs (Adapted from David Kroenke, Dabase Processing)
Improving the Quality of Database Designs Minimizing Redundancy in Database Avoiding Anomalies Function Dependency Normal Forms o First Normal Form o Second Normal Form o Third Normal Form Exercise Problems
Minimizing Redundancy in DB Redundancy o Wastes space o Wastes time o Causes Anomalies (incorrect data)
Avoiding Anomalies Causes o Update Anomaly o Insertion Anomaly o Deletion Anomaly
DVD Table dvdIDacquiredtitlegenrelengthstudiocountry 1201/25/03The 39 StepsMystery120ABCUSA 1502/5/03ElizabethDrama105XYZEngland 17212/31/03Lady & TrampAnimation93DEFPoland 1573/25/03ElizabethDrama105XYZEngland 1105/12/02Annie HallComedy120ABCUSA 1253/8/03ElizabethDrama105XYZEngland Back to UABack to IABack to DA
Update Anomaly Situation in which Update in one record requires update in another record. E.g. Suppose for dvdID #150 (Elizabeth), length is changed to 100. If length values in devID #157 and #125 are not changed also, we have anomalies. To DVD To DVD
Insertion Anomaly Situation in which Adding a record results in an inconsistency Suppose another copy of The 39 Steps is added to the table. If its values of genre, length, and rating are not the same as those dvdID #120, we have an anomaly. To DVD To DVD
Deletion Anomaly Situation in which Deleting one record results in unintended loss of data Suppose dvdID #172 is removed. Then all data items regarding studio DEF and its country (Poland) —will be lost. To DVD To DVD
Functional Dependency Definition Given: A and B are attributes of relation (table) R Then B is functionally dependent on A if and only if each value in A has associated with it exactly one value of B in R. A B ( A determines B) I.e., any 2 rows with same value for A will have the same value for B
Functional Dependence (1) DVD (title, publisher, length, director, pubAddress) o publisher pubAddress (yes) o title length (no) o title, publisher length (yes) Back to 2NF
Functional Dependence (2) Books (bkID, ISBN, title, author, pubAddress) o ISBN title (yes) o ISBN author (yes) o bkID title (yes) o bkID author (yes) o bkID pubAddress (yes) o title, publisher length (yes) A primary key determine each nonkey attribute
First Normal Form (1NF) A relation (table) is in 1NF if o Each row is unique (with primary key) o All attributes are atomic
Second Normal Form (2NF) A relation (table) is in second normal form if o All nonkey attributes are dependent on all of the key. (This means that the relation is not in 2NF if any nonkey attribute is dependent on only part of the key.) E.g., in DVD, length is dependent only on title, but not on publisher. To FD1To FD1
2NF? (No) stdIDactivitiesfee 100Skiing Golf65 150Swimming50 175Squash50 175Swimming50 200Swimming50 200Golf65 StudentdActivities Back to Problems
Problems Note o Key: stdID + activities o Attribute fee is dependent only on activities (partial key). Problems o There are obvious redundancies. o If student 175 is removed, fee($50) for Squash is deleted. o A new activity—say Surfing—cannot be entered until a student is entered To 2NF
Solution Remove the attribute that is dependent only on part of the key and form a new table Create a link between the new and the original tables using a foreign key Note: if a relation (table) is 1NF and the primary key consists of a single attribute, the relation is automatically 2NF.
Solution stdIDactivities 100Skiing 100Golf 150Swimming 175Squash 175Swimming 200Swimming 200Golf Activitiesfee Skiing200 Golf65 Swimming50 Squash50 Activities Fees
Third Normal Form (3NF) A relation is in 3NF if o It is in 2NF and o There are no transitive dependencies. (I.e., every nonkey attribute is dependent only on the primary key.) Table satisfying 3NF (in common terms) o Should have a field that uniquely identifies each record o Each field in the table should describe the subject that the table represents
3NF? (No) stdIDbuildingfee 100Randolf Ingersoll Randolf Pitkin Randolf1200 StudentHousing Back to Problems
Transitive Dependence stdID building (I.e., building is dependent on stdID) building fee (I.e., fee is dependent on building) Thus, stdID building fee
Problems StdHousing is in 2NF, but o Redundant data will introduce modification anomaly o Removing stdID 150 deletes fee value for Ingersoll o Fee for a new building—say Barrett—cannot be recorded until a new stdID is entered To 3NF To 3NF
Solution Remove data that is not dependent on primary key and form new relation Create a relationship between the new and the original tables using foreign key
Solution stdIDBuilding 100Randolf 150Ingersol 200Randolf 250Pitkins 300Randolf BuildingFee Randolf1200 Ingersoll1100 Pitkins1100 ResidenceFee StudentResidence
Try This (Customers Table) Back to Problem
Problem Note that o custNum ZIP ZIP city, state I.e., custNum ZIP city, state o Transitive dependence results in redundancy and modification, insertion, & deletion anomalies. To CustomersCustomers
Solution
Summary Examine the attributes of an entity and ask the following questions. If the answer is any “Yes,” an attribute probably belong to another entity. o Does an attribute or attributes describe an entity other than the current one? o Does an attribute of the entity depend (functionally dependent) on only part of the primary key? o Does an attribute depend on something other than the primary key?
empId empLastName empFirstName empMiddleName empAddress empCity empState empZip empPhone empPager empPosition empPositionDescrip empDateHire empPayRate empDateLastRaise custId custName custAddress custCity custState custZip custPhone custFax orderNum orderQuantity orderDate prodId prodDescrip prodCost Employees empId empLastName empFirstName empMiddleName empAddress empCity empState empZip empPhone empPager empPosition empPositionDescrip empDateHire empPayRate empDateLastRaise Customers custId custName custAddress custCity custState custZip custPhone custFax orderNum orderQuantity orderDate Products prodId prodDescrip prodCost Company Database
Company Database (2) Employees empId empLastName empFirstName empMiddleName empAddress empCity empState empZip empPhone empPager empPosition empDateHire empPayRate empDateLastRaise Employees empId empLastName empFirstName empMiddleName empAddress empCity empState empZip empPhone empPager EmployeePays empId empPosition empPositionDescrip empDateHire empPayRate empDateLastRaise
Company Database Customers custId custName custAddress custCity custState custZip custPhone custFax orderNum orderQuantity orderDate Customers custId custName custAddress custCity custState custZip custPhone custFax Orders custId orderNum orderQuantity orderDate
Quiz Normalization is the process of grouping data into logically related data into tables to reduce redundancy. (T/F) Having no duplicate or redundant data in a database, and having everything in the database normalized, is always the best way to go. (T/F) If data is in the third normal form, it is automatically in the first and second normal forms. (T/F) What is the major advantage of denormalized database versus a normalized database? What are some major disadvantages of unnormalized database?
Exercise : What Type of Relationships Do the Tables Have? Positions os_id position position_descrip EmployeePays empPayId empDateHire empPayRate empDateLastRaise Orders orderNum orderQuantity orderDate Customers custId custName custAddress custcity custState custZip custPhone custFax Employees empId empLastName empFirstName empMiddleName empAddress empCity empState empZip empPhone empPager
Exercise: Normalize the following data. Take the following data and normalize it. Keep in mind that, in a real DB, there would be many more items than what is given here. Employees: Angela Smith, secretary, RR 1 Box 73, Greensburg, IN, 47890, $9.50/hour, started Jan. 22, 1996, SSN is Jack Lee Nelson, salesman, 3334 N. Main St., Brownsburg, IN, 45687, , $35,000.00/year, data started 10/28/95, SSN is Customers: Robert’s Games & Things, 5612 Lafayette Rd., Indianapolis, IN, 46224, , customer ID is 432A Reed’s Dairy Bar, 4556 W 10th St., Indianapolis, IN, 46245, , customer ID is 117A CustomerOrders: Customer ID is 117A, date of last order is 2/20/1997, product ordered was napkins, and product ID is 661
Tables Employees Customers Orders Ssn lastName firstName street city state zip phoneNum salary hourlyRate startDate position customerID name street city state zip phoneNum orderID customerID productID productDescrip dateOrdered
Solutions