Chapter 15 1 Functional Dependencies and Normalization for Relational Databases تنبيه : شرائح العرض (Slides) هي وسيلة لتوضيح الدرس واداة من الادوات في ذلك. حيث المرجع الاساسي للمادة هي الكتاب المعتمد في وصف المقرر
Introduction Chapter 10-2 We have studied Relational model, which consist of group of relation schemas Each relation schema consists of number of attributes The attributes are grouped together to form a relation schema by using a commonsense of the database designer We still need some formal measure of why one group of attributes of one relation may be better than another So far, we have not develop any measure of appropriateness to measure the quality of design 2 2
Outline Chapter 10-3 Informal design guidelines for good and bad relation schemas 3 1.Semantics of the Relation Attributes 2.Reducing the Redundant Information in Tuples 3.Reducing the Null Values in Tuples 4.Disallowing the Possibility of Generating Spurious Tuples 3 3
Semantics of the Relation Attributes Chapter 10-4 Design a relation schema so that it is easy to explain its meaning. Do not combine attributes from multiple entity types and relationship types into a single relation Intuitively, if a relation corresponds to one entity type or one relationship type, it is straightforward to explain its meaning. Otherwise, if the relation corresponds to mixture of multiple entities, semantic ambiguities will result and the relation cannot be easily explained The semantics of attributes should be easy to interpret. 4 4
Simplified COMPANY (1) Chapter 10-5 ENAMESSNBDATEADDRESSDNUMBER DEPARTMERNT DNAMEDNUMBERDMGR_SSN EMPLOYEE DNUMBERDLOCATION DEPARTMERNT LOCATIONS PNAMEPNUMBERPLOCATIONDNUM PROJECT SSNPNUMBERHOURS WORKS ON 5
Simplified COMPANY (1) The meaning of the EMPLOYEE relation is quit simple Each tuple represents an employee, with values for employee’s name, social security number, birth date, and address, and the department number that the employee works for The DNUMBER attribute is a foreign key that represents an implicit relationship between EMPLOYEE and DEPARTMENT 6 6
Simplified COMPANY (1) The semantics of DEPARTMENT and PROJECT schemas are also straightforward Each department tuple represents a department entity, each project tuple represents a project entity The attribute DMGR_SSN of Department relates a department to the employee who is its manager The attribute DNUM of Project relates a project to its controlling department, both attributes are foreign key 7 7
Simplified COMPANY (1) 8 The semantics of the other two relation schemas are slightly more complex Each tuple in DEPARTMENT LOCATION gives a department number(Dnumber) and one of the locations of the department (Dlocation) Each tuple in WORKS ON gives an employee social security number (ssn), the project number of one of the projects that the employee works on (Pnumber), and the number of hours per week that the employee works on that project(Hours) All the relation schemas is considered as easy to explain and hence having clear semantic 8
The ease with which the meaning of a relation’s attributes can be explained is an informal measure of how well the relation is designed Informally 9 9
Design a relation schema so that it is easy to explain its meaning. Do not combine attributes from multiple entity types and relationship types into a single relation Guideline 1 10
Simplified COMPANY (2) Chapter ENAMESSNBDATEADDRESSDNUMBERDNAMEDMGRSSN EMP_DEPT SSNPNUMBERHours ENAME PNAMEPLOCATION EMP_PROJ 11
Simplified COMPANY (2) Chapter The relation schemas above have clear semantics Each tuple in EMP_DEPT represents a single employee but includes additional information namely the name of the department for which the employee works and the social security number of the department Each tuple in EMP_PROJ relates an employee to a project but also includes employee name, project name, and project location 12
Although there is nothing wrong logically with these two relations, they are considered poor design because they violate the guideline 1 by mixing attributes from distinct real world entities Conclusion They may be used as views but they cause problems when used as a base relations 13
Redundant Information in Tuples and Update Anomalies One goal of schema design is to minimize the storage space used by the base relations Combining attributes from multiple entity types has a significant effect on storage space 14
Redundant Information in Tuples and Update Anomalies 15 ENAMESSNBDATEADDRESSDNUMBERDNAMEDMGRSSN EMP_DEPT SSNPNUMBERHours ENAME PNAMEPLOCATION EMP_PROJ Redundancy DNAME, DMGRSSN are repeated for every employee who works for that department 15
Another serious problem with using the previous schemas as a base relations is the problem of update anomalies which is classified into Insertion anomalies Deletion anomalies Modification anomalies Redundant Information in Tuples and Update Anomalies 16
Insertions anomalies Problem1: To insert a new tuple for an employee who works in department 5, we must enter the attribute values of department 5 correctly so that they are consistent with values for department 5 in other tuple Problem2: The only way to insert a new department that has no employees is to place null values in the attributes for employee. This cause a problem because the SSN is the primary key and cannot be NULL ENAMESSNBDATEADDRESSDNUMBERDNAMEDMGRSSN 17
Deletion anomalies Problem3: if an employee is the sole employee on a department, deleting that employee would result in deleting the corresponding department ENAMESSNBDATEADDRESSDNUMBERDNAMEDMGRSSN 18
Modification anomalies Problem4: If we change the value of the manager of department 5, we must update the tuples of all employees who work in that department ; Otherwise, the database will become inconsistence If some tuples does not updated then the same department will have two different values for manager, which would be wrong ENAMESSNBDATEADDRESSDNUMBERDNAMEDMGRSSN 19
Mixing attributes of multiple entities may cause problems Information is stored redundantly wasting storage Problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies Conclusion 20
Design a schema that does not suffer from the insertion, deletion and update anomalies. If any anomalies are present: Note them clearly Make sure that the programs that update the database will operate correctly Guideline 2 21
Null Values in Tuples Chapter In some schema designs we may group many attributes together into “fat” relation If many of the attributes do not apply to all tuples in the relation, we end up with many nulls in those tuples Problems of NULL Waste space at storage level Lead to problem when using aggregate operation such as SUM Nulls have multiple interpretations The attribute does not apply to this tuple attribute value unknown value known to exist, but unavailable 22
Avoid placing attributes in a base relation whose values may frequently be NULL Attributes that are NULL frequently could be placed in separate relations (with the primary key) Guideline 3 23 For example, if only 10% of employees have individual office, then it does not make sense to include attribute called OFFICE_NUMBER in EMPLOYEE relation Rather, make new relation EMP_OFFICE( ESSN, OFIICE_NUMBER) 23
Generation of Spurious Tuples 24 Emp_Dept EnameSSNBdateAddressDnumberDnameDmgr_ssn Smith,John B /01/09731 Fondren,Houston,TX5Research Wong,Franklin T /12/08638Voss,Houston,TX5Research Zelaya, Alicia J /07/ Castle,Spring,TX4Administration Wallace,Jennifer S /06/20291 Berry,Beliaire,TX4Administration Narayan,Ramesh K /09/15975 FireOak,Humble,TX5Research English,Joyce A /07/ Rice,Houston,TX5Research Jabbar,Ahmad V /03/29980 Dallas,Houston,TX4Administration Borg,James E /11/10450 Stone,Houston,TX1Headquarters Redundancy 24
Generation of Spurious Tuples 25 SSNPNUMBERHours ENAME PNAMEPLOCATION Emp_Proj SSNPnumberHoursEnamePnamePlocation Smith,JohnB.ProductXBellaire Smith,JohnB.ProductYSugarland Narayan,RameshK.ProductZHouston English,JoyceA.ProductXBellaire English,JoyceA.ProductYSugarland Wong,FranklinT.ProductYSugarland Wong,FranklinT.ProductZHouston Wong,FrankiinT.ComputerizationStafford Wong,FranklinT.ReorganizationHouston Zelaya,AliciaJ.NewbenefitsStafford Zelaya,AliciaJ.ComputerizationStafford Jabbar,AhmadV.ComputerizationStafford Jabbar,AhmadV.NewbenefitsStafford Wallace,JenniferS.NewbenefitsStafford Wallace,JenniferS.ReorganizationHouston nullBorg,JamesE.ReorganizationHouston Redundancy 25
Generation of Spurious Tuples 26 SSNPNUMBERHours ENAME PNAMEPLOCATION Suppose that we used EMP_PROJ1 and EMP_LOCS as the base relations instead of EMP_PROJ This produces a bad schema design, because we cannot recover the information that was originally in EMP_PROJ from EMP_PROJ1 and EMP_LOCS
Generation of Spurious Tuples 27 Emp_Proj1 SSNPnumberHoursPnamePlocation ProductXBellaire ProductYSugarland ProductZHouston ProductXBellaire ProductYSugarland ProductYSugarland ProductZHouston ComputerizationStafford NewbenefitsStafford m10 ComputerizationStafford ComputerizationStafford NewbenefitsStafford NewbenefitsStafford ReorganizationHouston nullReorganizationHouston EMP_LOCS EnamePlocation Smith,JohnB.Bellaire Smith,JohnB.Sugarland Narayan,RameshK.Houston English,JoyceA.Bellaire English,JoyceA.Sugarland Wong,FranklinT.Sugarland Wong,FranklinT.Houston Zelaya,AliciaJ.Stafford Jabbar,AhmadV.Stafford Wallace,JenniferS.Stafford Wallace,JenniferS.Houston Borg,JamesE.Houston 27
Generation of Spurious Tuples 28 If we attempt a NATURAL JOIN operation on EMP_PROJ1 and EMP_LOCS The results produces many more tuples than the original set of tuples in EMP_PROJ Additional tuples that were not in EMP_PROJ are called spurious tuples because they represent wrong information that is not valid 28
Generation of Spurious Tuples 29 SSNPnumberHoursPnamePlocationEname ProductXBellaireSmith,John B. * ProductXBellaireEnglish,Joyce A ProductYSugarlandSmith,John B. * ProductYSugarlandEnglish,Joyce A. * ProductYSugarlandWong,Franklin T ProductZHoustonNarayan,Ramesh K. * ProductZHoustonWong,Franklin T. * ProductXBellaireSmith,John B ProductXBellaireEnglish,Joyce A. * ProductYSugarlandSmith,John B ProductYSugarlandEnglish,Joyce A. * ProductYSugarlandWong,Franklin T. * ProductYSugarlandSmith,John B. * ProductYSugarlandEnglish,Joyce A ProductYSugarlandWong,Franklin T. * ProductZHoustonNarayan,Ramesh K ProductZHoustonWong,Franklin T ComputerizationStaffordWong,Franklin T. * ReorganizationHoustonNarayan,Ramesh K ReorganizationHoustonWong,Franklin T. Emp_Proj1 * Emp_Locs 28
30
Guideline 4 31 Avoid relations that contain matching attributes that are not primary or foreign keys because joining on such attributes may produce spurious tuples Guideline 4 31
Anomalies cause Redundant work during insertion and modification Loss of information during a deletion NULL Cause Waste of storage space Difficulty of performing aggregation operations Generation of invalid and spurious data during joins on improperly related base relations Conclusions 32