Lecture9:Functional Dependencies and Normalization for Relational Databases Ref. Chapter Lecture9 1
How to produce a good relation schema? STEPS: 1.Start with a set of relation. 2.Define the functional dependencies for the relation to specify the PK. 3.Transform relations to normal form. 2 Lecture9
Functional Dependencies Describes the relationship between attributes in a relation. If A and B are attributes of relation R, B is functionally dependent on A, denoted by A B, if each value of A is associated with exactly one value of B. B may have several values of A. Determinant Dependent 3 AB B is functionally dependent on A Lecture9 Normalization
Functional Dependencies X Y X -> Y holds if whenever two tuples have the same value for X, they must have the same value for Y For any two tuples t and u in any relation instance r(R): If t[X]=u[X], then t[Y]=u[Y] Lecture9 4 X Y t u If t & u agree hereThen they must agree here Normalization
Functional Dependencies 5 Example StaffNoposition Position is functionally dependent on Staffno positionStaffNo StaffNo is NOT functionally dependent on position SL21 Manager Manager SL21 SG5 1:1 or M:1 relationship between attributes in a relation 1:M relationship between attributes in a relation Lecture9 Normalization
Examples of FD constraints Social security number determines employee name SSN -> ENAME Project number determines project name and location PNUMBER -> {PNAME, PLOCATION} Employee ssn and project number determines the hours per week that the employee works on the project {SSN, PNUMBER} -> HOURS Lecture9 6 Normalization
Identifying the PK Purpose of functional dependency, specify the set of integrity constraints that must hold on a relation. The determinant attribute(s) are candidate of the relation, if: 1:1 relationship between determinant & dependent. No subset of determinant attribute(s) is a determinant. (nontrivial) If (A, B) C, then NOT A B, and NOT B A All attributes that are not part of the CK should be functionally dependent on the key: CK all attributes of R Hold for all time. PK is the candidate attribute(s) with the minimal set of functional dependency. Normalization 7 Lecture9
Identifying the PK If a relation schema has more than one key, each is called a candidate key. One of the candidate keys is arbitrarily designated to be the primary key, and the others are called secondary keys. A Prime attribute must be a member of some candidate key A Nonprime attribute is not a prime attribute—that is, it is not a member of any candidate key. Lecture9 8 Normalization
The Purpose of Normalization Normalization is a bottom-up approach to database design that begins by examining the relationships between attributes. It is performed as a series of tests on a relation to determine whether it satisfies or violates the requirements of a given normal form. Purpose: - Guarantees no redundancy due to FDs - Guarantees no update anomalies Normal Forms: First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce-Codd Normal Form (BCNF) 9 Lecture9 Normalization
Normal Forms Defined Informally 1 st normal form All attributes depend on the key 2 nd normal form All attributes depend on the whole key 3 rd normal form All attributes depend on nothing but the key Lecture9 10 Normalization
First Normal Form (1NF) 11 Unnormalized form (UNF): A relation that contains one or more repeating groups. First normal form (1NF): A relation in which the intersection of each row and column contains one & only one value. 1NF Disallows: composite attributes multivalued attributes nested relations; attributes whose values for an individual tuple are non-atomic Lecture9 Normalization
First Normal Form (1NF) 12 ClientNo CR76 PropertyNo PG4 Name John Key CLIENT_PROPERTY PG16 PG4 PG36 PG16 CR56 Aline Stewart Unnormalized form (UNF) Lecture9 Not in the 1NF because there are Multivalued attribute in the table (PropertyNo) Normalization
UNF 1NF Approach 1 Expand the key so that there will be a separate tuple in the original relation for each repeated attribute(s). Primary key becomes the combination of primary key and redundant value (multivalued attribute). 1NF relation Disadvantage: introduce redundancy in the relation. 13 ClientNo CR76 PropertyNo PG4 Name John Key CLIENT_PROPERTY PG16 PG4 PG36 PG16 CR56 Aline Stewart CR76 John Key CR56 Aline Stewart CR56 Aline Stewart Lecture9 Normalization
UNF 1NF Approach 2 If the maximum number of values is known for the attribute, replace repeated attribute (PropertyNo) with a number of atomic attributes (PropertyNo1, PropertyNo2, PropertyNo3). 1NF relation Disadvantage: introduce NULL values in the relation. 14 ClientNo CR76 PropertyNo1 PG4 Name John Key CLIENT_PROPERTY PG16 PG4 PG36 CR56 Aline Stewart PropertyNo2PropertyNo3 NULL PG16 Lecture9 Normalization
Summary : first normal form 1NF : if all attribute values are atomic: no repeating group, no composite attributes. Lecture9 15 Normalization
UNF (multivalued) 1NF Lecture9 16 Normalization
UNF (nested relations) 1NF Lecture9 17 Normalization
Example : First normal form -1NF The following table is not in 1NF because there are nested relations in the table DPT_NOMG_NOEMP_NOEMP_NM D Carl Sagan Mag James Larry Bird D Jim Carter Paul Simon 18 Lecture9Normalization
Table in 1NF all attribute values are atomic because there are no repeating group and no composite attributes. DPT_NOMG_NOEMP_NOEMP_NM D Carl Sagan D Mag James D Larry Bird D Jim Carter D Paul Simon 19 NormalizationLecture9
Second Normal Form Uses the concepts of FDs, primary key Definitions Prime attribute: An attribute that is member of the primary key K Full functional dependency: a FD Y -> Z where removal of any attribute from Y means the FD does not hold any more Examples: {SSN, PNUMBER} -> HOURS is a full FD since neither SSN -> HOURS nor PNUMBER -> HOURS hold {SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial dependency ) since SSN -> ENAME also holds Lecture9 20 Normalization
Second Normal Form Second normal form (2NF) further addresses the concept of removing duplicative data A relation R is in 2NF if 1.R is 1NF, and 2.All non-prime attributes are fully dependent on the candidate keys. Which is creating relationships between these new tables and their predecessors through the use of foreign keys. A prime attribute appears in a candidate key. There is no partial dependency in 2NF. Lecture9 21 Normalization
Summary : Second Normal Form (2NF) 1)Meet all the requirements of the 1NF 2)Remove columns that are not fully dependent upon the primary key. 22 Lecture9 Normalization
Example1: 1NF 2NF Lecture9 23 Remove partial dependencies by placing the functionally dependent attributes in a new relation along with a copy of their determinants. Normalization
Example2: Second normal form -2NF Lecture9 24 Inventory DescriptionSupplierCostSupplier Address Inventory DescriptionSupplierCost There are two non-key fields. So, here are the questions: If I know just Description, can I find out Cost? No, because we have more than one supplier for the same product. If I know just Supplier, and I find out Cost? No, because I need to know what the Item is as well. Therefore, Cost is fully, functionally dependent upon the ENTIRE PK (Description-Supplier) for its existence. Normalization
Example 2: Second normal form -2NF Lecture9 25 Supplier NameSupplier Address Inventory DescriptionSupplierCostSupplier Address If I know just Description, can I find out Supplier Address? No, because we have more than one supplier for the same product. If I know just Supplier, and I find out Supplier Address? Yes. The Address does not depend upon the description of the item. Therefore, Supplier Address is NOT functionally dependent upon the ENTIRE PK (Description-Supplier) for its existence. Normalization
Inventory DescriptionSupplierCost Supplier NameSupplier Address The above relations are now in 2NF 26 Lecture9 Example 2: Second normal form -2NF Normalization
Third Normal Form (1) Transitive functional dependency X, Y, Z are attributes of a relation, such that: If X Y and Y Z, then Z is transitively dependent on X via Y. Provided X is NOT functionally dependent on Y or Z (nontrivial FD). Examples: SSN -> DMGRSSN is a transitive FD Since SSN -> DNUMBER and DNUMBER -> DMGRSSN hold SSN -> ENAME is non-transitive Since there is no set of attributes X where SSN -> X and X -> ENAME Lecture9 27 Normalization
Third Normal Form (2) A relation schema R is in third normal form (3NF) if : 1.R in 2NF and 2.no non-prime attribute A in R is transitively dependent on the primary key R can be decomposed into 3NF relations via the process of 3NF normalization NOTE: In X -> Y and Y -> Z, with X as the primary key, we consider this a problem only if Y is not a candidate key. When Y is a candidate key, there is no problem with the transitive dependency. E.g., Consider EMP (SSN, Emp#, Salary ). Here, SSN -> Emp# -> Salary and Emp# is a candidate key. Lecture9 28 Normalization
Summary : Third Normal Form (3NF) 1)Meet all the requirements of the 1NF 2)Meet all the requirements of the 2NF 3)Remove columns that are not dependent upon the primary key. 29 Lecture9 Normalization
Example: 2NF 3NF Lecture9 30 If transitive dependencies exist, place transitively dependent attributes in a new relation along with a copy of their determinants. Normalization
describes parcels of land for sale in various counties of a state. Suppose that there are two candidate keys: Property_id# and {County_name, Lot#} lot # are unique only within each county Property_id# numbers are unique across counties for the entire state. Lecture9 31 Example : Third normal form -3NF Normalization
Lecture9 32 Example: 2NF 3NF Normalization
Books NameAuthor's NameAuthor's Non-de Plume# of Pages Books NameAuthor's Name# of Pages If I know # of Pages, can I find out Author's Name? No. Can I find out Author's Non-de Plume? No. If I know Author's Name, can I find out # of Pages? No. Can I find out Author's Non-de Plume? YES. Therefore, Author's Non-de Plume is functionally dependent upon Author's Name, not the PK for its existence. Author NameNon-de Plume Lecture9 33 Example : Third normal form -3NF Normalization
Review Example 34 PG4 PG16 Pno pAddress 18-Oct Apr-01 1-Oct Apr Oct-01 iDateiTime 10:00 09:00 12:00 13:00 14:00 comments Replace crockery Good order Damp rot Replace carpet Good condition StaffNo SG37 SG14 SG37 CarReg M23JGR M53HDR N72HFR M53HDR N72HFR Lawrence St, Glasgow 5 Novar Dr., Glasgow sName Ann David Ann STAFF_PROPERTY_INSPECTION Unnormalized relation Lecture9 Normalization
UNF 1NF 35 PG4 PG16 Pno pAddress 18-Oct Apr-01 1-Oct Apr Oct-01 iDateiTime 10:00 09:00 12:00 13:00 14:00 comments Replace crockery Good order Damp rot Replace carpet Good condition StaffNo SG37 SG14 SG37 CarReg M23JGR M53HDR N72HFR M53HDR N72HFR Lawrence St, Glasgow 5 Novar Dr., Glasgow sName Ann David Ann STAFF_PROPERTY_INSPECTION 1NF Lecture9 Normalization
1NF 2NF 36 Pno pAddressiDateiTime commentsStaffNo CarReg sName STAFF_PROPERTY_INSPECTION Partial Dependency : Pno pAddress Lecture9 Normalization
1NF 2NF 37 Pno iDateiTime commentsStaffNo CarReg sName PROPERTY_INSPECTION 2NF Pno pAddress PROPERTY 2NF Pno pAddress Transitive Dependency : StaffNo Sname Lecture9 Normalization
2NF 3NF 38 Pno iDateiTime commentsStaffNo CarReg PROPERTY_INSPECTION PROPERTY(Pno, pAddres) STAFF(StaffNo, sName) PROPERTY_INSPECT(Pno, iDate, iTime, comments, staffNo, CarReg) 3NF Pno pAddress PROPERTY 3NF StaffNo sName STAFF 3NF Lecture9 Normalization
Lecture9 39 Normalization
References “Database Systems: A Practical Approach to Design, Implementation and Management.” Thomas Connolly, Carolyn Begg. 5 th Edition, Addison-Wesley, Lecture9 40 Normalization