IS 230Lecture 8Slide 1 Normalization Lecture 9
IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious information 4. Functional dependencies 5. Normalization first normal form second normal form third normal form BCNF normal form 6. Normalization Methodology: Example
IS 230Lecture 8Slide 3 1. Normalization A technique for producing a set of relations with desirable properties, given the data requirements of the applications
IS 230Lecture 8Slide 4 2. Data redundancy and anomalies Emp-Dept relation Insertion How do we insert a new department with no employees yet? (keys?) Entering employees is difficult as department information must be entered correctly. Deletion What happens when we delete CC's data - do we lose department 6? Modification If we change the manager of department 5, we must change it for tuples with Dept# = 5.
IS 230Lecture 8Slide 5 3. Spurious information Avoid breaking up relations in such a way that spurious information is created may be broken into: Joining them back together, we get NEW TUPLES!
IS 230Lecture 8Slide 6 4. Functional dependencies Formal concepts that may be used to exhibit “goodness” and “badness” of individual relational schemas, and describe relationships between attributes Examples of Functional Dependency Name, DateOfBirth, Dept all depend on NI Dname & Manager depend on Dept ProjLocation depends on ProjName.
IS 230Lecture 8Slide 7 Functional Dependence An attribute, X, of a relation is functionally dependent on attributes A, B,..., N if the same values of A,..., N are always associated with the same value of X {A,…,N} X A,..., N is called the determinant of the functional dependency
IS 230Lecture 8Slide 8 Full Functional Dependency X fully depends on A, …, N if it is not dependent on any subset of A,..., N; otherwise we talk of partial dependency e.g. age is dependent on NI and Name but only fully dependent on NI .
IS 230Lecture 8Slide 9 5. Normalization Process taking a set of relations and decomposing them into more relations satisfying some criteria. Decomposition essentially a series of projections so that the original data can be reconstituted using joins. Normal form form of the relations which satisfy the criteria
IS 230Lecture 8Slide First Normal Form (1) A relation is in first normal form (1NF) if all values are atomic, i.e. single values - small strings and numbers. Identify and remove repeating groups (multi-valued attributes)
IS 230Lecture 8Slide 11 First Normal Form (2) DEPARTMENT Two ways of normalising this: Have a tuple for each location of each department: Have a separate relation for (Dnumber, Locations) pairs: The latter is better as it avoids redundancy.
IS 230Lecture 8Slide 12 First Normal Form (3) Example of composite attribute: Room is not atomic Room includes Deptname and Room#
IS 230Lecture 8Slide Second Normal Form By the definition of the primary key, every other attribute is functionally dependent on it. If all the other attributes are fully functionally dependent then the relation is in Second Normal Form (2NF). Clearly, any relation with a single primary key will be 2NF. If there are two primary key attributes, A & B, then each other attribute is either dependent on A alone; dependent on B alone; or dependent on both. 2NF Normal consists of creating a separate relation for each of the three cases.
IS 230Lecture 8Slide 14 Example of 2NF decomposition EMP_PROJ(SSN, Pnumber, Hours, Ename, Pname, Ploc) SSNPnumber Hours EnamePnamePloc Decomposition into three 2NF relations: Work(SSN,Pnumber,Hours) EMP(SSN,Ename) Project(Pnumber,Pname,Ploc) SSN Ename Pnumber Pname,Ploc SSN, Pnumber Hours
IS 230Lecture 8Slide Third Normal Form Third Normal Form eliminates transitive dependencies - i.e. those dependencies which hold only because of some intermediary. An attribute is transitively dependent on the primary key if there is some other attribute which is dependent on and which is, in turn, dependent on the key. Dname is dependent on NI , but only because it is dependent on Dnumber which is, in turn, dependent on NI . Non-3NF relations are likely to hold redundant information. A relation is in 3NF if for any pair of attribute A & B such that A B, there is no attribute such that A X and X B. Normalizing this would create:
IS 230Lecture 8Slide 16 Example of 3NF decomposition EMP_DEPT(SSN, Ename, Bdate, Address, Dnum, Dname, Dman) Ename Bdate AddressDnum Dname Dman SSN Decomposition into two 3NF relations: EMPLOYEE(SSN, Ename, Bdate, Address, Dnum) DEPT(Dnum, Dname, Dman) SSN Ename, Bdate, Address, Dnum Dnum Dname, Dman
IS 230Lecture 8Slide 17 General Definitions of Normal Forms
IS 230Lecture 8Slide Boyce-Codd Normal Form Every relation in BCNF is also in 3NF Relation in 3NF is not necessarily in BCNF Nontrivial FD means not trivial FD A trivial functional dependency X Y is one in which Y is a subset of X Example: A, B B is a trivial FD Most relation schemas that are in 3NF are also in BCNF
IS 230Lecture 8Slide 19 Example of not BCNF The following is not in BCNF: bor_loan = (customer_id, loan_number, amount) The following is a functional dependency that may hold: loan_number amount but loan_number is not a superkey of bor_loan We decompose into two relations: R1=(customer_id, loan_number) R2=(loan_number, amount) R1 and R2 are in BCNF
IS 230Lecture 8Slide 20 R=(A, B, C, D, E, F) A, B D B E (not in BCNF) D F (not in 3NF)
IS 230Lecture 8Slide 21 Normalization Methodology: Example Consider the following description of a company: The company is divided into departments. Each department is identified by its department number. A department has a name and a manager (an employee). The employees of the company are identified by their National Insurance number. An employee has a name, an address, an age, and work in one department only. An employee is supervised by several supervisors (employees), and a supervisor can supervise several employees (supervisees). An employee can have dependents, where each dependent is described by his/her name and age. The company has a number of running projects. A project is identified by its project number. A project has also a name and a description. Several employees can work on a project, and an employee can work on several projects, each a fixed number of hours.
IS 230Lecture 8Slide 22 AB141537AB Functional dependencies An attribute, X, of a relation is functionally dependent on attributes A, B,..., N if the same values of A,..., N are always associated with the same value of X {A,…,N} X Example A B does NOT hold, B A does hold
IS 230Lecture 8Slide Functional dependencies (Cont.) The company is divided into departments. Each department is identified by its department number. A department has a name and a manager (an employee). Dnumber Dname, Manager The employees of the company are identified by their National Insurance number. An employee has a name, an address, an age, and work in one department only. NI Ename, Address, Eage, Dnumber
IS 230Lecture 8Slide Functional dependencies (Cont.) An employee is supervised by several supervisors (employees), and a supervisor can supervise several employees (supervisees). NI Supervisor (is not a valid functional dependency since an employee can have several supervisors) An employee can have dependents, where each dependent is described by his/her name and age. NI, DepName DepAge
IS 230Lecture 8Slide Functional dependencies (Cont.) The company has a number of running projects. A project is identified by its project number. A project has also a name and a description. Pno Pname, Description Several employees can work on a project, and an employee can work on several projects, each a fixed number of hours. Pno, NI Hours
IS 230Lecture 8Slide All Functional dependencies Dnumber Dname, Manager NI Ename, Address, Eage, Dnumber NI, DepName DepAge Pno Pname, Description Pno, NI Hours The Universal Relation: U(Dnumber, Dname, Manager, NI, Ename, Address, Eage, Pno, Pname, Description, DepName, DepAge, Hours, Supervisor)
IS 230Lecture 8Slide The Primary key What is the primary key? (Dnumber, NI, Pno, DepName)? Supervisor must be part of the primary key because it is a multivalued attribute DepName must be part of the primary key since there can be several dependents to an employee Dnumber is not part of the primary key since it can be derived from NI Thus the primary key is (NI, Pno, DepName, Supervisor)
IS 230Lecture 8Slide Is U in 1NF? A relation is in first normal form (1NF) if all values are atomic, i.e. single values Supervisor is a multivalued attribute, hence U is not in 1NF To remove the multivalued attribute, the relation U is decomposed as follows U1(Dnumber, Dname, Manager, NI, Ename, Address, Eage, Pno, Pname, Description, DepName, DepAge, Hours) Supervise(NI, Supervisor)
IS 230Lecture 8Slide Is U in 2NF? A relation is in Second Normal Form (2NF) if all the attributes are fully functionally dependent on the primary key. A relation with a single primary key is in 2NF. Supervise is in 2NF U1 is not in 2NF because some attributes are partially functionally dependent on the primary key (e.g. DepAge, Ename, etc). We decompose U1 into the following 2NF relations: Dept_Emp(Dnumber, Dname, Manager, NI, Ename, Address, Eage) Dependent(NI, DepName, DepAge) Project(Pno, Pname, Description) Work(NI, Pno, Hours)
IS 230Lecture 8Slide Is the previous decomposition in 3NF? A relation is in 3NF if for any pair of attributes A & B such that A B, there is no attribute such that A X and X B Supervise is in 3NF Among the above relations, only Dept_Emp is not in 3NF, since it has transitive dependencies (e.g. NI Dnumber Dname). We therefore decompose the relation into the following two relations, which are in 3NF: Department(Dnumber, Dname, Manager) Employee(NI, Ename, Address, Eage, Dnumber)
IS 230Lecture 8Slide Complete Schema in 3NF Department(Dnumber, Dname, Manager) Employee(NI, Ename, Address, Eage,Dnumber) Dependent(NI, DepName, DepAge) Project(Pno, Pname, Description) Work(NI, Pno, Hours) Supervise(NI, Supervisor)
IS 230Lecture 8Slide 32 End of Chapter