Week 6 Lecture Normalization CSE2132 Database Systems Week 6 Lecture Normalization Normalization 6. 1
Week 5 lecture review: Logical Database Design Steps 1. Conceptual Model (ER Diagram) mapped onto a logical model dependent on the DBMS characteristics. 2. De-normalization (Optimize for efficiency). Combining tables to avoid doing joins Create more tables - Horizontal and Vertical partitioning Data replication (Redundancy) Combination of the above Normalised relations solve data maintenance problems and minimise redundancy, but implemented as such as physical records, may not yield efficient data processing. NB: Only use De-normalisation to gain explicit processing speed when other design actions are not sufficient! Normalization 6. 2
Goal of Relational Design What Relations (tables) should exist and what Attributes (columns) should they contain? Avoid Redundancy if possible - minimize storage space Avoid Anomalies (data that does not make business sense) Avoid Nulls Avoid Joins which produce spurious (false) tuples (rows) Normalization 6. 3
Dependency Theory " One truly scientific part of the field [of database design]" Date 5th ed p.325 Relational database design - a mechanical approach to producing a database schema with certain desirable properties. Following…. A review of normal forms and the problems they solve. Normalization 6. 4
Data Normalization Normalization is a formal process to decide which attributes should be grouped together. Primarily a tool/technique to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data. It provides a formal measure of why one grouping of attributes may be better than another. Each Normal Form requires that a relation satisfies criteria for that normal form and this eliminates a different kind of redundancy. Database operations applied to unnormalized relations may lead to anomalies. Normalized Relations will remain consistent following database operations and will store each fact only once. Normalization 6. 5
Assumptions A group of attributes has a natural “inherent” structure. This structure is independent of the way the data is used. Normalization Introduced by E. Codd together with relational database theory. Originally Codd defined three normal forms. This was later expanded to include Boyce-Codd and fourth and fifth normal forms. Normalization 6. 6
Anomalies Consider the poorly structured relation ASSIGN Person_Id Project_budget Project_Id Time_ Spent_on_Project S75 32 P1 7 S75 40 P2 8 S79 32 P1 4 S79 27 P3 1 S80 40 P2 5 - 17 P4 - Null Values are considered to be anomalies Normalization 6. 7
Anomalies Insertion Anomaly add tuple (ASSIGN , <S85,35,P1,9>) - two conflicting budgets for P1 Deletion Anomaly delete tuple (ASSIGN, <S79,27,P3,1>) - removes project budget for P3 Normalization 6. 8
Anomalies Update anomalies update tuple (ASSIGN, <S75,32,P1,7>,<S75,35,P1,7>) This example tries to update the budget for P1. But P1 is also listed in the row with S79 ... either multiple updates or the potential for inconsistency ... Normalization 6. 9
Normalization and Functional Dependencies Normalization is based on the analysis of Functional Dependencies. Functional dependency = constraint between two attributes or two sets of attributes. Normalization 6. 10
Functional Dependencies - the values of one set of attributes effect the values of another attribute. The value of X determines the value of Y. The value of Y is functionally dependent on the value of X. Y is a fact about X. The simplest case is 1 attribute determines another single attribute. Often 2 or 3 attributes are needed to determine another single attribute. Y X Normalization 6. 11
Functional Dependencies Referring to slide 6.7 ... Project_id Project Budget Person_Id Project_id Time Spent on Project Alternative Representation: Functional Dependency Diagram Project_id Project Budget Normalization 6. 12
Task: Write down all the Functional Dependencies Answer: Name birtdate salary EMPLOYEE1 Emp_id Answer: Name salary date_completed EMPLOYEE2 Emp_id Course_id Normalization 6. 13
First Normal Form (1NF) A table is in 1NF if: it contains no repeating groups (i.e. no multi-valued attributes) every attribute is atomic ( Relational Model does not handle repeating groups) Relationship between key and non-key fields Will be one to one(1:1) or one to many (1:N) Normalization 6. 14
First Normal Form (Example) Remove Repeating Groups All occurrences in a relation must have the same number of fields Relation: STUDENT(STUD#,SNAME(SUBCODE,TITLE,RESULT)) 1NF Relation: STUDENT(STUD#,SNAME) STUDENT-RESULT(STUD#,SUBCODE,TITLE,RESULT) Normalization 6. 15
Second Normal Form A relation is in 2NF if: it is in 1NF, and every non-key attribute is fully functionally dependent on the whole key. Problems with relations not in 2NF: - repeated information - update anomalies - potential inconsistency - delete anomalies Normalization 6. 16
Second Normal Form (Example) Remove Partial Dependencies A non-key attribute cannot be identified by part of a composite key ORDER-ITEM(ORDER#,ITEM#, DESC, QTY) ORDER-ITEM(ORDER#,ITEM#,QTY) ITEM(ITEM#,DESC) Normalization 6. 17
Anomalies due to Partial Dependencies ORDER-ITEM ORDER# ITEM# DESC QTY 27 873 NUT 2 28 402 BOLT 1 28 873 NUT 10 30 495 WASHER 50 UPDATE - change DESC in many places DELETE - data for ITEM is lost when ORDER is deleted INSERT - cannot create a new ITEM until an ORDER requires that ITEM Normalization 6. 18
Solution to 2NF Anomalies ORDER-ITEM ORDER# ITEM# QTY Delete Order# 30 and washer still remains 27 873 2 28 402 1 28 873 10 30 495 50 ITEM Add a new Item at any time ITEM# DESC 873 NUT Update BOLT in one place only 402 BOLT 495 WASHER Normalization 6. 19
Third Normal Form A relation is in 3NF if: it is in 2NF, and A functional dependency between two (or more) nonkey attributes, gives rise to a transitive dependency A relation is in 3NF if: it is in 2NF, and contains no transitive dependencies 3NF - is violated when a non-key field is a fact(thus a functional dependency exists) about another non-key field Problems with relations not in 3NF: -as for 2NF Normalization 6. 20
Third Normal Form (Example) The functional dependency between the nonkey attributes (DEPT# and DNAME_, gives rise to a transitive dependency (EMP# DNAME). Remove this transitive dependency Remove Transitive Dependencies A non-key attribute cannot be identified by another non-key attribute. EMPLOYEE(EMP#,ENAME,DEPT#,DNAME) EMPLOYEE(EMP#,ENAME,DEPT#) DEPARTMENT(DEPT#,DNAME) Emp# dept# dept# dname therefore emp# dname (transitively) Normalization 6. 21
Anomalies due to Transitive Dependencies EMPLOYEE EMP# ENAME DEPT# DNAME 10 SMITH D5 EDP 20 JONES D7 FINANCE 25 SMITH D7 FINANCE 30 BLACK D8 SALES UPDATE - change DNAME in many places DELETE - data for DEPT is lost when last EMP is deleted for DEPT INSERT - cannot create a new DEPT until an EMP starts for that DEPT Normalization 6. 22
Solution to 3NF Anomalies EMPLOYEE DELETE last EMP but DEPT still remains EMP# ENAME DEPT# 10 SMITH D5 20 JONES D7 25 SMITH D7 30 BLACK D8 DEPARTMENT DEPT# DNAME ADD new DEPT at any time D5 EDP D7 FINANCE UPDATE DNAME once D8 SALES Normalization 6. 23
A Simple Test for 3NF Each attribute should depend on : the key the whole key and nothing but the key (so help me CODD) Normalization 6. 24
Steps in Normalization
Example Problem Consider the poorly formed relation following. The HR department wishes to keep track of Employees, Departments, Jobs and Employee job assignments. The primary key of the relation is underlined. ASSIGNMENT(EMP-ID, JOB-CODE,DEPT-NO,EMP_NAME, JOB-DESCR, DATE_JOB_ASSIGNED,DEPT-DESC) It is known that EMP_ID functionally determines EMP-NAME and DEPT-NO, DEPT-NO functionally determines DEPT-DESC and that JOB_CODE functionally determines JOB_DESCR. The system also needs to keep track of the date on which a specific employee has been assigned to a specific job. An employee can be assigned to more than one job over time. Normalization 6. 26
The Question [1] In what normal form (if any) is the relation as it appears above? [2] Rewrite the above relation as a number of relations all of which are in third normal form. (It is not required to write down relations in 1st or 2nd normal form.) Normalization 6. 27
One Approach to Solving Draw a data structure diagram (DSD) that is a best guess as to the final relations Identify the primary key in each relation Make sure each attribute is functionally dependent on the primary key attribute(s) Check a foreign key is present (at the many end) if the relation is related to some other relation Scan the resulting DSD for any omitted relationships, any repeating groups, partial dependencies or transitive dependencies If relationships are present include those relationships. If repeating groups, partial dependencies or transitive dependencies are present break down the offending relation further Normalization 6. 28
An Answer It is in first normal form as there are no repeating groups. EMPLOYEE(EMP-ID,EMP_NAME,DEPT-NO) JOB(JOB-CODE,JOB-DESCR) ASSIGNMENT(EMP-ID, JOB-CODE, DATE_JOB_ASSIGNED) DEPARTMENT(DEPT-NO,DEPT-DESC) EMPLOYEE JOB DEPT ASSIGNMENT Normalization 6. 29