Normalization Sept. 2012ACS-3902 Yangjun Chen1 Outline: Normalization Chapter 14 – 3rd ed. (Chap. 10 – 4 th, 5 th ed.; Chap. 6, 6 th ed.) Redundant information.

Slides:



Advertisements
Similar presentations
Functional Dependencies and Normalization for Relational Databases
Advertisements

ALAK ROY. Assistant Professor Dept. of CSE NIT Agartala N ATIONAL I NSTITUTE OF T ECHNOLOGY A GARTALA Aug-Dec,2010 Normalization 2 CSE-503 :: D ATABASE.
NORMALIZATION. Normalization Normalization: The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Ch 10, Functional Dependencies and Normal forms
Functional Dependencies and Normalization for Relational Databases.
Murali Mani Normalization. Murali Mani What and Why Normalization? To remove potential redundancy in design Redundancy causes several anomalies: insert,
September 24, R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary.
Normalization Normalization We discuss four normal forms: first, second, third, and Boyce-Codd normal forms 1NF, 2NF, 3NF, and BCNF Normalization.
Functional Dependencies and Normalization for Relational Databases
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
METU Department of Computer Eng Ceng 302 Introduction to DBMS Functional Dependencies and Normalization for Relational Databases by Pinar Senkul resources:
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Databases 6: Normalization
Chapter 8 Normalization for Relational Databases Copyright © 2004 Pearson Education, Inc.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Chapter 10 Functional Dependencies and Normalization for Relational Databases Copyright © 2004 Pearson Education, Inc.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 10 Functional Dependencies and Normalization for Relational Databases.
CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization.
King Saud University College of Computer & Information Sciences Computer Science Department CS 380 Introduction to Database Systems Functional Dependencies.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Normalization for Relational Databases.
DatabaseIM ISU1 Chapter 10 Functional Dependencies and Normalization for RDBs Fundamentals of Database Systems.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Topic 10 Functional Dependencies and Normalization for Relational Databases Faculty of Information Science and Technology Mahanakorn University of Technology.
Instructor: Churee Techawut Functional Dependencies and Normalization for Relational Databases Chapter 4 CS (204)321 Database System I.
Functional Dependencies and Normalization for Relational Databases.
Chapter 10 Functional Dependencies and Normalization for Relational Databases Copyright © 2004 Pearson Education, Inc.
Chapter 10 Functional Dependencies and Normalization for Relational Databases Copyright © 2004 Pearson Education, Inc.
Chapter 10 Functional Dependencies and Normalization for Relational Databases Copyright © 2004 Pearson Education, Inc.
By Abdul Rashid Ahmad. E.F. Codd proposed three normal forms: The first, second, and third normal forms 1NF, 2NF and 3NF are based on the functional dependencies.
Chapter Functional Dependencies and Normalization for Relational Databases.
1 Functional Dependencies and Normalization Chapter 15.
Database: Review Sept. 2004Yangjun Chen Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Lecture 8: Database Concepts May 4, Outline From last lecture: creating views Normalization.
1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization.
Normalization Sept. 2014ACS-3902 Yangjun Chen1 Outline: Normalization Redundant information and update anomalies Function dependencies Normal forms -1NF,
Chapter 7 Functional Dependencies Copyright © 2004 Pearson Education, Inc.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
Riyadh Philanthropic Society For Science Prince Sultan College For Woman Dept. of Computer & Information Sciences CS 340 Introduction to Database Systems.
Ch 7: Normalization-Part 1
Al-Imam University Girls Education Center Collage of Computer Science 1 st Semester, 1432/1433H Chapter 10_part 1 Functional Dependencies and Normalization.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
11/06/97J-1 Principles of Relational Design Chapter 12.
Database Design Theory CS405G: Introduction to Database Systems Jinze Liu 3/15/20161.
Functional Dependencies and Normalization for Relational Databases 1 Chapter 15 تنبيه : شرائح العرض (Slides) هي وسيلة لتوضيح الدرس واداة من الادوات في.
Chapter 10 Functional Dependencies and Normalization for Relational Databases Copyright © 2004 Pearson Education, Inc.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Functional Dependencies and Normalization for Relational Databases تنبيه : شرائح العرض (Slides) هي وسيلة لتوضيح الدرس واداة من الادوات في ذلك. حيث المرجع.
10/3/2017.
COP 6726: New Directions in Database Systems
Functional Dependency and Normalization
Functional Dependencies and Normalization for Relational Databases
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Functional Dependencies and Normalization for RDBs
Normalization There is a sequence to normal forms:
Database Management systems Subject Code: 10CS54 Prepared By:
Normalization Murali Mani.
Normalization Normalization
Outline: Normalization
Normalization.
Normalization February 28, 2019 DB:Normalization.
Database Design Theory CS405G: Introduction to Database Systems
Database Normalization
Presentation transcript:

Normalization Sept. 2012ACS-3902 Yangjun Chen1 Outline: Normalization Chapter 14 – 3rd ed. (Chap. 10 – 4 th, 5 th ed.; Chap. 6, 6 th ed.) Redundant information and update anomalies Function dependencies Normal forms -1NF, 2NF, 3NF -BCNF

Normalization Sept. 2012ACS-3902 Yangjun Chen2 Reading: (3 rd edition) Redundant … update anomalies (3rd edition) Functional dependencies (3rd edition) Inference rules for FDs (3rd edition) Equivalence of sets of FDs (3rd edition) Minimal sets of FDs 14.3 (3rd edition) Normal forms based on PKs (Chapter 15, 6 th ed.; Chapter 10, 5 th ed.)

Normalization Sept. 2012ACS-3902 Yangjun Chen3 Motivation: Certain relation schemas have redundancy and update anomalies - they may be difficult to understand and maintain Normalization theory recognizes this and gives us some principles to guide our designs Normal Forms: 1NF, 2NF, 3NF, BCNF, 4NF, … are each an improvement on the previous ones in the list Normalization is a process that generates higher normal forms. Denormalization moves from higher to lower forms and might be applied for performance reasons.

Normalization Sept. 2012ACS-3902 Yangjun Chen4 Suppose we have the following relation EmployeeProject ssnpnumberhoursenameplocation This is similar to Works_on, but we have included ename and plocation

Normalization Sept. 2012ACS-3902 Yangjun Chen5 Suppose we have the following relation enamessnbdateaddress EmployeeDepartment dnumberdname This is similar to Employee, but we have included dname

Normalization Sept. 2012ACS-3902 Yangjun Chen6 In the two prior cases with EmployeeDepartment and EmployeeProject, we have redundant information in the database … if two employees work in the same department, then that department name is replicated if more than one employee works on a project then the project location is replicated if an employee works on more than one project his/her name is replicated Redundant data leads to additional space requirements update anomalies

Normalization Sept. 2012ACS-3902 Yangjun Chen7 Suppose EmployeeDepartment is the only relation where department name is recorded insert anomalies adding a new department is complicated unless there is also an employee for that department deletion anomalies if we delete all employees for some department, what should happen to the department information? modification anomalies if we change the name of a department, then we must change it in all tuples referring to that department

Normalization Sept. 2012ACS-3902 Yangjun Chen8 If we design a database with a relation such as EmployeeDepartment then we will have complex update rules to enforce. difficult to code correctly will not be as efficient as possible Such designs mix concepts. E.g. EmployeeDepartment mixes the Employee and Department concept

Normalization Sept. 2012ACS-3902 Yangjun Chen9 Section 14.2 Functional dependencies Suppose we have a relation R comprising attributes X,Y, … We say a functional dependency exists between the attributes X and Y, if, whenever a tuple exists with the value x for X, it will always have the same value y for Y. XY XY LHSRHS

Normalization Sept. 2012ACS-3902 Yangjun Chen10 student_nostudent_namecourse_nogender Student Given a specific student number, there is only one value for student name and only one value for gender found with it. Student_noStudent_name gender

Normalization Sept. 2012ACS-3902 Yangjun Chen11 We always have functional dependencies between any candidate key and the other attributes. student_nostudent_namestudent_addressgender Student student_no is unique … given a specific student_no there is only one student name, only one student address, only one gender Student_no  student_name, Student_no  student_address, Student_no  gender

Normalization Sept. 2012ACS-3902 Yangjun Chen12 enamessnbdateaddress Employee dnumber ssn is unique … given a specific ssn there is only one ename, only one bdate, etc ssn  ename, ssn  bdate, ssn  address, ssn  dnumber.

Normalization Sept. 2012ACS-3902 Yangjun Chen13 Suppose we have the following relation ssnpnumberhoursename EmployeeProject plocation This is similar to Works_on, but we have included ename, and we know that ename is functionally dependent on ssn. We have included plocation … functionally dependent on pnumber {ssn, pnumber}  hours, ssn  ename, pnumber  plocation.

Normalization Sept. 2012ACS-3902 Yangjun Chen14 Suppose we have the following relation enamessnbdateaddress EmployeeDept dnumberdname This is similar to Employee, but we have included dname, and we know that dname is functionally dependent on dnumber, as well as being functionally dependent on ssn. ssn  ename, ssn  bdate, ssn  address, ssn  dnumber, dnumber  dname. ssn  dname

Normalization Sept. 2012ACS-3902 Yangjun Chen15 Minimal sets of FDs every dependency has a single attribute on the RHS the attributes on the LHS of a dependency are minimal we cannot remove a dependency without losing information.

Normalization Sept. 2012ACS-3902 Yangjun Chen16 Inference Rules for Function Dependencies From a set of FDs, we can derive some other FDs Example: F = {ssn  {Ename  Bdate, Address, dnumber}, dnumber  {dname, dmgrssn}} ssn  dnumber, dnumber  dname. ssn  {dname, dmgrssn}, inference F + (closure of F): The set of all FDs that can be deduced from F (with F together) is called the closure of F.

Normalization Sept. 2012ACS-3902 Yangjun Chen17 Inference Rules for Function Dependencies Inference rules: - IR1 (reflexive rule): If X  Y, then X  Y. (X  X.) - IR2 (augmentation rule): {X  Y} |= ZX  Y. - IR3 (transitive rule): {X  Y, Y  Z} |= X . - IR4 (decomposition, or projective, rule): {X  Y} |= X  Y, X  Z. - IR5 (union, or additive, rule): {X  Y, Y  Z} |= X  Y. - IR6 (pseudotransitive rule): {X  Y, WY  Z} |= WX .

Normalization Sept. 2012ACS-3902 Yangjun Chen18 Equivalence of Sets of FDs E and F are equivalent if E + = F +. Minimal sets of FDs every dependency has a single attribute on the RHS the attributes on the LHS of a dependency are minimal we cannot remove any dependency from F and still have a set of dependencies that is equivalent to F. ssnpnumberhoursenameplocation {ssn, pnumber}  hours, ssn  ename, pnumber  plocation.

Normalization Sept. 2012ACS-3902 Yangjun Chen19 Normal Forms A series of normal forms are known that have, successively, better update characteristics. We’ll consider 1NF, 2NF, 3NF, and BCNF. A technique used to improve a relation is decomposition, where one relation is replaced by two or more relations. When we do so, we want to eliminate update anomalies without losing any information.

Normalization Sept. 2012ACS-3902 Yangjun Chen20 1NF - First Normal Form The domain of an attribute must only contain atomic values. This disallows repeating values, sets of values, relations within relations, nested relations, … In the example database we have a department located in possibly several locations: department 5 is located in Bellaire, Sugarland, and Houston. If we had the relation then it would not be 1NF because there are multiple values to be kept in dlocations. Department dnumberdnamedmgrssndlocations 5Research Bellaire, Sugarland, Houston

Normalization Sept. 2012ACS-3902 Yangjun Chen21 1NF - First Normal Form If we have a non-1NF relation we can decompose it, or modify it appropriately, to generate 1NF relations. There are 3 options: option 1: split off the problem attribute into a new relation (create a DepartmentLocation relation). dnumberdnamedmgrssndlocation Department dnumber DepartmentLocation 5Research Bellaire5 5Sugarland 5Houston Generally considered the best solution

Normalization Sept. 2012ACS-3902 Yangjun Chen22 1NF - First Normal Form option 2: store just one value in the problem attribute, but create additional rows so that the other values can be stored too (department 5 would have 3 rows) dnumberdnamedmgrssndlocation Department 5Research Bellaire 5Research Sugarland 5Research Houston Redundancy is introduced! (not in 2NF) Dlocation becomes part of PK

Normalization Sept. 2012ACS-3902 Yangjun Chen23 1NF - First Normal Form option 3: if a maximum number of values is known, then create additional attributes so that the maximum number of values can be stored. (each location attribute would hold one location only) dnumberdnamedmgrssndloc1 Department dloc2dloc3 5Research BellaireSugarlandHouston

Normalization Sept. 2012ACS-3902 Yangjun Chen24 2NF - Second Normal Form full functional dependency X  Y is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more. ssnpnumberhoursenameplocation EmployeeProject {ssn, pnumber}  hours is a full dependency (neither ssn  hours, nor pnumber  hours).

Normalization Sept. 2012ACS-3902 Yangjun Chen25 2NF - Second Normal Form partial functional dependency X  Y is a partial functional dependency if removal of some attribute A from X does not affect the dependency. {ssn, pnumber}  ename is a partial dependency because ssn  ename holds.) ssnpnumberhoursenameplocation EmployeeProject

Normalization Sept. 2012ACS-3902 Yangjun Chen26 2NF - Second Normal Form A relation schema is in 2NF if (1) it is in 1NF and (2) every non-key attribute must be fully functionally dependent on the primary key. If we had the relation EmployeeProject ssnpnumberhoursenameplocation then this relation would not be 2NF because of two separate violations of the 2NF definition:

Normalization Sept. 2012ACS-3902 Yangjun Chen27 ename is functionally dependent on ssn, and plocation is functionally dependent on pnumber ename is not fully functionally dependent on ssn and pnumber and plocation is not fully functionally dependent on ssn and pnumber. {ssn, pnumber} is the primary key of EmployeeProject.

Normalization Sept. 2012ACS-3902 Yangjun Chen28 2NF - Second Normal Form We correct this by decomposing the relation into three relations - splitting off the offending attributes - splitting off partial dependencies on the key. ssnpnumberhoursenameplocation EmployeeProject ssnpnumberhours ename plocation ssn pnumber 2NF

Normalization Sept. 2012ACS-3902 Yangjun Chen29 3NF - Third Normal Form Transitive dependency A functional dependency X  Y in a relation schema R is a transitive dependency if there is a set of attributes Z that is not a subset of any key of R, and both X  Z and Z  Y hold. enamessnbdateaddress EmployeeDept dnumberdname ssn  dnumber and dnumber  dname

Normalization Sept. 2012ACS-3902 Yangjun Chen30 3NF - Third Normal Form A relation schema is in 3NF if (1) it is in 2NF and (2) each non-key attribute must not be fully functionally dependent on another non-key attribute (there must be no transitive dependency of a non-key attribute on the PK) If we had the relation enamessnbdateaddressdnumberdname then this relation would not be 3NF because dname is functionally dependent on dnumber and neither is a key attribute

Normalization Sept. 2012ACS-3902 Yangjun Chen31 3NF - Third Normal Form We correct this by decomposing - splitting off the transitive dependencies enamessnbdateaddress EmployeeDept dnumberdname enamessnbdateaddressdnumber dnamednumber 3NF

Normalization Sept. 2012ACS-3902 Yangjun Chen32 inv_noline_noprod_noprod_desccust_noqty Consider: What normal form is it in? What relations will decomposition result in? {inv_no, line_no}  prod_no, {inv_no, line_no}  prod_desc, {inv_no, line_no}  cust_no, {inv_no, line_no}  qty, inv_no  cust_no, prod_no  prod_desc

Normalization Sept. 2012ACS-3902 Yangjun Chen33 inv_noline_noprod_noprod_desccust_noqty Change it into 2NF: cust_noinv_no line_noprod_noprod_descqty 2NF

Normalization Sept. 2012ACS-3902 Yangjun Chen34 cust_noinv_no line_noprod_noprod_descqty 2NF Change it into 3NF: cust_noinv_no line_noprod_noqty 3NF prod_descprod_no

Normalization Sept. 2012ACS-3902 Yangjun Chen35 cust_nonamehouse_nostreetcityprovpostal_code Consider:

Normalization Sept. 2012ACS-3902 Yangjun Chen36 cust_nonamehouse_nopostal_code provcitystreet

Normalization Sept. 2012ACS-3902 Yangjun Chen37 Boyce Codd Normal Form, BCNF Consider a different definition of 3NF, which is equivalent to the previous one. A relation schema R is in 3NF if, whenever a function dependency X  A holds in R, either (a)X is a superkey of R, or (b)A is a prime attribute of R. A superkey of a relation schema R = {A1, A2,..., An} is a set of attributes S  R with the propertity that no tuples t1 and t2 in any legal state r of R will have t1[S] = t2[S]. An attribute is called a prime attribute if it is a member of any key.

Normalization Sept. 2012ACS-3902 Yangjun Chen38 Boyce Codd Normal Form, BCNF If we remove (b) from the previous definition for 3NF, we have the definition for BCNF. A relation schema is in BCNF if every determinant is a superkey key. Stronger than 3NF: - no partial dependencies - no transitive dependencies where a non-key attribute is dependent on another non-key attribute - no non-key attributes appear in the LHS of a functional dependency.

Normalization Sept. 2012ACS-3902 Yangjun Chen39 Boyce Codd Normal Form, BCNF Consider: student_nocourse_noinstr_no Instructor teaches one course only. Student takes a course and has one instructor. In 3NF! {student_no, course_no}  instr_no instr_no  course_no

Normalization Sept. 2012ACS-3902 Yangjun Chen40 Boyce Codd Normal Form, BCNF Some sample data: student_nocourse_noinstr_no Instructor 99 teaches 1803 Instructor 77 teaches 1903 Instructor 66 teaches 1803

Normalization Sept. 2012ACS-3902 Yangjun Chen41 Boyce Codd Normal Form, BCNF student_nocourse_noinstr_no Instructor 99 teaches 1803 Instructor 77 teaches 1903 Instructor 66 teaches 1803 Deletion anomaly: If we delete all rows for course 1803 we’ll lose the information that instructors 99 teaches student 121 and 66 teaches student 222. Insertion anomaly: How do we add the fact that instructor 55 teaches course 2906?

Normalization Sept. 2012ACS-3902 Yangjun Chen42 Boyce Codd Normal Form, BCNF How do we decompose this to remove the redundancies? - without losing information? student_nocourse_noinstr_no ? ? ? course_noinstr_no student_nocourse_no instr_no course_noinstr_no student_noinstr_no student_nocourse_no student_no Note that these decompositions do lose one of the FDs.

Normalization Sept. 2012ACS-3902 Yangjun Chen43 course_noinstr_no student_nocourse_no instr_no ? course_noinstr_no student_noinstr_no student_nocourse_no student_no Joining these two tables leads to spurious tuples - result includes Boyce Codd Normal Form, BCNF Which decomposition preserves all the information? S#C# I#

Normalization Sept. 2012ACS-3902 Yangjun Chen44 student_nocourse_noinstr_no course_noinstr_nostudent_nocourse_no S#C# C#I# 

Normalization Sept. 2012ACS-3902 Yangjun Chen45 course_noinstr_no student_nocourse_no instr_no ? course_noinstr_no student_noinstr_no student_nocourse_no student_no Joining these two tables leads to spurious tuples - result includes S#C#I#S# Boyce Codd Normal Form, BCNF Which decomposition preserves all the information?

Normalization Sept. 2012ACS-3902 Yangjun Chen46 student_nocourse_noinstr_no student_noinstr_nostudent_nocourse_no  S#C# I#S#

Normalization Sept. 2012ACS-3902 Yangjun Chen47 course_noinstr_no student_nocourse_no instr_no ? course_noinstr_no student_noinstr_no student_nocourse_no student_no Joining these two tables leads to no spurious tuples - result is: S#C#I# Boyce Codd Normal Form, BCNF Which decomposition preserves all the information?

Normalization Sept. 2012ACS-3902 Yangjun Chen48 Boyce Codd Normal Form, BCNF This decomposition preserves all the information. course_noinstr_no student_noinstr_no S#C#I# Only FD is instr_no course_no but the join preserves {student_no, course_no} instr_no

Normalization Sept. 2012ACS-3902 Yangjun Chen49 student_nocourse_noinstr_no course_noinstr_nostudent_noInstr_no S#I# C#I#

Normalization Sept. 2012ACS-3902 Yangjun Chen50 Boyce Codd Normal Form, BCNF A relation schema is in BCNF if every determinant is a candidate key.

Normalization Sept. 2012ACS-3902 Yangjun Chen51 Boyce Codd Normal Form, BCNF ABC A BC C In 3NF Not in BCNF In BCNF Lossless decomposition pattern:Given: But this could be where a database designer may decide to go with: ABC BC Functional dependencies are preserved There is some redundancy Delete anomaly is avoided

Normalization Sept. 2012ACS-3902 Yangjun Chen52 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary key, the whole primary key, and nothing but the primary key” Database designers are always looking for databases that are as simple as possible - ones that are easiest to keep consistent, ones where the semantics are clear.