Presentation is loading. Please wait.

Presentation is loading. Please wait.

Schema Refinement and Normal Forms 20131CS3754 Class Notes #7, John Shieh.

Similar presentations


Presentation on theme: "Schema Refinement and Normal Forms 20131CS3754 Class Notes #7, John Shieh."— Presentation transcript:

1 Schema Refinement and Normal Forms 20131CS3754 Class Notes #7, John Shieh

2 2CS4753/2006F Normalization It is a process that we can use to remove design flaws from a database A number of normal forms, which are sets of rules describing what we should and should not do in our table structure 3NF is sufficient to avoid the data redundancy problem of a designed relational database

3 Problems caused by redundancy Redundant Storage –Some information is stored repeatedly. Update Anomalies –If one copy of such repeated data is updated, an inconsistency is created, unless all copies are similarly updated. Insertion anomalies –It may not be possible to store certain information unless some other, unrelated, information is stored. Deletion Anomalies –It may not be possible to delete certain information without losing some other, unrelated, information. 20133CS3754 Class Notes #7, John Shieh

4 Redundant Storage –The hourly wages depend on rating levels. So, for example, hourly wage 10 for rating level 8 is repeated three times. Update Anomalies –The hourly_wages in the first tuple could be updated without making a similar change in the second tuple. IdnamelotratingHourly_wagesHours_worked 123-22-3666Attishoo4881040 231-31-5368Smiley2281030 131-24-3650Smethurst355730 434-26-3751Guldu355732 612-67-4134Madayan3581040 20134CS3754 Class Notes #7, John Shieh

5 Insertion Anomalies –We cannot insert a tuple for an employee unless we know the hourly wage for the employee’s rating value. Deletion Anomalies –If we delete all tuples with a given rating value (e.g. tuples of Smethurst and Guldu) we lose the association between the rating value and its hourly_wage value. IdnamelotratingHourly_wagesHours_worked 123-22-3666Attishoo4881040 231-31-5368Smiley2281030 131-24-3650Smethurst355730 434-26-3751Guldu355732 612-67-4134Madayan3581040 20135CS3754 Class Notes #7, John Shieh

6 Decompositions Intuitively, redundancy arise when a relational schema forces an association between attributes that is not natural. Functional dependencies can be used to identify such situations and suggest refinements to the schema. The essential idea is that many problems arising from redundancy can be addressed by replacing a relation with a collection of ‘smaller’ relation. 20136CS3754 Class Notes #7, John Shieh

7 IdnamelotratingHourly_wagesHours_worked 123-22-3666Attishoo4881040 231-31-5368Smiley2281030 131-24-3650Smethurst355730 434-26-3751Guldu355732 612-67-4134Madayan3581040 IdnamelotratingHours_worked 123-22-3666Attishoo48840 231-31-5368Smiley22830 131-24-3650Smethurst35530 434-26-3751Guldu35532 612-67-4134Madayan35840 ratingHourly_wages 810 57 A decomposition of a relation schema R consists of replacing the relation schema by two (or more) relation schemas each of which contains a subset of attributes of R and together include all attributes in R Functional dependency: - rating determines Hourly_wages 20137CS3754 Class Notes #7, John Shieh

8 Functional Dependencies A functional dependency (FD) is a kind of IC that generalizes the concept of a key. Let R be a relation schema, and X and Y be sets of nonempty sets of attributes in R. –An FD X  Y exists, if in every relation instance for R, any two tuples that agree on the value of X also agree on the value of Y. –More formally Let R be a relation schema and let X and Y be nonempty sets of attributes in R. An FD X  Y exists in R if every instance of R preserves the FD X  Y. We say that an instance r of R preserves the FD X  Y if the following holds for every pair of tuples t1 and t2 in r If t1.X = t2.X, then t1.Y = t2.Y The notation t1.X refers to the subset of fields of tuple t1 for the attributes in X 20138CS3754 Class Notes #7, John Shieh

9 Examples: course_ID  course_name is preserved? {student_ID, course_ID}  course_name is preserved ? if no two rows agree on  value, then    is trivially preserved. yes Take 20139CS3754 Class Notes #7, John Shieh

10 The table instance also preserves the following student_ID  student_name Student_ID, course_ID  {student_name, course_name} student_ID, course_ID  {student_ID, student_name, course_ID, course_Name} student_name  student_name (a trivial dependency) student_name, course_name  student_name (also trivial) many more …. 201310CS3754 Class Notes #7, John Shieh

11 How do we know if a FD exist in R? Can we check all instances of R to see if the FD is preserved? –Definitely, not possible! –Whether or not a functional dependency exists must be determined by assumptions given in advance, or common sense, not by individual relation instances. Given an instance r of R, we can check if r preserves some functional dependency f, but we cannot tell if f holds over R. course_ID  student_name ? Although it is preserved by this table, it does not fit the assumption. no 201311CS3754 Class Notes #7, John Shieh

12 The assumptions given in advance, or common sense, impose some constraints, and are called the semantics of a database Assumptions given in advance impose explicit constraints; common sense imposes implicit constraints 201312CS3754 Class Notes #7, John Shieh

13 Example: Application is to keep track of information about employees in a company. Information to be kept track of includes: eid: employee ’ s id number ename: employee name address: address of the employee sex: employee ’ s sex dname: name of the department that the employee works for dhname: department head ’ s name dhsex: department head ’ s sex 201313CS3754 Class Notes #7, John Shieh

14 Let ’ s construct a relation schema as follows: Which of the following dependencies are true? 1.eid  ename 2.ename  eid 3.eid  address 4.eid  sex 5.sex  address 6.dhname  dname 7.dhname  eid 8.dhsex  sex Assumptions: a:Employee’s id number is unique b:Each employee has a unique address c:Each employee works for only one dept. d:A person can be the head of at most one department e:All department heads have different names Implicit: common sense         Employee eidenameaddresssexdnamedhnamedhsex 2013 14 CS3754 Class Notes #7, John Shieh

15  is a superkey for relation schema R iff   attri(R) where attri(R) denotes the set of all the attributes in schema R  is a candidate key (or simply, key) for R iff   attri(R), and  is minimal, i.e., for any   ,   attri(R) In other words, a candidate key is a minimal superkey (student_ID, course_ID) is a candidate key (and the only one) (student_ID, course_ID, course_name) is a superkey, but not a candidate key (student_ID, course_ID, student_name) is another non-candidate superkey (student_ID, course_ID, course_name, student_name) is also a non-candidate superkey 201315CS3754 Class Notes #7, John Shieh

16 1 st Normal FormNo repeating data groups 2 nd Normal FormNo partial key dependency 3 rd Normal FormNo transitive dependency Boyce-Codd Normal FormReduce keys dependency 4 th Normal FormNo multi-valued dependency 5 th Normal FormNo join dependency Normal Forms 201316CS3754 Class Notes #7, John Shieh

17 17CS4753/2006F Normal Form (NF) 1NF: each attribute or column value must be atomic 2NF: if a schema is 1NF, and if its all attributes that are not part of the primary key are fully functionally dependent on the primary key 3NF: if a schema is 2NF, and all transitive dependencies have been removed Ex: employeeDept(employeeID, name, job, deptID, deptName) has to convert to employee(employeeID, name, job, deptID) Dept(deptID, deptName)

18 18CS4753/2006F 2NF It means that each non-key attribute must be functionally dependent on all parts of the primary key (i.e., the combination of the composite attributes of the key). Example: not 2NF Employee(employeeID, name, job, departmentID, skill) employeeID, skill  name, job, departmentID employeeID  name, job, departmentID (Note:  determine) Break the table into two tables to become 2NF Employee(employeeID, name, job, departmentID) employeeSkills(employeeID, skill)

19 19CS4753/2006F 3NF Example: 2NF but not 3NF Employee(employeeID, name, job, departmentID, departmentName) Here employeeID  departmentID employeeID  departmentName Also departmentID  departmentName, departmentID is not a key Therefore, employeeID  departmentName is a transitive dependency Convert the schema to 3NF by breaking to two tables: Employee(employeeID, name, job, departmentID) Department(departmentID, departmentName)

20 20CS4753/2006F Normal Forms Defined Informally 1 st normal form –All attributes depend on the key 2 nd normal form –All attributes depend on the whole key 3 rd normal form –All attributes depend on nothing but the key

21 21CS4753/2006F SUMMARY OF NORMAL FORMS based on Primary Keys


Download ppt "Schema Refinement and Normal Forms 20131CS3754 Class Notes #7, John Shieh."

Similar presentations


Ads by Google