1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2008 Advanced Normalization These slides are.

Slides:



Advertisements
Similar presentations
Functional Dependencies and Normalization for Relational Databases
Advertisements

primary key constraint foreign key constraint
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
NORMALIZATION. Normalization Normalization: The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =
Normalisation The theory of Relational Database Design.
Ch 10, Functional Dependencies and Normal forms
Functional Dependencies and Normalization for Relational Databases.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Extended SQL & The Relational Calculus.
The Relational Model System Development Life Cycle Normalisation
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Databases 6: Normalization
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Week 6 Lecture Normalization
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen M:N Relationships & Bridge Classes These.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Relationship Classes & N-ary Relationships.
IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Basic Normal Forms These slides are licensed.
King Saud University College of Computer & Information Sciences Computer Science Department CS 380 Introduction to Database Systems Functional Dependencies.
Lecture 6 Normalization: Advanced forms. Objectives How inference rules can identify a set of all functional dependencies for a relation. How Inference.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Instructor: Churee Techawut Functional Dependencies and Normalization for Relational Databases Chapter 4 CS (204)321 Database System I.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen :1 Relationships These slides are licensed.
Functional Dependencies and Normalization for Relational Databases.
CSE314 Database Systems Basics of Functional Dependencies and Normalization for Relational Databases Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E.
11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003.
COMP1212 COMP1212 Anomalies and Dependencies Dr. Mabruk Ali.
1 Functional Dependencies and Normalization Chapter 15.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Relational Mapping with Constraints &
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Relational State Constraints These slides.
Lecture 8: Database Concepts May 4, Outline From last lecture: creating views Normalization.
1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization.
Design Process - Where are we?
Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Advanced Relational Algebra These slides.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Subqueries These slides are licensed under.
Normalization.
Chapter 7 Functional Dependencies Copyright © 2004 Pearson Education, Inc.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Relational State Assertions These slides.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Collection Operators These slides are.
Riyadh Philanthropic Society For Science Prince Sultan College For Woman Dept. of Computer & Information Sciences CS 340 Introduction to Database Systems.
CS 338Database Design and Normal Forms9-1 Database Design and Normal Forms Lecture Topics Measuring the quality of a schema Schema design with normalization.
Ch 7: Normalization-Part 1
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Collection Operators These slides are.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Inner Joins These slides are licensed.
Relational Database Design Algorithms and Further Dependencies.
Al-Imam University Girls Education Center Collage of Computer Science 1 st Semester, 1432/1433H Chapter 10_part 1 Functional Dependencies and Normalization.
Chapter 10 Functional Dependencies and Normalization for Relational Databases Copyright © 2004 Pearson Education, Inc.
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
1 Normalization David J. Stucki. Outline Informal Design Guidelines Normal Forms  1NF  2NF  3NF  BCNF  4NF 2.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Functional Dependencies & Normalization.
10/3/2017.
Relational Normalization Theory
Functional Dependency and Normalization
Functional Dependencies and Normalization for Relational Databases
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Database Management systems Subject Code: 10CS54 Prepared By:
Module 5: Overview of Normalization
Normalization.
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Copyright © Ellis Cohen Consistency & Initialization
NORMALIZATION FIRST NORMAL FORM (1NF):
Copyright © Ellis Cohen
Chapter 7a: Overview of Database Design -- Normalization
Presentation transcript:

1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Advanced Normalization These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. For more information on how you may use them, please see

2 © Ellis Cohen Overview of Lecture Lossless & Dependency-Preserving Decomposition Boyce Codd Normal Form Standard Decomposition of Non-Symmetric BCNF Violations Decomposition of Non-Symmetric BCNF Violations Using Inclusion Constraints Conceptual BCNF Resolution 4th Normal Form & Multivalued Dependencies Pseudo-4NF Violations 5th Normal Form Eliminating Assertions with Referential Integrity

3 © Ellis Cohen Lossless & Dependency- Preserving Decomposition

4 © Ellis Cohen Resolving a Pure 3NF Violation Emps new empno ename addr deptno Depts deptno dname Emps old empno ename address deptno dname deptno  dname

5 © Ellis Cohen Lossless Decompositions When you decompose a relation into two or more relations, the result of joining them should be equivalent to the original relation. Is Emps old = Emps new |X| Depts ? (SELECT * FROM Emps old ) = (SELECT * FROM Emps new NATURAL JOIN Depts) ?

6 © Ellis Cohen NULLs & Normalization If Emps old had any NULL deptno's, then Emps old ≠ Emps new |X| Depts (although Emps old = Emps new :X| Depts) In general, NULLs complicate Functional Dependencies and Normalization We will assume that relations do NOT have NULLs. In other words, assume that all NULLs have been turned into domain-specific special values (e.g. -1 for deptno, ' ' for dname) Natural Left Join

7 © Ellis Cohen Avoid Decompositions that Lose FD's Emps( empno, deptno, dname ) empno  deptno, dname  deptno  dname RESEARCH SALES SALES ACCOUNTING empno deptno dname Emps empno deptno Emps1 7369RESEARCH 7499SALES 7521SALES 7839ACCOUNTING empno dname Emps2

8 © Ellis Cohen Dependency Preserving Decompositions Consider decomposition of Emps into Emps1: (empno, deptno) empno  deptno Emps2: (empno, dname) empno  dname It is a lossless decomposition You can get original Emps back by Emps1 |X| Emps2 BUT it is not dependency-preserving: you can no longer directly infer FD: deptno  dname –Still update anomalies –Still redundancies, but they are hidden, because spread across new Emps1 & Emps2

9 © Ellis Cohen Enforcing Lost Dependencies After decomposing Emps into Emps1 and Emps2 if we still want to enforce deptno  dname we would have to do so by enforcing a state assertion (SELECT count(DISTINCT dname) FROM (Emps1 NATURAL JOIN Emps2) GROUP BY deptno) ALL = 1

10 © Ellis Cohen Combine Corresponding Tables To help ensure that FD's are made visible: If two tables –both have a unique attribute (or set of attributes) with the same meaning –the same set of values for those attribute then, combine the tables Both Emps1 and Emps2 have an empno attribute, with the same meaning, and covering the same set of values, so combine Emps1 & Emps2  Emps

11 © Ellis Cohen Normalization & Decomposition Goals of Relational Normalization Eliminate redundancy Enable conceptual constraints (e.g. FD's) to be enforced by the database directly (e.g. uniqueness constraints & foreign keys) –Any remaining FD's must be enforced as state assertions (which is more expensive) Make sure decompositions are Lossless Dependency Preserving

12 © Ellis Cohen Boyce Codd Normal Form

13 © Ellis Cohen BCNF: Boyce Codd Normal Form A table is in 3NF if all the non-prime attributes are minimally determined by a candidate key A table is in BCNF if all attributes are minimally determined by a candidate key

14 © Ellis Cohen NF/3NF/BCNF Diagrams prime attributes non-prime attributes A BX 2NF Partial Key Violation 3NF Transitivity Violation BCNF Violation B A X C X X BCA Pure Mixed BA

15 © Ellis Cohen Symmetric BCNF Violation divnam dname divid prime attributesnon-prime attributes Candidate Key loc Depts  divid  divnam divnam  divid Department names are unique within a division Divisions are uniquely identified by either their name (divnam) or id (divid)

16 © Ellis Cohen Resolving Symmetric BCNF Violation GGeorgia PacificACCOUNTINGAtlanta GGeorgia PacificSALESBirmingham GGeorgia PacificOPERATIONSAtlanta SSouthern RailACCOUNTINGCharlotte divid divnam dname loc GACCOUNTINGAtlanta GSALESBirmingham GOPERATIONSAtlanta SACCOUNTINGCharlotte divid dname loc divid divnam GGeorgia Pacific SSouthern Rail divid + dname and divnam + dname are key candidates divid  divname & divnam  divid candidate key non-prime

17 © Ellis Cohen Symmetric BCNF Normalization Divisions divid divnam Depts new divid dname loc Depts old divid divnam dname loc Normalize by 1) Create a new relation with the determinant and everything dependent upon it 2) Remove everything dependent upon the determinant from the original relation 3) Add foreign key constraint, possibly w cascading delete What's the corresponding conceptual normalization? divid  divnam

18 © Ellis Cohen Conceptual Symmetric BCNF Normalization Dept divid divnam dname loc DivisionDept part of divid divnam dname loc Resolve exactly like 2NF Violations divid  divnam

19 © Ellis Cohen Standard Decomposition of Non-Symmetric BCNF Violations

20 © Ellis Cohen Problem of Non-Symmetric BCNF C B A prime attributesnon-prime attributes D The usual approach to resolving NF violations when used with BCNF violations leads to a decomposition that is NOT dependency-preserving

21 © Ellis Cohen Non-Symmetric BCNF Example Each project has mentors who work only with that project Each employee assigned to a project (for a specified # of hrs per week) is also assigned one of the project mentors Assignment( empno, ename, pno, pname, mentid, mname, hrs ) Candidate Keys empno + pno empno + mentid (since mentid  pno) Prime Attributes empno, pno, mentid FD's empno  ename pno  pname empno + pno  hrs, mentid mentid  mname, pno Non-symmetric BCNF violation since mentid  pno, but NOT( pno  mentid )

22 © Ellis Cohen Non-Symmetric BCNF Violation prime attributesnon-prime attributes mentid empno pno Candidate Key hrs Assignment

23 © Ellis Cohen Resolve BCNF Violation Mentor mentid mname pno pname Assigment empno ename hrs Mentors mentid mname pno pname Assignments mentid empno ename hrs Next resolve empno  ename and pno  pname Resolve mentid  pno Corresponding Conceptual Model

24 © Ellis Cohen pno pname BCNF & Dependency Preservation Mentor mentid mname hrs Mentors mentid mname pno Assigns mentid empno hrs Employee empno ename Project pno pname Emps empno ename Projects Everything is in BCNF & all decompositions are lossless, but empno + pno  mentid never got resolved, and can no longer be inferred! It must be added as an assertion! empno  ename mentid  mname pno  pname empno + mentid  hrs mentid  pno Assignment

25 © Ellis Cohen pno pname FD Violation Mentors mentid mname pno Assigns mentid empno hrs Emps empno ename Projects Possible violation of empno + pno  mentid mentidempnohrs mentidmnamepno 1111JONES SMITH32 Employee 4567 works with mentor 1111, assigned to proj 32 Employee 4567 works with mentor 2222, assigned to proj 32 So, 4567 works on proj 32, with mentors 1111 & 2222 OOPS!

26 © Ellis Cohen pno pname Required FD Assertion Mentors mentid mname pno Assigns mentid empno hrs Emps empno ename Projects empno + pno  mentid (SELECT count(DISTINCT mentid) FROM (Assigns NATURAL JOIN Mentors) GROUP BY empno, pno) ALL = 1 How many mentors can have the same empno & pno? The answer must always be one!

27 © Ellis Cohen Decomposing BCNF Violations Using Inclusion Constraints

28 © Ellis Cohen How to Tame BCNF Violations Mentors mentid mname pno ! pname Assignments mentid empno pno ename hrs Resolve mentid  pno with both empno + mentid and empno + pno as candidate keys Assignments mentid empno pno pname ename mname hrs With BCNF decomposition, pno remains in the original table, but it is constrained by an inclusion constraint. This keeps the decomposition dependency-preserving, while "taming" the mentid  pno FD still left in Assignments empno + pno  mentid mentid  pno

29 © Ellis Cohen Inclusion Constraints Mentors mentid mname pno ! pname Assignments mentid empno pno ename hrs This requires declaring mentid + pno in Mentors as unique. That may seem unnecessary, since mentid is already a PK. However, it forces the DB to maintain an index on mentid + pno, so the foreign key reference can check that every pair of mentid + pno's in Assigns is also in Mentors empno + pno  mentid mentid  pno Even though the mentid  pno FD is still in Assignments, it is enforced by the inclusion constraint with Mentors, so no state assertion is needed

30 © Ellis Cohen SQL for BCNF Decomposition CREATE TABLE Mentors( mentidnumber(3) primary key, mnamevarchar(20), pnonumber(4) not null, pnamevarchar(20), unique( mentid, pno ) ) CREATE TABLE Assignments( mentidnumber(3), empnonumber(4), pnonumber(4), enamevarchar(30), hrsnumber(5,1), primary key( empno, pno ), foreign key( mentid, pno ) references Mentors( mentid, pno ) );

31 © Ellis Cohen Fully Normalized Relational Model pno pname Mentors mentid mname pno ! Assigns mentid empno pno hrs Emps empno ename Projects Mentors mentid mname pno ! pname Assignments mentid empno pno ename hrs

32 © Ellis Cohen Conceptual BCNF Resolution

33 © Ellis Cohen pno pname 2NF/3NF Relational Normalization Mentors mentid mname Assigns pno empno hrs mentid Emps empno ename Projects mentid  mname pno  pname mentid  pno Still violates BCNF & would need to be enforced by an assertion empno + pno  mentid empno  ename empno + pno  hrs Suppose we resolve everything EXCEPT mentid  pno

34 © Ellis Cohen NF/3NF Conceptual Normalization Employee empno ename Project mentid mname Mentor pno pname mentid  mname pno  pname empno + pno  mentid empno  ename empno + pno  hrs Uniqueness Constraints Key Constraints hrs Assignment Corresponding conceptual model mentid  pno BCNF violation hidden in relational mapping of Assignment

35 © Ellis Cohen Modeling Considerations Consider the most natural conceptual model corresponding to enforcement of mentid  pno This represents a key constraint between Mentor and Project -- i.e. every mentor is associated with a single project Since there is no symmetric constraint, we can assume that a project can have multiple mentors. This implies a 1:M relationship between Mentor and Project, which we will add to the conceptual model

36 © Ellis Cohen Natural Conceptual Model empno ename mentid mname Mentor pno pname mentid  mname pno  pname empno + pno  mentid empno  ename empno + pno  hrs mentid  pno What the relational model? "Natural" because all FD's derivable from key & uniqueness constraints EmployeeProject Assignment

37 © Ellis Cohen pno pname Relational Mapping of Natural Conceptual Model Mentors mentid mname pno ! Assigns pno empno hrs mentid Emps empno ename Projects  (SELECT DISTINCT pno, mentid FROM Assigns)  (SELECT pno, mentid FROM Mentors) But this has added redundancy, and now an assertion is needed to ensure that each pair of mentid/pno in Assigns is also in Mentors. This is exactly the inclusion dependency

38 © Ellis Cohen Factored Conceptual Model This corresponds to adding the conceptual state constraint: A mentor can mentor many assignments, but they are all for the mentor's project, represented by factoring the mentored by relationship empno ename mentid mname Mentor pno pname EmployeeProject Assignment mentored by (by Project)

39 © Ellis Cohen pno pname Relational Mapping of Factoring Mentors mentid mname pno ! Assigns empno hrs pno mentid Emps empno ename Projects The factored relationship is modeled by an inclusion constraint on mentid + pno, which directly enforces the inclusion dependency expressed by the factored relationship Redundant except for cascading delete! Implements factored relationship

40 © Ellis Cohen Eliminating Redundancy pno pname Mentors mentid mname pno ! Assigns mentid empno pno hrs Emps empno ename Projects pno pname Mentors mentid mname pno ! Assigns empno hrs pno mentid Emps empno ename Projects Results in same model as when using BCNF normalization rule Eliminate cascading delete and just add it as an explicit business rule

41 © Ellis Cohen th Normal Form & Multivalued Dependencies

42 © Ellis Cohen NF: 4th Normal Form 4NF (4 th Normal Form) violations can occur when designers (trying to use one less table) put two independent 1:M or M:N relationships into a single table

43 © Ellis Cohen Actual Relations empno pno empno sklid MSExcel 7369MSWord 7369Emacs 2169MSWord (*) Employee assigned to (*) Project (*) Employee have (*) Skill Suppose a designer mistakenly put these into a single table maintaining one or none of the tables above EmpProjs EmpSkills

44 © Ellis Cohen Join Produces 4NF Violation MSExcel MSWord Emacs MSExcel MSWord Emacs MSWord empno pno sklid Notice that the original EmpProjs table is just EPS{ empno, pno ! }, the original EmpSkills table is just EPS{ empno, sklid ! }, and EPS = EmpsProjs |X| EmpSkills There is redundancy in this table, though it's a bit harder to see than with 2NF/3NF/BCNF violations. The remedy is the same though: decompose to get the original tables back! EPS Note the Deletion Anomaly: If an employee is decertified from all of their skills, we lose track of the projects to which they are assigned!

45 © Ellis Cohen Asserting Redundancy MSExcel MSWord Emacs MSExcel MSWord Emacs MSWord empno pno sklid EPS{ empno, pno ! } |X| EPS{ empno, sklid ! } = EPS If we were just given this table, what assertion expresses the redundancy? EPS This can also be written as the Multivalued Dependencies empno  pno and empno  sklid

46 © Ellis Cohen MVD Assertion empno  pno (or empno  sklid) EPS{ empno, pno ! } |X| EPS{ empno, sklid ! } = EPS SELECT * FROM (SELECT DISTINCT empno, pno FROM EPS) NATURAL JOIN (SELECT DISTINCT empno, sklid FROM EPS) = EPS

47 © Ellis Cohen Multivalued Dependencies (MVDs) Writing that a relation R has a multivalued dependency A  B (where both A and B could be individual attributes or sets of attributes) means that If you decomposed R into two tables, –one with the attributes A + B –one with the attributes R – B and joined then together, you would get R back, that is, you'd have a lossless decomposition. Formally (in REAL) R = R{ A, B ! } |X| R{  B ! }

48 © Ellis Cohen MVDs & FDs Given that decompositions based on FDs are all lossless, it should be no surprise that one can prove that If A  B Then A  B In other words, FDs are a special case of MVDs

49 © Ellis Cohen MVD's and 4NF A relation R is in 4NF if it has no MVD A  B where A is part of a candidate key of R B is a proper subset of the attributes in R excluding those in A

50 © Ellis Cohen Ternary Relationships can use SkillProject Employee A 4NF violation arises if a designer mistakenly creates a single table for empno, pno and sklid that's equivalent to conceptually treating the relationship between Skill, Project & Employee as a ternary relationship empno pnosklid

51 © Ellis Cohen Separate Relationships BUT: The problem is that there are actually two separate relationships SkillProject Employee empno pnosklid Note: In other cases, these could be 1:M relationships

52 © Ellis Cohen Pseudo-4NF Violations

53 © Ellis Cohen Pseudo-4NF Violations A pseudo-4NF violation describes other ways that designers attempt to use a single table to represent two independent relationships. They are ugly because They are "non-relational" They can be decomposed loss-lessly, but not based on joins They are fragile because they are relatively hard to maintain However, they don't exhibit the deletion anomalies of real 4NF violations

54 © Ellis Cohen Union Pseudo-4NF Violation MSExcel 7369MSWord 7369Emacs MSWord empno pno sklid Either don't specify a primary key or use some special value other than null (since null is not allowed as part of a primary key) (*) Employee assigned to (*) Project (*) Employee have (*) Skill Formed by a union of the EmpProjs and EmpSkills tables EPS

55 © Ellis Cohen Union Decomposition Assertion EPS[pno IS NOT NULL]{ empno, pno } U EPS[skills IS NOT NULL] { empno, sklid } = EPS SELECT empno, pno, NULL AS sklid FROM EPS WHERE pno IS NOT NULL UNION SELECT empno, NULL as pno, sklid FROM EPS WHERE sklid IS NOT NULL = EPS

56 © Ellis Cohen Mixed Pseudo-4NF Violation MSExcel MSWord 7369Emacs MSWord empno pno sklid (*) Employee assigned to (*) Project (*) Employee have (*) Skill Can also resolve by decomposing to represent each relationship in a separate table Either don't specify a primary key or use some special value other than null (since null is not allowed as part of a primary key)

57 © Ellis Cohen Mixed Decomposition Assertion CREATE TABLE NEPS AS SELECT empno, pno, sklid, rownum AS n FROM EPS CREATE TABLE NEmpProjs AS SELECT empno, pno, n – (SELECT min(n) FROM NEPS WHERE empno = ne.empno) AS n FROM NEPS ne WHERE pno IS NOT NULL CREATE TABLE NEmpSkills AS SELECT empno, sklid, n – (SELECT min(n) FROM NEPS WHERE empno = ne.empno) AS n FROM NEPS ne WHERE sklid IS NOT NULL (SELECT empno, pno, sklid FROM NEmpProjs NATURAL FULL JOIN NEmpSkills) = EPS

58 © Ellis Cohen th Normal Form also called PJNF: Project Join Normal Form

59 © Ellis Cohen NF: 5th Normal Form A 5NF (5 th Normal Form) violation describes any relation that could be decomposed into 2 or more relations, where the relations do not all have the same set of candidate keys, and the relations are joined together to get back the original relation These include the 4NF violations, but more complicated ones as well

60 © Ellis Cohen Actual Relations empno pno empno sklid 7049MSExcel 7049MSWord 7049MSproject 7369MSExcel 7369MSWord 7369Emacs 7411MSWord 7411Emacs pno sklid 47369MSExcel 47369MSWord 47369MSProject 20241MSExcel 20241Emacs 20241MSProject (*) Employees assigned to (*) Projects (*) Employees have (*) Skills (*) Projects requires (*) Skills Suppose a designer mistakenly put these into a single table maintaining some (but not all) of the tables above

61 © Ellis Cohen Join Produces a 5NF Violation MSExcel MSWord MSProject MSExcel MSWord MSExcel Emacs Emacs empno pno sklid This table is not in 5NF because it could be decomposed into the tables on the previous page There is redundancy in this table, but it is much harder to see than the 4NF violation Table has a tuple for every possible combination in which an employee is assigned to a project, a project needs a skill & an employee has that skill

62 © Ellis Cohen Ternary Relationships A 5NF violation arises because the designer mistakenly creates a single table for empno, pno and skill – equivalent to conceptually treating the relationship between Skill, Project & Employee as a ternary relationship can use SkillProject Employee empno pnosklid

63 © Ellis Cohen Separate Relationships BUT: The problem is that there are actually three separate relationships SkillProject Employee empno pnosklid assigned to needed for available from A tuple is in the original table whenever an employee is assigned to a project, has a skill, and the project needs the skill. It is NOT a subset of these tuples (i.e. the skills that an employee actually uses on a project!)

64 © Ellis Cohen How 4NF/5NF Violations Arise 4NF/5NF Violations can occur because of bad original design a table originally tracked only a subset of the tuples in the join, but that is no longer the case an attribute that applied to the combination of all the key values was removed without decomposing the tables

65 © Ellis Cohen Unused Tuple Selectivity MSExcel MSWord MSProject MSExcel MSWord MSExcel Emacs Emacs empno pno sklid But project managers always selected all the employee's skills they might possibly need on the project, so all the tuples in the join were included. Even after this was realized, the table was not decomposed, even though it could have been. The original application design had an EmpSkills tables, which kept track of each employee's skills When adding a new person to their project, it allowed a project manager to specify (based on EmpSkills) the skills they would use for each person assigned to their project. This required a ternary table, so no separate empno/pno assignment table was created.

66 © Ellis Cohen Unused Attribute Columns MSExcel MSWord MSProject MSExcel MSWord MSExcel Emacs Emacs8 empno pno sklid effectiveness The project managers stopped using the effectiveness ratings, so the UI and attribute for it was eliminated, but the table was never decomposed. The original application design required project managers to rate the effectiveness of each skill of each person assigned to their project (NULL if they hadn't used the skill yet on the project). This required a ternary table, so no separate empno/pno assignment table was created.

67 © Ellis Cohen Eliminating Assertions with Referential Integrity

68 © Ellis Cohen Assertions  Built-in Constraints Each of the preceding normalization techniques used normalization to eliminate redundancy Redundancy causes anomalies or requires an assertion to be enforced (e.g. deptno  dname) Normalization eliminates the need for an assertion; enforcement is provided instead by foreign key constraints.

69 © Ellis Cohen Assertions & Foreign Keys Assertions can more generally be eliminated (or simplified) by using foreign keys Example All project managers are dept managers EACH Emps e WHERE ( SOME Projs p SATISFIES e.empno = p.pmgr) SATISFIES e.job = 'DEPTMGR' empno ename job hiredate sal comm deptno Projs pno pname pmgr persons budget pstart pend Emps Given this relational model, what kinds of operations might cause this assertion to be violated?

70 © Ellis Cohen Assertion Enforcement empno ename job hiredate sal comm deptno Projs pno pname pmgr persons budget pstart pend Emps Enforcing: All project managers are department managers If application-based enforcement is used, the application must check the assertion whenever it assigns a new project manager to a project changes the project manager of a project changes the job of an employee! (oops, that might easily be missed) These involve looking in both the Emps and Projs tables Can we change the relational model to make enforcement easier?

71 © Ellis Cohen Adding Intermediate Table Add a table DeptMgrs which just has the empno's of DeptMgrs empno ename job hiredate sal comm deptno Projs pno pname pmgr persons budget pstart pend Emps empno DeptMgrs Since DeptMgrs holds the empno's of all the DeptMgrs, this foreign key constraint enforces the assertion! Have we completely eliminated the need for an assertion?

72 © Ellis Cohen DeptMgr Assertion empno ename job hiredate sal comm deptno Emps empno DeptMgrs We still need to assert that DeptMgrs corresponds to the DeptMgrs in Emps. That is (SELECT empno FROM Emps WHERE job = 'DEPTMGR') = (SELECT empno FROM DeptMgrs) When would an application need to enforce this?

73 © Ellis Cohen Enforcing DeptMgr Assertion The DeptMgrs tuples correspond to the employees who have the job 'DeptMgr' An application must enforce this assertion whenever it inserts, deletes or changes the job of an employee: When inserting a 'DEPTMGR' into Emps, or changing an employee's job to 'DEPTMGR', insert empno into DeptMgrs When deleting a 'DEPTMGR' from Emps, or changing an employee's job from 'DEPTMGR', delete empno from DeptMgrs (DELETE handled by cascading delete) Only need to check the tuple affected; very easy to enforce!

74 © Ellis Cohen Assertion Elimination/Simplification & Subclasses Assertion elimination/simplification corresponds to adding subclasses Employee DeptMgr Project empno pno EmployeeProject empno pno