1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Advanced Normalization These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. For more information on how you may use them, please see
2 © Ellis Cohen Overview of Lecture Lossless & Dependency-Preserving Decomposition Boyce Codd Normal Form Standard Decomposition of Non-Symmetric BCNF Violations Decomposition of Non-Symmetric BCNF Violations Using Inclusion Constraints Conceptual BCNF Resolution 4th Normal Form & Multivalued Dependencies Pseudo-4NF Violations 5th Normal Form Eliminating Assertions with Referential Integrity
3 © Ellis Cohen Lossless & Dependency- Preserving Decomposition
4 © Ellis Cohen Resolving a Pure 3NF Violation Emps new empno ename addr deptno Depts deptno dname Emps old empno ename address deptno dname deptno dname
5 © Ellis Cohen Lossless Decompositions When you decompose a relation into two or more relations, the result of joining them should be equivalent to the original relation. Is Emps old = Emps new |X| Depts ? (SELECT * FROM Emps old ) = (SELECT * FROM Emps new NATURAL JOIN Depts) ?
6 © Ellis Cohen NULLs & Normalization If Emps old had any NULL deptno's, then Emps old ≠ Emps new |X| Depts (although Emps old = Emps new :X| Depts) In general, NULLs complicate Functional Dependencies and Normalization We will assume that relations do NOT have NULLs. In other words, assume that all NULLs have been turned into domain-specific special values (e.g. -1 for deptno, ' ' for dname) Natural Left Join
7 © Ellis Cohen Avoid Decompositions that Lose FD's Emps( empno, deptno, dname ) empno deptno, dname deptno dname RESEARCH SALES SALES ACCOUNTING empno deptno dname Emps empno deptno Emps1 7369RESEARCH 7499SALES 7521SALES 7839ACCOUNTING empno dname Emps2
8 © Ellis Cohen Dependency Preserving Decompositions Consider decomposition of Emps into Emps1: (empno, deptno) empno deptno Emps2: (empno, dname) empno dname It is a lossless decomposition You can get original Emps back by Emps1 |X| Emps2 BUT it is not dependency-preserving: you can no longer directly infer FD: deptno dname –Still update anomalies –Still redundancies, but they are hidden, because spread across new Emps1 & Emps2
9 © Ellis Cohen Enforcing Lost Dependencies After decomposing Emps into Emps1 and Emps2 if we still want to enforce deptno dname we would have to do so by enforcing a state assertion (SELECT count(DISTINCT dname) FROM (Emps1 NATURAL JOIN Emps2) GROUP BY deptno) ALL = 1
10 © Ellis Cohen Combine Corresponding Tables To help ensure that FD's are made visible: If two tables –both have a unique attribute (or set of attributes) with the same meaning –the same set of values for those attribute then, combine the tables Both Emps1 and Emps2 have an empno attribute, with the same meaning, and covering the same set of values, so combine Emps1 & Emps2 Emps
11 © Ellis Cohen Normalization & Decomposition Goals of Relational Normalization Eliminate redundancy Enable conceptual constraints (e.g. FD's) to be enforced by the database directly (e.g. uniqueness constraints & foreign keys) –Any remaining FD's must be enforced as state assertions (which is more expensive) Make sure decompositions are Lossless Dependency Preserving
12 © Ellis Cohen Boyce Codd Normal Form
13 © Ellis Cohen BCNF: Boyce Codd Normal Form A table is in 3NF if all the non-prime attributes are minimally determined by a candidate key A table is in BCNF if all attributes are minimally determined by a candidate key
14 © Ellis Cohen NF/3NF/BCNF Diagrams prime attributes non-prime attributes A BX 2NF Partial Key Violation 3NF Transitivity Violation BCNF Violation B A X C X X BCA Pure Mixed BA
15 © Ellis Cohen Symmetric BCNF Violation divnam dname divid prime attributesnon-prime attributes Candidate Key loc Depts divid divnam divnam divid Department names are unique within a division Divisions are uniquely identified by either their name (divnam) or id (divid)
16 © Ellis Cohen Resolving Symmetric BCNF Violation GGeorgia PacificACCOUNTINGAtlanta GGeorgia PacificSALESBirmingham GGeorgia PacificOPERATIONSAtlanta SSouthern RailACCOUNTINGCharlotte divid divnam dname loc GACCOUNTINGAtlanta GSALESBirmingham GOPERATIONSAtlanta SACCOUNTINGCharlotte divid dname loc divid divnam GGeorgia Pacific SSouthern Rail divid + dname and divnam + dname are key candidates divid divname & divnam divid candidate key non-prime
17 © Ellis Cohen Symmetric BCNF Normalization Divisions divid divnam Depts new divid dname loc Depts old divid divnam dname loc Normalize by 1) Create a new relation with the determinant and everything dependent upon it 2) Remove everything dependent upon the determinant from the original relation 3) Add foreign key constraint, possibly w cascading delete What's the corresponding conceptual normalization? divid divnam
18 © Ellis Cohen Conceptual Symmetric BCNF Normalization Dept divid divnam dname loc DivisionDept part of divid divnam dname loc Resolve exactly like 2NF Violations divid divnam
19 © Ellis Cohen Standard Decomposition of Non-Symmetric BCNF Violations
20 © Ellis Cohen Problem of Non-Symmetric BCNF C B A prime attributesnon-prime attributes D The usual approach to resolving NF violations when used with BCNF violations leads to a decomposition that is NOT dependency-preserving
21 © Ellis Cohen Non-Symmetric BCNF Example Each project has mentors who work only with that project Each employee assigned to a project (for a specified # of hrs per week) is also assigned one of the project mentors Assignment( empno, ename, pno, pname, mentid, mname, hrs ) Candidate Keys empno + pno empno + mentid (since mentid pno) Prime Attributes empno, pno, mentid FD's empno ename pno pname empno + pno hrs, mentid mentid mname, pno Non-symmetric BCNF violation since mentid pno, but NOT( pno mentid )
22 © Ellis Cohen Non-Symmetric BCNF Violation prime attributesnon-prime attributes mentid empno pno Candidate Key hrs Assignment
23 © Ellis Cohen Resolve BCNF Violation Mentor mentid mname pno pname Assigment empno ename hrs Mentors mentid mname pno pname Assignments mentid empno ename hrs Next resolve empno ename and pno pname Resolve mentid pno Corresponding Conceptual Model
24 © Ellis Cohen pno pname BCNF & Dependency Preservation Mentor mentid mname hrs Mentors mentid mname pno Assigns mentid empno hrs Employee empno ename Project pno pname Emps empno ename Projects Everything is in BCNF & all decompositions are lossless, but empno + pno mentid never got resolved, and can no longer be inferred! It must be added as an assertion! empno ename mentid mname pno pname empno + mentid hrs mentid pno Assignment
25 © Ellis Cohen pno pname FD Violation Mentors mentid mname pno Assigns mentid empno hrs Emps empno ename Projects Possible violation of empno + pno mentid mentidempnohrs mentidmnamepno 1111JONES SMITH32 Employee 4567 works with mentor 1111, assigned to proj 32 Employee 4567 works with mentor 2222, assigned to proj 32 So, 4567 works on proj 32, with mentors 1111 & 2222 OOPS!
26 © Ellis Cohen pno pname Required FD Assertion Mentors mentid mname pno Assigns mentid empno hrs Emps empno ename Projects empno + pno mentid (SELECT count(DISTINCT mentid) FROM (Assigns NATURAL JOIN Mentors) GROUP BY empno, pno) ALL = 1 How many mentors can have the same empno & pno? The answer must always be one!
27 © Ellis Cohen Decomposing BCNF Violations Using Inclusion Constraints
28 © Ellis Cohen How to Tame BCNF Violations Mentors mentid mname pno ! pname Assignments mentid empno pno ename hrs Resolve mentid pno with both empno + mentid and empno + pno as candidate keys Assignments mentid empno pno pname ename mname hrs With BCNF decomposition, pno remains in the original table, but it is constrained by an inclusion constraint. This keeps the decomposition dependency-preserving, while "taming" the mentid pno FD still left in Assignments empno + pno mentid mentid pno
29 © Ellis Cohen Inclusion Constraints Mentors mentid mname pno ! pname Assignments mentid empno pno ename hrs This requires declaring mentid + pno in Mentors as unique. That may seem unnecessary, since mentid is already a PK. However, it forces the DB to maintain an index on mentid + pno, so the foreign key reference can check that every pair of mentid + pno's in Assigns is also in Mentors empno + pno mentid mentid pno Even though the mentid pno FD is still in Assignments, it is enforced by the inclusion constraint with Mentors, so no state assertion is needed
30 © Ellis Cohen SQL for BCNF Decomposition CREATE TABLE Mentors( mentidnumber(3) primary key, mnamevarchar(20), pnonumber(4) not null, pnamevarchar(20), unique( mentid, pno ) ) CREATE TABLE Assignments( mentidnumber(3), empnonumber(4), pnonumber(4), enamevarchar(30), hrsnumber(5,1), primary key( empno, pno ), foreign key( mentid, pno ) references Mentors( mentid, pno ) );
31 © Ellis Cohen Fully Normalized Relational Model pno pname Mentors mentid mname pno ! Assigns mentid empno pno hrs Emps empno ename Projects Mentors mentid mname pno ! pname Assignments mentid empno pno ename hrs
32 © Ellis Cohen Conceptual BCNF Resolution
33 © Ellis Cohen pno pname 2NF/3NF Relational Normalization Mentors mentid mname Assigns pno empno hrs mentid Emps empno ename Projects mentid mname pno pname mentid pno Still violates BCNF & would need to be enforced by an assertion empno + pno mentid empno ename empno + pno hrs Suppose we resolve everything EXCEPT mentid pno
34 © Ellis Cohen NF/3NF Conceptual Normalization Employee empno ename Project mentid mname Mentor pno pname mentid mname pno pname empno + pno mentid empno ename empno + pno hrs Uniqueness Constraints Key Constraints hrs Assignment Corresponding conceptual model mentid pno BCNF violation hidden in relational mapping of Assignment
35 © Ellis Cohen Modeling Considerations Consider the most natural conceptual model corresponding to enforcement of mentid pno This represents a key constraint between Mentor and Project -- i.e. every mentor is associated with a single project Since there is no symmetric constraint, we can assume that a project can have multiple mentors. This implies a 1:M relationship between Mentor and Project, which we will add to the conceptual model
36 © Ellis Cohen Natural Conceptual Model empno ename mentid mname Mentor pno pname mentid mname pno pname empno + pno mentid empno ename empno + pno hrs mentid pno What the relational model? "Natural" because all FD's derivable from key & uniqueness constraints EmployeeProject Assignment
37 © Ellis Cohen pno pname Relational Mapping of Natural Conceptual Model Mentors mentid mname pno ! Assigns pno empno hrs mentid Emps empno ename Projects (SELECT DISTINCT pno, mentid FROM Assigns) (SELECT pno, mentid FROM Mentors) But this has added redundancy, and now an assertion is needed to ensure that each pair of mentid/pno in Assigns is also in Mentors. This is exactly the inclusion dependency
38 © Ellis Cohen Factored Conceptual Model This corresponds to adding the conceptual state constraint: A mentor can mentor many assignments, but they are all for the mentor's project, represented by factoring the mentored by relationship empno ename mentid mname Mentor pno pname EmployeeProject Assignment mentored by (by Project)
39 © Ellis Cohen pno pname Relational Mapping of Factoring Mentors mentid mname pno ! Assigns empno hrs pno mentid Emps empno ename Projects The factored relationship is modeled by an inclusion constraint on mentid + pno, which directly enforces the inclusion dependency expressed by the factored relationship Redundant except for cascading delete! Implements factored relationship
40 © Ellis Cohen Eliminating Redundancy pno pname Mentors mentid mname pno ! Assigns mentid empno pno hrs Emps empno ename Projects pno pname Mentors mentid mname pno ! Assigns empno hrs pno mentid Emps empno ename Projects Results in same model as when using BCNF normalization rule Eliminate cascading delete and just add it as an explicit business rule
41 © Ellis Cohen th Normal Form & Multivalued Dependencies
42 © Ellis Cohen NF: 4th Normal Form 4NF (4 th Normal Form) violations can occur when designers (trying to use one less table) put two independent 1:M or M:N relationships into a single table
43 © Ellis Cohen Actual Relations empno pno empno sklid MSExcel 7369MSWord 7369Emacs 2169MSWord (*) Employee assigned to (*) Project (*) Employee have (*) Skill Suppose a designer mistakenly put these into a single table maintaining one or none of the tables above EmpProjs EmpSkills
44 © Ellis Cohen Join Produces 4NF Violation MSExcel MSWord Emacs MSExcel MSWord Emacs MSWord empno pno sklid Notice that the original EmpProjs table is just EPS{ empno, pno ! }, the original EmpSkills table is just EPS{ empno, sklid ! }, and EPS = EmpsProjs |X| EmpSkills There is redundancy in this table, though it's a bit harder to see than with 2NF/3NF/BCNF violations. The remedy is the same though: decompose to get the original tables back! EPS Note the Deletion Anomaly: If an employee is decertified from all of their skills, we lose track of the projects to which they are assigned!
45 © Ellis Cohen Asserting Redundancy MSExcel MSWord Emacs MSExcel MSWord Emacs MSWord empno pno sklid EPS{ empno, pno ! } |X| EPS{ empno, sklid ! } = EPS If we were just given this table, what assertion expresses the redundancy? EPS This can also be written as the Multivalued Dependencies empno pno and empno sklid
46 © Ellis Cohen MVD Assertion empno pno (or empno sklid) EPS{ empno, pno ! } |X| EPS{ empno, sklid ! } = EPS SELECT * FROM (SELECT DISTINCT empno, pno FROM EPS) NATURAL JOIN (SELECT DISTINCT empno, sklid FROM EPS) = EPS
47 © Ellis Cohen Multivalued Dependencies (MVDs) Writing that a relation R has a multivalued dependency A B (where both A and B could be individual attributes or sets of attributes) means that If you decomposed R into two tables, –one with the attributes A + B –one with the attributes R – B and joined then together, you would get R back, that is, you'd have a lossless decomposition. Formally (in REAL) R = R{ A, B ! } |X| R{ B ! }
48 © Ellis Cohen MVDs & FDs Given that decompositions based on FDs are all lossless, it should be no surprise that one can prove that If A B Then A B In other words, FDs are a special case of MVDs
49 © Ellis Cohen MVD's and 4NF A relation R is in 4NF if it has no MVD A B where A is part of a candidate key of R B is a proper subset of the attributes in R excluding those in A
50 © Ellis Cohen Ternary Relationships can use SkillProject Employee A 4NF violation arises if a designer mistakenly creates a single table for empno, pno and sklid that's equivalent to conceptually treating the relationship between Skill, Project & Employee as a ternary relationship empno pnosklid
51 © Ellis Cohen Separate Relationships BUT: The problem is that there are actually two separate relationships SkillProject Employee empno pnosklid Note: In other cases, these could be 1:M relationships
52 © Ellis Cohen Pseudo-4NF Violations
53 © Ellis Cohen Pseudo-4NF Violations A pseudo-4NF violation describes other ways that designers attempt to use a single table to represent two independent relationships. They are ugly because They are "non-relational" They can be decomposed loss-lessly, but not based on joins They are fragile because they are relatively hard to maintain However, they don't exhibit the deletion anomalies of real 4NF violations
54 © Ellis Cohen Union Pseudo-4NF Violation MSExcel 7369MSWord 7369Emacs MSWord empno pno sklid Either don't specify a primary key or use some special value other than null (since null is not allowed as part of a primary key) (*) Employee assigned to (*) Project (*) Employee have (*) Skill Formed by a union of the EmpProjs and EmpSkills tables EPS
55 © Ellis Cohen Union Decomposition Assertion EPS[pno IS NOT NULL]{ empno, pno } U EPS[skills IS NOT NULL] { empno, sklid } = EPS SELECT empno, pno, NULL AS sklid FROM EPS WHERE pno IS NOT NULL UNION SELECT empno, NULL as pno, sklid FROM EPS WHERE sklid IS NOT NULL = EPS
56 © Ellis Cohen Mixed Pseudo-4NF Violation MSExcel MSWord 7369Emacs MSWord empno pno sklid (*) Employee assigned to (*) Project (*) Employee have (*) Skill Can also resolve by decomposing to represent each relationship in a separate table Either don't specify a primary key or use some special value other than null (since null is not allowed as part of a primary key)
57 © Ellis Cohen Mixed Decomposition Assertion CREATE TABLE NEPS AS SELECT empno, pno, sklid, rownum AS n FROM EPS CREATE TABLE NEmpProjs AS SELECT empno, pno, n – (SELECT min(n) FROM NEPS WHERE empno = ne.empno) AS n FROM NEPS ne WHERE pno IS NOT NULL CREATE TABLE NEmpSkills AS SELECT empno, sklid, n – (SELECT min(n) FROM NEPS WHERE empno = ne.empno) AS n FROM NEPS ne WHERE sklid IS NOT NULL (SELECT empno, pno, sklid FROM NEmpProjs NATURAL FULL JOIN NEmpSkills) = EPS
58 © Ellis Cohen th Normal Form also called PJNF: Project Join Normal Form
59 © Ellis Cohen NF: 5th Normal Form A 5NF (5 th Normal Form) violation describes any relation that could be decomposed into 2 or more relations, where the relations do not all have the same set of candidate keys, and the relations are joined together to get back the original relation These include the 4NF violations, but more complicated ones as well
60 © Ellis Cohen Actual Relations empno pno empno sklid 7049MSExcel 7049MSWord 7049MSproject 7369MSExcel 7369MSWord 7369Emacs 7411MSWord 7411Emacs pno sklid 47369MSExcel 47369MSWord 47369MSProject 20241MSExcel 20241Emacs 20241MSProject (*) Employees assigned to (*) Projects (*) Employees have (*) Skills (*) Projects requires (*) Skills Suppose a designer mistakenly put these into a single table maintaining some (but not all) of the tables above
61 © Ellis Cohen Join Produces a 5NF Violation MSExcel MSWord MSProject MSExcel MSWord MSExcel Emacs Emacs empno pno sklid This table is not in 5NF because it could be decomposed into the tables on the previous page There is redundancy in this table, but it is much harder to see than the 4NF violation Table has a tuple for every possible combination in which an employee is assigned to a project, a project needs a skill & an employee has that skill
62 © Ellis Cohen Ternary Relationships A 5NF violation arises because the designer mistakenly creates a single table for empno, pno and skill – equivalent to conceptually treating the relationship between Skill, Project & Employee as a ternary relationship can use SkillProject Employee empno pnosklid
63 © Ellis Cohen Separate Relationships BUT: The problem is that there are actually three separate relationships SkillProject Employee empno pnosklid assigned to needed for available from A tuple is in the original table whenever an employee is assigned to a project, has a skill, and the project needs the skill. It is NOT a subset of these tuples (i.e. the skills that an employee actually uses on a project!)
64 © Ellis Cohen How 4NF/5NF Violations Arise 4NF/5NF Violations can occur because of bad original design a table originally tracked only a subset of the tuples in the join, but that is no longer the case an attribute that applied to the combination of all the key values was removed without decomposing the tables
65 © Ellis Cohen Unused Tuple Selectivity MSExcel MSWord MSProject MSExcel MSWord MSExcel Emacs Emacs empno pno sklid But project managers always selected all the employee's skills they might possibly need on the project, so all the tuples in the join were included. Even after this was realized, the table was not decomposed, even though it could have been. The original application design had an EmpSkills tables, which kept track of each employee's skills When adding a new person to their project, it allowed a project manager to specify (based on EmpSkills) the skills they would use for each person assigned to their project. This required a ternary table, so no separate empno/pno assignment table was created.
66 © Ellis Cohen Unused Attribute Columns MSExcel MSWord MSProject MSExcel MSWord MSExcel Emacs Emacs8 empno pno sklid effectiveness The project managers stopped using the effectiveness ratings, so the UI and attribute for it was eliminated, but the table was never decomposed. The original application design required project managers to rate the effectiveness of each skill of each person assigned to their project (NULL if they hadn't used the skill yet on the project). This required a ternary table, so no separate empno/pno assignment table was created.
67 © Ellis Cohen Eliminating Assertions with Referential Integrity
68 © Ellis Cohen Assertions Built-in Constraints Each of the preceding normalization techniques used normalization to eliminate redundancy Redundancy causes anomalies or requires an assertion to be enforced (e.g. deptno dname) Normalization eliminates the need for an assertion; enforcement is provided instead by foreign key constraints.
69 © Ellis Cohen Assertions & Foreign Keys Assertions can more generally be eliminated (or simplified) by using foreign keys Example All project managers are dept managers EACH Emps e WHERE ( SOME Projs p SATISFIES e.empno = p.pmgr) SATISFIES e.job = 'DEPTMGR' empno ename job hiredate sal comm deptno Projs pno pname pmgr persons budget pstart pend Emps Given this relational model, what kinds of operations might cause this assertion to be violated?
70 © Ellis Cohen Assertion Enforcement empno ename job hiredate sal comm deptno Projs pno pname pmgr persons budget pstart pend Emps Enforcing: All project managers are department managers If application-based enforcement is used, the application must check the assertion whenever it assigns a new project manager to a project changes the project manager of a project changes the job of an employee! (oops, that might easily be missed) These involve looking in both the Emps and Projs tables Can we change the relational model to make enforcement easier?
71 © Ellis Cohen Adding Intermediate Table Add a table DeptMgrs which just has the empno's of DeptMgrs empno ename job hiredate sal comm deptno Projs pno pname pmgr persons budget pstart pend Emps empno DeptMgrs Since DeptMgrs holds the empno's of all the DeptMgrs, this foreign key constraint enforces the assertion! Have we completely eliminated the need for an assertion?
72 © Ellis Cohen DeptMgr Assertion empno ename job hiredate sal comm deptno Emps empno DeptMgrs We still need to assert that DeptMgrs corresponds to the DeptMgrs in Emps. That is (SELECT empno FROM Emps WHERE job = 'DEPTMGR') = (SELECT empno FROM DeptMgrs) When would an application need to enforce this?
73 © Ellis Cohen Enforcing DeptMgr Assertion The DeptMgrs tuples correspond to the employees who have the job 'DeptMgr' An application must enforce this assertion whenever it inserts, deletes or changes the job of an employee: When inserting a 'DEPTMGR' into Emps, or changing an employee's job to 'DEPTMGR', insert empno into DeptMgrs When deleting a 'DEPTMGR' from Emps, or changing an employee's job from 'DEPTMGR', delete empno from DeptMgrs (DELETE handled by cascading delete) Only need to check the tuple affected; very easy to enforce!
74 © Ellis Cohen Assertion Elimination/Simplification & Subclasses Assertion elimination/simplification corresponds to adding subclasses Employee DeptMgr Project empno pno EmployeeProject empno pno