CS 222 Database Management System Spring 2010-11 Lecture 5 Database Design (Decomposition) Korra Sathya Babu Department of Computer Science NIT Rourkela.

CS 222 Database Management System Spring 2010-11 Lecture 5 Database Design (Decomposition) Korra Sathya Babu Department of Computer Science NIT Rourkela

Design of DB is needed to reduce redundancy and anomalies The theory of Functional Dependency is completely studied Better Design requires schema refinement A solution for schema refinement is Synthesis of relations 12/19/2015 Database Design2 Recap

12/19/2015 Database Design3 Relation Decomposition R-X +R-X + XX +-XX +-X R2R2 R1R1 R

Reason for Decomposition A solution for reducing redundancy and Anomalies Rules for synthesis Lossless Join (Information Preservation) Dependency Preservation (a special case of information preservation) Decomposition (synthesis) types By functional dependency By multi-valued dependency By Join dependency 12/19/2015 Database Design4 Relation Decomposition

Definition A decomposition D = {R 1, R 2,..., R m } of R has the lossless join property with respect to the set of dependencies F on R if, for every relation r of R that satisfies F, the following holds, (R 1 (r),..., R m (r)) = r where  is the natural join of all the relations in D The word loss in lossless refers to loss of information, not to loss of tuples. 12/19/2015 Database Design5 Lossless Join

Input: A relation R, a decomposition D = {R 1, R 2,..., R m } of R, and a set F of Functional Dependencies 12/19/2015 Database Design6 Test for Lossless Join Lossless Join Test Algorithm: Step 1: Create an initial matrix S with one row i for each relation Ri in D, and one column j for each attribute Aj in R. Step 2: Set S(i, j) := b ij for all matrix entries Step 3: For each row i representing relation schema Ri Do {for each column j representing Aj do {if relation Ri includes attribute Aj then set S(i, j) := a j ;} Step 4: Repeat the following loop until a complete loop execution results in no changes to S.

12/19/2015 Database Design7 Test for Lossless Join Lossless Join Test Algorithm: continues… Step 4: Repeat the following loop until a complete loop execution results in no changes to S. If {for each function dependency X  Y in F do for all rows in S which have the same symbols in the columns corresponding to attributes in X do {make the symbols in each column that correspond to an attribute in Y be the same in all these rows as follows: if any of the rows has an “a” symbol for the column, set the other rows to the same “a” symbol in the column. If no “a” symbol exists for the attribute in any of the rows, choose one of the “b” symbols that appear in one of the rows for the attribute and set the other rows to that same “b” symbol in the column;}} Step 5: If a row is made up entirely of “a” symbols, then the decomposition has the lossless join property; otherwise it does not.

12/19/2015 Database Design8 Example 1 SSNPNUMhoursENAME Emp_PROJ PNAMEPLOCATION F = {SSN  ENAME, PNUM  {PNAME, PLOCATION}, {SSN, PNUM}  hours} SSNENAME R1 PNUMPNAMEPLOCATION R2 SSNPNUMhours R3

12/19/2015 Database Design9 Example 1 A1 SSN A2 ENAME A3 PNUM A4 PNAME A5 PLOCATION A6 hours b11 b21 b31 b12 b22 b32 b13 b23 b33 b14 b24 b34 b15 b25 b35 b16 b26 b36 R1 R2 R3 a1 b21 a1 a2 b22 b32 b13 a3 b14 a4 b34 b15 a5 b35 b16 b26 a6 R1 R2 R3

12/19/2015 Database Design10 Example 1 a1 b21 a1 a2 b22 a2 b13 a3 b14 a4 b34 b15 a5 b35 b16 b26 a6 R1 R2 R3 a1 b21 a1 a2 b22 a2 b13 a3 b14 a4 b15 a5 b16 b26 a6 R1 R2 R3 SSN  ENAME PNUM  {PNAME, PLOCATION} SSNENAME PNUMPNAMEPLOCATION

12/19/2015 Database Design11 Example 2 SSNPNUMhoursENAME Emp_PROJ PNAMEPLOCATION F = {SSN  ENAME, PNUM  {PNAME, PLOCATION}, {SSN, PNUM}  hours} ENAME R1 SSNPNAME PLOCATION R2 PNUMhoursPLOCATION

12/19/2015 Database Design12 Example 2 A1 SSN A2 ENAME A3 PNUM A4 PNAME A5 PLOCATION A6 hours b11 b21 b12 b22 b13 b23 b14 b24 b15 b25 b16 b26 R1 R2 b11 a1 a2 b22 b13 a3 b14 a4 a5 b16 a6 R1 R2 SSN  ENAME PNUM  {PNAME, PLOCATION} {SSN, PNUM}  hours

Check whether the following decompositions are lossy or lossless Let R=ABCDE, R1=AD, R2=AB, R3=BE, R4=CDE, R5=AE. Let F={A  C, B  C, C  D, DE  C, CE  A} R(XYZWQ), FD={X  Z, Y  Z, Z  W, WQ  Z, ZQ  X}. R1(XW), R2(XY), R3(YQ), R4(ZWQ), R5(XQ) R(XYZ), F={X  Y, Z  Y}. R1(XY), R2(YZ) R(XYWZPQ), D={R1(ZPQ), R2(XYZPQ)} F={XY  W, XW  P, PQ  Z, XY  Q} 12/19/2015 Database Design13 Problems

R was decomposed (normalisation) into R 1, …, R n S - the set of FDs for R S 1, …, S n - the set of FDs for R 1, …, R n (each S i refers to only the attributes of R i ) S’ = S 1  …  S n (usually, S’  S) the decomposition is dependency preserving if S’ + = S + 12/19/2015 Database Design14 Dependency Preservation

12/19/2015 Database Design15 Test for Dependency Preservation Dependency Preservation Test: Step 1: For each X  Y Є F initialize a set T of attributes with the attributes of X (the determinant of the FD under consideration). ie set T=X and continue with step 2 Step 2: Repeat step 3 until the set T no longer changes. When T no longer changes continue with step 4 Step 3: For each relation R i (1≤ i ≤ k) of the input decomposition apply the corresponding R i operation (on a set of attributes T with respect to set of dependencies F). i.e T=T ∩ ((T ∪ R i ) + ∩ R i ) and repeat step 3 Step 4: Test to see if Y(the right hand side of the FD under consideration) is such that Y ⊂ T. There are two outcomes to this test. If the answer is negative. i.e. if Y not a subset of T then stop the execution of the algorithm and report that the decomposition does not preserve the FD. If the answer is affirmative, i.e. if Y ⊂ T then X  Y Є G +. If there are other FDs in F that need to be considered repeat step 1 with a FD that has not been considered before. If no more FDs in F then continue with step 4 Input: decomposition D={D 1,…,D k } and a set of FDs F

12/19/2015 Database Design16 Problems 1.Given R(XYZ) and the set F = {Z  X, XY  Z}. Check if the decomposition R1(XY) and R2(XZ) preserve the set F. 2.Given R(ABCD) and the set F = {A  B, C  D}. Check if the decomposition R1(AB) and R2(CD) preserve the set F. 3.Determine if the decomposition D={R 1 (XY), R 2 (YZ), R 3 (ZW)} of the relation R(WXYZ) preserves the dependencies of the set F={X  Y, Y  Z, Z  W, W  X}. 4.Given R(ABCDEF) and the set F = {A  B, C  DF, AC  E, D  F}. Check if the decomposition R1(ACE), R2(CD), R3(DF) and R4(AB) preserve the set F.

Normalization is the process of successive reduction of a given set of relations to a better form ( reduced redundancy and anomalies ) The normalization that one needs to sustain depends on the work flow (tradeoff between fast access, maintenance of integrity) Assumes that all possible functional dependencies are known First construct a minimal set of FDs Then apply algorithms that construct a required Normal Form Additional criteria may be needed to ensure that the set of relations in a relational database are atisfactory 12/19/2015 Database Design17 Normalization

A relation is in first normal form (1NF) if it does not contain any repeating columns or repeating groups of columns It is the process of converting complex data structures into more simple, stable data structures A relvar is in 1NF if and only if in every legal value of that relvar, every tuple contains exactly one value for each attribute First Normal From (1NF) Unique rows All attributes are atomic 12/19/2015 Database Design18 1 NF

A table is in the second normal form (2NF) if it is in the first normal form and if all non-key columns in the table depend on the entire primary key The following relation is in 1NF but not 2NF 12/19/2015 Database Design19 2 NF EMPLOYEE2(Emp_ID, Name, Dept, Salary, Course, Date_Completed) Functional dependencies: 1. Emp_ID  Name, Dept, Salary 2. Emp_ID, Course  Date_Completed partial key dependency Decompose into 2NF EMPLOYEE1(Emp_ID, Name, Dept, Salary) Functional dependencies: Emp_ID Name, Dept, Salary EMPCOURSE(Emp_ID, Course,Date_Completed) Functional dependency: Emp_ID, Course  Date_Completed

A table is in the third normal form (3NF) if it is in the second normal form and if all non-key columns in the table depend non-transitively on the entire primary key 12/19/2015 Database Design20 3 NF SALES(Customer_ID, Customer_Name, SalesPerson, Region) Functional dependencies: 1. Customer_ID  Customer_Name, SalesPerson, Region 2. SalesPerson  Region Decompose into 3NF SALES1(Customer_ID, Customer_Name, SalesPerson) Functional dependencies: Customer_ID  Customer_Name, SalesPerson SPERSON(SalesPerson, Region) Functional dependency: SalesPerson  Region Transitive Dependency

A table is in Boyce-Codd normal form (BCNF) if every column, on which some other column is fully functionally dependent, is also a candidate for the primary key of the table A table is in BCNF if the only determinants in the table are the candidate keys 12/19/2015 Database Design21 BCNF SCHOOL(Student, Subject, Teacher) Functional dependencies: 1. Student, Subject  Teacher 2. Student, Teacher  Subject 3. Teacher  Subject Decompose into BCNF SCHOOL1(Student, Subject) SCHOOL2(Subject, Teacher) All Functional Dependencies vanished except Teacher  Subject

It is always possible to decompose a relation into relations in 3NF such that: the decomposition is lossless the dependencies are preserved It is always possible to decompose a relation into relations in BCNF such that: the decomposition is lossless but it may not be possible to preserve dependencies But may eliminate more redundancy 12/19/2015 Database Design22 Comparison between 3NF and BCNF

Let R be a relation schema and let   R and   R. The multivalued dependency    holds on R if in any legal relation r(R), for all pairs for tuples t 1 and t 2 in r such that t 1 [  ] = t 2 [  ], there exist tuples t 3 and t 4 in r such that: t 1 [  ] = t 2 [  ] = t 3 [  ] = t 4 [  ] t 3 [  ] = t 1 [  ] t 3 [R –  ] = t 2 [R –  ] t 4  ] = t 2 [  ] t 4 [R –  ] = t 1 [R –  ] MVD is a tuple generating Dependency 12/19/2015 Database Design23 Multivalued Dependency

A table is in the fourth normal form (4 NF) if it is in BCNF and does not have any independent multivalued parts of the primary key If there are two attributes A and B and for a given value of A if there exists multiple values of B, then we say that an MVD exists between A and B The normal forms after BCNF are theoretical interests 12/19/2015 Database Design24 4 NF

Student Table 12/19/2015 Database Design25 4 NF StudentSubjectLanguage GeetaMythologyEnglish GeetaPsychologyEnglish GeetaMythologyHindi GeetaPsychologyHindi ShekherGardeningEnglish Student  Subject Student  Language

12/19/2015 Database Design26 4 NF StudentSubject GeetaMythology GeetaPsychology ShekherGardening Here we take care of the update anomaly Split the independent multi-valued components of the primary key into two tables The primary key is (student subject language) Student_Subject Table StudentLanguage GeetaEnglish GeetaHindi ShekherEnglish Student_Language Table

There exists relations that cannot be nonloss- decomposed into two projects, but can be decomposed into three or more 12/19/2015 Database Design27 Surprise: Loss less Decomposition

Definition: A relation R satisfies the join Dependency (JD) *(X,Y,…,Z) iff R is equal to the join of its projects on X,Y,..,Z, where X,Y,..,Z are subsets of the set of attributes of R. Consider the following Suppliers(S), Parts(P) and Location they Supply (L) table SPL Table 12/19/2015 Database Design28 Join Dependency SPL S1P1L2 S1P2L1 S2P1L1 S1P1L1 SP S1P1 S1P2 S2P1 PL L2 P2L1 P1L1 ACTUAL DECOMPOSTION

12/19/2015 Database Design29 Join Dependency SPL S1P1L2 S1P2L1 S2P1L1 S1P1L1 SP S1P1 S1P2 S2P1 PL L2 P2L1 P1L1 ACTUAL DECOMPOSTION Join SPL S1P1L2 S1P2L1 S2P1L1 S1P1L1 S2P1L2 Spurious Tuple

12/19/2015 Database Design30 Join Dependency SPL S1P1L2 S1P2L1 S2P1L1 S1P1L1 SP S1P1 S1P2 S2P1 PL L2 P2L1 P1L1 DECOMPOSTION Join LS L2S1 L1S1 L2S2 SPL S1P1L2 S1P2L1 S2P1L1 S1P1L1

A table is in fifth normal form (5NF) if it is in the fourth normal form and every join dependency in the table is implied by the candidate key Its also called as the Project Join Normal Form (PJNF) 12/19/2015 Database Design31 5 NF

12/19/2015 Database Design32 Normalization Un-normalized Relation First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce-Codd Normal Form Fourth Normal Form (4NF) Fifth Normal Form (5NF) Arrange every atomic value in the cell (intersection of row and column) of a table Eliminate Partial Dependencies Eliminate Transitive Dependencies Make every determinant as a key Eliminate Multi-valued Dependencies that are not Functional Dependencies Eliminate Join Dependencies that are not implied by Candidate keys

Denormalization if a process in which we retain or introduce some amount of redundancy for faster data access Where there arise tradeoffs 12/19/2015 Database Design33 Denormalization

Normalization helps to reduce redundancy and few anomalies The first 3 (1, 2 and 3) normal forms are practical but BCNF, 4NF and 5 NF are more of theoretical interests Denormalization is done for fast access 12/19/2015 Database Design34 Summary

CS 222 Database Management System Spring 2010-11 Lecture 5 Database Design (Decomposition) Korra Sathya Babu Department of Computer Science NIT Rourkela.

Similar presentations

Presentation on theme: "CS 222 Database Management System Spring 2010-11 Lecture 5 Database Design (Decomposition) Korra Sathya Babu Department of Computer Science NIT Rourkela."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 222 Database Management System Spring 2010-11 Lecture 5 Database Design (Decomposition) Korra Sathya Babu Department of Computer Science NIT Rourkela.

Similar presentations

Presentation on theme: "CS 222 Database Management System Spring 2010-11 Lecture 5 Database Design (Decomposition) Korra Sathya Babu Department of Computer Science NIT Rourkela."— Presentation transcript:

Similar presentations

About project

Feedback