Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Slides:



Advertisements
Similar presentations
Schema Refinement: Normal Forms
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Lecture 21 CS 157 B Revision of Midterm3 Prof. Sin-Min Lee.
Logical Database Design (3 of 3) John Ortiz. Lecture 7Logical Database Design (2)2 Normalization  If a relation is not in BCNF or 3NF, we refine it by.
Spring 2011 Instructor: Hassan Khosravi
1 Design Theory. 2 Minimal Sets of Dependancies A set of dependencies is minimal if: 1.Every right side is a single attribute 2.For no X  A in F and.
1 Loss-Less Joins. 2 Decompositions uDependency-preservation property: enforce constraints on original relation by enforcing some constraints on resulting.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.
Functional Dependencies. Babies At a birth, there is one baby (twins would be represented by two births), one mother, any number of nurses, and a doctor.
Closure The closure of {B 1 …B k } under the set of FDs S, denoted by {B 1 …B k } +, is defined as follows: {B 1 …B k } + = {B | any relation satisfies.
Nov 11, 2003Murali Mani Normalization B term 2004: lecture 7, 8, 9.
1 Introduction to Database Systems CSE 444 Lectures 8 & 9 Database Design April 16 & 18, 2008.
Normal Form Design addendum by C. Zaniolo. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Normal Form Design Compute the canonical cover.
1 Normalization Chapter What it’s all about Given a relation, R, and a set of functional dependencies, F, on R. Assume that R is not in a desirable.
The principal problem that we encounter is redundancy, where a fact is repeated in more than one tuple. Most common cause: attempts to group into one relation.
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
Winter 2002Arthur Keller – CS 1804–1 Schedule Today: Jan. 15 (T) u Normal Forms, Multivalued Dependencies. u Read Sections Assignment 1 due. Jan.
M.P. Johnson, DBMS, Stern/NYU, Spring C : Database Management Systems Lecture #7 Matthew P. Johnson Stern School of Business, NYU Spring,
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee.
M.P. Johnson, DBMS, Stern/NYU, Sp20041 C : Database Management Systems Lecture #6 Matthew P. Johnson Stern School of Business, NYU Spring, 2004.
Fall 2001Arthur Keller – CS 1804–1 Schedule Today Oct. 4 (TH) Functional Dependencies and Normalization. u Read Sections Project Part 1 due. Oct.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Normalization Goal = BCNF = Boyce-Codd Normal Form = all FD’s follow from the fact “key  everything.” Formally, R is in BCNF if for every nontrivial FD.
Copyright © Curt Hill Schema Refinement III 4 th NF and 5 th NF.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Lecture 09: Functional Dependencies. Outline Functional dependencies (3.4) Rules about FDs (3.5) Design of a Relational schema (3.6)
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
THIRD NORMAL FORM (3NF) A relation R is in BCNF if whenever a FD XA holds in R, one of the following statements is true: XA is a trivial FD, or X is.
BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Functional Dependencies. FarkasCSCE 5202 Reading and Exercises Database Systems- The Complete Book: Chapter 3.1, 3.2, 3.3., 3.4 Following lecture slides.
IST 210 Normalization 2 Todd Bacastow IST 210. Normalization Methods Inspection Closure Functional dependencies are key.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
© D. Wong Ch. 3 (continued)  Database design problems  Functional Dependency  Keys of relations  Decompositions based on Functional Dependency.
Copyright, Harris Corporation & Ophir Frieder, The Process of Normalization.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
1 Multivalued Dependencies Fourth Normal Form Reasoning About FD’s + MVD’s.
1 Multivalued Dependencies Fourth Normal Form Reasoning About FD’s + MVD’s.
3 Spring Chapter Normalization of Database Tables.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
Functional Dependencies CIS 4301 Lecture Notes Lecture 8 - 2/7/2006.
Multivalued Dependencies and 4th NF CIS 4301 Lecture Notes Lecture /21/2006.
CS 338Database Design and Normal Forms9-1 Database Design and Normal Forms Lecture Topics Measuring the quality of a schema Schema design with normalization.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
CS542 1 Schema Refinement Chapter 19 (part 1) Functional Dependencies.
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Relational Database Design Algorithms and Further Dependencies.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
11/06/97J-1 Principles of Relational Design Chapter 12.
1 Lecture 9: Database Design Wednesday, January 25, 2006.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
Normal Forms Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems June 18, 2016 Some slide content courtesy of Susan Davidson.
Design Theory for Relational Databases Functional Dependencies Decompositions Normal Forms: BCNF, Third Normal Form Introduction to Multivalued Dependencies.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
4NF & MULTIVALUED DEPENDENCY By Kristina Miguel. Review  Superkey – a set of attributes which will uniquely identify each tuple in a relation  Candidate.
1 Database Design: DBS CB, 2 nd Edition Physical RDBMS Model: Schema Design and Normalization Ch. 3.
Formal definition of a key A key is a set of attributes A 1,..., A n such that for any other attribute B: A 1,..., A n  B A minimal key is a set of attributes.
Design Theory for Relational Databases
Design Theory for RDB Normal Forms.
Schedule Today: Next After that Normal Forms. Section 3.6.
Schedule Today: Jan. 23 (wed) Week of Jan 28
3.1 Functional Dependencies
CPSC-310 Database Systems
Multivalued Dependencies
Chapter 3: Design theory for relational Databases
CS4222 Principles of Database System
Presentation transcript:

Design Theory for RDB Normal Forms

Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy –A fact is repeated in more than one tuple. –Eg. We put course information into Students to represent “take-course” relationship StuCourse(sno, name, age, dept, cno, title, credit) sno name age dept cno title credit s1 zhao 20 CS c1 DB 3 s1 zhao 20 CS c2 OS 3 s2 qian 21 CS c2 OS 3

Lu Chaojun, SJTU 3 What’s Bad Design?(cont.) Anomalies –Update anomalies eg. When ‘zhao’ gets one year older, we may change his age in one tuple and leave others unchanged –Deletion anomalies eg. If ‘zhao’ is the only student taking ‘c1’ and then he quits, we lose information of ‘c1’. –Insertion anomalies eg. Can we input a student who has not yet selected any course?

Lu Chaojun, SJTU 4 What’s Good Design? Decompose into smaller relations –eg. S, SC, C No loss of information No redundancy No anomalies –Update anomalies eg. –Deletion anomalies eg. –Insertion anomalies eg.

Lu Chaojun, SJTU 5 Decomposing Relations Goal: decompose a relation into smaller ones in order to eliminate anomalies. Def: Decompose R(A 1,…,A n ) into S(B 1,…,B m ) and T(C 1,…,C k ) such that 1. {A 1,…,A n }={B 1,…,B m }  {C 1,…,C k } 2. S =  B1,…,Bm (R) 3. T =  C 1,…,C k (R)

Lu Chaojun, SJTU 6 Example Stud(sno,name,age,dept,cno) S(sno,name,age,dept) SC(sno,cno) Does the number of tuples change after decomposition?

Lu Chaojun, SJTU 7 Boyce-Codd Normal Form Goal: Defines conditions for good schemas -- Intuitively, key determines everything. Def.: R is in BCNF iff for every nontrivial FD X  Y, X is a superkey for R. BCNF violation: nontrivial FD X  Y where X is not a superkey Example: StuCourse(sno,name,age,dept,cno,title,credit) is not in BCNF, because of FD sno  name,age,dept

Lu Chaojun, SJTU 8 Decomposition into BCNF Any relation schema R can be decomposed into R 1,…,R n such that 1. Each R i is in BCNF; 2. R can be reconstructed from R 1,…,R n. Decomposition into BCNF Strategy –Find a BCNF-violation: X  Y –Compute X + to augment the RHS –Decompose R into R 1 : X + and R 2 : (R–X + )  X or: R–(X + –X) XX+X+ R–X+R–X+ R X + –X

Lu Chaojun, SJTU 9 Decomposition into BCNF(cont.) Repeat the decomposition strategy if any R i is not in BCNF, until all relations are in BCNF. –Use FD’s projected on R i Always successful? -- yes! –Decomposition always yields smaller relation schemas –Any two-attributes relation is in BCNF. Given R and set F of FD’s on R, we need only look among F for a BCNF violation, not those that follow from F.

Lu Chaojun, SJTU 10 Example StuCourse(sno,name,age,dept,cno,title,credit) BCNF violation: sno  dept R1(sno,name,age,dept) ---- in BCNF R2(sno,cno,title,credit) -----not in BCNF BCNF violation on R2: cno  title R21(cno,title,credit) ---- in BCNF R22(sno,cno) ---- in BCNF –Thus StuCourse is decomposed into R1, R21, and R22. Exactly what constitutes our running DB example Each R i is about one thing!

More on BCNF-Algorithm What if not expanding the RHS of BCNF violation? –See Ex Which of several BCNF violations to use? –See Ex Lu Chaojun, SJTU 11

Issues about Decomposition Elimination of redundancy and anomaly Recoverability of information Preservation of Dependency Lu Chaojun, SJTU 12

Lu Chaojun, SJTU 13 Lossless Join Decomposition A decomposition has a lossless join if the projections of tuples can be joined again to produce all and only the original tuples. Example R(A,B,C) R1(A,B) R2(B,C) a b c a b b c (a,b) joins with (b,c) to recover (a,b,c)

Lu Chaojun, SJTU 14 Lossless Join Decomposition (cont.) Projection/Join can always recover original tuples, but the process may produce “too much” tuples. Example R(A,B,C) R1(A,B) R2(B,C) a b c a b b c d b e d b b e (a,b) joins with (b,e) to give (a,b,e)  R

Lu Chaojun, SJTU 15 Lossless Join Decomposition (cont.) Decomposition into BCNF Strategy has a lossless join, i.e. the original relation can be recovered exactly by natural join. Why? -- decompose according to FD B  C R(A,B,C) R1(A,B) R2(B,C) a b c a b b c d b e d b b e –c must be the same as e! Same is true for recursive decomposition – is associative and commutative

Testing for a Lossless Join If we project R onto R 1, R 2,…, R k, can we recover R by rejoining? Any tuple in R can be recovered from its projected fragments. So the only question is: when we rejoin, do we ever get back something we didn’t have originally? Lu Chaojun, SJTU 16

The Chase Test Suppose tuple t comes back in the join. Then t is the join of projections of some tuples of R, one for each R i of the decomposition. Can we use the given FD’s to show that one of the tuples of R must be t ? Lu Chaojun, SJTU 17

The Chase Test (cont.) Start by assuming t = abc…. For each i, there is a tuple s i of R that has a, b, c,… in the attributes of R i. s i can have any values in other attributes. We’ll use the same letter as in t, but with a subscript, for these components. Lu Chaojun, SJTU 18

Example: The Chase Let R = ABCD, and the decomposition be AB, BC, and CD. Let the given FD’s be C  D and B  A. Suppose the tuple t = abcd is the join of tuples projected onto AB, BC, CD. Lu Chaojun, SJTU 19

Example: The Tableau ABCD abc 1 d 1 a 2 bcd 2 a 3 b 3 cd Lu Chaojun, SJTU 20 We’ve proved the second tuple must be t. The tuples of R pro- jected onto AB, BC, CD d Use C  D a Use B  A

Summary of the Chase If two rows agree in the left side of a FD, make their right sides agree too. –Always replace a subscripted symbol by the corresponding unsubscripted one, if possible. If we ever get an unsubscripted row, we know any tuple in the project-join is in the original. –the join is lossless. Otherwise, the join is not lossless. –The final tableau is a counterexample. –It’s an instance of R that satisfies the given FD’s –The join produces an unsubscripted tuple not in R Lu Chaojun, SJTU 21

Example: Lossy Join Same relation R = ABCD and same decomposition. But with only the FD C  D. Lu Chaojun, SJTU 22

Example: The Tableau ABCDabc1d1a2bcd2a3b3cdABCDabc1d1a2bcd2a3b3cd Lu Chaojun, SJTU 23 d Use C  D These three tuples are an example R that shows the join lossy. abcd is not in R, but we can project and rejoin to get abcd. These projections rejoin to form abcd.

Lu Chaojun, SJTU 24 A Problem with BCNF A kind of FD causes problems: –If you decompose, you can’t check the FD within a single relation –If you don’t decompose, you violate BCNF. An abstract example: AB  C and C  B –Keys: {A,B} and {A,C} –BCNF violation: C  B –Decomposition: BC and AC –You can’t check FD AB  C

Lu Chaojun, SJTU 25 Example STC(stud,course,teacher) stud course  teacher and teacher  course Key: (stud,course) and (stud,teacher) BCNF violation: teacher  course Decomposition: TC(teacher,course), ST(stud,teacher) Problem: stud course  teacher may not be satisfied course teacher stud teacher stud course teacher c1 t1 s1 t1 s1 c1 t1 c1 t2 s1 t2 s1 c1 t2 –Although no FD’s were violated in TC and ST, FD stud course  teacher is violated by the database as a whole.

Lu Chaojun, SJTU 26 3NF A relation R is in 3NF iff for every nontrivial FD X  Y, either 1. X is a superkey, or 2. Each A  Y  X is contained in some key. A is said to be prime if it is a member of some key. We don’t decompose into BCNF in this situation, at the price of some redundancy.

Example: 3NF In our problem situation with FD’s AB  C and C  B –Keys: {A,B} and {A,C} Thus A, B, and C are each prime. Although C  B violates BCNF, it does not violate 3NF. Lu Chaojun, SJTU 27

Lu Chaojun, SJTU 28 3NF vs BCNF There are two important properties of a decomposition: –P1 (Lossless Join). We are able to recover from the decomposed relations the data of the original. –P2 (Dependency Preservation). We are able to check that the FD's for the original relation are satisfied by checking the projections of those FD's in the decomposed relations.

Lu Chaojun, SJTU 29 3NF vs BCNF(cont.) It is always possible to decompose into BCNF and satisfy P1. It is always decompose into 3NF and satisfy both P1 and P2. It is not always possible to decompose into BNCF and satisfy both P1 and P2.

Lu Chaojun, SJTU 30 Why no 1NF and 2NF? 1NF –atomic value for any attribute 2NF –1NF and there’s no partial dependency 3NF –2NF and there’s no transitive dependency

3NF Synthesis Algorithm We can always decompose a relation into 3NF relations with a lossless join and dependency preservation. Need minimal basis for the FD’s: 1.Right sides are single attributes. 2.No FD can be removed. 3.No attribute can be removed from a left side. Lu Chaojun, SJTU 31

3NF Synthesis Algorithm(cont.) One relation for each FD in the minimal basis. –For X  A, create T(X,A). If none of the relation schemas contains some key for R, then add one relation whose schema is some key. Lu Chaojun, SJTU 32

Example: 3NF Synthesis Relation R(A,B,C,D). FD’s: A  B and A  C. Decomposition: AB and AC from the FD’s, plus AD for a key. Lu Chaojun, SJTU 33

Why It Works Lossless Join: use the chase to show that the row for the relation that contains a key can be made all-unsubscripted variables. Preserves dependencies: each FD from a minimal basis is contained in a relation, thus preserved. 3NF: hard part – a property of minimal bases. Lu Chaojun, SJTU 34

Lu Chaojun, SJTU 35 MVD: Attribute Independence CTX: course teacher text DB Li T1 Lu T2 T3 DB Li T1 DB Li T2 DB Li T3 DB Lu T1 DB Lu T2 DB Lu T3 CTX is in BCNF!

Lu Chaojun, SJTU 36 MVD A multivalued dependency X  Y holds for R if whenever two tuples of R agree on X, then we can swap their Y components and get two new tuples in R. X Y Z x1 y1 z1 x1 y2 z2 x1 y2 z1 x1 y1 z2 –For any fixed X, the associated values of Y and Z appear in all possible combinations. Or, Y and Z are independent.

Lu Chaojun, SJTU 37 Reasoning about MVD Trivial MVD’s X  Y if Y  X X  Y if R = X  Y. –Nontrivial MVD: X  Y where attributes of Y don’t appear in X and X  Y are not all the attributes of R. Transitive rule If X  Y and Y  Z, then X  Z –Any attribute in X  Z must be deleted from Z.

Lu Chaojun, SJTU 38 Reasoning about MVD(cont.) FD Promotion If X  Y, then X  Y. Complementation Rule If X  Y, then X  Z, where Z is all attributes not in X and Y. –Sometimes written as X  Y | Z No splitting rule! –Eg. name  city street | title year

Lu Chaojun, SJTU 39 4NF Goal: eliminate the redundancy caused by MVD R is in 4NF iff for every nontrivial MVD X  Y, X is a superkey. –If so, every nontrivial MVD is really an FD. –4NF implies BCNF, because FD is also an MVD and BCNF violation is also 4NF violation. –Eg. CTX: C  T and C is not a superkey.

Lu Chaojun, SJTU 40 Decomposition into 4NF Algorithm: Given R and FD/MVD, 1. Find a 4NF violation: X  Y. –If no, then R is in 4NF. 2. Decompose R into R1(X,Y) and R2(X,Z) where Z = R  (X  Y) 3. Find FD/MVD on R1 and R2. Recursively decompose R1 and R2.

Lu Chaojun, SJTU 41 Example 1 CTX(course,teacher,text) 1. course  teacher 2. CT(course,teacher) and CX(course,text) 3. No nontrivial MVD any more. So CT and CX are in 4NF.

Lu Chaojun, SJTU 42 Example 2 Person(name,addr,phones,hobbies) FD: name  addr Nontrivial MVD: name  phones and name  hobbies Only key: {name,phones,hobbies} All three dependencies violate 4NF Successive decomposition yields 4NF relations: P1(name,addr) P2(name,phones) P3(name,hobbies)

Lu Chaojun, SJTU 43 Relationships Among NF 4NF  BCNF  3NF  2NF  1NF 3NFBCNF4NF Eliminates redundancy due to FD MostYes Eliminates redundancy due to MVD No Yes Preserve FDYesMaybe Preserve MVDMaybe

Reasoning about FD/MVD’s Review: closure algorithm for inferring FD Closure algorithm can be seen as a variant of the Chase. The Chase can be extended to incorporate MVD’s as well as FD’s. –Inferring MVD’s –Projecting MVD’s

Inferring FD using the Chase Chase test for “X  Y follows from F” –Start with a tableau having two rows that agree only on X –Chase the tableau using FD’s of F to equate columns in X +  X –If the final tableau agrees in Y, then X  Y holds; otherwise, it does not.

Inferring MVD using the Chase FD X  Y can be used to equate values of Y for two tuples that agree on X. MVD X  Y can be used to form new tuples by swapping Y for two tuples that agree on X Given a set of FD/MVD’s, infer X  Y. –Start with two tuples s and t that agree only on X; –Apply FD and MVD; –If we find s[Y  t[Y]] in the tableau, then we have inferred X  Y.

Problem and Solution Since symbols may get equated and replaced, we may not recognize the desired tuple. Solution: –Define a target row with all unsubscripted letters, and never change its symbols. –Let s[X], s[Y], t[X] and t[Z] have unsubscripted letters. All the other components of s and t have unique new symbols. –Apply the chase. –If all-unsubscripted-letters row appears in the tableau, then we have inferred the MVD.

Example Given R(A,B,C,D) with A  B, B  C. Prove A  C A B C D  A B C D  A B C D a b 1 c d 1 a b c d 1 a b c d 1 a b c 2 d a b c 2 d a b c 2 d a b c 2 d 1 a b c d –Target row is (a,b,c,d)

Why Chase Works for MVD? A positive conclusion of the chase is nothing but another form of the familiar proof that the concluded FD/MVD holds. When the chase ends in failure, the final tableau is a counterexample. The chase can’t possibly keep producing new rows forever, since it never create new symbols.

Projecting MVD’s If R is decomposed into R i ’s, we have to test every possible FD and MVD for each R i using the chase. –The chase is applied on R, but we only need to produce a row that has unsubscripted letters in all the attributes of R i. Often, we don’t have to be exhaustive: –Check no trivial FD/MVD; –Consider only FD with singleton RHS; –Don’t consider FD/MVD whose LHS doesn’t contain the LHS of any given FD/MVD.

End