Prof. Sin-Min Lee Department of Computer Science

Slides:



Advertisements
Similar presentations
Dr. Alexandra I. Cristea CS 319: Theory of Databases: C3.
Advertisements

Schema Refinement: Normal Forms
Schema Refinement: Canonical/minimal Covers
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Schema Refinement and Normal Forms
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Announcements Read 6.1 – 6.3 for Wednesday Project Step 3, due now Homework 5, due Friday 10/22 Project Step 4, due Monday Research paper –List of sources.
Schema Refinement and Normal Forms Given a design, how do we know it is good or not? What is the best design? Can a bad design be transformed into a good.
Manipulating Functional Dependencies Zaki Malik September 30, 2008.
1 Design Theory. 2 Minimal Sets of Dependancies A set of dependencies is minimal if: 1.Every right side is a single attribute 2.For no X  A in F and.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Database Management COP4540, SCS, FIU Functional Dependencies (Chapter 14)
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
CS Algorithm : Decomposition into 3NF  Obviously, the algorithm for lossless join decomp into BCNF can be used to obtain a lossless join decomp.
Normalization DB Tuning CS186 Final Review Session.
Normalization DB Tuning CS186 Final Review Session.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
CMSC424: Database Design Instructor: Amol Deshpande
Design Theory.
1 Normalization Chapter What it’s all about Given a relation, R, and a set of functional dependencies, F, on R. Assume that R is not in a desirable.
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
Schema Refinement and Normal Forms. The Evils of Redundancy v Redundancy is at the root of several problems associated with relational schemas: – redundant.
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.
1 Schema Refinement and Normal Forms Yanlei Diao UMass Amherst April 10, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 Triggers: Correction. 2 Mutating Tables (Explanation) The problems with mutating tables are mainly with FOR EACH ROW triggers STATEMENT triggers can.
Chapter 14 Advanced Normalization Transparencies © Pearson Education Limited 1995, 2005.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
Relational Database Design by Relational Database Design by Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING.
Normal Forms1. 2 The Problems of Redundancy Redundancy is at the root of several problems associated with relational schemas: Wastes storage Causes problems.
CSCD34 - Data Management Systems - A. Vaisman1 Schema Refinement and Normal Forms.
Schema Refinement and Normalization. Functional Dependencies (Review) A functional dependency X  Y holds over relation schema R if, for every allowable.
Lecture 6 Normalization: Advanced forms. Objectives How inference rules can identify a set of all functional dependencies for a relation. How Inference.
Schema Refinement and Normal Forms Chapter 19 1 Database Management Systems 3ed, R.Ramakrishnan & J.Gehrke.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Computing & Information Sciences Kansas State University Tuesday, 27 Feb 2007CIS 560: Database System Concepts Lecture 18 of 42 Tuesday, 27 February 2007.
Christoph F. Eick: Functional Dependencies, BCNF, and Normalization 1 Functional Dependencies, BCNF and Normalization.
Database Systems/COMP4910/Spring02/Melikyan1 Schema Refinement and Normal Forms.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
1 Schema Refinement and Normal Forms Week 6. 2 The Evils of Redundancy  Redundancy is at the root of several problems associated with relational schemas:
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 15.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
1 Schema Refinement and Normal Forms Chapter The Evils of Redundancy  Redundancy is at the root of several problems associated with relational.
CS542 1 Schema Refinement Chapter 19 (part 1) Functional Dependencies.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Relational Database Design Algorithms and Further Dependencies.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Functional Dependency and Normalization
Advanced Normalization
Schema Refinement and Normal Forms
Relational Database Design
Advanced Normalization
Schema Refinement and Normalization
Schema Refinement and Normal Forms
Functional Dependencies and Normalization
Normalization Part II cs3431.
Some slides are from Dr. Sara Cohen
Schema Refinement and Normalization
Relational Database Design
Presentation transcript:

Prof. Sin-Min Lee Department of Computer Science CS157A Lecture 20 Midterm 3 Revision 2 Prof. Sin-Min Lee Department of Computer Science

Normal Forms 1NF 2NF 3NF BCNF 4NF 5NF Functional dependencies Multivalued dependencies Join dependencies

F+: dependencies induced by Armstrong’s Axioms Axioms for reasoning about FD’s (i), (ii) and (iii) are Armstrong’s axioms F+ is the set of dependencies which can be deduced from F by applying three inference rules: (i) reflexivity if Y  X then X ® Y (ii) augmentation if X ® Y then XZ ® YZ (iii) transitivity if X ® Y and Y ® Z then X ® Z

Completeness of Armstrong Axioms Idea If X ® Y cannot be deduced using Armstrong’s axioms then there is a relational instance for R in which all the dependencies in F are true, but X ® Y does not hold R=LMNO X=L F={L ® M , M ® N, O ® N} then X+ = LMN L ® O cannot be deduced in F* L M N Introduction -------------- Presentation --------------- O L ® O cannot be deduced in F+ Counterexample:

Closure of a Set of Attributes Let U be a set of attributes and F be a set of functional dependencies on U. Suppose that X  U is a set of attributes. Definition: X+ = { A | F X  A} We would like to compute X+ |=

AB  AB, BC  BC, AC  AC, ABC  ABC, AB  A, AB  B, BC  B, BC  C, Note that A, B, C, are attributes We refer to the set {A,B} simply as AB R = ( A, B, C ) F = { A  B, B  C } F+ = { A  A, B  B, C  C, AB  AB, BC  BC, AC  AC, ABC  ABC, AB  A, AB  B, BC  B, BC  C, AC  A, AC  C, ABC  AB, ABC  BC, ABC  AC, ABC  A, ABC  B, ABC  C, A  B, … (1) ( given ) B  C, … (2) ( given ) A  C, … (3) ( transitivity on (1) and (2) ) AC  BC, … (4) ( augmentation on (1) ) AC  B, … (5) ( decomposition on (4) ) A  AB, … (6) ( augmentation on (1) ) AB  AC, AB  C, B  BC, A  AC, AB  BC, AB  ABC, AC  ABC, A  BC, A  ABC } Using reflexivity, we can generate all trivial dependencies

Algorithm to compute closure of attributes X+ under F closure := X ; Repeat for each U  V in F do begin if U  closure then closure := closure  V ; end Until (there is no change in closure)

F = { A  B, A  C, CG  H, CG  I, B  H } To compute AG+ R( A, B, C, G, H, I ) F = { A  B, A  C, CG  H, CG  I, B  H } To compute AG+ closure = AG closure = ABG ( A  B ) closure = ABCG ( A  C ) closure = ABCGH ( CG  H ) closure = ABCGHI ( CG  I ) Is AG a candidate key? AG  R A+  R ? G+  R ?

Example R(ABCDE) F={ABC, CEB, DA, BCE} {A}+ = {A,B}+ = {B,D}+=

Theorem: R - a relation schema F - set of functional dependencies on R The decomposition of R into relations with attribute sets R1, R2 is a lossless-join decomposition iff ( R1  R2 )  R1  F + OR ( R1  R2 )  R2  F + i.e., R1  R2 is a superkey for R1 or R2. (the attributes common to R1 and R2 must contain a key for either R1 or R2 ).

Example R(A,B,C,D,E)  = {R1(A,D), R2(A,B),R3(B,E), R4(C,D,E), R5(A,E)} FD1. AC FD2. BC FD3. CD FD4. DEC FD5. CE A. Decide whether the decomposition is lossless.

BCNF Decomposition It is a lossless join decomposition. Suppose R is not in BCNF, A is an attribute, and X  A is a FD where X  A =  that violates the condition. Remove A from R Create a new relational schema XA Repeat this process until all the relations are in BCNF It is a lossless join decomposition. But not necessary dependency preserving

Key is C CSJDPQV SDP JS SDP CSJDQV SDP JS CJDQV JS

Does not preserve JPC, we can add a schema: Key is C SDP CSJDPQV JS SDP JP C SDP CSJDQV JS JS CJDQV The result is in BCNF Does not preserve JPC, we can add a schema: CJP Each of SDP, JS, CJDQV, CJP is in BCNF, but there is redundancy in CJP.

Possible refinement Key is C SDP SDP SDQ SDQ CSJDPQV Key is C SDP SDP SDP CSJDQV SDQ SDQ SDQ CSJDV SD is a key in SDP and SDQ, There is no dependency between P and Q we can combine SDP and SDQ into one schema Resulting in SDPQ, CSJDV

Overview It is possible to decompose any relational schema into a set of relational schemas with the following properties: 1) Attribute Preserving 2) FDs preserving 3) Lossless Join

The Decomposition Algorithm Step 1. Find a minimal cover G for F Step 2. For each left-hand side X that appears in G, create a relation schema with attributes {X  A1  A2, ... ,  Am} where X  A1, X  A2, ... , X  Am are all dependencies in G with X as left-hand side. Step 3. If none of the relation schemas contains a key of R, create one more relation schema that contains attributes that form a key for R.  

Example Consider R(A, B, C, D, E) and F={AB  C, A  BE, C E} Step 1. minimal cover Fmin={AC, AB, CE} Step 2. R1(A,B,C), R2(C,E) Step 3. Key: {A,D} we have R3(A,D)  Final Result: R1(A, B,C), R2(C,E), and R3(A,D)

BCNF Decomposition Property LJ1: A decomposition D={R1, R2} of R has the lossless join property with respect to a set of functional dependencies F on R if and only if either the FD ((R1  R2)  (R1 - R2)) is in F+, or the FD ((R1  R2)  (R2 - R1)) is in F+

BCNF Decomposition Property LJ2: If a decomposition D={R1, R2, ..., Rm} of R has the lossless join property with respect to a set of functional dependencies F on R, and if a decomposition D1={Q1, Q2, ... ,Qk} of Ri has the lossless join property with respect to the projection of F on Ri, then the decomposition D2={R1, R2, ... Ri-1, Q1, Q2, ..., Qk, Ri+1, ..., Rm} of R has the lossless join property with respect to F

BCNF Decomposition - Algorithm 1. Set D  {R} 2. While there is a relation schema Q in D that is not in BCNF do begin choose a relation schema Q in D that is not in BCNF; find a functional dependency X  Y in Q that violates BCNF; replace Q in D by two schemas (Q-Y) and (X  Y) end;  

BCNF Decomposition - Example Consider R(A,B,C,D) and F={A  B, B  C, D  B} Decompose R into BCNF relations. Step 1.D={R(A,B,C,D)} Step 2. Loop 1.R is not in BCNF because A  B and A is not a superkey decompose R into R1(A, C, D), and R2(A, B) Loop 2. R1 is not in BCNF because A  C and A is not a superkey decompose R1 into R11(A, D) and R12(A, C) Result:D={R11(A,D), R12(A,C), R2(A,B)}  

Overview Given a relation schema R(A1, A2, ... , An). If R is not in the third normal form (3NF), we wan to decompose it into a set of relation schema D= { R1, R2, ... ,Rm }, where each Ri is in 3NF, such that the following conditions are held: Attribute preservation condition. Dependency preservation condition Lossless join condition

Attribute Preservation Condition Attribute preservation condition states that the union of attributes of Ri equal to the set of attributes of R.  For example: Given R(A, B, C, D) and the decomposition D1={ R1(A,B), R2(B,C) and R3(A,C,D)}. D1 satisfies the attribute preservation condition.

Attribute Preservation Condition Given R(A, B, C, D) and the decomposition D2={R1(A, B), R2(B,C), R3(A, C)}, The attribute preservation condition is violated because D is missing (not preserved in the decomposition).

Dependency Preservation Condition We say that a decomposition D={R1, R2, ... , Rm} of R is dependency preserving with respect to F if the union of the projections of F on each Ri in D is equivalent to F. That is: ((F(R1)  ...  F(Rm))+ = F+ Given a set of dependencies F on R, the projection of F on Ri, denoted by F(Ri) where Ri is a subset of R, is the set of dependencies X  Y in F+ such that the attributes in X  Y are all contained in Ri.

Dependency Preservation Condition Given R(A, B, C, D) and F = { A  B, B  C, C  D}  Let D1={R1(A,B), R2(B,C), R3(C,D)} F(R1)={A  B} F(R2)={B  C} F(R3)={C  D} FDs are preserved.

Dependency Preservation Condition Given R(A, B, C, D) and F = { A  B, B  C, C  D}  Let D2={R1(A,B}, R2(B,C), R3(A, D)}, then FDs are not preserved.  

Dependency Preservation Condition Given R(A, B, C, D) and F = { A  B, B  C, C  D}  Let D2={R1(A,B}, R2(B,C), R3(A, D)}, then FDs are not preserved.  

Dependency Preservation Condition We want to preserve the dependencies because each dependency in F represents a constraint on a database.

Dependency Preservation Condition If one of the dependencies is not represented by the dependencies on some individual relation Ri of the decomposition, we will not be able to enforce this constraint by looking only at an individual relation, instead, to enforce the constraint, we will have to join two or more of the relations in the decomposition and then check that functional dependency hold in the result of the join operation. This is very inefficient.

Multivalued Dependencies and Fourth Normal Form Formal Definition of Multivalued Dependency

Multi-Valued Dependency Problem: multi-valued (or binary join) dependency Definition: If every instance of schema R can be (losslessly) decomposed using attribute sets (X, Y) such that: r =  X (r)  Y (r) then a multi-valued dependency exists Ex: Person= SSN,PhoneN(Person)  SSN,ChildSSN(Person)

Fourth Normal Form A schema is in fourth normal form if for every non-trivial multi-valued dependency: R = X Y either: - X  Y or Y  X (trivial case) or - X  Y is a superkey of R (i.e., X  Y R )

4th Normal Form No multivalued dependencies and BCNF Create separate tables for each separate functional dependency

Example SalesForce (State, SalesPerson) Delivery (State, Delivery)

Beyond 4th Normal Form 5th Normal Form Domain Key Normal Form (DKNF) Project-Join Normal Form Domain Key Normal Form (DKNF)

Assume the relation R contains the following two tuples R(A B C D) ( 1 2 3 4 ) … ( 1 5 6 7 ) …   What other tuples must R contain so that A ->-> B and A ->-> C hold for R ? Answer: The tuples that must be included due to the two multi-valued dependency are: (1 2 6 7) (1 5 3 4) (1 2 6 4) (1 5 3 7) (1 2 3 7) second round (1 5 6 4) second round  

Example Consider the following relation and determinants. R(a,b,c,d) a,c -> b,d a,d -> b To be in BCNF, all valid determinants must be a candidate key. In the relation R, a,c->b,d is the determinate used, so the first determinate is fine. a,d->b suggests that a,d can be the primary key, which would determine b. However this would not determine c. This is not a candidate key, and thus R is not in BCNF.

Tuple Relational Calculus based on specifying a number of tuple variables a tuple variable refers to any tuple

Simple example 1 To find all employees whose salary is greater than $50,000 {t| EMPLOYEE(t) and t.Salary>5000} where EMPLOYEE(t) specifies the range of tuple variable t The above operation selects all the attributes

Simple example 2 To find only the names of employees whose salary is greater than $50,000 {t.FNAME, t.NAME| EMPLOYEE(t) and t.Salary>5000} The above is equivalent to SELECT T.FNAME, T.LNAME FROM EMPLOYEE T WHERE T.SALARY > 5000

Elements of a tuple calculus In general, we need to specify the following in a tuple calculus expression: Range Relation (I.e, R(t)) = FROM Selected combination= WHERE Requested attributes= SELECT

Elements of formula A formula is made of Predicate Calculus atoms: an atom of the from R(ti) ti.A op tj.B op{=, <,>,..} F1 And F2 where F1 and F2 are formulas F1 OR F2 Not (F1) F’=(t) (F) or F’= (t) (F)  Y friends (Y, John) X likes(X, ICE_CREAM)

More Example For every project located in ‘Stafford’, retrieve the project number, the controlling department number, and the last name, birthrate, and address of the manger of that department.

Domain Relational Calculus (DRC) Another type of formal predicate calculus-based language QBE is based on DRC The language shares a lot of similarities with the tuple calculus

DRC The only difference is the type of variables: variables range over singles values from domains of attributes An expression of DRC is: {x1, x2,…,xn|COND(x1,x2,…,xn, xn+2,…,xn+m)} where x1,x2,…,xn+m are domain var range over attributers COND is a condition (or formula)

Examples Retrieve the birthdates and address of the employee whose name is ‘John B. Smith’ {uv| (q)(r)(s) (EMPLOYEE(qrstuvwxyz) and q=‘John’ and r=‘B’ and s=‘Smith’

Alternative notation Ssign the constants ‘John’, ‘B’, and ‘Smith’ directly {uv|EMPLOYEE (‘John’, ’B’, ’Smith’ ,t ,u ,v ,x ,y ,z)}

More example Retrieve the name and address of all employees who work for the ‘Reseach’ department {qsv | ( z) EMPLOYEE(qrstuvwxyz) and ( l) ( m) (DEPARTMENT (lmno) and l=‘Research’ and m=z))}

More example List the names of managers who have at least on e dependent {sq| ( t) EMPLOYEE(qrstuvwxyz) and (( j)( DEPARTMENT (hijk) and (( l) | (DEPENTENT (lmnop) and t=j and t=l))))}

Characteristics of a Decomposition Two important characteristics of a decomposition: lossless join: necessary, otherwise original relation cannot be recreated, even if tables are not modified dependency preserving: allows us to check that inserts/updates are correct without joining the sub-relations

Lossless Join T C S Smith DB Cohen Jones OS Levy C S DB Cohen OS Levy

Checking Check for a lossless join using the algorithm from class (with the a-s and b-s) Check for dependency preserving using an algorithm shown today

Dependency Preservation R=ABC Decomposition {AB, AC} Dependencies {AB, BC}. Is it lossless? Does this decomposition preserve BC?

Dependency Preservation (cont’d) B A 100 10 1 2 300 20 3 B A 10 1 2 30 3 4 C A 100 1 2 300 3 400 4

Definitions We define S (F) to be the set of dependencies XY in F+ such that X and Y are in S. We say that a decomposition R1...Rn of R is dependency preserving if for all instances r of R that satisfy the FDs of R: (R1 (F) U ... U Rn (F))+ = F+ Note that one inclusion clearly holds always. This definition implies an exponential algorithm to check if a decomposition is dependency preserving We give a polynomial algorithm

Algorithm Let R be a relation, decomposed into R1, R2,…,Rn Let F be a set of functional dependencies To check whether R1,…,Rn preserves all the functional dependencies in F, run the algorithm on the next slide for each X -> Y in F If the answer is “Yes” for all FDs, then the decomposition preserves F If the answer is “No” for at least one FD, then the decomposition does not preserve F

Testing Dependency Preservation To check if the decomposition preserves XY: Z:=X while changes to Z occur do for i=1 to n do Z:= Z  ((Z  Ri)+  Ri) if YZ return “yes” else return “no”

Example (1) R=ABCD F = {A -> B, B -> C, C -> D, D -> A} R1=AB, R2=BC, R3=CD Is this decomposition dependency preserving?

Example (2) R = ABCDE F = {A -> ABCDE, BC -> A, DE -> C} Suppose we decompose R into ABDE and DEC. Is the decomposition dependency preserving?

Normal Forms The basic idea: if a relation is in one of these forms, then it avoids certain problems (e.g., redundancy) Normal Forms: BCNF: Every dependency X->A in F+ must be (1) trivial or (2) X is a super-key 3NF: Every dependency X->A in F+ must be (1) trivial, (2) X is a super-key or (3) A is an attribute of a key

Example Reminder F+ = {X -> X+ | exist Y->Z in F st Y in X and Z not in X} Suppose that R = ABC. For each of the following values of F, decide whether R is in BCNF/3NF: F = {} F = {A -> B} F = {A -> B, A -> C} F = {A -> B, B -> C} F = {A -> B, BC -> A}

Decomposition into 3NF Given a relation R with functional dependencies F Step 1: Find a non-redundant cover G of F Step 2: For each FD XA in G, create a schema XA Step 3: If no schema created so far contains a key, add a key as a schema Step 4: Remove schemas that are contained in other schemas The result is a decomposition into 3NF that is dependency preserving and has a lossless join

Example Find a decomposition into 3NF for the relation R = ABCDEFGH, with the functional dependencies F = {AB, ABCDE, EFGH, ACDFEG}

Example Non-redundant cover G = {AB, ACDE, EFG, EFH} Key ACDF Schema: AB, ACDE, EFG, EFH, ACDF