Algorithm of Compute F+

Slides:

Advertisements

Similar presentations

Schema Refinement: Normal Forms

Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Logical Database Design (3 of 3) John Ortiz. Lecture 7Logical Database Design (2)2 Normalization  If a relation is not in BCNF or 3NF, we refine it by.

Boyce-Codd NF Takahiko Saito Spring 2005 CS 157A.

Lossless Decomposition (2) Prof. Sin-Min Lee Department of Computer Science San Jose State University.

©Silberschatz, Korth and Sudarshan Relational Database Design First Normal Form Pitfalls in Relational Database Design Functional Dependencies Decomposition.

7.1 Chapter 7: Relational Database Design. 7.2 Chapter 7: Relational Database Design Features of Good Relational Design Atomic Domains and First Normal.

Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.

1 Database Systems: A Practical Approach to Design, Implementation and Management International Computer Science S. Carolyn Begg, Thomas Connolly Lecture.

Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.

Slides adapted from A. Silberschatz et al. Database System Concepts, 5th Ed. Relational Database Design - part 2 - Database Management Systems I Alex Coman,

CMSC424: Database Design Instructor: Amol Deshpande

The principal problem that we encounter is redundancy, where a fact is repeated in more than one tuple. Most common cause: attempts to group into one relation.

1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.

Normalization I.

Fall 2001Arthur Keller – CS 1804–1 Schedule Today Oct. 4 (TH) Functional Dependencies and Normalization. u Read Sections Project Part 1 due. Oct.

Department of Computer Science and Engineering, HKUST Slide 1 7. Relational Database Design.

Boyce-Codd Normal Form By: Thanh Truong. Boyce-Codd Normal Form Eliminates all redundancy that can be discovered by functional dependencies But, we can.

Chapter 14 Advanced Normalization Transparencies © Pearson Education Limited 1995, 2005.

Chapter 8: Relational Database Design First Normal Form First Normal Form Functional Dependencies Functional Dependencies Decomposition Decomposition Boyce-Codd.

Relational Database Design

©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.

Chapter 10 Functional Dependencies and Normalization for Relational Databases.

CS 405G: Introduction to Database Systems 16. Functional Dependency.

FUNCTIONAL DEPENDENCIES

FUNCTIONAL DEPENDENCIES. Chapter Outline 1 Informal Design Guidelines for Relational Databases 1.1Semantics of the Relation Attributes 1.2 Redundant Information.

Chapter 7: Relational Database Design. 7.2Unite International CollegeDatabase Management Systems Chapter 7: Relational Database Design Features of Good.

Relational Database Design by Relational Database Design by Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING.

Chapter 8: Relational Database Design First Normal Form First Normal Form Functional Dependencies Functional Dependencies Decomposition Decomposition Boyce-Codd.

DAVID DENG CS157B MARCH 23, 2010 Dependency Preserving Decomposition.

Normalization. 2 Objectives u Purpose of normalization. u Problems associated with redundant data. u Identification of various types of update anomalies.

Normalization. Learners Support Publications 2 Objectives u The purpose of normalization. u The problems associated with redundant data.

Lecture 6 Normalization: Advanced forms. Objectives How inference rules can identify a set of all functional dependencies for a relation. How Inference.

CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.

Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 7: Relational.

CSC271 Database Systems Lecture # 28.

SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.

BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.

Computing & Information Sciences Kansas State University Tuesday, 27 Feb 2007CIS 560: Database System Concepts Lecture 18 of 42 Tuesday, 27 February 2007.

Functional Dependencies and Normalization 1 Instructor: Mohamed Eltabakh

Chapter 10 Normalization Pearson Education © 2009.

IST 210 Normalization 2 Todd Bacastow IST 210. Normalization Methods Inspection Closure Functional dependencies are key.

Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.

Copyright, Harris Corporation & Ophir Frieder, The Process of Normalization.

Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization.

9/23/2012ISC329 Isabelle Bichindaritz1 Normalization.

Normalization. 2 u Main objective in developing a logical data model for relational database systems is to create an accurate representation of the data,

1 Multivalued Dependencies Fourth Normal Form Reasoning About FD’s + MVD’s.

Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

11/10/2009GAK1 Normalization. 11/10/2009GAK2 Learning Objectives Definition of normalization and its purpose in database design Types of normal forms.

© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.

Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.

ITD1312 Database Principles Chapter 4C: Normalization.

1 Lecture 9: Database Design Wednesday, January 25, 2006.

Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.

1 CS490 Database Management Systems. 2 CS490 Database Normalization.

4NF & MULTIVALUED DEPENDENCY By Kristina Miguel. Review  Superkey – a set of attributes which will uniquely identify each tuple in a relation  Candidate.

Advanced Normalization

Normalization DBMS.

Module 5: Overview of Database Design -- Normalization

Relational Database Design

CS 480: Database Systems Lecture 22 March 6, 2013.

3.1 Functional Dependencies

Advanced Normalization

Functional Dependencies and Normalization

Multivalued Dependencies & Fourth Normal Form (4NF)

Multivalued Dependencies

Presentation transcript:

Algorithm of Compute F+ F + = F repeat Step-1 apply reflexivity rules for each functional dependency f in F+ Step-2 augmentation rules for each functional dependency f in F+ Step-3 add the resulting functional dependencies to F + step-4 for each pair of functional dependencies f1and f2 in F + if f1 and f2 can be combined using transitivity rule Step-5 add the resulting functional dependency to F + until F + does not change any further

R= {A,B,C,D,E} F = {A → B, B → C, C D → E } • Step 1: For each f in F, apply reflexivity rule – We get: CD → C; CD → D – Add them to F: • F = {A → B, B → C, C D → E; CD → C; CD → D } • Step 2: For each f in F, apply augmentation rule – From A → B we get: AC → BC; AD→ BD; ABC →BC; ABD → BD; – From B → C we get: AB → AC; BC → C; BD → CD; ABC → AC; ABD → ACD, etc etc. Step 3: Apply transitivity on pairs of f’s • Keep repeating… You get the idea

Computing X+ Input F (a set of FDs), and X (a set of attributes) Output: Result=X+ (under F) Method: While(changes to result)do For each function dependency Y→ Z in F do Begin If Y result then result = result U Z; End

Uses of Attribute Closure There are several uses of the attribute closure algorithm: • Testing for superkey: – To test if X is a superkey, we compute X+, and check if X+ contains all attributes of R. • Testing functional dependencies – To check if a functional dependency X → Y holds (or, in other words, is in F+), just check if Y ⊆ X+. – That is, we compute X+ by using attribute closure, and then check if it contains Y. – Is a simple and cheap test, and very useful • Computing closure of F

Example OF X+ R=(A,B,C,G,H,I) F= {A → B,A → C,CG → H,CG → I,B → H} Step 1: Result = AG Step -2 A → B causes us to include B in result. We know that A → B is in F, A result ( which is AG), so result=result U B A → C causes result to become ABCG. CG → H causes result to become ABCGH. CG → I causes result to become ABCGHI

How to find List OF Candidate key if + = R, then  is a super key for R if we find + for all  R, we've computed F+ (except that we'd need to use decomposition to get all of it).

Compute the closure for relational schema R={A,B,C,D,E} A-->BC CD-->E B-->D E-->A List candidate keys of R.

Find the attributes that are neither on the left and right side Find attributes that are only on the right side Find attributes that are only on the left side Combine the attributes on step 1 and 3

Canonical Cover A canonical cover for F is a set of dependencies Fc such that F logically implies all dependencies in Fc, and Fc logically implies all dependencies in F, and No functional dependency in Fc contains an extraneous attribute, and Each left side of functional dependency in Fc is unique Intuitively, a canonical cover of F is a “minimal” set of functional dependencies equivalent to F, having no redundant dependencies or redundant parts of dependencies

Extraneous Attributes Consider F, and a functional dependency, A  B. “Extraneous”: Are there any attributes in A or B that can be safely removed ? Without changing the constraints implied by F

Every functional dependency a->b in Fc contains no extraneous attributes in a(ones that can be removed from without changing fc+ ). So A is extraneous in if logically implies Fc

Every functional dependency a->b in Fc contains no extraneous attributes in b(ones that can be removed from b without changing fc+). So A is extraneous in b if and logically implies .fc

Testing if an Attribute is Extraneous Consider a set F of functional dependencies and the functional dependency    in F. To test if attribute A   is extraneous in  compute ({} – A)+ using the dependencies in F check that ({} – A)+ contains A; if it does, A is extraneous To test if attribute A   is extraneous in  compute + using only the dependencies in F’ = (F – {  })  { ( – A)}, check that + contains A; if it does, A is extraneous

Example we need to compute ({ACD} – A)+ using the dependencies in F and check if the result contains A; if it does, attribute A is extraneous. Computing CD+, we get ACDEGB which contains A and therefore, A is extraneous. Also the closure contains B, which tells us that CD -> B holds.

To compute a canonical cover for F: repeat Use the union rule to replace any dependencies in F 1  1 and 1  2 with 1  1 2 Find a functional dependency    with an extraneous attribute either in  or in  If an extraneous attribute is found, delete it from    until F does not change Note: Union rule may become applicable after some extraneous attributes have been deleted, so it has to be re-applied

{ACB, DEG, BCD, CGD, CEA} No more extraneous attributes FC = {ACB, DEG, BCD, CGD, CEA} * Different order of considering the extraneous attributes can result in different FC

R = (A, B, C) F = {A  BC B  C A  B AB  C} Combine A  BC and A  B into A  BC Set is now {A  BC, B  C, AB  C} A is extraneous in AB  C Check if the result of deleting A from AB  C is implied by the other dependencies Yes: in fact, B  C is already present! Set is now {A  BC, B  C} C is extraneous in A  BC Check if A  C is logically implied by A  B and the other dependencies Yes: using transitivity on A  B and B  C. Can use attribute closure of A in more complex cases The canonical cover is: A  B B  C

Given F = {A  C, AB  C } Given F = {A  C, AB  CD} B is extraneous in AB  C because {A  C, AB  C} is equivalent to {A  C, A  C } = {A  C} Given F = {A  C, AB  CD} C is extraneous in AB  CD because {A  C, AB  CD} is equivalent to {A  C, AB  D}

A canonical cover might not be UNIQUE R= A,B,C,D,E f= AB→CD, A→E , E→C To check if C is extraneous in AB→CD we compute (AB)+ under F’ = AB→ D A→ E E→ C AB+ = ABCDE which includes C so we can say that C is extraneous. To check if D is extraneous in AB→CD we compute (AB)+ under F’ = AB→ C E→ C AB+ = ABCE which does not include D so we can say that D is not extraneous. Now we have AB→ D S0 Fc = {AB->,A->E,E->C}

Normalization Formal technique for analyzing a relation Based on primary key and functional dependencies Series of Steps With each successive step the relation gets more restricted

The process of normalization was developed by Dr. E. F. Codd The process of normalization was developed by Dr. E.F.Codd. Three normal forms were initially proposed. It is a formal process for deciding which attributes should be grouped together. The database design flow is from logical design to physical design. In the process of normalization, the designers analyze and decompose the complex relation and transform them into smaller, simpler and well structured relations. With the help of normalization a record structure is replaced with a new record structure which is more simper and more manageable.

Goals Of Normalization To eliminate the redundant data. To Eliminate Reaping Groups To provide simple retrieval of data in response of query. To simplify various operations like update, deletion, and insertions. To Eliminate Columns Not Dependent on key. Isolate Independent Multi Relationships Isolate semantically related multiple relationship

1NF(First Normal Form) A domain is atomic if elements of the domain are considered to be individual unit. This means there are no multivalve attributes (repeating groups). We eliminate the substructure of attribute. We say that a relation schema R is in First Normal Form if the domains of all attributes of R are atomic. Composite attribute such as address with components street and city have non atomic domains. The set of integer is atomic domain. The set of all sets of integers is a non atomic domain. We remove the repeating group and creating a separate relation containing repeating group. Here the original record and new records are interrelated by a common data item All relational tables must satisfy the 1NF requirements.

1NF(First Normal Form) A relation in which intersection of each row and column contains one and only one value. (Each Data Value Stored in relation is single – valued) 1NF – a relation that contains no repeating groups

UNF to 1NF Nominate an attribute or group of attributes to act as the key for the Unnormalized table. Identify repeating group(s) in unnormalized table which repeats for the key attribute(s). Remove repeating group by: entering appropriate data into the empty columns of rows containing repeating data

Second Normal Form (2NF) Based on concept of full functional dependency: A and B are attributes of a relation, B is fully dependent on A if B is functionally dependent on A but not on any proper subset of A. 2NF - A relation that is in 1NF and every non-primary-key attribute is fully functionally dependent on the primary key. Relations with a simple key are automatically in 2NF Need to concentrate on relations with composite keys

1NF to 2NF Identify primary key for the 1NF relation. Identify functional dependencies in the relation. If partial dependencies exist on the primary key remove them by placing them in a new relation along with copy of their determinant.

3NF(Third Normal Form) Based on concept of transitive dependency: A, B and C are attributes of a relation such that if A  B and B  C, then C is transitively dependent on A through B. (Provided that A is not functionally dependent on B or C). 3NF - A relation that is in 1NF and 2NF and in which no non-primary-key attribute is transitively dependent on the primary key. No non-key attribute is a determinant for another non-key attribute

2NF to 3NF Identify the primary key in the 2NF relation. Identify functional dependencies in the relation. If transitive dependencies exist on the primary key remove them by placing them in a new relation along with copy of their determinant.

Definition of Decomposition Let R be a relation schema A set of relation schemas { R1, R2,…, Rn } is a decomposition of R if R = R1 U R2 U …..U Rn each Ri is a subset of R ( for i = 1,2…,n)

Example of Decomposition For relation R(x,y,z) there can be 2 subsets: R1(x,z) and R2(y,z) If we union R1 and R2, we get R R = R1 U R2

Goal of Decomposition Eliminate redundancy by decomposing a relation into several relations in a higher normal form. It is important to check that a decomposition does not lead to bad design

Example : Problem with Decomposition Model Name Price Category a11 100 Canon s20 200 Nikon a70 150 R1 R2 Model Name Category a11 Canon s20 Nikon a70 Price Category 100 Canon 200 Nikon 150

Example : Problem with Decomposition Model Name Price Category a11 100 Canon 150 s20 200 Nikon a70 R1 U R2 Model Name Price Category a11 100 Canon s20 200 Nikon a70 150 R

Lossy decomposition In previous example, additional tuples are obtained along with original tuples Although there are more tuples, this leads to less information Due to the loss of information, decomposition for previous example is called lossy decomposition or lossy-join decomposition

Lossless Decomposition A decomposition {R1, R2,…, Rn} of a relation R is called a lossless decomposition for R if the natural join of R1, R2,…, Rn produces exactly the relation R.

Lossless Decomposition Property R : relation F : set of functional dependencies on R X,Y : decomposition of R Decomposition is lossles if : X ∩ Y  X, that is: all attributes common to both X and Y functionally determine ALL the attributes in X OR X ∩ Y  Y, that is: all attributes common to both X and Y functionally determine ALL the attributes in Y

Lossless Decomposition Property In other words, if X ∩ Y forms a superkey of either X or Y, the decomposition of R is a lossless decomposition

Example : Lossless Decomposition Given: Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount) Required FD’s: branch-name -> branch-city assets loan-number -> amount branch-name Decompose Lending-schema into two schemas: Branch-schema = (branch-name, branch-city, assets) Loan-info-schema = (branch-name, customer-name, loan-number, amount)

Since Branch-schema ∩ Loan-info-schema = {branch-name} branch-name -> branch-city assets Thus, this decomposition is Lossless decomposition

Dependency Preservation A decomposition D = {R1, R2, ..., Rn} of R is dependency-preserving with respect to F if the union of the projections of F on each Ri in D is equivalent to F; that is if (F1  F2  …  Fn )+ = F +

Dependency preservation is another important requirement since Dependency is constrain on the database If constrain is split over more than one relation then constrain would difficult to meet (Dependency is not Preserved )

Example of Dependency Preservation R(A B C D) FD1: A  B FD2: B  C FD3: C  D Decomposition: R1(A B C) R2(C D)

FD1: A  B FD2: B  C FD3: C  D R1( A B C ) FD1 FD2

FD1: A  B FD2: B  C FD3: C  D R1( A B C ) R2( C D ) FD1 FD3 FD2 Has all 3 functional dependencies! Therefore, it’s preserving the dependencies

Example of Non-Dependency Preservation R(A B C D) FD1: A  B FD2: B  C FD3: C  D Decomposition: R1(A C D) R2(B C)

FD1: A  B FD2: B  C FD3: C  D R1( A C D ) R2( B C ) FD3 FD2 Does not support FD1: A => B Therefore, it does not preserve the dependencies

Dependency preservation is another important requirement since Dependency is constrain on the database If constrain is split over more than one relation then constrain would difficult to meet (Dependency is not Preserved )

Checking Dependency Preservation If All the attribute appearing on the left and right side of dependency appear in the same relation then a dependency is considered preserved

Algorithm for Checking Dependency Preservation Test each X  Y in F for dependency preservation result = X while (changes to result) do for each Ri in decomposition t = (result  Ri)+  Ri result = result  t if Y  result, return true; else, return false; [Note: If any false is returned for algorithm, whole decomposition is not dependency preserving.]

Choose a functional dependency in set F, say you choose X  Y. 2. Let set Z to the “left hand side” of the functional dependency, X such Z = X Starting with R1 in the decomposed set {R1, R2,…Rn) 3. Intersect Z with R1, Z  R1 4. Find the closure of the result from step 3 (Z  R1) using original set F 5. Intersect the result from step 4 ((Z  R1)+) with R1 again.

. Updated Z with new attribute in the result from step 5. 7. Repeat step 3-6 from R2, R3, …, Rn. 8. If there’s any changes between original Z before step 3 and after step 7, repeat step 3-7. 9. Check whether Y is a proper subset of current Z. If it is not, this decomposition is a violation of dependency preservation. You can stop now. 10. If Y is a proper subset of current Z, repeat 1-9 until you check ALL functional dependencies in set F.

Example using Algorithm Given the following: R(A,B,C,D,E) F = {ABC, CE, BD, EA} R1(B,C,D) R2(A,C,E) Is this decomposition dependency preserving?

Update Z = AB  BD = ABD, continue R(A,B,C,D,E) F = {ABC, CE, BD, EA} Decomposition: R1(B,C,D) R2(A,C,E) Z=AB For Z  R1 = AB  BCD = B {B}+ = BD {B}+  R1 = BD  BCD = BD Update Z = AB  BD = ABD, continue

Z=ABD For Z  R2 = ABD  ACE = A {A}+ = A {A}+  R2 = A  ACE = A Update Z, Z is still ABD Since Z changed, repeat checking R1 to R2.

Z=ABD For Z  R1 = ABD  BCD = BD {BD}+ = BD {BD}+  R1 = BD  BCD = BD Update Z = ABD  BD = ABD, so Z hasn’t changed but you still have to continue.

R(A,B,C,D,E) F = {ABC, CE, BD, EA} Decomposition: R1(B,C,D) R2(A,C,E) Z=ABD and checking R2 was done 2 slides ago Z will still be ABD. Since Z hasn’t change, you can conclude ABC is not preserved. Let’s practice with other functional dependencies.

R(A,B,C,D,E) F = {ABC, CE, BD, EA} Decomposition: R1(B,C,D) R2(A,C,E) Z=X=B For Z  R1 = B  BCD = B {B}+ = BD {B}+  R1 = BD  BCD = BD Update Z = B  BD = BD Since Y=D is proper subset of BD, BD is preserved.

R(A,B,C,D,E) F = {ABC, CE, BD, EA} Decomposition: R1(B,C,D) R2(A,C,E) Z=X=C For Z  R2 = C  ACE = C {C}+ = CEA {C}+  R1 = CEA  ACE = ACE Update Z = C ACE= ACE Since Y=E is proper subset of ACE, CE is preserved.

R(A,B,C,D,E) F = {ABC, CE, BD, EA} Decomposition: R1(B,C,D) R2(A,C,E) Z=X=E For Z  R1 = E  ACE = E {E}+ = EA {E}+  R1 = EA  ACE = EA Update Z = E  EA= EA Since Y=A is proper subset of EA, EA is preserved.

R(A,B,C,D,E) F = {ABC, CE, BD, EA} Decomposition: R1(B,C,D) R2(A,C,E) Shortcut: For any functional dependency, if both LHS and RHS collectively are within any of the sub scheme Ri. Then this functional dependency is preserved.

Example = 2 Let R{A,B,C,D} and F={AB, BC, CD, DA} Let’s decomposed R into R1 = AB, R2 = BC, and R3 = CD Is this a dependency preserving decomposition?

Example = 2 R{A,B,C,D} F={AB, BC, CD, DA} Decomposition: R1 = AB, R2 = BC, and R3 = CD Yes it is. You can immediately see that AB, BC, CD are preserved for R1, R2, R3 The key is to check whether DA is preserved. Let’s walk through the algorithm.

For Z  R1 = D  AB = empty set For Z  R2 = D  BC = empty set R{A,B,C,D} F={AB, BC, CD, DA} Decomposition: R1 = AB, R2 = BC, and R3 = CD Z = X = D For Z  R1 = D  AB = empty set For Z  R2 = D  BC = empty set For Z  R3 = D  CD = D Find {D}+ = DABC Find {D}+  R3 = DABC  CD = CD Update Z to CD. Since Z changed, repeat.

Z = BCD For Z  R1 = BCD  AB = B Find {B}+ = BCDA Find {B}+  R1 = BCDA  AB = AB Update Z = BCD  AB = ABCD. Since Y = A is a subset of ABCD, function DA is preserved.

Example = 3 R{A,B,C,D,E) F={ABD, BE} Decomposition: R1{A,B,C} R2{A,D} R3{B,D,E} Is this a dependency preserving decomposition?

Let’s start with ABD: Z = A Z  R1 = A  ABC = A {A}+ = ABDE R{A,B,C,D,E) F={ABD, BE} Decomposition: R1{A,B,C} R2{A,D} R3{B,D,E} Let’s start with ABD: Z = A Z  R1 = A  ABC = A {A}+ = ABDE {A}+  R1 = ABDE  ABC = AB Update Z = A  AB = AB

Z = AB Z  R2 = A  AD = A {A}+ = ABDE {A}+  R1 = ABDE  AD = AD Update Z = AB  AD = ABD Thus A BD preserved

Based on R3, BE is preserved. Check B  E: Z = B Z  R1 = B  ABC = B R{A,B,C,D,E) F={ABD, BE} Decomposition: R1{A,B,C} R2{A,D} R3{B,D,E} Based on R3, BE is preserved. Check B  E: Z = B Z  R1 = B  ABC = B {B}+ = BE {B}+  R1 = BE  ABC = B Update Z = B still the same

Z = B Z  R2 = B  AD = empty set Z  R3 = B  BDE = B {B}+ = BE R{A,B,C,D,E) F={ABD, BE} Decomposition: R1{A,B,C} R2{A,D} R3{B,D,E} Z = B Z  R2 = B  AD = empty set Z  R3 = B  BDE = B {B}+ = BE {B}+  R3 = BE  BDE = BE Update Z = B  BE = BE Thus BE preserved

non trivial dependency means X-->Y that is if Y is not proper subset of X table or relation with X then it said to be non trivial functional dependency A trivial functional dependency occurs when you describe a functional dependency of an attribute on a collection of attributes that includes the original attribute. This type of functional dependency is called trivial {A, B} -> B

Definition BCNF A relation schema R is in BCNF with respect to a set F if: For all functional dependencies of F of the form , where R and R  is a trivial functional dependency()  is a superkey for schema R A database design is in BCNF if each member of the set of relational schemas that constitute the design is in BCNF

BCNF When a relation has more than one candidate key, anomalies may result even though the relation is in 3NF. 3NF does not deal satisfactorily with the case of a relation with overlapping candidate keys i.e. composite candidate keys with at least one attribute in common. BCNF is based on the concept of a determinant.

A determinant is any attribute (simple or composite) on which some other attribute is fully functionally dependent. A relation is in BCNF is, and only if, every determinant is a candidate key. A table is in Boyce-Codd normal form (BCNF) if and only if it is in 3NF and every determinant is a candidate key. It is a slightly stronger version of the third normal form (3NF)

BCNF Decomposition Algorithm If R is not in BCNF, we can decompose R into a collection of BCNF schemas R1 , R2, …, Rn Result := {R}; done := false; computer F+ while(not done) do if(there is a schema Ri in result that is not in BCNF) then begin  be a nontrivial functional dependency that holds on Ri such that -> Ri is not in F+, and  = ; result :=(result – Ri)  (Ri - )  (,); end else done := true;

BCNF_Decompose(R) find X .: X ≠X+ ≠ [all attributes] if (not found) then “R is in BCNF” let Y = X+ - X let Z = [all attributes] - X+ decompose R into R1(X U Y) and R2(X U Z) continue to decompose recursively R1 and R2

R(A;B;C;D;E) A ->B CD ->E Iteration 1: R A+ = AB Decompose into R1 = AB, R2 = ACDE. Continue to decompose R2 since R1 is in BCNF form already. Iteration 2: R2 CD+ = CDE Decompose R2 into R3 = CDE and R4 = CDA

Conditions for BCNF The candidate keys in relations are composite key There is more than one overlapping candidate key in relations and some attributes in keys are overlapping and some are nor overlapping . There is a FD from non overlapping attribute of one candidate key to non overlapping attribute of other key

title year length filmType studioName star Name Star Wars 1977 124 color Fox Fisher Hamill Ford Mighty Ducks 1991 104 Disney Esteves Wayne’s World 1992 95 Paramount Carvey Meyers

{title, year, starName} as candidate key title, year  length, filmType, studioName The above FD (Functional Dependency) violates the BCNF condition because title and year do not determine the sixth attribute, starName

We solve this BCNF violation by decomposing relation Movies into 1. The schema with all the attributes of the FD {title, year, length, filmType, studioName} 2. The schema with all attributes of Movies except the three that appear on the right of the FD {title, year, starName}

Diffence between BCNF and 3NF In a table that is in the BCNF normal form, for every non-trivial functional dependency of the form A → B, A is a super-key whereas, a table that complies with 3NF should be in the 2NF, and every non-prime attribute should directly depend on every candidate key of that table. BCNF is considered as a stronger normal form than the 3NF and it was developed to capture some of the anomalies that could not be captured by 3NF.

A decomposition into 3NF is lossless and dependency preservation whereas A decomposition into BCNF lossless but may or may not be dependency preservation

What is the relationship between 3NF and BCNF? 3NF = BCNF 3NF≠ BCNF 3NF is subset of BCNF BCNF is subset of 3NF

Example Grade_report (StudNo,StudName,Major, Adviser, CourseNo,Ctitle,InstrucName,InstructLocn,Grade) • Functional dependencies – StudNo -> StudName – CourseNo -> Ctitle, InstrucName – InstrucName -> InstrucLocn – StudNo , Major, CourseNo-> Grade – StudNo, Major -> Advisor – Advisor -> Major

1NF Remove repeating groups Student (StudNo, StudName) [studno => studname] – StudMajor (StudNo, Major, Advisor) [studno, major => advisor] – StudCourse (StudNo, Major, CourseNo, Ctitle, InstrucName, InstructLocn, Grade)

2NF Remove Functional dependencies Student (StudNo, StudName) StudMajor (StudNo, Major, Advisor) StudCourse (StudNo, Major, CourseNo, Grade) Course (CourseNo, Ctitle, InstrucName, InstructLocn)

3NF Remove transitive dependencies Student (StudNo,StudName) StudMajor (StudNo,Major,Advisor) StudCourse (StudNo,Major,CourseNo,Grade) Course (CourseNo, Ctitle, InstrucName) Instructor (InstructName, InstructLocn)

BCNF Every determinant must be a candidate key – Student : only determinant is StudNo – StudCourse: only determinant is StudNo, Major, CourseNo – Course: only determinant is CourseNo – Instructor: only determinant is InstrucName – StudMajor: the determinants are • StudNo,Major [StudNo, Major -> advisor] • Advisor [Advisor -> Major] Only StudNo,Major is a candidate key. I.e. Stud Major (StudNo,Major,Advisor)

BCNF BCNF remove non-candidate determinants • Student (StudNo,StudName) StudCourse (StudNo,Major,CourseNo,Grade) Course (CourseNo,Ctitle,InstrucName) Instructor (InstructName,InstructLocn) StudMajor (StudNo,Major,Advisor) Advisor -> Major • StudMajor (StudNo, Advisor) Advisor (Advisor, Major)

Problems that BCNF overcomes StudMajor (StudNo,Major,Advisor) StudNo Major Advisor 123 .net Kirti roshania Java MitalVora 456 789 Sql server Anriod Jelam vora Vimal Parmar

In BCNF we get two tables: Stud Major (StudNo, Advisor) & Advisor (Advisor, Major)

Testing for BCNF To check if a nontrivial dependency  causes a violation of BCNF, compute a+(attribute closure of ), and verify that it includes all attributes of R; that is, is is the super key of R If we can show that none of the dependencies in F causes a violation of BCNF, then none of the dependencies in F+ will cause a violation of BCNF either For every subset  of attributes in Ri check that a+(the attribute closer of  under F ) either includes no attribute of Ri - , or includes all attributes of Ri

lossless-join dependency-preserving decomposition into 3NF. Find the Fc be canonical cover for F i=0 For each FD a->b in fc do If none of schemas Rj J= 1,2…iContains ab Then begin i = i+1; Ri= ab End If none of schemas Rj, J= 1,2…i contains candidate key for R i=i+1 Ri = any candidate key for R Return (R1,R2….Rn)

1) Construct a canonical cover of F. In our case FC = F. 2) Initially we have an empty set of Rj (j = 0). Therefore, none of Rj contains ABC (we take a dependency from the canonical cover A ® B). So R1 = (A, B, C). Consider CD ® E. CDE is not in R1, hence we add R2 = (C, D, E). Similarly, we add R3 = (B, D), and R4 = (E, A). 3) R1 contains a candidate key for R, therefore we do not need to add a relation consisting of a candidate key. Finally, the received decomposition is (A, B, C), (C, D, E), (B, D), (E, A).

ClientInterview ClientNo interviewDate interviewTime staffNo roomNo CR76 13-May-02 10.30 SG5 G101 12.00 CR74 SG37 G102 CR56 1-Jul-02 FD1 clientNo, interviewDate  interviewTime, staffNo, roomNo (Primary Key) FD2 staffNo, interviewDate, interviewTime clientNo (Candidate key) FD3 roomNo, interviewDate, interviewTime  clientNo, staffNo (Candidate key) FD4 staffNo, interviewDate  roomNo (not a candidate key) As a consequece the ClientInterview relation may suffer from update anmalies. For example, two tuples have to be updated if the roomNo need be changed for staffNo SG5 on the 13-May-02.

4th Normal Form (4 NF) A relation is a 4 NF if it is BCNF and There is no multivalued dependency in the relation or There are multivalued dependency but the attributes, which are multivalued dependent on a specific attribute, are dependent between themselves

The redundancy that comes from MVD’s is not removable by putting the database schema in BCNF. There is a stronger normal form, called 4NF, that (intuitively) treats MVD’s as FD’s when it comes to decomposition, but not when determining keys of the relation. That mean No Relation may contain two or more 1:n and N:M relationship that are not directly related.

Definition A relation R is in 4NF if whenever X ‐>‐>Y is a nontrivial MVD, then X is a superkey. – Nontrivial means that: 1. Y is not a subset of X, and 2. X and Y are not, together, all the attributes. – Note that the definition of “superkey” still depends on FD’s only. 17

If relation in 4NF then is also in BCNF IF relation in BCNF then is also in 3NF

What is a multi valued dependency (MVD)? Multi valued dependencies (MVD’s) express a condition among tuples of a relation that exists when the relation is trying to represent more than one many‐many relationship. Multi-valued Dependency (MVD) Dependency between attributes (for example, A, B, and C) in a relation, such that for each value of A there is a set of values for B and a set of values for C. However, the set of values for B and C are independent of each other.

Drinkers(name, addr, phones, beersLiked) • A drinker’s phones are independent of the beers they like. • Thus, each of a drinker’s phones appears with each of the beers they like in all combinations. – If a drinker has 3 phones and likes 10 beers, then the drinker has 30 tuples – where each phone is repeated 10 times and each beer 3 times • This repetition is unlike redundancy due to FD’s, of hi h dd i h l 3 which name‐>addr is the only one.

Definition of MVD A multivalued dependency (MVD) X ‐>‐>Y is an assertion that if two tuples of a relation agree on all the attributes of X, then their components in the set of attributes Y may be swapped, and the result will be two tuples that are also in the relation.

MVD Rules Every FD is an MVD – If X ‐>Y, then swapping Y ’s between two tuples that agree on X doesn’t change the tuples. – Therefore, the “new” tuples are surely in the relation, and we know X ‐>‐>Y. • Definition of keys depend on FDs and not MDs

3NF TO 4NF Relation Must Be in 3NF. Identify Multi valued dependencies in the relation. If multiple set of Multi valued dependencies exist remove them by placing them in a new relation along with copy of their determinant.