4NF and 5NF Prof. Sin-Min Lee Department of Computer Science.

Slides:



Advertisements
Similar presentations
1 Lecture 7 Design Theory for Relational Databases (part 1) Slides based on
Advertisements

Functional Dependencies (FDs)
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Lecture 21 CS 157 B Revision of Midterm3 Prof. Sin-Min Lee.
1 Loss-Less Joins. 2 Decompositions uDependency-preservation property: enforce constraints on original relation by enforcing some constraints on resulting.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Lossless Decomposition (2) Prof. Sin-Min Lee Department of Computer Science San Jose State University.
Functional Dependencies, Normalization Rose-Hulman Institute of Technology Curt Clifton.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.
Functional Dependencies. Babies At a birth, there is one baby (twins would be represented by two births), one mother, any number of nurses, and a doctor.
603 Database Systems Senior Lecturer: Laurie Webster II, M.S.S.E.,M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 8 A First Course in Database Systems.
Multivalued Dependency Prof. Sin-Min Lee Department of Computer Science.
1 Multi-valued Dependencies Salman Azhar Multi-valued Dependencies Fourth Normal Form These slides use some figures, definitions, and explanations from.
1 Multivalued Dependencies Fourth Normal Form Source: Slides by Jeffrey Ullman.
Relational Design. DatabaseDesign Process Conceptual Modeling -- ER diagrams ER schema transformed to relational schema Designer may add additional integrity.
1 Multivalued Dependencies Fourth Normal Form. 2 Definition of MVD uA multivalued dependency (MVD) on R, X ->->Y, says that if two tuples of R agree on.
1 Multivalued Dependencies Fourth Normal Form Sources: Slides by Jeffrey Ullman book by Ramakrishnan & Gehrke.
Multivalued Dependency Prof. Sin-Min Lee Department of Computer Science.
The principal problem that we encounter is redundancy, where a fact is repeated in more than one tuple. Most common cause: attempts to group into one relation.
Winter 2002Arthur Keller – CS 1804–1 Schedule Today: Jan. 15 (T) u Normal Forms, Multivalued Dependencies. u Read Sections Assignment 1 due. Jan.
1 Multivalued Dependencies Fourth Normal Form. 2 A New Form of Redundancy uMultivalued dependencies (MVD’s) express a condition among tuples of a relation.
1 Normalization Anomalies Boyce-Codd Normal Form 3 rd Normal Form.
Fall 2001Arthur Keller – CS 1805–1 Schedule Today Oct. 9 (T) Multivalued Dependencies, Relational Algebra u Read Sections 3.7, Assignment 2 due.
1 Normalization Anomalies Boyce-Codd Normal Form 3 rd Normal Form Source: Slides by Jeffrey Ullman.
Fall 2001Arthur Keller – CS 1804–1 Schedule Today Oct. 4 (TH) Functional Dependencies and Normalization. u Read Sections Project Part 1 due. Oct.
Ch 7: Normalization-Part 2 Much of the material presented in these slides was developed by Dr. Ramon Lawrence at the University of Iowa.
Decompositions uDo we need to decompose a relation? wSeveral normal forms for relations. If schema in these normal forms certain problems don’t.
Database Management Systems Chapter 3 The Relational Data Model (III) Instructor: Li Ma Department of Computer Science Texas Southern University, Houston.
Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2.
1 Normalization Anomalies Boyce-Codd Normal Form 3 rd Normal Form.
Databases 1 8th lecture. Topics of the lecture Multivalued Dependencies Fourth Normal Form Datalog 2.
Normalization Goal = BCNF = Boyce-Codd Normal Form = all FD’s follow from the fact “key  everything.” Formally, R is in BCNF if for every nontrivial FD.
Copyright © Curt Hill Schema Refinement III 4 th NF and 5 th NF.
Databases 2 Storage optimization: functional dependencies, normal forms, data ware houses.
Database Normalization.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Functional Dependencies. FarkasCSCE 5202 Reading and Exercises Database Systems- The Complete Book: Chapter 3.1, 3.2, 3.3., 3.4 Following lecture slides.
4NF (Multivalued Dependency), and 5NF (Join Dependency)
IST 210 Normalization 2 Todd Bacastow IST 210. Normalization Methods Inspection Closure Functional dependencies are key.
© D. Wong Ch. 3 (continued)  Database design problems  Functional Dependency  Keys of relations  Decompositions based on Functional Dependency.
Third Normal Form (3NF) Zaki Malik October 23, 2008.
1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization.
Normalization.
1 Multivalued Dependencies Fourth Normal Form Reasoning About FD’s + MVD’s.
1 Multivalued Dependencies Fourth Normal Form Reasoning About FD’s + MVD’s.
Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.
3 Spring Chapter Normalization of Database Tables.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Final Review Zaki Malik November 20, Basic Operators Covered.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
Design Theory for Relational Databases Functional Dependencies Decompositions Normal Forms: BCNF, Third Normal Form Introduction to Multivalued Dependencies.
1 Lecture 8 Design Theory for Relational Databases (part 2) Slides from
4NF & MULTIVALUED DEPENDENCY By Kristina Miguel. Review  Superkey – a set of attributes which will uniquely identify each tuple in a relation  Candidate.
1 Database Design: DBS CB, 2 nd Edition Physical RDBMS Model: Schema Design and Normalization Ch. 3.
Design Theory for Relational Databases
Schedule Today: Next After that Normal Forms. Section 3.6.
CPSC-310 Database Systems
Schedule Today: Jan. 23 (wed) Week of Jan 28
3.1 Functional Dependencies
BCNF and Normalization
CPSC-310 Database Systems
Multivalued Dependencies & Fourth Normal Form (4NF)
Multivalued Dependencies & Fourth Normal Form
Multivalued Dependencies & Fourth Normal Form
Multivalued Dependencies
Anomalies Boyce-Codd Normal Form 3rd Normal Form
CS4222 Principles of Database System
Presentation transcript:

4NF and 5NF Prof. Sin-Min Lee Department of Computer Science

Schedule Today: Jan. 15 (T) –Normal Forms, Multivalued Dependencies. –Read Sections Assignment 1 due. Jan. 17 (TH) –Relational Algebra. –Read Chapter 5. Project Part 1 due. Jan. 22 (T) –SQL Queries. –Read Sections Assignment 2 due. Jan. 24 (TH) –Subqueries, Grouping and Aggregation. –Read Sections Project Part 2 due.

Normalization Goal = BCNF = Boyce-Codd Normal Form = all FD’s follow from the fact “key  everything.” Formally, R is in BCNF if for every nontrivial FD for R, say X  A, then X is a superkey. –“Nontrivial” = right-side attribute not in left side. Why? 1. Guarantees no redundancy due to FD’s. 2. Guarantees no update anomalies = one occurrence of a fact is updated, not all. 3. Guarantees no deletion anomalies = valid fact is lost when tuple is deleted.

Example of Problems Drinkers(name, addr, beersLiked, manf, favoriteBeer) FD’s: 1. name  addr 2. name  favoriteBeer 3. beersLiked  manf ???’s are redundant, since we can figure them out from the FD’s. Update anomalies: If Janeway gets transferred to the Intrepid, will we change addr in each of her tuples? Deletion anomalies: If nobody likes Bud, we lose track of Bud’s manufacturer.

Each of the given FD’s is a BCNF violation: Key = { name, beersLiked } –Each of the given FD’s has a left side that is a proper subset of the key. Another Example Beers(name, manf, manfAddr). FD’s = name  manf, manf  manfAddr. Only key is name. –Manf  manfAddr violates BCNF with a left side unrelated to any key.

Decomposition to Reach BCNF Setting: relation R, given FD’s F. Suppose relation R has BCNF violation X  B. We need only look among FD’s of F for a BCNF violation, not those that follow from F. Proof: If Y  A is a BCNF violation and follows from F, then the computation of Y + used at least one FD X  B from F. –X must be a subset of Y. –Thus, if Y is not a superkey, X cannot be a superkey either, and X  B is also a BCNF violation.

1. Compute X +. –Cannot be all attributes – why? 2. Decompose R into X + and (R–X + )  X. 3. Find the FD’s for the decomposed relations. –Project the FD’s from F = calculate all consequents of F that involve only attributes from X + or only from (R  X + )  X. R X + X

Example R = Drinkers(name, addr, beersLiked, manf, favoriteBeer) F = 1. name  addr 2. name  favoriteBeer 3. beersLiked  manf Pick BCNF violation name  addr. Close the left side: name + = name addr favoriteBeer. Decomposed relations: Drinkers1(name, addr, favoriteBeer) Drinkers2(name, beersLiked, manf) Projected FD’s (skipping a lot of work that leads nowhere interesting): –For Drinkers1 : name  addr and name  favoriteBeer. –For Drinkers2 : beersLiked  manf.

(Repeating) Decomposed relations: Drinkers1(name, addr, favoriteBeer) Drinkers2(name, beersLiked, manf) Projected FD’s: –For Drinkers1 : name  addr and name  favoriteBeer. –For Drinkers2 : beersLiked  manf. BCNF violations? –For Drinkers1, name is key and all left sides of FD’s are superkeys. –For Drinkers2, { name, beersLiked } is the key, and beersLiked  manf violates BCNF.

Decompose Drinkers2 First set of decomposed relations: Drinkers1(name, addr, favoriteBeer) Drinkers2(name, beersLiked, manf) Close beersLiked + = beersLiked, manf. Decompose Drinkers2 into: Drinkers3(beersLiked, manf) Drinkers4(name, beersLiked) Resulting relations are all in BCNF: Drinkers1(name, addr, favoriteBeer) Drinkers3(beersLiked, manf) Drinkers4(name, beersLiked)

3NF One FD structure causes problems: If you decompose, you can’t check all the FD’s only in the decomposed relations. If you don’t decompose, you violate BCNF. Abstractly: AB  C and C  B. Example 1: title city  theatre and theatre  city. Example 2: street city  zip, zip  city. Keys: {A, B} and {A, C}, but C  B has a left side that is not a superkey. Suggests decomposition into BC and AC. –But you can’t check the FD AB  C in only these relations.

Example A = street, B = city, C = zip. Join: zip  city street city  zip

“Elegant” Workaround Define the problem away. A relation R is in 3NF iff (if and only if) for every nontrivial FD X  A, either: 1. X is a superkey, or 2. A is prime = member of at least one key. Thus, the canonical problem goes away: you don’t have to decompose because all attributes are prime.

What 3NF Gives You There are two important properties of a decomposition: 1.We should be able to recover from the decomposed relations the data of the original. –Recovery involves projection and join, which we shall defer until we’ve discussed relational algebra. 2.We should be able to check that the FD’s for the original relation are satisfied by checking the projections of those FD’s in the decomposed relations. Without proof, we assert that it is always possible to decompose into BCNF and satisfy (1). Also without proof, we can decompose into 3NF and satisfy both (1) and (2). But it is not possible to decompose into BNCF and get both (1) and (2). –Street-city-zip is an example of this point.

Multivalued Dependencies The multivalued dependency X  Y holds in a relation R if whenever we have two tuples of R that agree in all the attributes of X, then we can swap their Y components and get two new tuples that are also in R. X Y others

Example Drinkers(name, addr, phones, beersLiked) with MVD Name  phones. If Drinkers has the two tuples: nameaddrphonesbeersLiked sueap1b1 sueap2b2 it must also have the same tuples with phones components swapped: nameaddrphonesbeersLiked sueap2b1 sueap1b2 Note: we must check this condition for all pairs of tuples that agree on name, not just one pair.

MVD Rules 1.Every FD is an MVD. –Because if X  Y, then swapping Y’s between tuples that agree on X doesn’t create new tuples. –Example, in Drinkers : name  addr. 2.Complementation: if X  Y, then X  Z, where Z is all attributes not in X or Y. –Example: since name  phones holds in Drinkers, so does name  addr beersLiked.

Splitting Doesn’t Hold Sometimes you need to have several attributes on the right of an MVD. For example: Drinkers(name, areaCode, phones, beersLiked, beerManf) nameareaCodephonesbeersLikedbeerManf Sue BudA.B. Sue Wicked AlePete’s Sue BudA.B. Sue Wicked AlePete’s name  areaCode phones holds, but neither name  areaCode nor name  phones do.

4NF Eliminate redundancy due to multiplicative effect of MVD’s. Roughly: treat MVD’s as FD's for decomposition, but not for finding keys. Formally: R is in Fourth Normal Form if whenever MVD X  Y is nontrivial (Y is not a subset of X, and X  Y is not all attributes), then X is a superkey. –Remember, X  Y implies X  Y, so 4NF is more stringent than BCNF. Decompose R, using 4NF violation X  Y, into XY and X  (R—Y). R Y X

Example Drinkers(name, addr, phones, beersLiked) FD: name  addr Nontrivial MVD’s: name  phones and name  beersLiked. Only key: { name, phones, beersLiked } All three dependencies above violate 4NF. Successive decomposition yields 4NF relations: D1(name, addr) D2(name, phones) D3(name, beersLiked)

1NF 2NF 3NF BCNF 4NF 5NF functional dependencies multivalued dependencies join dependencies HIGHER NORMAL FORMS

ASSUMPTIONS 24-HOUR FLIGHT-TIMETABLE, ALL FLIGHTS EVERY DAY ALL PLANES TAKE-OFF AND LAND (BUT DO NOT CRASH) NO AIRPORT IS LANDING-ONLY & NO AIRPORT IS TAKE-OFF-ONLY   TTAB (ORG) =  TTAB (DST) THERE IS AT LEAST ONE TIME DELAY ENTRY FOR EVERY FLIGHT SIMILARLY IN WEATHER REPORT HISTORY IF NO TWO FLIGHTS CAN TAKE OFF AT THE SAME TIME IN THE SAME AIRPORT WES CAN BE POSTED TO FLIGHT AND ELIMINATED

UNIVERSITYDISCIPLINEDEGREE Old TownComputingBSc Old TownMathematicsPhD New CityComputingPhD Old TownComputingPhD AWARD teaches (UNIVERSITY, DISCIPLINE) is_read_for (DISCIPLINE, DEGREE) awards (UNIVERSITY, DEGREE) teaches (NewCity, Computing) = true awards (NewCity, PhD) = true is_read_for (Computing, BSc) = true FROM (NewCity teaches Computing) and (Computing is_read_for BSc) IT DOES NOT FOLLOW NewCity awards BSc for_reading Computing

Join Dependency JD * (R 1, R 2, R 3,..., R m ) holds in R iff R = join (R 1, R 2, R 3,..., R m ), R i - a projection of R

R JD* (, ) then R is in 5NF if JD* (, ) A relation R is in 5NF iff for all JD * (R 1, R 2, R 3,..., R m ) in R, every R i is a superkey for R. preventing illogical conjunction of facts Fifth Normal Form JD* (, ) holds for R then R is not in 5NF if does not contain key

UNIVERSITYDISCIPLINE Old TownComputing Old TownMathematics New CityComputing DISCIPLINEDEGREE ComputingBSc MathematicsPhD ComputingPhD UNIVERSITYDEGREE Old TownBSc Old TownPhD New CityPhD AWARD

join dependencies JD1 * ((NAME, CODE, TEACHING), (NAME, RESEARCH)) JD2 * ((NAME, CODE, RESEARCH), (NAME, TEACHING)) JD3 * ((NAME, CODE, TEACHING), (CODE, RESEARCH)) JD4 * ((NAME, CODE, RESEARCH), (CODE, TEACHING)) JD5 * ((NAME, CODE), (NAME, TEACHING), (CODE,RESEARCH)) candidate keys - NAME or CODE all projections in JD1 - to JD5 are superkeys for RANKING  5NF