Presentation is loading. Please wait.

Presentation is loading. Please wait.

M.P. Johnson, DBMS, Stern/NYU, Sp20041 C20.0046: Database Management Systems Lecture #7 Matthew P. Johnson Stern School of Business, NYU Spring, 2004.

Similar presentations


Presentation on theme: "M.P. Johnson, DBMS, Stern/NYU, Sp20041 C20.0046: Database Management Systems Lecture #7 Matthew P. Johnson Stern School of Business, NYU Spring, 2004."— Presentation transcript:

1 M.P. Johnson, DBMS, Stern/NYU, Sp20041 C20.0046: Database Management Systems Lecture #7 Matthew P. Johnson Stern School of Business, NYU Spring, 2004

2 M.P. Johnson, DBMS, Stern/NYU, Sp2004 2 Agenda Last time: Normalization Homework 1 due now Project part 2 is up, due on the 19 th (Thurs.) This time: 1. Finish BCNF 2. 3NF 3. 4NF 4. Relational Algebra…

3 M.P. Johnson, DBMS, Stern/NYU, Sp2004 3 BCNF Review Q: What’s required for BCNF? Q: What’s the slogan for BCNF? Q: Who are B & C? Q: What are the two types of violations?

4 M.P. Johnson, DBMS, Stern/NYU, Sp2004 4 BCNF Review Q: How do we fix a non-BCNF relation? Q: If As  Bs violates BCNF, what do we do?  Q: In this case, could the decomposition be lossy? Q: Under what circumstances could a decomposition be lossy? Q: How do we combine two relations?

5 M.P. Johnson, DBMS, Stern/NYU, Sp2004 5 Decomposition algorithm example R(N,O,R,P) F = {N  O, O  R, R  N} Key: N,P Violations of BCNF: N  O, O  R, N  OR  which kinds of violations are these? Pick N  OR(on board) Can we rejoin?(on board) What happens if we pick N  O instead? Can we rejoin? (on board) NameOfficeResidencePhone GeorgePres.WH202-… GeorgePres.WH486-… DickVPNO202-… DickVPNO307-…

6 M.P. Johnson, DBMS, Stern/NYU, Sp2004 6 Lossless BCNF decomposition Consider simple relation: R(A,B,C) Only FD: A  B (assume C!  A) Key: A,C  Diff vars from text! Also goes through if assumption is false BCNF violation (which kind?): no key on the left Thus: Decomposition to BCNF: Create R1(A,B) and R2(A,C) Could this be lossy? We will join R1 and R2 on A to find out Q: If C  A, then what kind do we have? Q: Since C !  A, what kind of bad FD do we have?

7 M.P. Johnson, DBMS, Stern/NYU, Sp2004 7 Lossless BCNF decomposition Suppose R contains (b,a,c) and (b’,a,c’) In projection onto (B,A):  (b,a,c)  (b,a), (b’,a,c’)  (b’,a) In projection onto (A,C):  (b,a,c)  (a,c), (b’,a,c’)  (a,c’) In joining, (b’,a), (a,c)  (b’,a,c) Q: Is/must/can this be correct? A: Yes! A  B, so b = b’ So this was lossless We assumed C!  A, but argument also goes through when C  A Moral: BCNF decomp alg is always lossless

8 M.P. Johnson, DBMS, Stern/NYU, Sp2004 8 BCNF summary BCNF decomposition is lossless  Can reproduce original by joining Saw last time: Every 2-attribute relation is in BCNF Final set of decomposed relations might be different depending on  Order of bad FDs chosen Saw last time: But all results will be in BCNF

9 M.P. Johnson, DBMS, Stern/NYU, Sp2004 9 A problem with BCNF Relation: R(Title, Theater, Neighboorhood) FDs:  Title,N’hood  Theater Assume movie can’t play twice in same neighborhood  Theater  N’hood Keys:  {Title, N’hood}  {Theater, Title} TitleTheaterN’hood City of GodAngelicaVillage Fog of WarAngelicaVillage

10 M.P. Johnson, DBMS, Stern/NYU, Sp2004 10 A problem with BCNF BCNF violation: Theater  N’hood Decompose:  {Theater, N’Hood}  {Theater, Title} Resulting relations: VillageAngelica N’hoodTheater R1 Fog of WarAngelica City of GodAngelica TitleTheater R2

11 M.P. Johnson, DBMS, Stern/NYU, Sp2004 11 Problem - continued Suppose we add new rows to R1 and R2: Their join: City of GodVillageFilm Forum Village N’hood Fog of War City of God Title Angelica Theater (R’) TheaterN’hood AngelicaVillage Film ForumVillage TheaterTitle AngelicaCity of God AngelicaFog of War Film ForumCity of God R1R2 A and B could not enforce FD Title,N’hood  Theater

12 M.P. Johnson, DBMS, Stern/NYU, Sp2004 12 Third normal form: motivation There are some situations in which  BCNF is not dependency-preserving, and  Efficient checking for FD violation on updates is important In these cases BCNF is too severe a req. Solution: define a weaker normal form, called Third Normal Form  in which FDs can be checked on individual relations without performing a join (no inter-relational FDs)  to which relations can be converted, preserving both data and FDs

13 M.P. Johnson, DBMS, Stern/NYU, Sp2004 13 Third Normal Form BCNF decomposition is not dependency-preserving! We now define the (weaker) Third Normal Form  Turns out: this example was already in 3NF A relation R is in 3rd normal form if : For every nontrivial dependency A 1, A 2,..., A n  B for R, {A 1, A 2,..., A n } is a super-key for R, or B is part of a key, i.e., B is prime A relation R is in 3rd normal form if : For every nontrivial dependency A 1, A 2,..., A n  B for R, {A 1, A 2,..., A n } is a super-key for R, or B is part of a key, i.e., B is prime Tradeoff: BCNF = no FD anomalies, but may lose some FDs 3NF = keeps all FDs, but may have some anomalies

14 M.P. Johnson, DBMS, Stern/NYU, Sp2004 14 BCNF: vices and virtues Be clear on the problem just described v. the arg. that BCNF decomp is lossless BCNF decomp does not lose data  Resulting relations can be rejoined to obtain the original But: it can can lose dependencies  After decomp, possible to add rows whose corresponding rows would be illegal in (rejoined) original

15 M.P. Johnson, DBMS, Stern/NYU, Sp2004 15 Recap: goals of normalization When we decompose a relation R with FDs F into R1..Rn we want: 1. lossless-join decomposition – no data lost 2. no/little redundancy: the relations Ri should be in either BCNF or at least 3NF 3. Dependency preservation: if Fi be the set of dependencies in F + that include only attributes in Ri:  F is the “sum” of the FDs of the new relations  (F 1  F 2  F 3  …  F n ) + = F +  Otherwise checking updates for violation of FDs may require computing joins, which is expensive

16 M.P. Johnson, DBMS, Stern/NYU, Sp2004 16 Dependency preservation Saw that last req. didn’t hold in move-theater example Did it hold in R(N,O,R,P) example? (on board)

17 M.P. Johnson, DBMS, Stern/NYU, Sp2004 17 Testing for 3NF For each dependency X  Y, use attribute closure to check if X is a superkey If X is not a superkey, verify that each attribute in Y is prime  This test is rather more expensive, since it involves finding candidate keys  Testing for 3NF is NP-complete (in what?)  Interestingly, decomposition into 3NF can be done in polynomial time   Testing for 3NF is harder than decomposing into 3NF! Optimization: need to check only FDs in F, need not check all FDs in F + (why?)

18 M.P. Johnson, DBMS, Stern/NYU, Sp2004 18 3NF Example R = (J, K, L) F = (JK  L, L  K) Two candidate keys: JK and JL R is in 3NF  JK  LJK is a superkey  L  KK is prime BCNF decomposition yields R1 = (L,K), R2 = (L,J)  testing for JK  L requires a join There is some redundancy in R

19 M.P. Johnson, DBMS, Stern/NYU, Sp2004 19 BCNF and 3NF Comparison Example of problems due to redundancy in 3NF  R = (J, K, L)  F = (JK  L, L  K) A schema that is in 3NF but not BCNF has the problems of:  redundancy (e.g., the relationship between l1 and k1)  need to use null values (if allowed!), e.g. to represent the relationship between l2 and k2 when there is no corresponding value for attribute J JKL j1k1l1 j2k1l1 j3k1l1 NULLk2l2

20 M.P. Johnson, DBMS, Stern/NYU, Sp2004 20 Comparison of BCNF and 3NF It is always possible to decompose a relation into relations in 3NF such that:  the decomposition is lossless  the dependencies are preserved It is always possible to decompose a relation into relations in BCNF such that:  the decomposition is lossless  but it may not be possible to preserve dependencies  But may eliminate more redundancy

21 M.P. Johnson, DBMS, Stern/NYU, Sp2004 21 The Normal Forms (so far) 1NF: every attribute has an atomic value 2NF: 1NF and no partial dependencies 3NF: for each FD X  Y either it is trivial, or X is a superkey, or Y is a part of some key BCNF: 3NF and third 3NF option disallowed I.e, 2NF and no transitive dependencies

22 M.P. Johnson, DBMS, Stern/NYU, Sp2004 22 Distinguishing examples 1NF but not 2NF: R(Name, SSN,Mailing- address,Phone)  Key: SSN,Phone  Partial: ssn  name, address 2NF but not 3NF: R(Title,Year,Studio,Pres,Pres-Addr)  Key: Title,Year  Transitive: studio  president 3NF but not BCNF: R( Title, Theater, N’hood)  Title,N’hood  Theater  Prime-on-right: Theater  N’hood

23 M.P. Johnson, DBMS, Stern/NYU, Sp2004 23 Design Goals Goal for a relational database design is:  No redundancy  Lossless Join  Dependency Preservation If we cannot achieve this, we accept one of  dependency loss  use of more expensive inter-relational methods to preserve dependencies  data redundancy due to use of 3NF Interesting: SQL does not provide a direct way of specifying FDs other than superkeys  can specify FDs using assertions, but they are expensive to test

24 M.P. Johnson, DBMS, Stern/NYU, Sp2004 24 3NF 3NF means we may have anomalies Example: TEACH(student, teacher, subject)  student, subject  teacher (students not allowed in the same subject with two teachers)  teacher  subject (each teacher teaches one subject)  Subject is prime, so this is 3NF But we have anomalies:  Insertion: cannot insert a teacher until we have a student taking his subject If we convert to BCNF, we lost student, subject  teacher

25 M.P. Johnson, DBMS, Stern/NYU, Sp2004 25 BCNF and over-normalization What is the problem? Schema overload – trying to capture two meanings:  1) subject X can be taught by teacher Y  2) student Z takes subject W from teacher V What to do? 3NF has anomalies, normalizing to BCNF loses FDs One soln: keep the 3NF TEACH and another (BCNF) relation SUBJECT-TAUGHT (teacher, subject) Still (more!) redundancy, but no more insert and delete anomalies

26 M.P. Johnson, DBMS, Stern/NYU, Sp2004 26 New topic: MVDs (3.7) Consider this relation  People ~ their jobs ~ their residences  Person-address/city: many-many  Person-job: many-many  Address/city-job: independent Chappaqua333 Some StreetFirst Lady456Hilary Washington444 Embassy RowFirst Lady456Hilary New York111 East 60 th StreetCEO123Michael London222 Brompton RoadCEO123Michael 444 Embassy Row 333 Some Street 444 Embassy Row 333 Some Street 222 Brompton Road 111 East 60 th Street Streets Lawyer Senator Mayor Jobs Washington456Hilary Chappaqua789Hilary Washington789Hilary Chappaqua456Hilary London123Michael New York123Michael CitysSSNName

27 M.P. Johnson, DBMS, Stern/NYU, Sp2004 27 Redundancy in BCNF Lots of redundancy! Key? All fields  None determined by others! Non-trivial FDs? None!  In BCNF? Yes! NameStreetsCitysJobs Michael111 East 60 th StreetNew YorkMayor Michael222 Brompton RoadLondonMayor Michael111 East 60 th StreetNew YorkCEO Michael222 Brompton RoadLondonCEO Hilary333 Some StreetChappaquaSenator Hilary444 Embassy RowWashingtonSenator Hilary333 Some StreetChappaquaFirst Lady Hilary444 Embassy RowWashingtonFirst Lady Hilary333 Some StreetChappaquaLawyer Hilary444 Embassy RowWashingtonLawyer Now what? New concept, leading to another normal form: Multivalued dependencies


Download ppt "M.P. Johnson, DBMS, Stern/NYU, Sp20041 C20.0046: Database Management Systems Lecture #7 Matthew P. Johnson Stern School of Business, NYU Spring, 2004."

Similar presentations


Ads by Google