The principal problem that we encounter is redundancy, where a fact is repeated in more than one tuple. Most common cause: attempts to group into one relation.

The principal problem that we encounter is redundancy, where a fact is repeated in more than one tuple. Most common cause: attempts to group into one relation both single valued and multi-valued properties of an object. Now we will tackle the problem of designing good relational schemas. Design of Relational Database Schemas

Redundancy. –Information may be repeated unnecessarily in several tuples. –E.g. length and filmType. Update anomalies. –We may change information in one tuple but leave it unchanged in other tuples. –E.g. we could change the length of Star Wars to 125, in the first tuple, and forget to do the same in the second and third tuple. Deletion anomalies. –If a set of values becomes empty, we may lose other information as a side effect. –E.g. if we delete Emilio Estevez we will lose all the information about Mighty Ducks. Anomalies Mike MeyersParamountcolor951992Wayne’s World Dana CarveyParamountcolor951992Wayne’s World Emilio EstevezDisneycolor1041991Mighty Ducks Harrison FordFoxcolor1241977Star Wars Mark HamillFoxcolor1241977Star Wars Carrie FisherFoxcolor1241977Star Wars starNamestudioNamfilmTyplengthyeartitle

Decomposing Relations - Example Star Wars title Wayne’s World Foxcolor1241977 studioNamfilmTyplengthyear Paramountcolor951992 Disneycolor1041991Mighty Ducks Wayne’s World Mighty Ducks Star Wars title 1992 1991 1977 Mike Meyers Dana Carvey Emilio Estevez Harrison Ford Mark Hamill Carrie Fisher starNameyear No true redundancy! The update anomaly disappeared. If we change the length of a movie, it is done only once. The deletion anomaly disappeared. If we delete all the stars from Movie 2 we still will have the other info for a movie. Movie 1 relation Movie 2 relation

Boyce-Codd Normal Form The goal of decomposition is to replace a relation by several that do not exhibit anomalies. There is a simple condition under which the anomalies can be guaranteed not to exist. This condition is called Boyce-Codd Normal Form, or BCNF. A relation is in BCNF if: –Whenever there is a nontrivial dependency A 1 A 2 …A n  B 1 B 2 …B m for R, it must be the case that {A 1, A 2, …, A n } is a superkey for R.

Boyce-Codd Normal Form - Example Relation Movie in the previous figure is not in BCNF. –Consider the FD: title year  length filmType studioName –Unfortunately, the left side of the above dependency is not a superkey. In particular we know that the title and the year does not functionally determine starName. On the other hand, Movie 1 is in BCNF. –The only key is {title, year} and –title year  length filmType studioName holds in the relation Violating BCNF

Decomposition into BCNF The decomposition strategy is: –Find a non-trivial FD A 1 A 2 …A n  B 1 B 2 …B m that violates BCNF, i.e. A 1 A 2 …A n is not a superkey. –Decompose the relation schema into two overlapping relation schemas: One is all the attributes involved in the violating dependency and the other is the left side and all the other attributes not involved in the dependency. By repeatedly, choosing suitable decompositions, we can break any relation schema into a collection of smaller schemas in BCNF. The data in the original relation is represented faithfully by the data in the relations that are the result of the decomposition. –i.e. we can reconstruct the original relation exactly from the decomposed relations.

Boyce-Codd Normal Form - Example Consider relation schema: Movies(title, year, studioName, president, presAddr) and functional dependencies: title year  studioName studioName  president president  presAddr Last two violate BCNF. Why? Compute {title, year}+, {studioName}+, {president}+ and see if you get all the attributes of the relation. If not, you got a BCNF violation, and need to break relation.

Boyce-Codd Normal Form – Example Let’s decompose starting with: studioName  president Let’s add to the right-hand side any other attributes in the closure of studioName (optional “rule of thumb”). 1.X={studioName} studioName  president 2.X={studioName, president} president  presAddr 3.X={studioName} + ={studioName, president, presAddr}

Boyce-Codd Normal Form – Example From the closure we get: studioName  president presAddr We decompose the relation schema into the following two schemas: Movies1(studioName, president, presAddr) Movies2(title, year, studioName) The second schema is in BCNF. What about the first schema? The following dependency violates BCNF. president  presAddr Why it’s bad to leave Movies1 table as is? If many studios share the same president than we would have redundancy when repeating the presAddr in all those studios.

Boyce-Codd Normal Form – Example We must decompose Movies1, using the FD: president  presAddr The resulting relation schemas, both in BCNF, are: Movies11(title, year, studioName) Movies12(studioName, president) In general, we must keep applying the decomposition rule as many times as needed, until all our relations are in BCNF. So, finally we got Movies11, Movies12, and Movies2.

Finding FDs for the decomposed relations When we decompose a relation, we need to check that the resulting schemas are in BCNF. We can’t tell a relation is in BCNF, unless we can determine the FDs that hold for that relation.

Suppose S is one of the resulting relations in a decomposition of R. For this: Consider each subset X of attributes of S. Compute X + using the FD on R. At the end throw out the attributes of R, which aren’t in S. Then, for each attribute B such that: B is an attribute of S, B is in X + we have that the functional dependency X  B holds in S. Finding FDs for the decomposed relations

Example: Consider R(A, B, C, D, E) decomposed into S(A, B, C) and another relation. Let FDs of R be: A  D, B  E, DE  C First, consider {A} + ={A,D}. Since D is not in the schema of S, we get no dependency here. Similarly, {B} + ={B,E} and {C} + ={C}, yielding no FDs for S. Now consider pairs. {A,B} + ={A, B, C, D, E}. Thus, we deduce AB  C for S. Neither of the other pairs give us any FD for S. Of course the set of all three attributes of S, {A, B, C}, cannot yield any nontrivial dependencies for S. Thus, the only dependency we need assert for S is AB  C.

A Few Tricks Never need to compute the closure of the empty set or of the set of all attributes. If we find X + = all attributes, don’t bother computing the closure of any supersets of X.

Another Example R(A,B,C) with FD’s A  B and B  C. Project onto S(A,C). –A + =ABC ; yields A  B, A  C. We do not need to compute AB + or AC +. –B + =BC ; yields B  C. –C + =C ; yields nothing. –BC + =BC ; yields nothing. Resulting FD’s: A  B, A  C, and B  C. Projection onto AC: A  C. –This is the only FD that involves a subset of {A,C }.

The principal problem that we encounter is redundancy, where a fact is repeated in more than one tuple. Most common cause: attempts to group into one relation.

Similar presentations

Presentation on theme: "The principal problem that we encounter is redundancy, where a fact is repeated in more than one tuple. Most common cause: attempts to group into one relation."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The principal problem that we encounter is redundancy, where a fact is repeated in more than one tuple. Most common cause: attempts to group into one relation.

Similar presentations

Presentation on theme: "The principal problem that we encounter is redundancy, where a fact is repeated in more than one tuple. Most common cause: attempts to group into one relation."— Presentation transcript:

Similar presentations

About project

Feedback