Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.

Similar presentations


Presentation on theme: "CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases."— Presentation transcript:

1 CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases

2 CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 228 Introduction Typically, the first database design uses a high-level database model such as the ER model. This model is then translated into a relational schema. Sometimes a relational database schema is developed directly without going through the high-level design. Either way, the initial relational schema has room for improvement, in particular by eliminating redundancy. Redundancies lead to undesirable update and deletion anomalies. Relational database design theory introduces various normal forms of a schema that avoid various types of redundancies and algorithms to convert a relational schema into these normal forms.

3 CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 229 Functional Dependency Normal forms are based on the concept of functional dependencies between sets of attributes. A functional dependency (FD) X  Y is an assertion about a relation R that whenever two tuples of R agree on all the attributes of set X, then they must also agree on all attributes in set Y. We say “ X  Y holds in R.” Convention: …, X, Y, Z represent sets of attributes of relation R. A, B, C,… represent single attributes of R. Convention: no parentheses to denote sets of attributes, just ABC, rather than { A, B, C }. A FD X  Y is called trivial if.

4 CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 230 Splitting / Combining Rule X  A 1 A 2 … A n holds for R if and only if each of X  A 1, X  A 2,…, X  A n hold for R. Example: The FD A  BC is equivalent to the two FDs A  B and A  C. This rule can be used to split a FD into multiple ones with singleton right sides or to combine multiple singleton right side FDs into one FD. There is no splitting /combining rule for left sides. We’ll generally express FDs with singleton right sides.

5 CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 231 Functional Dependency Consider the relation Movies1 (title, year, length, genre, studioName, starName). title year  length genre studioName holds, assuming that there are not two movies with the same title in the same year. title year  starName does not hold, since a movie can have more than one star acting. A FD makes an assertion about all possible instances of a relation, not only about one (such as the current) instance.

6 CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 232 Keys Given a relation R with attributes X = {A 1,..., A n }. is a superkey for relation R if K functionally determines X, i.e. K  X. K is a key for R if K is a superkey, but no proper subset of K is a superkey. Keys are a special case of a FD. Keys can be deduced systematically, if all FDs for relation R are given.

7 CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 233 Keys {title, year, starName} is a superkey of Movies1, since title  title, year  year, title year  length, title year  genre, title year  studioName, starName  starName. Remember that title year  starName does not hold. {title, year}, {year, starName} and {title, starName} are not superkeys. Thus, {title, year, starName} is a key of Movies1.

8 CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 234 Closure of Attributes Given a set of attributes { A 1,..., A n } and a set S of FDs. The closure of { A 1,..., A n } under S is the set of attributes X such that every relation that satisfies all the FDs in S also satisfies { A 1,..., A n }  X, i.e. { A 1,..., A n }  X follows from the FDs in S. The closure of set Y is denoted by Y +. Example attribute set {A, B, C} FDs {AB  D, D  E, BC  F, G  H} {A,B,C} + = {A, B, C, D, E, F}

9 CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 235 Computing the Closure of Attributes Given a set of attributes { A 1,..., A n } and a set S of FDs. If necessary, apply the splitting rule to the FDs in S. Initialize X to { A 1,..., A n }. Repeat search for some FD B 1,..., B m  C in S such that for all and add C to the set X until no more attribute C can be added. Now X = { A 1,..., A n } +,, return X.

10 CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 236 Computing the Closure of Attributes Given a set of attributes { A, B, C, D, E, F } and FDs {AB  C, BC  AD, D  E, CF  B}. What is {A,B} + ? Apply the splitting rule: split BC  AD into BC  A and BC  D. Initialize X = {A,B}. Iterations apply AB  C, X = {A,B,C} apply BC  D, X = {A,B,C,D} apply D  E, X = {A,B,C,D,E} Return {A,B} + = {A,B,C,D,E}.

11 CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 237 Relational Schema Design Goal of relational schema design is to avoid anomalies. Redundancies lead to certain forms of anomalies. and redundancy. Update anomaly one occurrence of a fact is changed, but not all occurrences. Deletion anomaly valid fact is lost when a tuple is deleted. In the following example, consider the relation Movies1 (title, year, length, genre, studioName, starName).

12 CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 238 Relational Schema Design titleyearlengthgenrestudioNamestarName Star Wars1977124SciFiFoxCarrie Fisher Star Wars1977124SciFiFoxMark Hamill Star Wars1977124SciFiFoxHarrison Ford Gone with the Wind 1939231dramaMGMVivien Leigh Wayne’s World 199295comedyParamountDana Carvey Wayne’s World 199295comedyParamountMike Meyers Update anomaly: update the length to 125 (only) for the first Star Wars tuple. Deletion anomaly: delete Vivien Leigh (and the corresponding movie).

13 CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 239 Decomposing Relations How to eliminate these anomalies? Decompose the relation into two or more relations that together have the same attributes. Given relation R { A 1,..., A n }. A decomposition of R consists of two relations S { B 1,..., B m } and T { C 1,..., C k } such that

14 CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 240 Decomposing Relations Decompose Movies1 (title, year, length, genre, studioName, starName) into Movies2 (title, year, length, genre, studioName) and Movies3 (title, year, starName). titleyearlengthgenrestudioName Star Wars1977124SciFiFox Gone with the Wind 1939231dramaMGM Wayne’s World 199295comedyParamount titleyearstarName Star Wars1977Carrie Fisher Star Wars1977Mark Hamill Star Wars1977Harrison Ford Gone with the Wind 1939Vivien Leigh Wayne’s World 1992Dana Carvey Wayne’s World 1992Mike Meyers Movies2 Movies3  The update and deletion anomalies are gone!

15 CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 241 Boyce-Codd Normal Form There are many ways of decomposing a relation. Which decompositions leads to an anomaly-free relation? Boyce-Codd normal form defines a condition under which the anomalies discussed so far cannot exist. A relation R is in Boyce-Codd normal form (BCNF), if and only if the following condition holds: for every non-trivial FD A 1... A n  B 1... B m for R { A 1,..., A n } is a superkey for R. I.e., the left side of a FD needs to contain a key.

16 CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 242 Boyce-Codd Normal Form Consider Movies1 (title, year, length, genre, studioName, starName). {title, year, starName} is a key, and it is the only one. We have the FD title year  length genre studioName. The left side of this FD is not a superkey, i.e. Movies1 is not in BCNF. Consider Movies2 (title, year, length, genre, studioName). {title, year } is its only key, since neither title  length genre studioName nor year  length genre studioName hold. All non-trivial FDs must have at least title and year on the left side. Thus, Movies2 is in BCNF.

17 CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 243 Decomposition into BCNF We want to find a decomposition R into R 1,..., R k such that each of the resulting relations is in BNCF, and the original relation R can be reconstructed from R 1,..., R k. Look for non-trivial FD that violates BCNF, e.g. A 1... A n  B 1... B m. { A 1,..., A n } is no superkey. Add to the right side all other attributes { C 1,..., C k } that functionally depend on { A 1,..., A n }. Need to compute the closure of { A 1,..., A n } under the given FDs. This step is optional, but leads to a smaller number of component relations.

18 CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 244 Decomposition into BCNF Decompose R into R 1 and R 2 : R 1 contains { A 1,..., A n } and { B 1,..., B m } and { C 1,..., C k } R 2 contains { A 1,..., A n } and all attributes not involved in the FD. Continue to decompose the resulting relations until there is no more FD for any of the R i that violates BCNF.

19 CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 245 Decomposition into BCNF Consider Movies1 (title, year, length, genre, studioName, starName). title year  length genre studioName violates BCNF. Decompose Movies1 into Movies2 (title, year, length, genre, studioName) and Movies3 (title, year, starName). {title, year, starName} is key for Movies3. In general, more than one decomposition necessary to reach a schema in BCNF.

20 CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 246 Decomposition into BCNF Consider relation schema {title, year, studioName, president, presAddr}. FDs title year  studioName studioName  president president  presAddr {title, year} is the only key. The last two FDs violate BCNF. Assume that we first deal with studioName  president. Decompose into {title, year, studioName}, and {studioName, president, presAddr}. While the first of these relations is in BCNF, the second one is not: president  presAddr, but studioName is key. Decompose the second relation further into {studioName, president} and {president, presAddr}.

21 CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 247 Recoverability of Information So far, we know how to decompose R into R 1,..., R k such that each of the resulting relations is in BNCF. But can we reconstruct R from R 1,..., R k ? More precisely: is ? Yes! To see why, consider a relation R(A,B,C) and a FD B  C that violates BCNF. Decompose R into R 1 (A,B) and R 2 (B,C). Let t = ( a, b, c ) be an arbitrary tuple from R. Then and consequently. Thus, all tuples in R are in

22 CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 248 Recoverability of Information Let t = ( a, b, d ) and v = ( d, b, e ) be arbitray tuples in R, which implies that Then Is it also in R ? B  C and implies d = e. Since ( a, b, d ) in R, also ( a, b, e ) in R. Thus, all tuples in are in R. This means that the BCNF decomposition is lossless.

23 CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 249 Summary A functional dependency (FD) states that two tuples that agree on some set of attributes also agree on another attribute set. Keys are special cases of functional dependencies. Redundancies in a relational table lead to anomalies such as update and deletion anomalies. A relation is in Boyce-Codd normal form (BCNF), if the left sides of all non-trivial FDs contain a key. A schema in BCNF avoids the above anomalies. A given schema can be decomposed into subsets of attributes such that the resulting tables are all in BCNF and the join of these tables recovers the original table.


Download ppt "CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases."

Similar presentations


Ads by Google