CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.

Slides:



Advertisements
Similar presentations
Schema Refinement: Normal Forms
Advertisements

Schema Refinement: Canonical/minimal Covers
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Spring 2011 Instructor: Hassan Khosravi
Normalization CMSC 461 Michael Wilson. Anomalies  Poor relational database design can lead to the occurrence of anomalies  Anomalies that we tend to.
1 Design Theory. 2 Minimal Sets of Dependancies A set of dependencies is minimal if: 1.Every right side is a single attribute 2.For no X  A in F and.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Relational Normalization Theory. Limitations of E-R Designs Provides a set of guidelines, does not result in a unique database schema Does not provide.
Functional Dependencies - Example
Lossless Decomposition (2) Prof. Sin-Min Lee Department of Computer Science San Jose State University.
Functional Dependencies, Normalization Rose-Hulman Institute of Technology Curt Clifton.
Midterm Review II. Redundancy. –Information may be repeated unnecessarily in several tuples. –E.g. length and filmType. Update anomalies. –We may change.
Functional Dependencies
Functional Dependencies. Babies At a birth, there is one baby (twins would be represented by two births), one mother, any number of nurses, and a doctor.
Instructor: Amol Deshpande  Data Models ◦ Conceptual representation of the data  Data Retrieval ◦ How to ask questions of the database.
603 Database Systems Senior Lecturer: Laurie Webster II, M.S.S.E.,M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 8 A First Course in Database Systems.
Decomposition By Timothy Chen CS157A. Goal to Decomposition Eliminate redundancy by decomposing a relation into several relations in a higher normal form.
Closure The closure of {B 1 …B k } under the set of FDs S, denoted by {B 1 …B k } +, is defined as follows: {B 1 …B k } + = {B | any relation satisfies.
Normal Form Design addendum by C. Zaniolo. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Normal Form Design Compute the canonical cover.
The principal problem that we encounter is redundancy, where a fact is repeated in more than one tuple. Most common cause: attempts to group into one relation.
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
Winter 2002Arthur Keller – CS 1804–1 Schedule Today: Jan. 15 (T) u Normal Forms, Multivalued Dependencies. u Read Sections Assignment 1 due. Jan.
Cs3431 Normalization. cs3431 Why Normalization? To remove potential redundancy in design Redundancy causes several anomalies: insert, delete and update.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
Fall 2001Arthur Keller – CS 1804–1 Schedule Today Oct. 4 (TH) Functional Dependencies and Normalization. u Read Sections Project Part 1 due. Oct.
Functional Dependencies and Relational Schema Design.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Functional Dependencies and Normalization 1 Instructor: Mohamed Eltabakh
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
Normalization Goal = BCNF = Boyce-Codd Normal Form = all FD’s follow from the fact “key  everything.” Formally, R is in BCNF if for every nontrivial FD.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Lecture 09: Functional Dependencies. Outline Functional dependencies (3.4) Rules about FDs (3.5) Design of a Relational schema (3.6)
Design Theory for Relational Databases 2015, Fall Pusan National University Ki-Joune Li.
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
Revisit FDs & BCNF Normalization 1 Instructor: Mohamed Eltabakh
Functional Dependencies. FarkasCSCE 5202 Reading and Exercises Database Systems- The Complete Book: Chapter 3.1, 3.2, 3.3., 3.4 Following lecture slides.
IST 210 Normalization 2 Todd Bacastow IST 210. Normalization Methods Inspection Closure Functional dependencies are key.
© D. Wong Ch. 3 (continued)  Database design problems  Functional Dependency  Keys of relations  Decompositions based on Functional Dependency.
Copyright, Harris Corporation & Ophir Frieder, The Process of Normalization.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
3 Spring Chapter Normalization of Database Tables.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
Functional Dependencies CIS 4301 Lecture Notes Lecture 8 - 2/7/2006.
Functional dependencies CMSC 461 Michael Wilson. Designing tables  Now we have all the tools to build our databases  How should we actually go about.
Multivalued Dependencies and 4th NF CIS 4301 Lecture Notes Lecture /21/2006.
CS 338Database Design and Normal Forms9-1 Database Design and Normal Forms Lecture Topics Measuring the quality of a schema Schema design with normalization.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
CPSC 603 Database Systems Lecturer: Laurie Webster II, M.S.S.E., M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 5 Introduction to a First Course in Database Systems.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
Databases 1 Sixth lecture. 2 Functional Dependencies X -> A is an assertion about a relation R that whenever two tuples of R agree on all the attributes.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
1 Lecture 9: Database Design Wednesday, January 25, 2006.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
More on Decompositions and Third Normal Form CIS 4301 Lecture Notes Lecture /16/2006.
Formal definition of a key A key is a set of attributes A 1,..., A n such that for any other attribute B: A 1,..., A n  B A minimal key is a set of attributes.
Design Theory for Relational Databases
Schedule Today: Next After that Normal Forms. Section 3.6.
Schedule Today: Jan. 23 (wed) Week of Jan 28
3.1 Functional Dependencies
Design Theory for Relational Databases
Functional Dependencies and Normalization
Functional Dependencies
CS 405G: Introduction to Database Systems
Chapter 3: Design theory for relational Databases
Design Theory for Relational Databases
Presentation transcript:

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 228 Introduction Typically, the first database design uses a high-level database model such as the ER model. This model is then translated into a relational schema. Sometimes a relational database schema is developed directly without going through the high-level design. Either way, the initial relational schema has room for improvement, in particular by eliminating redundancy. Redundancies lead to undesirable update and deletion anomalies. Relational database design theory introduces various normal forms of a schema that avoid various types of redundancies and algorithms to convert a relational schema into these normal forms.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 229 Functional Dependency Normal forms are based on the concept of functional dependencies between sets of attributes. A functional dependency (FD) X  Y is an assertion about a relation R that whenever two tuples of R agree on all the attributes of set X, then they must also agree on all attributes in set Y. We say “ X  Y holds in R.” Convention: …, X, Y, Z represent sets of attributes of relation R. A, B, C,… represent single attributes of R. Convention: no parentheses to denote sets of attributes, just ABC, rather than { A, B, C }. A FD X  Y is called trivial if.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 230 Splitting / Combining Rule X  A 1 A 2 … A n holds for R if and only if each of X  A 1, X  A 2,…, X  A n hold for R. Example: The FD A  BC is equivalent to the two FDs A  B and A  C. This rule can be used to split a FD into multiple ones with singleton right sides or to combine multiple singleton right side FDs into one FD. There is no splitting /combining rule for left sides. We’ll generally express FDs with singleton right sides.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 231 Functional Dependency Consider the relation Movies1 (title, year, length, genre, studioName, starName). title year  length genre studioName holds, assuming that there are not two movies with the same title in the same year. title year  starName does not hold, since a movie can have more than one star acting. A FD makes an assertion about all possible instances of a relation, not only about one (such as the current) instance.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 232 Keys Given a relation R with attributes X = {A 1,..., A n }. is a superkey for relation R if K functionally determines X, i.e. K  X. K is a key for R if K is a superkey, but no proper subset of K is a superkey. Keys are a special case of a FD. Keys can be deduced systematically, if all FDs for relation R are given.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 233 Keys {title, year, starName} is a superkey of Movies1, since title  title, year  year, title year  length, title year  genre, title year  studioName, starName  starName. Remember that title year  starName does not hold. {title, year}, {year, starName} and {title, starName} are not superkeys. Thus, {title, year, starName} is a key of Movies1.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 234 Closure of Attributes Given a set of attributes { A 1,..., A n } and a set S of FDs. The closure of { A 1,..., A n } under S is the set of attributes X such that every relation that satisfies all the FDs in S also satisfies { A 1,..., A n }  X, i.e. { A 1,..., A n }  X follows from the FDs in S. The closure of set Y is denoted by Y +. Example attribute set {A, B, C} FDs {AB  D, D  E, BC  F, G  H} {A,B,C} + = {A, B, C, D, E, F}

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 235 Computing the Closure of Attributes Given a set of attributes { A 1,..., A n } and a set S of FDs. If necessary, apply the splitting rule to the FDs in S. Initialize X to { A 1,..., A n }. Repeat search for some FD B 1,..., B m  C in S such that for all and add C to the set X until no more attribute C can be added. Now X = { A 1,..., A n } +,, return X.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 236 Computing the Closure of Attributes Given a set of attributes { A, B, C, D, E, F } and FDs {AB  C, BC  AD, D  E, CF  B}. What is {A,B} + ? Apply the splitting rule: split BC  AD into BC  A and BC  D. Initialize X = {A,B}. Iterations apply AB  C, X = {A,B,C} apply BC  D, X = {A,B,C,D} apply D  E, X = {A,B,C,D,E} Return {A,B} + = {A,B,C,D,E}.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 237 Relational Schema Design Goal of relational schema design is to avoid anomalies. Redundancies lead to certain forms of anomalies. and redundancy. Update anomaly one occurrence of a fact is changed, but not all occurrences. Deletion anomaly valid fact is lost when a tuple is deleted. In the following example, consider the relation Movies1 (title, year, length, genre, studioName, starName).

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 238 Relational Schema Design titleyearlengthgenrestudioNamestarName Star Wars SciFiFoxCarrie Fisher Star Wars SciFiFoxMark Hamill Star Wars SciFiFoxHarrison Ford Gone with the Wind dramaMGMVivien Leigh Wayne’s World comedyParamountDana Carvey Wayne’s World comedyParamountMike Meyers Update anomaly: update the length to 125 (only) for the first Star Wars tuple. Deletion anomaly: delete Vivien Leigh (and the corresponding movie).

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 239 Decomposing Relations How to eliminate these anomalies? Decompose the relation into two or more relations that together have the same attributes. Given relation R { A 1,..., A n }. A decomposition of R consists of two relations S { B 1,..., B m } and T { C 1,..., C k } such that

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 240 Decomposing Relations Decompose Movies1 (title, year, length, genre, studioName, starName) into Movies2 (title, year, length, genre, studioName) and Movies3 (title, year, starName). titleyearlengthgenrestudioName Star Wars SciFiFox Gone with the Wind dramaMGM Wayne’s World comedyParamount titleyearstarName Star Wars1977Carrie Fisher Star Wars1977Mark Hamill Star Wars1977Harrison Ford Gone with the Wind 1939Vivien Leigh Wayne’s World 1992Dana Carvey Wayne’s World 1992Mike Meyers Movies2 Movies3  The update and deletion anomalies are gone!

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 241 Boyce-Codd Normal Form There are many ways of decomposing a relation. Which decompositions leads to an anomaly-free relation? Boyce-Codd normal form defines a condition under which the anomalies discussed so far cannot exist. A relation R is in Boyce-Codd normal form (BCNF), if and only if the following condition holds: for every non-trivial FD A 1... A n  B 1... B m for R { A 1,..., A n } is a superkey for R. I.e., the left side of a FD needs to contain a key.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 242 Boyce-Codd Normal Form Consider Movies1 (title, year, length, genre, studioName, starName). {title, year, starName} is a key, and it is the only one. We have the FD title year  length genre studioName. The left side of this FD is not a superkey, i.e. Movies1 is not in BCNF. Consider Movies2 (title, year, length, genre, studioName). {title, year } is its only key, since neither title  length genre studioName nor year  length genre studioName hold. All non-trivial FDs must have at least title and year on the left side. Thus, Movies2 is in BCNF.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 243 Decomposition into BCNF We want to find a decomposition R into R 1,..., R k such that each of the resulting relations is in BNCF, and the original relation R can be reconstructed from R 1,..., R k. Look for non-trivial FD that violates BCNF, e.g. A 1... A n  B 1... B m. { A 1,..., A n } is no superkey. Add to the right side all other attributes { C 1,..., C k } that functionally depend on { A 1,..., A n }. Need to compute the closure of { A 1,..., A n } under the given FDs. This step is optional, but leads to a smaller number of component relations.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 244 Decomposition into BCNF Decompose R into R 1 and R 2 : R 1 contains { A 1,..., A n } and { B 1,..., B m } and { C 1,..., C k } R 2 contains { A 1,..., A n } and all attributes not involved in the FD. Continue to decompose the resulting relations until there is no more FD for any of the R i that violates BCNF.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 245 Decomposition into BCNF Consider Movies1 (title, year, length, genre, studioName, starName). title year  length genre studioName violates BCNF. Decompose Movies1 into Movies2 (title, year, length, genre, studioName) and Movies3 (title, year, starName). {title, year, starName} is key for Movies3. In general, more than one decomposition necessary to reach a schema in BCNF.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 246 Decomposition into BCNF Consider relation schema {title, year, studioName, president, presAddr}. FDs title year  studioName studioName  president president  presAddr {title, year} is the only key. The last two FDs violate BCNF. Assume that we first deal with studioName  president. Decompose into {title, year, studioName}, and {studioName, president, presAddr}. While the first of these relations is in BCNF, the second one is not: president  presAddr, but studioName is key. Decompose the second relation further into {studioName, president} and {president, presAddr}.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 247 Recoverability of Information So far, we know how to decompose R into R 1,..., R k such that each of the resulting relations is in BNCF. But can we reconstruct R from R 1,..., R k ? More precisely: is ? Yes! To see why, consider a relation R(A,B,C) and a FD B  C that violates BCNF. Decompose R into R 1 (A,B) and R 2 (B,C). Let t = ( a, b, c ) be an arbitrary tuple from R. Then and consequently. Thus, all tuples in R are in

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 248 Recoverability of Information Let t = ( a, b, d ) and v = ( d, b, e ) be arbitray tuples in R, which implies that Then Is it also in R ? B  C and implies d = e. Since ( a, b, d ) in R, also ( a, b, e ) in R. Thus, all tuples in are in R. This means that the BCNF decomposition is lossless.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 249 Summary A functional dependency (FD) states that two tuples that agree on some set of attributes also agree on another attribute set. Keys are special cases of functional dependencies. Redundancies in a relational table lead to anomalies such as update and deletion anomalies. A relation is in Boyce-Codd normal form (BCNF), if the left sides of all non-trivial FDs contain a key. A schema in BCNF avoids the above anomalies. A given schema can be decomposed into subsets of attributes such that the resulting tables are all in BCNF and the join of these tables recovers the original table.