Functional Dependencies. Babies At a birth, there is one baby (twins would be represented by two births), one mother, any number of nurses, and a doctor.

Slides:



Advertisements
Similar presentations
1 Lecture 7 Design Theory for Relational Databases (part 1) Slides based on
Advertisements

Schema Refinement: Normal Forms
Schema Refinement: Canonical/minimal Covers
Spring 2011 Instructor: Hassan Khosravi
4NF and 5NF Prof. Sin-Min Lee Department of Computer Science.
1 Design Theory. 2 Minimal Sets of Dependancies A set of dependencies is minimal if: 1.Every right side is a single attribute 2.For no X  A in F and.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Database Management Systems Chapter 3 The Relational Data Model (II) Instructor: Li Ma Department of Computer Science Texas Southern University, Houston.
Functional Dependencies - Example
Topics to be discusses Functional Dependency Key
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.
Midterm Review II. Redundancy. –Information may be repeated unnecessarily in several tuples. –E.g. length and filmType. Update anomalies. –We may change.
Functional Dependencies
603 Database Systems Senior Lecturer: Laurie Webster II, M.S.S.E.,M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 8 A First Course in Database Systems.
1 The Relational Data Model Functional Dependencies.
Closure The closure of {B 1 …B k } under the set of FDs S, denoted by {B 1 …B k } +, is defined as follows: {B 1 …B k } + = {B | any relation satisfies.
Functional Dependencies Definition: If two tuples agree on the attributes A, A, … A 12n then they must also agree on the attributes B, B, … B 12m Formally:
1 Functional Dependencies Meaning of FD’s Keys and Superkeys Inferring FD’s.
1 Functional Dependencies Meaning of FD’s Keys and Superkeys Inferring FD’s Source: slides by Jeffrey Ullman.
The principal problem that we encounter is redundancy, where a fact is repeated in more than one tuple. Most common cause: attempts to group into one relation.
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
Winter 2002Arthur Keller – CS 1804–1 Schedule Today: Jan. 15 (T) u Normal Forms, Multivalued Dependencies. u Read Sections Assignment 1 due. Jan.
1 Normalization Anomalies Boyce-Codd Normal Form 3 rd Normal Form.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
Bad DB Design Duplicate of data Duplicate of data Updating Updating Deleting Deleting.
Fall 2001Arthur Keller – CS 1804–1 Schedule Today Oct. 4 (TH) Functional Dependencies and Normalization. u Read Sections Project Part 1 due. Oct.
1 Design Theory for Relational Databases Functional Dependencies Decompositions Normal Forms.
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
Decompositions uDo we need to decompose a relation? wSeveral normal forms for relations. If schema in these normal forms certain problems don’t.
Database Management Systems Chapter 3 The Relational Data Model (III) Instructor: Li Ma Department of Computer Science Texas Southern University, Houston.
Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2.
Normalization Goal = BCNF = Boyce-Codd Normal Form = all FD’s follow from the fact “key  everything.” Formally, R is in BCNF if for every nontrivial FD.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
Functional Dependencies. FarkasCSCE 5202 Reading and Exercises Database Systems- The Complete Book: Chapter 3.1, 3.2, 3.3., 3.4 Following lecture slides.
IST 210 Normalization 2 Todd Bacastow IST 210. Normalization Methods Inspection Closure Functional dependencies are key.
© D. Wong Ch. 3 (continued)  Database design problems  Functional Dependency  Keys of relations  Decompositions based on Functional Dependency.
Copyright, Harris Corporation & Ophir Frieder, The Process of Normalization.
1 Multivalued Dependencies Fourth Normal Form Reasoning About FD’s + MVD’s.
Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.
3 Spring Chapter Normalization of Database Tables.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
Functional Dependencies CIS 4301 Lecture Notes Lecture 8 - 2/7/2006.
Functional dependencies CMSC 461 Michael Wilson. Designing tables  Now we have all the tools to build our databases  How should we actually go about.
Functional Dependencies Zaki Malik September 25, 2008.
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
Functional Dependencies. Babies Exercise 2.2.5: At a birth, there is one baby (twins would be represented by two births), one mother, any number of nurses,
Databases 1 Sixth lecture. 2 Functional Dependencies X -> A is an assertion about a relation R that whenever two tuples of R agree on all the attributes.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
Design Theory for Relational Databases Functional Dependencies Decompositions Normal Forms: BCNF, Third Normal Form Introduction to Multivalued Dependencies.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
4NF & MULTIVALUED DEPENDENCY By Kristina Miguel. Review  Superkey – a set of attributes which will uniquely identify each tuple in a relation  Candidate.
1 Database Design: DBS CB, 2 nd Edition Physical RDBMS Model: Schema Design and Normalization Ch. 3.
Formal definition of a key A key is a set of attributes A 1,..., A n such that for any other attribute B: A 1,..., A n  B A minimal key is a set of attributes.
Design Theory for Relational Databases
Schedule Today: Next After that Normal Forms. Section 3.6.
CPSC-310 Database Systems
Schedule Today: Jan. 23 (wed) Week of Jan 28
3.1 Functional Dependencies
BCNF and Normalization
Multivalued Dependencies & Fourth Normal Form (4NF)
Exercise R(A,B,C,D) with FD’s ABC, CD, and DA
Functional Dependencies
Functional Dependencies
Multivalued Dependencies
Anomalies Boyce-Codd Normal Form 3rd Normal Form
CS4222 Principles of Database System
Presentation transcript:

Functional Dependencies

Babies At a birth, there is one baby (twins would be represented by two births), one mother, any number of nurses, and a doctor. Suppose, therefore, that we have a table: Births(baby, mother, nurse, doctor) Some facts and assumptions a) For every baby, there is a unique mother. b) For every (existing) combination of a baby and a mother there is a unique doctor. c) There are many nurses in a birth.

Redundancy. –Information may be repeated unnecessarily in several tuples. Update anomalies. –We may change information in one tuple but leave it unchanged in other tuples. Deletion anomalies. –If a set of values becomes empty, we may lose other information as a side effect. –E.g. if we delete Smith we will lose all the information about baby Jason. Anomalies BabyMotherNurseDoctor BenMaryAnnBrown BenMaryAliceBrown BenMaryPaulaBrown JasonMaryAngelaSmith JasonMaryPeggySmith JasonMaryRitaSmith

Fix BabyMother BenMary JasonMary BabyDoctor BenBrown JasonSmith BabyNurse BenAnn BenAlice BenPaula JasonAngela JasonPeggy JasonRita

Functional Dependencies Convention: –X, Y, Z represent sets of attributes; A, B, C,… represent single attributes. –will write just ABC, rather than {A,B,C}. X  A for a relation R says that –whenever two tuples of R agree on all the attributes of X, then they must also agree on the attribute A. Example baby  mother baby mother  doctor

Another Example Drinkers(name, addr, beersLiked, manf, favBeer) Reasonable FD’s to assert: 1.name  addr 2.name  favBeer 3.beersLiked  manf

Example Data nameaddr beersLiked manffavBeer JanewayVoyager Bud A.B.WickedAle JanewayVoyager WickedAle Pete’sWickedAle SpockEnterprise Bud A.B.Bud Because name -> addr Because name -> favBeer Because beersLiked -> manf

FD’s With Multiple Attributes No need for FD’s with more than one attribute on the right. –But sometimes convenient to combine FD’s as a shorthand. –Example: name -> addr and name -> favBeer become name -> addr favBeer More than one attribute on left may be essential. –Example: bar beer -> price

K is a superkey for relation R if K functionally determines all of R’s attributes. K is a key for R if K is a superkey, but no proper subset of K is a superkey. Example. Attributes {name, beersLiked} form a key for the previous Drinkers relation. Why? Keys of Relations

A functional dependency A 1 A 2 …A n  B is said to be trivial if B is one of A’s. For example: bar beer  beer is a trivial dependency. Trivial Dependencies

The principal problem that we encounter is redundancy, where a fact is repeated in more than one tuple. Now we will tackle the problem of refining relational schemas. Refining of Relational Schemas

Boyce-Codd Normal Form Boyce-Codd Normal Form (BCNF): simple condition under which the anomalies can be guaranteed not to exist. A relation is in BCNF if: Whenever there is a nontrivial dependency A 1 A 2 …A n  B 1 B 2 …B m for R, it must be the case that {A 1, A 2, …, A n } is a superkey for R.

BCNF Violation - Example Relation Babies isn’t in BCNF. FD: baby  mother Left side isn't a superkey. –We know: baby doesn't functionally determine nurse.

Decomposition into BCNF Goal of decomposition is to replace a relation by several that don't exhibit anomalies. Decomposition strategy is: –Find a non-trivial FD A 1 A 2 …A n  B 1 B 2 …B m that violates BCNF, i.e. A 1 A 2 …A n isn’t a superkey. –Decompose the relation schema into two overlapping relation schemas: One is all the attributes involved in the violating dependency and the other is the left side and all the other attributes not involved in the dependency. By repeatedly, choosing suitable decompositions, we can break any relation schema into a collection of smaller schemas in BCNF.

Babies Example Births(baby, mother, nurse, doctor) baby  mother is a violating FD, so we decompose. BabyMother BenMary JasonMary BabyNurseDoctor BenAnnBrown BenAliceBrown BenPaulaBrown JasonAngelaSmith JasonPeggySmith JasonRitaSmith This relation needs to be further decomposed using the baby  doctor FD. We, will see a formal algorithm for deducing this FD.

Suppose we are told of a set of functional dependencies that a relation satisfies. Without knowing exactly, what tuples are in the relation we can deduce other dependencies. Example. baby  mother and baby mother  doctor imply baby  doctor But, what's the algorithm? Rules About Functional Dependencies

There is a general principle from which all possible FD’s follow. Suppose {A 1, A 2, …, A n } is a set of attributes and S is a set of FD’s. Closure of {A 1, A 2, …, A n } under the dependencies in S is the set of attributes B, which are functionally determined by A 1, A 2, …, A n i.e. A 1 A 2 …A n  B. –Closure is denoted by {A 1, A 2, …, A n } +. –A 1, A 2, …, A n are in {A 1, A 2, …,A n } + Computing the Closure of Attributes

Brief Starting with the given set of attributes, repeatedly expand the set by adding the right sides of FD’s as soon as we have included their left sides. Eventually, we cannot expand the set any more, and the resulting set is the closure. Computing the Closure - Algorithm

Detailed 1Let X be a set of attributes that eventually will become the closure. First initialize X to be {A 1, A 2, …, A n }. 2Now, repeatedly search for some FD in S: B 1 B 2 …B m  C such that all of B’s are in set X, but C isn’t. Add C to X. 3Repeat step 2 as many times as necessary until no more attributes can be added to X. Since X can only grow, and the number of attributes is finite, eventually nothing more can be added to X. 4Set X after no more attributes can be added to it is: {A 1, A 2, …, A n } +. Computing the Closure - Algorithm

Consider a relation with schema R(A, B, C, D, E, F) and FD’s: AB  C, BC  AD, D  E, CF  B. Compute {A,B} + Iterations: X = {A,B}Use: AB  C X = {A,B,C}Use: BC  AD X = {A,B,C,D}Use: D  E X = {A,B,C,D,E}No more changes to X are possible so X = {A,B} +. FD: CF  B wasn't used because its left side is never contained in X. Computing the Closure - Example

Having {A 1 A 2 …A n }+, we can test/generate any given functional dependency A 1 A 2 …A n  B. If B  {A 1, A 2, …, A n } + then FD: A 1 A 2 …A n  B holds. If B  {A 1, A 2, …, A n } + then FD: A 1 A 2 …A n  B doesn’t hold. Why Computing the Closure?

Consider the previous example: R(A, B, C, D, E, F) and FD’s: AB  C, BC  AD, D  E, CF  B. Suppose we want to test whether FD: AB  D follows. Yes! Since D  {A,B,C,D,E} = {A,B} +. On the other hand consider testing FD: D  A. –First compute {D} +. –{D} + = {D,E} and A  {D} +. –We conclude that D  A doesn't follow from the given set of dependencies. Example

{A 1, A 2, …, A n } is a superkey iff {A 1, A 2, …, A n } + is the set of all attributes. Closures and Keys

A Few Tricks To deduce all the FDs, compute the closure of each subset of attributes, but –Never need to compute the closure of the empty set or of the set of all attributes. –If we find X + = all attributes, don’t bother computing the closure of any supersets of X.

Movie Example Movies(title, year, studioName, president, presAddr) and FDs: title year  studioName studioName  president president  presAddr Last two violate BCNF. Why? Compute {title, year}+, {studioName}+, {president}+ and see if you get all the attributes of the relation. If not, you got a BCNF violation, and need to decompose.

Example (Continued) Let’s decompose starting with: studioName  president Optional rule of thumb: Add to the right-hand side any other attributes in the closure of studioName. {studioName}+ = {studioName, president, presAddr} Thus, we get: studioName  president presAddr

Example (Continued) Using: studioName  president presAddr we decompose into: Movies1(studioName, president, presAddr) Movies2(title, year, studioName) Movie2 is in BCNF. What about Movie1? FD president  presAddr violates BCNF. Why is it bad to leave Movies1 as is? If many studios share the same president than we would have redundancy when repeating the presAddr for all those studios.

Example (Continued) We decompose Movies1, using FD: president  presAddr The resulting relation schemas, both in BCNF, are: Movies11(president, presAddr) Movies12(studioName, president) So, finally we got Movies11, Movies12, and Movies2. In general, we must keep applying the decomposition rule as many times as needed, until all our relations are in BCNF.

Suppose S is one of the resulting relations in a decomposition of R. Then, do the following Consider each subset X of attributes of S. Compute X + using the FD on R. At the end throw out the attributes of R, which aren’t in S. Then, for each attribute B such that: –B is an attribute of S, –B is in X + we have that the functional dependency X  B holds in S. Closures in the decomposed relations

Example: Consider R(A, B, C, D, E) decomposed into S(A, B, C) and another relation. Let FDs of R be: A  D, B  E, DE  C {A}+ = {A,D}, {B}+ = {B,E}, {C}+ = {C}, yielding no FDs for S. Now consider pairs. {A,B}+ = {A, B, C, D, E} Thus, we deduce AB  C for S. Neither of the other pairs give us any FD for S. Of course the set of all three attributes of S, {A, B, C}, cannot yield any nontrivial dependencies for S. Thus, the only dependency we need assert for S is AB  C.

Recovering Info from a Decomposition Why a decomposition based on an FD preserves the information of the original relation? Because: The projections of the original tuples can be “joined” again to produce all and only the original tuples. Example: Consider R(A, B, C) and FD B  C, which suppose is a BCNF violation. Let’s decompose based on B  C: R 1 (A,B) and R 2 (B,C). Let (a,b,c) be a tuple of R, it projects as (a,b) for R 1, and as (b,c) for R 2. It's possible to join a tuple from R 1 with a tuple from R 2, when they agree on the B component. –In particular, (a,b) joins with (b,c) to give us the original tuple (a,b,c) back again. Getting back those tuples we started with isn't enough. Do we also get false tuples, i.e. that weren’t in the original relation?

Example continued What might happen if there were two tuples of R, say (a,b,c) and (d,b,e)? We get: (a,b) and (d,b) in R 1 (b,c) and (b,e) in R 2 Now if we join R 1 with R 2 we get: (a,b,c) (d,b,e) (a,b,e) (is it bogus?) (d,b,c) (is it bogus?) They aren’t bogus. By the FD B  C we know that if two tuples agree on B, they must agree on C as well. Hence c=e and we have: (a,b,c) (d,b,e) (a,b,e) = (a,b,c) (d,b,c) = (d,b,e)

What if B  C isn’t a true FD? Suppose R consists of two tuples: ABCABC The projections of R onto the relations with schemas R 1 (A,B) and R 2 (B,C) are: ABandBC When we try to reconstruct R by joining, we get: ABCABC That is, we get “too much.”

Problems For R(A,B,C,D) with AB  C, C  D, and D  A, and R(A,B,C,D) with B  C, and B  D Indicate all BCNF violations. Decompose into relations that are in BCNF.