Handout 4 Functional Dependencies

Slides:



Advertisements
Similar presentations
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Advertisements

Multivalued Dependency Prepared by Tomasz Kaciak CS157A.
Schema Refinement and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 6, 2004 Some slide content.
Databases 6: Normalization
Schema Normalization, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 11, 2005 Some slide content.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
Schema Refinement and Normal Forms Chapter 19 1 Database Management Systems 3ed, R.Ramakrishnan & J.Gehrke.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Computing & Information Sciences Kansas State University Tuesday, 27 Feb 2007CIS 560: Database System Concepts Lecture 18 of 42 Tuesday, 27 February 2007.
1 Lecture 6: Schema refinement: Functional dependencies
ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October.
Copyright, Harris Corporation & Ophir Frieder, The Process of Normalization.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
1 CS 430 Database Theory Winter 2005 Lecture 8: Functional Dependencies Second, Third, and Boyce-Codd Normal Forms.
Normal Forms Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems June 18, 2016 Some slide content courtesy of Susan Davidson.
1 CS122A: Introduction to Data Management Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li.
Lecture 11: Functional Dependencies
Normalization Database Management Systems, 3rd ed., Ramakrishnan and Gehrke, Chapter 19.
Functional Dependency and Normalization
Advanced Normalization
Design Theory for Relational Databases
CS422 Principles of Database Systems Normalization
Schedule Today: Next After that Normal Forms. Section 3.6.
Functional Dependency
Schema Refinement and Normal Forms
Relational Database Design
CS411 Database Systems 08: Midterm Review Kazuhiro Minami 1.
Module 5: Overview of Database Design -- Normalization
Normalization First Normal Form (1NF) Boyce-Codd Normal Form (BCNF)
CS422 Principles of Database Systems Normalization
Relational Database Design by Dr. S. Sridhar, Ph. D
Relational Database Design
CS 480: Database Systems Lecture 22 March 6, 2013.
STRUCTURE OF PRESENTATION :
Chapter 8: Relational Database Design
3.1 Functional Dependencies
Advanced Normalization
Schema Refinement and Normalization
Functional Dependencies and Normalization
Schema Refinement and Normalization
Schema Refinement and Normalization
Module 5: Overview of Normalization
Chapter 7: Relational Database Design
Schema Refinement What and why
Normalization Murali Mani.
Schema Refinement and Normalization
Schema Refinement and Normal Forms
Functional Dependencies and Relational Schema Design
Relational Data Base Design in Practice
Normalization Part II cs3431.
Schema Refinement and Normalization
Decomposition and Higher Forms of Normalization
Lecture 07: E/R Diagrams and Functional Dependencies
CS 405G: Introduction to Database Systems
Schema Refinement and Normal Forms
Chapter 19 (part 1) Functional Dependencies
Schema Refinement and Normalization
Relational Database Design
Relational Database Theory
Anomalies Boyce-Codd Normal Form 3rd Normal Form
Lecture 6: Functional Dependencies
Chapter 3: Design theory for relational Databases
Chapter 7a: Overview of Database Design -- Normalization
Functional Dependencies and Normalization
Functional Dependencies and Normalization
CS4222 Principles of Database System
Presentation transcript:

Handout 4 Functional Dependencies CIS 550 Handout 4 Functional Dependencies CIS550 Handout 4

Why we need relational design theory We don’t need it to design databases ER diagrams and related tools are much more understandable and effective. The theory is useful as a check on our designs to understand certain things that ER diagrams cannot do to help us understand the consequences of redundancy (which we may use for efficiency) CIS550 Handout 4

Not all designs are equally good Why is this design bad? And why is this one preferable? Data(Id#, Name, Address, C#, Description, Grade) Student(Id#, Name, Address) Course(C#, Description) Enrolled(Id#, C#, Grade) CIS550 Handout 4

An example of “bad” design Name Jones Smith Brown Address Phila NYC Boston C# Phil7 Math8 Eng12 Description Plato Topology Chaucer Grade A B C Id# 124 456 789 Information is redundantly given. E.g. Name and Address Some information, e.g., course information depends on the existence of some student. CIS550 Handout 4

Functional Dependencies Recall that a key is a set of attribute names. If two tuples agree on a key they agree everywhere -- they are the same. In our “bad” design, if two tuples agree on Id#, they agree on Address, even though they are not the same. We can say “Id# determines Address” -- written Id#  Address This is a functional dependency CIS550 Handout 4

Here are some functional dependencies that we expect to hold in our student-course database Id#  Name, Address C#  Description Id#,C#  Grade Note that any relation (good or bad design) should be constrained by these dependencies A functional dependency X  Y is simply a pair of sets. Notice the “sloppy” notation A,B  C,D or AB  CD rather than {A, B}  {C,D } CIS550 Handout 4

The Meaning of fd’s Defn. Given a relation scheme R (a set of attributes) and subsets X,Y of R, an instance r of R satisfies X  Y if, for any two tuples t1, t2 in R, t1[X ]=t2[X ] implies t1[Y ] = t2[Y ] N.B. We cannot look at a relation to determine which fd’s hold (we can tell if an it doesn’t satisfy an fd. CIS550 Handout 4

Basic Intuition in Relational Design A database scheme is “good” if all fd’s are of the form K  R where K is a key for R Example: Our “bad” design is bad because, for example Id#  Address is not a key for the relation scheme in which these attributes occur. However, it isn’t as simple as as this. A  A is a functional dependency for any attribute A. Are all attributes keys?? CIS550 Handout 4

Armstrong’s Axioms Some fd’s occur as consequences of others These can be deduced by Armstrong’s axioms: Reflexivity. If Y  X then X Y (These are called trivial dependencies). Example: Name, Address -> Address Augmentation. If X  Y then XW  YW Example: From C#  Description we deduce C#,Id#  Description, Id# Transitivity. If X  Y and Y  Z then X  Z Example: From Id#,C#  C# and C#  Description, we deduce Id#,C#  Description CIS550 Handout 4

Consequences of Armstrong’s Axioms Union. If X Y and X  Z then X  YZ Pseudotransitivity. If X  Y and WY  Z then XW  Z Decomposition. If X  Y and Z  Y then X  Z Prove these from Armstrong’s Axioms. CIS550 Handout 4

{X  Y | X  Y can be deduced from F by Armstrong’s Axioms} Closure of a set of fd’s Defn. Let F be a set of fd’s. The closure of F, F + is the set of fd’s {X  Y | X  Y can be deduced from F by Armstrong’s Axioms} Which of the following are in the the closure of our Student-Course fd’s? Address  Address C#  Description C#  Description, Name C#, Id#  Description, Name CIS550 Handout 4

Equivalence of fd sets Defn. Two sets of fd’s, F and G, are equivalent if F + = G + Example: {AB  C, A  B } and {A  C, A B } are equivalent. F + contains a huge number of fd’s (exponential in the size of the scheme). One naturally looks for small equivalent fd sets CIS550 Handout 4

Minimal Cover Defn. A fd set F is minimal if 1. Every fd in F is of the form where A is a (single) attribute, 2. For no X  A F is F \ {X  A } equivalent to F. 3. For no X  A in F and Z X is F \{X  A }  {Z A } equivalent to F. Example (from previous slide) {A  C, A B } is a minimal cover for {AB  C, A  B } CIS550 Handout 4

More on closures Fact. If F is a set of fd’s and X  Y  F + then there exists an attribute A s.t. X A  F +. Proof. Assume otherwise Let Y = {A1,..., An}. Then X  A1, ..., X  An are in F + . Therefore X  A1 ... An is in F +, i.e., X  Y is in F + Notation: F (X ) for  {Y | X Y  F +} CIS550 Handout 4

Why Armstrong’s Axioms? Why are Armstrong’s axioms (or an equivalent rule set) appropriate for fd’s? They are consistent and complete “Consistent” means that any relation that satisfies the fd’s in F will satisfy the fd’s in F + “Complete” means that if an fd X  Y cannot be derived by Armstrong’s axioms from F. Then there’s a relational instance satisfying F but not X  Y. In other words, Armstrongs axioms derive all the fd’s that should hold. CIS550 Handout 4

Proof of consistency This comes directly from the definition. Consider augmentation, for example. This says that if XY then XW  YW. If a relation instance satisfies X  Y then for any tuples t1, t2 r. If t1[X]=t2[X] then t1[Y] = t2[Y]. If, in addition, t1[W]=t2[W] then t1[YW]=t2[YW] (remember that we are using “sloppy” notation -- YW for YW) CIS550 Handout 4

Proof of Completeness To prove completeness we suppose X  Y  F + and construct a relation instance that satisfies F + but not X  Y. By our previous result, we know there is an attribute A  X such that X  A  F +. Our relation has 2 tuples. They agree on F (X ) but disagree everywhere else. x1 x2 ... xn a1,1 v1 v2 ... vm w1,1 w2,1... x1 x2 ... xn a1,2 v1 v2 ... vm w1,2 w2,2... X A F(X) \ X rest of R CIS550 Handout 4

Proof of Completeness cont’d It is immediate that this relation fails to satisfy XA and hence X  Y. We also have to check that it does satisfy any fd in F + . The tuples agree on only F (X ) . Thus the only fd’s that might be violated are of the form X’  Y’ where X’  F (X ). But if X’  Y’ F + and X’  F (X ) then Y’  F (X ) (reflexivity and augmentation). Therefore X’  Y’ is satisfied. CIS550 Handout 4

Data(Id#, Name, Address, C#, Description, Grade) Decomposition Consider our attribute set We could decompose it into But this decomposition loses information about the relationship between students and courses. Why? Data(Id#, Name, Address, C#, Description, Grade) R1 (Id#, Name, Address,) R2(C#, Description, Grade) CIS550 Handout 4

Lossless Join Decomposition R1, … Rk is a lossless join of R with respect to a fd set F if for every instance r of R that satisfies F, R1 r ... R1 r= r Consider What happens if we decompose on (Id#, Name,Address) and (C#,Description, Grade)? Name Jones Brown Address Phila Boston C# Phil7 Math8 Description Plato Topology Grade A C Id# 124 789 CIS550 Handout 4

Testing for lossless join Fact. R1, R2 is a lossless join decomposition of R with respect to F iff at least one of the following dependencies is in F (R1  R2)  R1 \ R2 Example: WRT the fd set Id#  Name, Address C#  Description Id#,C#  Grade Is (Student,Name,Address) and (Student, C#, Description, Grade) a lossless decomposition? CIS550 Handout 4

Dependency preservation Suppose we update a relation in a database. Can we easily check whether a fd XY is violated. We can if X Y is contained within set of attributes The projection of an fd set F onto a set of attributes Z, FZ is {XY | XYF + and X Y Z } A decomposition R1, …, Rk is dependency preserving if F + = (FR1...FRk)+ This means that the decomposition hasn’t “lost” any essential fd’s CIS550 Handout 4

{Sname, Sadd, City, Zip, Item, Price} An example A relation scheme {Sname, Sadd, City, Zip, Item, Price} A fd set Sname  Sadd, City Sadd,City  Zip Sname,Item  Price Consider the decomposition {Sname,Sadd, City,Zip} and{Sname,Item,Price} Is it lossless? Is it dependency preserving? What if we replaced the first fd by Sname, Sadd  City ? CIS550 Handout 4

Another example The scheme: {Student, Teacher, Subject} The fd set: Teacher  Subject Student, Subject  Teacher The decomposition: {Student, Teacher} and {Teacher, Subject} Is it lossless? Is it dependency preserving? CIS550 Handout 4

Fd’s and keys Earlier we stated that the idea in relational database design (from fd’s) is to obtain a design such that for each nontrivial dependency XY , X is a super-key for some relation scheme in R The last example shows that this cannot always be achieved in a way that preserves dependencies. This leads to two notions of normal forms CIS550 Handout 4

Normal forms Boyce-Codd Normal Form (BCNF). For every relation scheme R and for every X  A that holds over R, either A  X (it is trivial) ,or or X is a superkey for R Third Normal Form (3NF) For every relation scheme R and for every X  A that holds over R, either A  X (it is trivial), or X is a superkey for R, or A is a member of some key for R. CIS550 Handout 4

Normal Forms contd. BCNF is clearly desirable, but the teacher/student/subject example shows that it is not always obtainable. BCNF is stronger than 3NF There are algorithms to obtain A BCNF lossless join decomposition A 3NF lossless join, dependency preserving decomposition The 3NF algorithm uses a minimal cover. CIS550 Handout 4

BCNF Decomposition Algorithm RES:= {R} //R = set of all attributes while there is a scheme S in RES that is not in BCNF do begin let A  B be a nontrivial functional dependency that holds on S such that A  S is not in F+ and A and B are disjoint RES:= (RES-{S})  {S-B}  {AB} end CIS550 Handout 4

3NF Decomposition Algorithm let F be a minimal cover. RES = {} for each A  B in F do if none of the schemes in RES contains AB then RES:= RES {AB} if none of the schemes in RES contains a candidate key for R then RES:= RES  {any candidate key for R} CIS550 Handout 4