Functional Dependencies (FDs)

Slides:



Advertisements
Similar presentations
Schema Refinement: Normal Forms
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Shantanu Narang.  Background  Why and What of Normalization  Quick Overview of Lower Normal Forms  Higher Order Normal Forms.
Normalisation to 3NF Database Systems Lecture 11 Natasha Alechina.
Announcements Read 6.1 – 6.3 for Wednesday Project Step 3, due now Homework 5, due Friday 10/22 Project Step 4, due Monday Research paper –List of sources.
NORMALIZATION. Normalization Normalization: The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Database Management COP4540, SCS, FIU Functional Dependencies (Chapter 14)
Relational Normalization Theory. Limitations of E-R Designs Provides a set of guidelines, does not result in a unique database schema Does not provide.
Functional Dependencies - Example
Lossless Decomposition (2) Prof. Sin-Min Lee Department of Computer Science San Jose State University.
Functional Dependencies, Normalization Rose-Hulman Institute of Technology Curt Clifton.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.
603 Database Systems Senior Lecturer: Laurie Webster II, M.S.S.E.,M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 8 A First Course in Database Systems.
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
Winter 2002Arthur Keller – CS 1804–1 Schedule Today: Jan. 15 (T) u Normal Forms, Multivalued Dependencies. u Read Sections Assignment 1 due. Jan.
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Fall 2001Arthur Keller – CS 1804–1 Schedule Today Oct. 4 (TH) Functional Dependencies and Normalization. u Read Sections Project Part 1 due. Oct.
Chapter 14 Advanced Normalization Transparencies © Pearson Education Limited 1995, 2005.
Functional Dependencies and Relational Schema Design.
Ch 7: Normalization-Part 2 Much of the material presented in these slides was developed by Dr. Ramon Lawrence at the University of Iowa.
Chapter 8: Relational Database Design First Normal Form First Normal Form Functional Dependencies Functional Dependencies Decomposition Decomposition Boyce-Codd.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
Normalization Goal = BCNF = Boyce-Codd Normal Form = all FD’s follow from the fact “key  everything.” Formally, R is in BCNF if for every nontrivial FD.
Copyright © Curt Hill Schema Refinement III 4 th NF and 5 th NF.
Relational Database Design by Relational Database Design by Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING.
Lecture 6 Normalization: Advanced forms. Objectives How inference rules can identify a set of all functional dependencies for a relation. How Inference.
Your name here. Improving Schemas and Normalization What are redundancies and anomalies? What are functional dependencies and how are they related to.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Revisit FDs & BCNF Normalization 1 Instructor: Mohamed Eltabakh
Functional Dependencies. FarkasCSCE 5202 Reading and Exercises Database Systems- The Complete Book: Chapter 3.1, 3.2, 3.3., 3.4 Following lecture slides.
IST 210 Normalization 2 Todd Bacastow IST 210. Normalization Methods Inspection Closure Functional dependencies are key.
© D. Wong Ch. 3 (continued)  Database design problems  Functional Dependency  Keys of relations  Decompositions based on Functional Dependency.
Copyright, Harris Corporation & Ophir Frieder, The Process of Normalization.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
CS 405G: Introduction to Database Systems
Normalization.
Chapter 5.1 and 5.2 Brian Cobarrubia Database Management Systems II January 31, 2008.
3 Spring Chapter Normalization of Database Tables.
Databases Illuminated
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
Multivalued Dependencies Fourth Normal Form Tony Palladino 157B.
Functional Dependencies CIS 4301 Lecture Notes Lecture 8 - 2/7/2006.
Multivalued Dependencies and 4th NF CIS 4301 Lecture Notes Lecture /21/2006.
Ch 7: Normalization-Part 1
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
1 CS 430 Database Theory Winter 2005 Lecture 8: Functional Dependencies Second, Third, and Boyce-Codd Normal Forms.
Normal Forms Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems June 18, 2016 Some slide content courtesy of Susan Davidson.
4NF & MULTIVALUED DEPENDENCY By Kristina Miguel. Review  Superkey – a set of attributes which will uniquely identify each tuple in a relation  Candidate.
More on Decompositions and Third Normal Form CIS 4301 Lecture Notes Lecture /16/2006.
Advanced Normalization
Schedule Today: Next After that Normal Forms. Section 3.6.
CS422 Principles of Database Systems Normalization
Relational Database Design by Dr. S. Sridhar, Ph. D
Schedule Today: Jan. 23 (wed) Week of Jan 28
3.1 Functional Dependencies
Advanced Normalization
Functional Dependencies and Normalization
Mulitvalued Dependencies
Instructor: Mohamed Eltabakh
Chapter 3: Multivalued Dependencies
Chapter 3: Design theory for relational Databases
Presentation transcript:

Functional Dependencies (FDs)

A function dependency on a relation means that some attribute is a function of a group of other attributes: A4 = f(A1, A2, A3) Notice that functions are single-valued, so if two tuples agree on A1, A2, A3, then they will also agree on A4. It’s this forced agreement that is the functional dependency.

The function may also be multiple attributes: (A3, A5, A6) = f(A1, A2, A3) We write A1, A2, A3-> A4, A5, A6

Keys of Relations A1, A2, A3 -> all other attributes No subset of (A1, A2, A3) -> all other attributes A key must be minimal We can’t throw out an attribute from this key and have it remains a key This is the smallest key A key is a minimal-sized set of attributes that functionally determine all the other attributes of a relation In ER, keys need not be minimal. In the relational data model, keys must be minimal

Superkey A superkey is a set of attributes that contains a key ES: relation’s key is the key attributes for the ES E-R-F: many-many: R’s key contains the key attributes from both E and F E-R->F: many-one: R’s key contains the key attributes only from E E<-R->F: one-one: R’s key contains the key from either E or F (not unique)

Rules for FDs and Reasoning about them A1, A2, A3 -> B1, B2, B3 is shorthand for Splitting Rule: going from the multi-valued form to the list of single-valued form A1, A2, A3->B1 A1, A2, A3->B2 Combining Rule: going from single-valued form to multi-valued

Trivial dependencies A FD A1, A2 … An -> B is said to be trivial if B is one of the A’s. for example, Title, year -> title In general: A1, A2, A3 -> B1, B2, B3 Trivial if the Bs are a subset of As Nontrivial if at least one B is not an A Completely nontrivial: no B is an A Any trivial dependency can be assumed

Computing the Closure of Attributes Given a set of attributes A={A1, A2, A3} and a set of FDs S We can think of A as a subset of the attributes of a relation R, and the FDs S as being FDs of that relation R The closoure of A under S is the set of attributes B, such that every relation that satisfies the FDs in S also satisfies A1 A2 A3 -> B. (we want to compute the B!)

A set T=A. Now apply all your FDs in S that match the attributes you have in T. Their right-hand-side will take you to new attributes. Union those attributes you have back into T. Repeat. Keep going until you can introduce no new attributes. Then T is your enclosure of A under S

Given closure, we can now determine if a new FD follows from existing FDs FOLLOWS FROM TEST Let’s test A1, A2, A3 -> B Find {A1, A2, A3}+ If it contains B then A1, A2, A3 -> B follows from the FDs in S

Closure and Keys {A1, A2, A3}+ contains all the attributes of R iff A1 A2 A3 is a superset for R Given a superkey, we can test its subsets to see if they are also superkeys. If no subesets exits, then the superkey must be a key Proof of why closure works omitted

Transitivity If A1, A2, A3->B1, B2 and B1, B2->C1, C2 then A1, A2, A3->C1, C2 GIVEN versus DERIVED FDs BASIS: a set FDs from which all the FDs can be derived.

PROJECTION Suppose we have a relation R with some FDs F, and we “project” R by eliminating certain attributes from the schema and get S. What FD’s hold in S? Computing all FSs that Follow from F, and Involve only attributes of S The calculation is exponential for a large number of such FDs (many of them may be redundant since they follow from other such FDs)

All Inference Rules Reflexivity: if {B1, B2, B3} is a subset of {A1, A2, A3}, then A1, A2, A3->B1, B2, B3 Augmentation: if A1, A2, A3-> B1, B2 B3 then A1, A2, A3, C1, C2-> B1, B2 B3 C1, C2 for all C1, C2 Transitivity: A1, A2-> B1, B2 and B1, B2-> C1, C2 then A1, A2-> C1, C2

Relational Data Model (schema design and normal forms) Recall that a schema is a template for a relation and a relation is just a table, as we saw from SQL days at the beginning of class Except here we use the original relational model in which a relation is a set and not a bag

Projection Rehash Given a relation R and its FDs F (A->B = “A determine B” We can use the FDs to find the keys and superkeys We can talk about a minimal basis of FDs from which we can derive all the others using our inference rules We can use closure to figure out all that the FDs give us to know, given some initial set of knowns But what does this means when it comes to the data in the relation?

Let’s say you have two tuples (a, b, c) and you project out the third attributes. Now we have (a, b), (a, b). But you no longer have a set because you have duplicates. So, projection means that you turn the input back into a set. Multiple tuples in the original relation can get flattened into a single tuple in the projected relation

What about FDs? We would expect that fewer FDs (or the same number) would hold in the projected relation than in the original one. What are they? They are the ones that follow from the original Fds, but that involve only attributes of the new relation.

Design After we hav finished doing ER->relation, or we have done relation to start with, we have a set of relations, with each relation having a set of associated FDs. Now we’ll talk about what it means to “normalize” each of those relations.

Normalization Take each relation and its FDs and convert it into a set of relations, each with its own FDs such that the relations make it easy to avoid anomalies. Avoiding anomalies boils down to avoiding redundancy and duplication while still preserving all the meaning in the originals.

There’s a range of “normal forms” for a “relation schema” 5NF>=4NF>=BCNF>=3NF>=2NF>=1NF The book will talk about BCNF first, then 3NF and then 4NF The handout talks about the other normal forms

Exmaple schema Candidate(lastname, firstname, address, party, partychair, partyaddress) Last, first, address->party, chair party->partychair Party->partyaddress We will pretend that (lastname, firstname, address) is the key

Anomoly Redundancy Updated anomalies Deletion anomalies Information repeated in more than one tuple Updated anomalies What is information is repeated in more than one tuple, but when you update that tuple you forget to update the others? (broken an FD) Deletion anomalies What if information is repeated in more than one tuple and you delete *all* of tuples? (where is the information now? We may loose other information as a side effect)

Decomposing Relations Take some relation and turn it into two relations R(A1, A2…An) can be decomposed into S(B1, B2…Bm) and T(C1, C2…Ck) If we union the attributes of the two relations, we get the set of attributes f the original relation {B1, B2…Bm} U{C1, C2…Ck}={A1, A2…A n} The tuples in S are projections onto B1, B2… This means there could be fewer tuples in S than in R. Similar for T

BCNF A relation R is in BCNF iff whenever there is a nontrivial FD A1, A2…An->B for R, it is the case that {A1, A2…An} is a superkey for R Recall that a superkey need not be minimal, so an equivalent statement of BCNF is that the left side of every nontrivial FD must contain a key

How do we convert a relation into BCNF? We don’t. We decompose it into relations that seperately are in BCNF, then we project the tuples into those relations For example, the party->partychair is a violating FD How to decompose?

Make two new schemas by PROJECTION Augment the FD with other FD’s RHS that are determinded by the LHS or subset of it We have party->partychair and party->partyaddress, so we’ll create party->partychair, partyaddress Make two new schemas by PROJECTION contain all the attributes of the FD Parties(party, partychair, partyaddress) Contain all the attributes in the originl relation except those in the RHS of the FD Candidates(last, first, address, party) Use the projection scheme to find the FDs of each.

Getting Stuff Back (Joins) Pick some tuple in Candidates. We now know the party. Use party to go find the matching tuples in Parties(just one) and append the Candidate attribute to it. Now we have the original tuple back. Pick some tuple in Parties. Now we have a party. Find all matching candidates in candidates, append the party tuple to it. Now we have the original set of tuples back.

This is sometimes called an “equijoin”, because you are joining tuples in Parties to tuples in Candidates which match(are equal) in the party. If we decompose a relation according to the above algorithm, we are guaranteed that we can always recover the original relation through join.

3NF Compare with BCNF, it is relaxed a bit A relation R is in 3NF if: whenever A1, A2…An->B is a nontrivial FD, either {A1, A2…An} is a superkey, or B is a member of some key

Example: we want to convert this schema into BCNF: Offices(office, pollingplace, city), with pollingplace->city office, city->pollingplace We claim that “office, city” is clearly a key and that “pollingplace, office” is also a key So it looks like pollingplace->city is a violation, since pollingplace is not a superkey, we split into (pollingplace, city) and (office, pollingplace) We now break the original FD office, city->pollingplace Now let’s look at the offending FD in the context of 3NF.

We would say that it’s ok because “city” a party of a party of the key “office, key”, hence pollingplace->city is acceptable even though pollingplace is not a superkey. The key point here is that 3NF relax BCNF’s requirement to allow relation schema like above. It can be proved that 3NF is in fact adequate for its purpose, that is we can always decompose a relation schema in a way that does not lose information, into schemas that are in 3NF and allow all FDs to be checked.

Other normal forms 1NF: every component of every tuple must be atomic. No structures, or complex data structure, even arrays. (SQL forces this) 2NF: 1NF+FDs permitted, no LHS of an FD can be a proper subset of a key. The key determines all the nonkey attributes 3NF: 2NF + BVNF: 3NF+ 4NF: BCNF+ multivalued dependencies 5NF: 4NF+ 3NF+ all keys are single attributes (join-projection normal form(JPNF)) DKNF: may be impossible

Multivalued Dependencies A multivalud dependency (MVD) is an assertion that two attributes or sets of attributes are independent of one other. We say that MVD A1A2…An->-> B1B1…Bm holds for a relation R if for all tuples, which we can view as (A1A2…An C1C2..Ck B1B2…Bm), when we pick (A1A2…An) value, the (B1B2..Bm) value is independent of (C1C2…Ck) value

Reasoning about multivalued dependencies Trivial dependencies rule If MVD A1A2…An->->B1B2…Bm holds for a relation, then does A1A2…An->->C1C2…Ck, where the C’s are the B’s plus one or more of the A’s, or the B’s minus one or more of the A’s with D’s are those B’s that are not among A’s

Transitive rule If MVD A1A2…An->->B1B2…Bm and B1B2…Bm->>C1C2…Ck hold for a relation, then so does A1A2…An->-> C1C2…Ck, however any C’s that are also B’s must be deleted from the right side.

Complementation rule If MVD A1A2…An->->B1B2…Bm holds for a relation, then R also satisfies A1A2…An->->C1C2…Ck, where the C’s are all attributes of R not among the A’s and B’s.