Lecture 09: Functional Dependencies, Database Design

Slides:



Advertisements
Similar presentations
Lecture 6: Design Constraints and Functional Dependencies January 21st, 2004.
Advertisements

Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science San Jose State University.
Functional Dependencies Definition: If two tuples agree on the attributes A, A, … A 12n then they must also agree on the attributes B, B, … B 12m Formally:
1 Introduction to Database Systems CSE 444 Lectures 8 & 9 Database Design April 16 & 18, 2008.
Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000.
Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science San Jose State University.
1 Lecture 06 The Relational Data Model. 2 Outline Relational Data Model Functional Dependencies FDs in ER Logical Schema Design Reading Chapter 8.
M.P. Johnson, DBMS, Stern/NYU, Spring C : Database Management Systems Lecture #7 Matthew P. Johnson Stern School of Business, NYU Spring,
Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee.
Functional Dependencies CS 186, Spring 2006, Lecture 21 R&G Chapter 19 Science is the knowledge of consequences, and dependence of one fact upon another.
Functional Dependencies and Relational Schema Design.
E/R Diagrams and Functional Dependencies. Modeling Subclasses The world is inherently hierarchical. Some entities are special cases of others We need.
Lecture 08: E/R Diagrams and Functional Dependencies.
Lecture 09: Functional Dependencies. Outline Functional dependencies (3.4) Rules about FDs (3.5) Design of a Relational schema (3.6)
Functional Dependencies. FarkasCSCE 5202 Reading and Exercises Database Systems- The Complete Book: Chapter 3.1, 3.2, 3.3., 3.4 Following lecture slides.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Functional Dependencies. Outline Functional dependencies (3.4) Rules about FDs (3.5) Design of a Relational schema (3.6)
1 Lecture 7: Normal Forms, Relational Algebra Monday, 10/15/2001.
Functional Dependencies R&G Chapter 19 Science is the knowledge of consequences, and dependence of one fact upon another. Thomas Hobbes ( )
Functional Dependencies and Relational Schema Design.
Lecture 3: Conceptual Database Design and Schema Design April 12 th, 2004.
1 Lecture 08: E/R Diagrams and Functional Dependencies Friday, January 21, 2005.
CS542 1 Schema Refinement Chapter 19 (part 1) Functional Dependencies.
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
1 Lecture 9: Database Design Wednesday, January 25, 2006.
Formal definition of a key A key is a set of attributes A 1,..., A n such that for any other attribute B: A 1,..., A n  B A minimal key is a set of attributes.
Lecture 11: Functional Dependencies
CS422 Principles of Database Systems Normalization
Functional Dependencies
COP4710 Database Systems Relational Design.
CS422 Principles of Database Systems Normalization
3.1 Functional Dependencies
Handout 4 Functional Dependencies
Schema Refinement & Normalization Theory
Functional Dependencies
Problems in Designing Schema
Introduction to Database Systems CSE 544
Cse 344 May 18th – Loss and views.
Lecture 6: Design Theory
Faloutsos & Pavlo SCS /615 Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Lecture #16: Schema Refinement & Normalization.
Normalization Murali Mani.
Functional Dependencies and Normalization
Cse 344 May 11th – Entities.
Functional Dependencies and Normalization
Lecture 2: Database Modeling (end) The Relational Data Model
Cse 344 May 16th – Normalization.
Functional Dependencies and Relational Schema Design
Functional Dependencies
Introduction to Database Systems CSE 444 Lectures 8 & 9 Database Design October 12 & 15, 2007.
Lecture 8: Database Design
Functional Dependencies
Lecture 07: E/R Diagrams and Functional Dependencies
Functional Dependencies
Normalization cs3431.
Instructor: Mohamed Eltabakh
CSE544 Data Modeling, Conceptual Design
Lectures 13: Design Theory II
Functional Dependencies
Lecture 06: SQL Monday, October 11, 2004.
Chapter 19 (part 1) Functional Dependencies
Functional Dependencies
Terminology Product Attribute names Name Price Category Manufacturer
Lecture 08: E/R Diagrams and Functional Dependencies
Lecture 6: Functional Dependencies
Lecture 11: Functional Dependencies
Lecture 09: Functional Dependencies
CS4222 Principles of Database System
Presentation transcript:

Lecture 09: Functional Dependencies, Database Design Friday, October 17, 2003

Outline Functional dependencies (3.4) Rules about FDs (3.5) Design of a Relational schema (3.6)

Functional Dependencies Definition: A1, ..., Am  B1, ..., Bn holds in R if: t, t’  R, (t.A1=t’.A1  ...  t.Am=t’.Am  t.B1=t’.B1  ...  t.Bm=t’.Bm ) R A1 ... Am B1 Bm t if t, t’ agree here then t, t’ agree here t’

Inference If some FDs are satisfied, then others are satisfied too name  color category  department color, category  price If all these FDs are true: Then this FD also holds: name, category  price Why ??

Inference Rules for FD’s A1, A2, …, An  B1, B2, …, Bm Splitting rule and Combing rule Is equivalent to A1 ... Am B1 Bm A1, A2, …, An  B1 A1, A2, …, An  B2 . . . . . A1, A2, …, An  Bm

Inference Rules for FD’s (continued) Trivial Rule A1, A2, …, An  Ai where i = 1, 2, ..., n A1 … Am Why ?

Inference Rules for FD’s (continued) Transitive Closure Rule If A1, A2, …, An  B1, B2, …, Bm Why ? and B1, B2, …, Bm  C1, C2, …, Cp then A1, A2, …, An  C1, C2, …, Cp A1 … Am B1 Bm C1 ... Cp

Example (continued) Which Rule did we apply ? Inferred FD 1. name  color 2. category  department 3. color, category  price Start from the following FDs: Infer the following FDs: Inferred FD Which Rule did we apply ? 4. name, category  name 5. name, category  color 6. name, category  category 7. name, category  color, category 8. name, category  price

Example (continued) 1. name  color 2. category  department 3. color, category  price Answers: Inferred FD Which Rule did we apply ? 4. name, category  name Trivial rule 5. name, category  color Transitivity on 4, 1 6. name, category  category 7. name, category  color, category Split/combine on 5, 6 8. name, category  price Transitivity on 3, 7

Another Example Enrollment(student, major, course, room, time) course  time What else can we infer ? [in class, or at home]

Another Rule Augmentation A1, A2, …, An  B If then A1, A2, …, An , C1, C2, …, Cp  B Augmentation follows from trivial rules and transitivity How ?

Problem: infer ALL FDs Given a set of FDs, infer all possible FDs How to proceed ? Try all possible FDs, apply all 3 rules E.g. R(A, B, C, D): how many FDs are possible ? Drop trivial FDs, drop augmented FDs Still way too many Better: use the Closure Algorithm (next)

Closure of a set of Attributes Given a set of attributes A1, …, An The closure, {A1, …, An}+ , is the set of attributes B s.t. A1, …, An  B Example: name  color category  department color, category  price Closures: name+ = {name, color} {name, category}+ = {name, category, color, department, price} color+ = {color}

Closure Algorithm Start with X={A1, …, An}. Example: Repeat until X doesn’t change do: if B1, …, Bn  C is a FD and B1, …, Bn are all in X then add C to X. Example: name  color category  department color, category  price {name, category}+ = {name, category, color, department, price}

Example In class: A, B  C R(A,B,C,D,E,F) A, D  E B  D A, F  B Compute {A,B}+ X = {A, B, } Compute {A, F}+ X = {A, F, }

Using Closure to Infer ALL FDs Example: A, B  C A, D  B B  D Step 1: Compute X+, for every X: A+ = A, B+ = BD, C+ = C, D+ = D AB+ = ABCD, AC+ = AC, AD+ = ABCD ABC+ = ABD+ = ACD+ = ABCD (no need to compute– why ?) BCD+ = BCD, ABCD+ = ABCD Step 2: Enumerate all FD’s X  Y, s.t. Y  X+ and XY = : AB  CD, ADBC, ABC  D, ABD  C, ACD  B

Problem: find FD from the data Given a database instance Find all FD’s satisfied by that instance Useful if we don’t get enough information from our users: need to reverse engineer a data instance

In Class: Find All FDs Student Dept Course Room Alice CSE C++ 020 Bob EE HW 040 Carol DB 045 Dan Java 050 Elsa Frank Circuits Do all FDs make sense in practice ? Course  Dept, Room Dept, Room  Course Student, Dept  Course, Room Student, Course  Dept, Room Student, Room  Dept, Course

Answer Course  Dept, Room Do all FDs make sense in practice ? Dept, Room  Course Student, Dept  Course, Room Student, Course  Dept, Room Student, Room  Dept, Course Do all FDs make sense in practice ?

Keys A key is a set of attributes A1, ..., An s.t. for any other attribute B, we have A1, ..., An  B A minimal key is a set of attributes which is a key and for which no subset is a key Note: book calls them superkey and key

Computing Keys Compute X+ for all sets X If X+ = all attributes, then X is a key List only the minimal keys Note: there can be many minimal keys ! Example: R(A,B,C), ABC, BCA Minimal keys: AB and BC

Examples of Keys Product(name, price, category, color) name, category  price category  color Keys are: {name, category} and all supersets Enrollment(student, address, course, room, time) student  address room, time  course student, course  room, time Keys are: [in class] Keys: {student, room, time}, {student, course} and all supersets

FD’s for E/R Diagrams Given a relation constructed from an E/R diagram, what is its key? Rule 1: If the relation comes from an entity set, the key of the relation is the set of attributes which is the key of the entity set. address name ssn Person Person(address, name, ssn)

FD’s for E/R Diagrams buys(name, ssn, date) Rule 2: If the relation comes from a many-many relationship, the key of the relation is the set of all attribute keys in the relations corresponding to the entity sets name buys Person Product price name ssn date buys(name, ssn, date)

FD’s for E/R Diagrams Purchase(name , sname, ssn, card-no) Except: if there is an arrow from the relationship to E, then we don’t need the key of E as part of the relation key. Purchase Product Person Store CreditCard sname name card-no ssn Purchase(name , sname, ssn, card-no)

FD’s for E/R Diagrams More rules: Many-one, one-many, one-one relationships Multi-way relationships Weak entity sets (Try to find them yourself, or check book)

Incomplete (what does it say ?) FD’s for E/R Diagrams Say: “the CreditCard determines the Person” Product sname Purchase name Store Incomplete (what does it say ?) card-no CreditCard Person ssn Purchase(name , sname, ssn, card-no) card-no  ssn

Relational Schema Design (or Logical Design) Main idea: Start with some relational schema Find out its FD’s Use them to design a better relational schema

Data Anomalies When a database is poorly designed we get anomalies: Redundancy: data is repeated Updated anomalies: need to change in several places Delete anomalies: may lose data when we don’t want

Relational Schema Design Recall set attributes (persons with several phones): Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle 206-555-6543 Joe 987-65-4321 908-555-2121 Westfield SSN  Name, City but not SSN  PhoneNumber Anomalies: Redundancy = repeat data Update anomalies = Fred moves to “Bellevue” Deletion anomalies = Joe deletes his phone number: what is his city ?

Relation Decomposition Break the relation into two: Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle 206-555-6543 Joe 987-65-4321 908-555-2121 Westfield Name SSN City Fred 123-45-6789 Seattle Joe 987-65-4321 Westfield SSN PhoneNumber 123-45-6789 206-555-1234 206-555-6543 987-65-4321 908-555-2121 Anomalies have gone: No more repeated data Easy to move Fred to “Bellevue” (how ?) Easy to delete all Joe’s phone number (how ?)

Relational Schema Design Person buys Product name price ssn Conceptual Model: Relational Model: plus FD’s Normalization: Eliminates anomalies

Decompositions in General R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp) R1 = projection of R on A1, ..., An, B1, ..., Bm R2 = projection of R on A1, ..., An, C1, ..., Cp

Decomposition Sometimes it is correct: Lossless decomposition Name Price Category Gizmo 19.99 Gadget OneClick 24.99 Camera Name Price Gizmo 19.99 OneClick 24.99 Name Category Gizmo Gadget OneClick Camera Lossless decomposition

Incorrect Decomposition Sometimes it is not: Name Price Category Gizmo 19.99 Gadget OneClick 24.99 Camera What’s incorrect ?? Name Category Gizmo Gadget OneClick Camera Price Category 19.99 Gadget 24.99 Camera Lossy decomposition

Decompositions in General R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp) If A1, ..., An  B1, ..., Bm Then the decomposition is lossless Note: don’t need necessarily A1, ..., An  C1, ..., Cp Example: name  price, hence the first decomposition is lossless