Lecture 8: Database Design

Slides:



Advertisements
Similar presentations
Lecture 6: Design Constraints and Functional Dependencies January 21st, 2004.
Advertisements

Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science San Jose State University.
1 Lecture 10: Database Design XML Wednesday, October 20, 2004.
Functional Dependencies Definition: If two tuples agree on the attributes A, A, … A 12n then they must also agree on the attributes B, B, … B 12m Formally:
1 Introduction to Database Systems CSE 444 Lectures 8 & 9 Database Design April 16 & 18, 2008.
Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000.
Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science San Jose State University.
1 Lecture 06 The Relational Data Model. 2 Outline Relational Data Model Functional Dependencies FDs in ER Logical Schema Design Reading Chapter 8.
M.P. Johnson, DBMS, Stern/NYU, Spring C : Database Management Systems Lecture #7 Matthew P. Johnson Stern School of Business, NYU Spring,
Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee.
Relation Decomposition A, A, … A 12n Given a relation R with attributes Create two relations R1 and R2 with attributes B, B, … B 12m C, C, … C 12l Such.
Functional Dependencies and Relational Schema Design.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Lecture 09: Functional Dependencies. Outline Functional dependencies (3.4) Rules about FDs (3.5) Design of a Relational schema (3.6)
Functional Dependencies. Outline Functional dependencies (3.4) Rules about FDs (3.5) Design of a Relational schema (3.6)
1 Lecture 7: Normal Forms, Relational Algebra Monday, 10/15/2001.
1 Lecture 10: Database Design Wednesday, January 26, 2005.
Functional Dependencies and Relational Schema Design.
Lecture 3: Conceptual Database Design and Schema Design April 12 th, 2004.
Lecture 13: Relational Decomposition and Relational Algebra February 5 th, 2003.
1 Lecture 10: Database Design and Relational Algebra Monday, October 20, 2003.
1 Lecture 9: Database Design Wednesday, January 25, 2006.
Formal definition of a key A key is a set of attributes A 1,..., A n such that for any other attribute B: A 1,..., A n  B A minimal key is a set of attributes.
Lecture 11: Functional Dependencies
Design Theory for Relational Databases
CS422 Principles of Database Systems Normalization
Schedule Today: Next After that Normal Forms. Section 3.6.
COP4710 Database Systems Relational Design.
CS411 Database Systems 08: Midterm Review Kazuhiro Minami 1.
Normalization First Normal Form (1NF) Boyce-Codd Normal Form (BCNF)
CS422 Principles of Database Systems Normalization
CPSC-310 Database Systems
CS411 Database Systems 08: Midterm Review Kazuhiro Minami 1.
3.1 Functional Dependencies
Handout 4 Functional Dependencies
Problems in Designing Schema
Introduction to Database Systems CSE 544
Cse 344 May 18th – Loss and views.
Functional Dependencies and Normalization
Lecture 6: Design Theory
Module 5: Overview of Normalization
Design Theory for Relational Databases
Normalization Murali Mani.
Lecture #17: Schema Refinement & Normalization - Normal Forms
Functional Dependencies and Normalization
Cse 344 May 11th – Entities.
Functional Dependencies and Normalization
Lecture 2: Database Modeling (end) The Relational Data Model
Cse 344 May 16th – Normalization.
Functional Dependencies and Relational Schema Design
Introduction to Database Systems CSE 444 Lectures 8 & 9 Database Design October 12 & 15, 2007.
Lecture 09: Functional Dependencies, Database Design
Normalization Part II cs3431.
Lecture 07: E/R Diagrams and Functional Dependencies
Normalization cs3431.
CS 405G: Introduction to Database Systems
Instructor: Mohamed Eltabakh
CSE544 Data Modeling, Conceptual Design
Functional Dependencies
Instructor: Mohamed Eltabakh
Terminology Product Attribute names Name Price Category Manufacturer
Lecture 08: E/R Diagrams and Functional Dependencies
Lecture 6: Functional Dependencies
Lecture 11: Functional Dependencies
Chapter 3: Design theory for relational Databases
Functional Dependencies and Normalization
Lecture 09: Functional Dependencies
CS4222 Principles of Database System
Design Theory for Relational Databases
Presentation transcript:

Lecture 8: Database Design Friday, October 14, 2005

Announcements Homework 1: solutions are posted Homework 2: posted (due: Oct. 24) Project Phase 1A: due tonight Midterm: still November 2nd Final: still December 12 Website is now updated with all dates

Outline Design of a Relational schema (3.6)

Keys A superkey is a set of attributes A1, ..., An s.t. for any other attribute B, we have A1, ..., An  B A key is a minimal superkey I.e. set of attributes which is a superkey and for which no subset is a superkey

Computing (Super)Keys Compute X+ for all sets X If X+ = all attributes, then X is a key List only the minimal X’s

Example Product(name, price, category, color) name, category  price Keys: {student, room, time}, {student, course} and all supersets What is the key ?

Example Product(name, price, category, color) name, category  price Keys: {student, room, time}, {student, course} and all supersets What is the key ? (name, category) + = name, category, price, color Hence (name, category) is a key

Examples of Keys Enrollment(student, address, course, room, time) room, time  course student, course  room, time Keys: {student, room, time}, {student, course} and all supersets (find keys at home)

Eliminating Anomalies Main idea: X ® A is OK if X is a (super)key X ® A is not OK otherwise

Example Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle 206-555-6543 Joe 987-65-4321 908-555-2121 Westfield 908-555-1234 SSN  Name, City What the key? {SSN, PhoneNumber} Hence SSN  Name, City is a “bad” dependency

Key or Keys ? Can we have more than one key ? Given R(A,B,C) define FD’s s.t. there are two or more keys

Key or Keys ? Can we have more than one key ? Given R(A,B,C) define FD’s s.t. there are two or more keys ABC BCA ABC BAC or what are the keys here ? Can you design FDs such that there are three keys ?

Boyce-Codd Normal Form A simple condition for removing anomalies from relations: A relation R is in BCNF if: If A1, ..., An  B is a non-trivial dependency in R , then {A1, ..., An} is a superkey for R In other words: there are no “bad” FDs Equivalently: " X, either (X+ = X) or (X+ = all attributes)

BCNF Decomposition Algorithm repeat choose A1, …, Am  B1, …, Bn that violates BNCF split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others]) continue with both R1 and R2 until no more violations Is there a 2-attribute relation that is not in BCNF ? B’s A’s Others In practice, we have a better algorithm (coming up) R1 R2

Example Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle 206-555-6543 Joe 987-65-4321 908-555-2121 Westfield 908-555-1234 SSN  Name, City What the key? {SSN, PhoneNumber} use SSN  Name, City to split

Example SSN  Name, City Let’s check anomalies: Redundancy ? Update ? Fred 123-45-6789 Seattle Joe 987-65-4321 Westfield SSN  Name, City Let’s check anomalies: Redundancy ? Update ? Delete ? SSN PhoneNumber 123-45-6789 206-555-1234 206-555-6543 987-65-4321 908-555-2121 908-555-1234

Example Decomposition Person(name, SSN, age, hairColor, phoneNumber) SSN  name, age age  hairColor Decompose in BCNF (in class):

BCNF Decomposition Algorithm BCNF_Decompose(R) find X s.t.: X ≠X+ ≠ [all attributes] if (not found) then “R is in BCNF” let Y = X+ - X let Z = [all attributes] - X+ decompose R into R1(X  Y) and R2(X  Z) continue to decompose recursively R1 and R2

Example BCNF Decomposition Find X s.t.: X ≠X+ ≠ [all attributes] Example BCNF Decomposition Person(name, SSN, age, hairColor, phoneNumber) SSN  name, age age  hairColor Iteration 1: Person SSN+ = SSN, name, age, hairColor Decompose into: P(SSN, name, age, hairColor) Phone(SSN, phoneNumber) Iteration 2: P age+ = age, hairColor Decompose: People(SSN, name, age) Hair(age, hairColor) Phone(SSN, phoneNumber) What are the keys ?

Example What happens if in R we first pick B+ ? Or AB+ ? R(A,B,C,D) A  B B  C Example R(A,B,C,D) A+ = ABC ≠ ABCD R1(A,B,C) B+ = BC ≠ ABC R2(A,D) R11(B,C) R12(A,B) What are the keys ? What happens if in R we first pick B+ ? Or AB+ ?

Decompositions in General R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp) R1 = projection of R on A1, ..., An, B1, ..., Bm R2 = projection of R on A1, ..., An, C1, ..., Cp

Theory of Decomposition Sometimes it is correct: Name Price Category Gizmo 19.99 Gadget OneClick 24.99 Camera Name Price Gizmo 19.99 OneClick 24.99 Name Category Gizmo Gadget OneClick Camera Lossless decomposition

Incorrect Decomposition Sometimes it is not: Name Price Category Gizmo 19.99 Gadget OneClick 24.99 Camera What’s incorrect ?? Name Category Gizmo Gadget OneClick Camera Price Category 19.99 Gadget 24.99 Camera Lossy decomposition

Decompositions in General R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp) If A1, ..., An  B1, ..., Bm Then the decomposition is lossless Note: don’t need A1, ..., An  C1, ..., Cp BCNF decomposition is always lossless. WHY ?

3NF: A Problem with BCNF Unit Company Product Unit  Company Company, Product  Unit Unit+ = Unit, Company Unit Company Unit Product Unit  Company We loose the FD: Company, Product  Unit !!

So What’s the Problem? Galaga99 UW Bingo Galaga99 Databases Bingo Unit Company Galaga99 UW Bingo Unit Product Galaga99 Databases Bingo Unit  Company No problem so far. All local FD’s are satisfied. Let’s put all the data back into a single table again: Unit Company Product Galaga99 UW Databases Bingo Company, Product  Unit Violates the FD:

The Problem We started with a table R and FD We decomposed R into BCNF tables R1, R2, … with their own FD1, FD2, … We can reconstruct R from R1, R2, … But we cannot reconstruct FD from FD1, FD2, …

Solution: 3rd Normal Form (3NF) A simple condition for removing anomalies from relations: A relation R is in 3rd normal form if : Whenever there is a nontrivial dependency A1, A2, ..., An  B for R , then {A1, A2, ..., An } a super-key for R, or B is part of a key. Tradeoff: BCNF = no anomalies, but may lose some FDs 3NF = keeps all FDs, but may have some anomalies

3NF Decomposition Algorithm 3NF_Decompose(R) let K = [all attributes that are part of some key] find X s.t.: X+ - X - K ≠  and X+ ≠ [all attributes] if (not found) then “R is already in 3NF” let Y = X+ - X - K let Z = [all attributes] - (X  Y) decompose into R1(X  Y) and R2(X  Z) decompose, recursively, R1 and R2

Example of 3NF decomposition R(A,B,C,D,E): Keys: (need to compute X+, for several Xs) AB, AC, AD AB  C C  D D  B D  E K = {A, B, C, D} Pick X = C C+ = BCDE C  BDE is a BCNF violation For 3NF: remove B, D (part of K): C  E is a 3NF violation Decompose: R1(C, E), R2(A,B,C,D) R1 is in 3NF R2 is in 3NF (because its keys: AB, AC, AD)

3NF v.s. BCNF Decomposition A B C D E F G H K A B C D E E F G H K 3NF A B C C D E E F G G H K BCNF A B A B A B A B A B A B A B A B

FD’s for E/R Diagrams Given a relation constructed from an E/R diagram, what is its key? Rule 1: If the relation comes from an entity set, the key of the relation is the set of attributes which is the key of the entity set. address name ssn Person Person(address, name, ssn)

FD’s for E/R Diagrams buys(name, ssn, date) Rule 2: If the relation comes from a many-many relationship, the key of the relation is the set of all attribute keys in the relations corresponding to the entity sets name buys Person Product price name ssn date buys(name, ssn, date)

FD’s for E/R Diagrams Purchase(name , sname, ssn, card-no) Except: if there is an arrow from the relationship to E, then we don’t need the key of E as part of the relation key. Purchase Product Person Store CreditCard sname name card-no ssn Purchase(name , sname, ssn, card-no)

FD’s for E/R Diagrams More rules: Many-one, one-many, one-one relationships Multi-way relationships Weak entity sets (Try to find them yourself, or check book)

Incomplete (what does it say ?) FD’s for E/R Diagrams Say: “the CreditCard determines the Person” Product sname Purchase name Store Incomplete (what does it say ?) card-no CreditCard Person ssn Purchase(name , sname, ssn, card-no) card-no  ssn