1 Lecture 9: Database Design Wednesday, January 25, 2006.

Slides:



Advertisements
Similar presentations
Schema Refinement: Normal Forms
Advertisements

1 Design Theory. 2 Minimal Sets of Dependancies A set of dependencies is minimal if: 1.Every right side is a single attribute 2.For no X  A in F and.
Lecture 6: Design Constraints and Functional Dependencies January 21st, 2004.
Murali Mani Normalization. Murali Mani What and Why Normalization? To remove potential redundancy in design Redundancy causes several anomalies: insert,
Database Systems Lecture #5 Yan Pan School of Software, SYSU 2011.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.
1 Lecture 02: Conceptual Design Wednesday, October 6, 2010 Dan Suciu -- CSEP544 Fall 2010.
Classroom Exercise: Normalization
Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science San Jose State University.
1 Lecture 10: Database Design XML Wednesday, October 20, 2004.
Functional Dependencies Definition: If two tuples agree on the attributes A, A, … A 12n then they must also agree on the attributes B, B, … B 12m Formally:
Nov 11, 2003Murali Mani Normalization B term 2004: lecture 7, 8, 9.
1 Introduction to Database Systems CSE 444 Lectures 8 & 9 Database Design April 16 & 18, 2008.
Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000.
Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science San Jose State University.
1 Lecture 06 The Relational Data Model. 2 Outline Relational Data Model Functional Dependencies FDs in ER Logical Schema Design Reading Chapter 8.
M.P. Johnson, DBMS, Stern/NYU, Spring C : Database Management Systems Lecture #7 Matthew P. Johnson Stern School of Business, NYU Spring,
M.P. Johnson, DBMS, Stern/NYU, Spring C : Database Management Systems Lecture #5 M.P. Johnson Stern School of Business, NYU Spring, 2008.
Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee.
M.P. Johnson, DBMS, Stern/NYU, Sp20041 C : Database Management Systems Lecture #6 Matthew P. Johnson Stern School of Business, NYU Spring, 2004.
Fall 2001Arthur Keller – CS 1804–1 Schedule Today Oct. 4 (TH) Functional Dependencies and Normalization. u Read Sections Project Part 1 due. Oct.
Relation Decomposition A, A, … A 12n Given a relation R with attributes Create two relations R1 and R2 with attributes B, B, … B 12m C, C, … C 12l Such.
Functional Dependencies and Relational Schema Design.
1 Design Theory for Relational Databases Functional Dependencies Decompositions Normal Forms.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
E/R Diagrams and Functional Dependencies. Modeling Subclasses The world is inherently hierarchical. Some entities are special cases of others We need.
Lecture 08: E/R Diagrams and Functional Dependencies.
1 Schema Design & Refinement (aka Normalization).
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Lecture 09: Functional Dependencies. Outline Functional dependencies (3.4) Rules about FDs (3.5) Design of a Relational schema (3.6)
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
THIRD NORMAL FORM (3NF) A relation R is in BCNF if whenever a FD XA holds in R, one of the following statements is true: XA is a trivial FD, or X is.
Functional Dependencies. FarkasCSCE 5202 Reading and Exercises Database Systems- The Complete Book: Chapter 3.1, 3.2, 3.3., 3.4 Following lecture slides.
Functional Dependencies. Outline Functional dependencies (3.4) Rules about FDs (3.5) Design of a Relational schema (3.6)
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
1 Lecture 7: Normal Forms, Relational Algebra Monday, 10/15/2001.
Tallahassee, Florida, 2015 COP4710 Database Systems Relational Design Fall 2015.
Lectures 5 & 7: Design Theory Lectures 5 & 7. Announcements Homework #1 due today! Homework was not easy You learned a new, declarative way of programming!
Section 3 CSE 444 Introduction to Databases. Announcements Project 1 was due yesterday (10/14/2009) Homework 1 was released, due 10/28/2009.
1 Lecture 10: Database Design Wednesday, January 26, 2005.
Functional Dependencies and Relational Schema Design.
Lecture 3: Conceptual Database Design and Schema Design April 12 th, 2004.
Lecture 13: Relational Decomposition and Relational Algebra February 5 th, 2003.
1 Lecture 10: Database Design and Relational Algebra Monday, October 20, 2003.
1 Lecture 08: E/R Diagrams and Functional Dependencies Friday, January 21, 2005.
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
COMP 430 Intro. to Database Systems Normal Forms Slides use ideas from Chris Ré.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Formal definition of a key A key is a set of attributes A 1,..., A n such that for any other attribute B: A 1,..., A n  B A minimal key is a set of attributes.
Lecture 11: Functional Dependencies
CS422 Principles of Database Systems Normalization
Normalization First Normal Form (1NF) Boyce-Codd Normal Form (BCNF)
CS422 Principles of Database Systems Normalization
Problems in Designing Schema
Lecture 6: Design Theory
Cse 344 May 16th – Normalization.
Functional Dependencies and Relational Schema Design
Introduction to Database Systems CSE 444 Lectures 8 & 9 Database Design October 12 & 15, 2007.
Lecture 09: Functional Dependencies, Database Design
Lecture 8: Database Design
Lecture 07: E/R Diagrams and Functional Dependencies
CSE544 Data Modeling, Conceptual Design
Lecture 08: E/R Diagrams and Functional Dependencies
Lecture 6: Functional Dependencies
Lecture 11: Functional Dependencies
Lecture 09: Functional Dependencies
CS4222 Principles of Database System
Presentation transcript:

1 Lecture 9: Database Design Wednesday, January 25, 2006

2 Closure of a set of Attributes Given a set of attributes A 1, …, A n The closure, {A 1, …, A n } +, is the set of attributes B s.t. A 1, …, A n  B Given a set of attributes A 1, …, A n The closure, {A 1, …, A n } +, is the set of attributes B s.t. A 1, …, A n  B name  color category  department color, category  price name  color category  department color, category  price Example: Closures: name + = {name, color} {name, category} + = {name, category, color, department, price} color + = {color}

3 Closure Algorithm Start with X={A1, …, An}. Repeat until X doesn’t change do: if B 1, …, B n  C is a FD and B 1, …, B n are all in X then add C to X. Start with X={A1, …, An}. Repeat until X doesn’t change do: if B 1, …, B n  C is a FD and B 1, …, B n are all in X then add C to X. {name, category} + = { } name  color category  department color, category  price name  color category  department color, category  price Example: name, category, color, department, price Hence: name, category  color, department, price

4 Example Compute {A,B} + X = {A, B, } Compute {A, F} + X = {A, F, } R(A,B,C,D,E,F) A, B  C A, D  E B  D A, F  B A, B  C A, D  E B  D A, F  B In class:

5 Why Do We Need Closure With closure we can find all FD’s easily To check if X  A –Compute X + –Check if A  X +

6 Using Closure to Infer ALL FDs A, B  C A, D  B B  D A, B  C A, D  B B  D Example: Step 1: Compute X +, for every X: A+ = A, B+ = BD, C+ = C, D+ = D AB+ = ABCD, AC+ = AC, AD+ = ABCD ABC+ = ABD+ = ACD + = ABCD (no need to compute– why ?) BCD + = BCD, ABCD+ = ABCD A+ = A, B+ = BD, C+ = C, D+ = D AB+ = ABCD, AC+ = AC, AD+ = ABCD ABC+ = ABD+ = ACD + = ABCD (no need to compute– why ?) BCD + = BCD, ABCD+ = ABCD Step 2: Enumerate all FD’s X  Y, s.t. Y  X + and X  Y =  : AB  CD, AD  BC, ABC  D, ABD  C, ACD  B

7 Another Example Enrollment(student, major, course, room, time) student  major major, course  room course  time What else can we infer ? [in class, or at home]

8 Back to Conceptual Design Now we know how to find more FDs, it’s easy Search for “bad” FDs If there are such, then decompose the table into two tables, repeat for the subtables. When done, the database schema is normalized Unfortunately, there are several normal forms…

9 Normal Forms First Normal Form = all attributes are atomic Second Normal Form (2NF) = old and obsolete Third Normal Form (3NF) = will discuss Boyce Codd Normal Form (BCNF) = will discuss Others...

10 Keys A superkey is a set of attributes A 1,..., A n s.t. for any other attribute B, we have A 1,..., A n  B A key is a minimal superkey –I.e. set of attributes which is a superkey and for which no subset is a superkey

11 Computing (Super)Keys Compute X + for all sets X If X + = all attributes, then X is a key List only the minimal X’s

12 Example Product(name, price, category, color) name, category  price category  color name, category  price category  color What is the key ?

13 Example Product(name, price, category, color) name, category  price category  color name, category  price category  color What is the key ? (name, category) + = name, category, price, color Hence (name, category) is a key

14 Examples of Keys Enrollment(student, address, course, room, time) student  address room, time  course student, course  room, time student  address room, time  course student, course  room, time (find keys at home)

15 Eliminating Anomalies Main idea: X  A is OK if X is a (super)key X  A is not OK otherwise

16 Example What the key? {SSN, PhoneNumber} NameSSNPhoneNumberCity Fred Seattle Fred Seattle Joe Westfield Joe Westfield SSN  Name, City Hence SSN  Name, City is a “bad” dependency

17 Key or Keys ? Can we have more than one key ? Given R(A,B,C) define FD’s s.t. there are two or more keys

18 Key or Keys ? Can we have more than one key ? Given R(A,B,C) define FD’s s.t. there are two or more keys AB  C BC  A A  BC B  AC or what are the keys here ? Can you design FDs such that there are three keys ?

19 Boyce-Codd Normal Form A simple condition for removing anomalies from relations: In other words: there are no “bad” FDs A relation R is in BCNF if: If A 1,..., A n  B is a non-trivial dependency in R, then {A 1,..., A n } is a superkey for R A relation R is in BCNF if: If A 1,..., A n  B is a non-trivial dependency in R, then {A 1,..., A n } is a superkey for R Equivalently:  X, either (X + = X) or (X + = all attributes)

20 BCNF Decomposition Algorithm A’s Others B’s R1R1 Is there a 2-attribute relation that is not in BCNF ? repeat choose A 1, …, A m  B 1, …, B n that violates BNCF split R into R 1 (A 1, …, A m, B 1, …, B n ) and R 2 (A 1, …, A m, [others]) continue with both R 1 and R 2 until no more violations R2R2 In practice, we have a better algorithm (coming up)

21 Example What the key? {SSN, PhoneNumber} NameSSNPhoneNumberCity Fred Seattle Fred Seattle Joe Westfield Joe Westfield SSN  Name, City use SSN  Name, City to split

22 Example NameSSNCity Fred Seattle Joe Westfield SSNPhoneNumber SSN  Name, City Let’s check anomalies: Redundancy ? Update ? Delete ?

23 Example Decomposition Person(name, SSN, age, hairColor, phoneNumber) SSN  name, age age  hairColor Decompose in BCNF (in class):

24 BCNF Decomposition Algorithm BCNF_Decompose(R) find X s.t.: X ≠X + ≠ [all attributes] if (not found) then “R is in BCNF” let Y = X + - X let Z = [all attributes] - X + decompose R into R1(X  Y) and R2(X  Z) continue to decompose recursively R1 and R2 BCNF_Decompose(R) find X s.t.: X ≠X + ≠ [all attributes] if (not found) then “R is in BCNF” let Y = X + - X let Z = [all attributes] - X + decompose R into R1(X  Y) and R2(X  Z) continue to decompose recursively R1 and R2

25 Example BCNF Decomposition Person(name, SSN, age, hairColor, phoneNumber) SSN  name, age age  hairColor Iteration 1: Person SSN+ = SSN, name, age, hairColor Decompose into: P(SSN, name, age, hairColor) Phone(SSN, phoneNumber) Iteration 2: P age+ = age, hairColor Decompose: People(SSN, name, age) Hair(age, hairColor) Phone(SSN, phoneNumber) Iteration 1: Person SSN+ = SSN, name, age, hairColor Decompose into: P(SSN, name, age, hairColor) Phone(SSN, phoneNumber) Iteration 2: P age+ = age, hairColor Decompose: People(SSN, name, age) Hair(age, hairColor) Phone(SSN, phoneNumber) Find X s.t.: X ≠X + ≠ [all attributes] What are the keys ?

26 Example What are the keys ? A  B B  C R(A,B,C,D) A + = ABC ≠ ABCD R(A,B,C,D) What happens if in R we first pick B + ? Or AB + ? R 1 (A,B,C) B + = BC ≠ ABC R 2 (A,D) R 11 (B,C) R 12 (A,B)

27 Decompositions in General R 1 = projection of R on A 1,..., A n, B 1,..., B m R 2 = projection of R on A 1,..., A n, C 1,..., C p R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p )

28 Theory of Decomposition Sometimes it is correct: NamePriceCategory Gizmo19.99Gadget OneClick24.99Camera Gizmo19.99Camera NamePrice Gizmo19.99 OneClick24.99 Gizmo19.99 NameCategory GizmoGadget OneClickCamera GizmoCamera Lossless decomposition

29 Incorrect Decomposition Sometimes it is not: NamePriceCategory Gizmo19.99Gadget OneClick24.99Camera Gizmo19.99Camera NameCategory GizmoGadget OneClickCamera GizmoCamera PriceCategory 19.99Gadget 24.99Camera 19.99Camera What’s incorrect ?? Lossy decomposition

30 Decompositions in General R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) If A 1,..., A n  B 1,..., B m Then the decomposition is lossless R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p ) BCNF decomposition is always lossless. WHY ? Note: don’t need A 1,..., A n  C 1,..., C p

31 3NF: A Problem with BCNF Unit  Company Company, Product  Unit Unit + = Unit, Company We loose the FD: Company, Product  Unit !! UnitCompanyProduct UnitCompanyUnitProduct Unit  Company

32 So What’s the Problem? No problem so far. All local FD’s are satisfied. Let’s put all the data back into a single table again: UnitCompany Galaga99UW BingoUW UnitProduct Galaga99Databases BingoDatabases UnitCompanyProduct Galaga99UWDatabases BingoUWDatabases Unit  Company Company, Product  Unit Violates the FD:

33 The Problem We started with a table R and FD We decomposed R into BCNF tables R 1, R 2, … with their own FD 1, FD 2, … We can reconstruct R from R 1, R 2, … But we cannot reconstruct FD from FD 1, FD 2, …

34 Solution: 3rd Normal Form (3NF) A simple condition for removing anomalies from relations: A relation R is in 3rd normal form if : Whenever there is a nontrivial dependency A 1, A 2,..., A n  B for R, then {A 1, A 2,..., A n } a super-key for R, or B is part of a key. A relation R is in 3rd normal form if : Whenever there is a nontrivial dependency A 1, A 2,..., A n  B for R, then {A 1, A 2,..., A n } a super-key for R, or B is part of a key. Tradeoff: BCNF = no anomalies, but may lose some FDs 3NF = keeps all FDs, but may have some anomalies

35 3NF Decomposition Algorithm 3NF_Decompose(R) let K = [all attributes that are part of some key] find X s.t.: X + - X - K ≠  and X + ≠ [all attributes] if (not found) then “R is already in 3NF” let Y = X + - X - K let Z = [all attributes] - (X  Y) decompose into R1(X  Y) and R2(X  Z) decompose, recursively, R1 and R2 3NF_Decompose(R) let K = [all attributes that are part of some key] find X s.t.: X + - X - K ≠  and X + ≠ [all attributes] if (not found) then “R is already in 3NF” let Y = X + - X - K let Z = [all attributes] - (X  Y) decompose into R1(X  Y) and R2(X  Z) decompose, recursively, R1 and R2

36 Example of 3NF decomposition R(A,B,C,D,E): AB  C C  D D  B D  E AB  C C  D D  B D  E Keys: (need to compute X+, for several Xs) AB, AC, AD K = {A, B, C, D} Pick X = C C+ = BCDE C  BDE is a BCNF violation R1 is in 3NF R2 is in 3NF (because its keys: AB, AC, AD) For 3NF: remove B, D (part of K): C  E is a 3NF violation Decompose: R1(C, E), R2(A,B,C,D)

37 BCNF 3NF v.s. BCNF Decomposition ABCDEFGHK ABCDEEFGHK EFGGHK ABCCDE ABABABABABABAB AB 3NF

38 FD’s for E/R Diagrams Given a relation constructed from an E/R diagram, what is its key? Rule 1: If the relation comes from an entity set, the key of the relation is the set of attributes which is the key of the entity set. address namessn Person Person(address, name, ssn)

39 FD’s for E/R Diagrams Person buys Product name pricenamessn buys(name, ssn, date) date Rule 2: If the relation comes from a many-many relationship, the key of the relation is the set of all attribute keys in the relations corresponding to the entity sets

40 FD’s for E/R Diagrams Except: if there is an arrow from the relationship to E, then we don’t need the key of E as part of the relation key. Purchase Product Person Store CreditCard name card-no ssn sname Purchase(name, sname, ssn, card-no)

41 FD’s for E/R Diagrams More rules: Many-one, one-many, one-one relationships Multi-way relationships Weak entity sets (Try to find them yourself, or check book)

42 FD’s for E/R Diagrams Say: “the CreditCard determines the Person” Purchase Product Person Store CreditCard name card-no ssn sname Purchase(name, sname, ssn, card-no) Incomplete (what does it say ?) card-no  ssn