1 Introduction to Database Systems CSE 444 Lectures 8 & 9 Database Design April 16 & 18, 2008.

Slides:



Advertisements
Similar presentations
Schema Refinement: Normal Forms
Advertisements

1 Design Theory. 2 Minimal Sets of Dependancies A set of dependencies is minimal if: 1.Every right side is a single attribute 2.For no X  A in F and.
Lecture 6: Design Constraints and Functional Dependencies January 21st, 2004.
Database Systems Lecture #5 Yan Pan School of Software, SYSU 2011.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.
1 Lecture 02: Conceptual Design Wednesday, October 6, 2010 Dan Suciu -- CSEP544 Fall 2010.
Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science San Jose State University.
Normalization DB Tuning CS186 Final Review Session.
1 Lecture 10: Database Design XML Wednesday, October 20, 2004.
Functional Dependencies Definition: If two tuples agree on the attributes A, A, … A 12n then they must also agree on the attributes B, B, … B 12m Formally:
Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000.
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science San Jose State University.
1 Lecture 06 The Relational Data Model. 2 Outline Relational Data Model Functional Dependencies FDs in ER Logical Schema Design Reading Chapter 8.
M.P. Johnson, DBMS, Stern/NYU, Spring C : Database Management Systems Lecture #7 Matthew P. Johnson Stern School of Business, NYU Spring,
M.P. Johnson, DBMS, Stern/NYU, Spring C : Database Management Systems Lecture #4 M.P. Johnson Stern School of Business, NYU Spring, 2008.
M.P. Johnson, DBMS, Stern/NYU, Sp20041 C : Database Management Systems Lecture #5 Matthew P. Johnson Stern School of Business, NYU Spring, 2004.
Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee.
M.P. Johnson, DBMS, Stern/NYU, Sp20041 C : Database Management Systems Lecture #6 Matthew P. Johnson Stern School of Business, NYU Spring, 2004.
Functional Dependencies and Relational Schema Design.
1 Design Theory for Relational Databases Functional Dependencies Decompositions Normal Forms.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
E/R Diagrams and Functional Dependencies. Modeling Subclasses The world is inherently hierarchical. Some entities are special cases of others We need.
Lecture 08: E/R Diagrams and Functional Dependencies.
1 Schema Design & Refinement (aka Normalization).
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Lecture 09: Functional Dependencies. Outline Functional dependencies (3.4) Rules about FDs (3.5) Design of a Relational schema (3.6)
Functional Dependencies An example: loan-info= Observe: tuples with the same value for lno will always have the same value for amt We write: lno  amt.
Functional Dependencies. FarkasCSCE 5202 Reading and Exercises Database Systems- The Complete Book: Chapter 3.1, 3.2, 3.3., 3.4 Following lecture slides.
Functional Dependencies. Outline Functional dependencies (3.4) Rules about FDs (3.5) Design of a Relational schema (3.6)
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
1 Lecture 7: Normal Forms, Relational Algebra Monday, 10/15/2001.
Tallahassee, Florida, 2015 COP4710 Database Systems Relational Design Fall 2015.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
1 Database Systems Lecture #4 Yan Pan School of Software, SYSU 2011.
Lectures 5 & 7: Design Theory Lectures 5 & 7. Announcements Homework #1 due today! Homework was not easy You learned a new, declarative way of programming!
1 Lecture 10: Database Design Wednesday, January 26, 2005.
Functional Dependencies and Relational Schema Design.
Lecture 3: Conceptual Database Design and Schema Design April 12 th, 2004.
1 Lecture 10: Database Design and Relational Algebra Monday, October 20, 2003.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
Functional Dependencies CIS 4301 Lecture Notes Lecture 8 - 2/7/2006.
1 Lecture 08: E/R Diagrams and Functional Dependencies Friday, January 21, 2005.
CS542 1 Schema Refinement Chapter 19 (part 1) Functional Dependencies.
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
COMP 430 Intro. to Database Systems Normal Forms Slides use ideas from Chris Ré.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
1 Lecture 9: Database Design Wednesday, January 25, 2006.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Formal definition of a key A key is a set of attributes A 1,..., A n such that for any other attribute B: A 1,..., A n  B A minimal key is a set of attributes.
Lecture 11: Functional Dependencies
CS422 Principles of Database Systems Normalization
COP4710 Database Systems Relational Design.
Problems in Designing Schema
Lecture 6: Design Theory
Cse 344 May 16th – Normalization.
Functional Dependencies and Relational Schema Design
Introduction to Database Systems CSE 444 Lectures 8 & 9 Database Design October 12 & 15, 2007.
Lecture 09: Functional Dependencies, Database Design
Lecture 8: Database Design
Lectures 12: Design Theory I
Lecture 07: E/R Diagrams and Functional Dependencies
CSE544 Data Modeling, Conceptual Design
Lecture 08: E/R Diagrams and Functional Dependencies
Lecture 6: Functional Dependencies
Lecture 11: Functional Dependencies
Lecture 09: Functional Dependencies
Presentation transcript:

1 Introduction to Database Systems CSE 444 Lectures 8 & 9 Database Design April 16 & 18, 2008

2 Outline The relational data model: 3.1 Functional dependencies: 3.4

3 Schema Refinements = Normal Forms 1st Normal Form = all tables are flat 2nd Normal Form = obsolete Boyce Codd Normal Form = will study 3rd Normal Form = see book

4 First Normal Form (1NF) A database schema is in First Normal Form if all tables are flat NameGPACourses Alice3.8 Bob3.7 Carol3.9 Math DB OS DB OS Math OS Student NameGPA Alice3.8 Bob3.7 Carol3.9 Student Course Math DB OS StudentCourse AliceMath CarolMath AliceDB BobDB AliceOS CarolOS Takes Course May need to add keys

5 Relational Schema Design Person buys Product name pricenamessn Conceptual Model: Relational Model: plus FD’s Normalization: Eliminates anomalies

6 Data Anomalies When a database is poorly designed we get anomalies: Redundancy: data is repeated Update anomalies: need to change in several places Delete anomalies: may lose data when we don’t want

7 Relational Schema Design Anomalies: Redundancy = repeated data Update anomalies = Fred moves to “Bellevue” Deletion anomalies = Joe deletes his phone number: what is his city ? Recall set attributes (persons with several phones): NameSSNPhoneNumberCity Fred Seattle Fred Seattle Joe Westfield One person may have multiple phones, but lives in only one city

8 Relation Decomposition Break the relation into two: NameSSNCity Fred Seattle Joe Westfield SSNPhoneNumber Anomalies are gone: No more repeated data Easy to move Fred to “Bellevue” (how?) Easy to delete all Joe’s phone numbers (how?) NameSSNPhoneNumberCity Fred Seattle Fred Seattle Joe Westfield

9 Relational Schema Design (or Logical Design) Main idea: Start with some relational schema Find out its functional dependencies Use them to design a better relational schema

10 Functional Dependencies A form of constraint –hence, part of the schema Finding them is part of the database design Also used in normalizing the relations

11 Functional Dependencies Definition: If two tuples agree on the attributes then they must also agree on the attributes Formally: A 1, A 2, …, A n  B 1, B 2, …, B m A 1, A 2, …, A n B 1, B 2, …, B m

12 When Does an FD Hold Definition: A 1,..., A m  B 1,..., B n holds in R if:  t, t’  R, (t.A 1 =t’.A 1 ...  t.A m =t’.A m  t.B 1 =t’.B 1 ...  t.B n =t’.B n ) A1A1...AmAm B1B1 BmBm if t, t’ agree here then t, t’ agree here t t’ R

13 Examples EmpID  Name, Phone, Position Position  Phone but not Phone  Position An FD holds, or does not hold on an instance: EmpIDNamePhonePosition E0045Smith1234Clerk E3542Mike9876Salesrep E1111Smith9876Salesrep E9999Mary1234Lawyer

14 Example Position  Phone EmpIDNamePhonePosition E0045Smith1234Clerk E3542Mike 9876  Salesrep E1111Smith 9876  Salesrep E9999Mary1234Lawyer

15 Example EmpIDNamePhonePosition E0045Smith 1234  Clerk E3542Mike9876Salesrep E1111Smith9876Salesrep E9999Mary 1234  Lawyer but not Phone  Position

16 Example FD’s are constraints: On some instances they hold On others they don’t namecategorycolordepartmentprice GizmoGadgetGreenToys49 TweakerGadgetGreenToys99 Does this instance satisfy all the FDs ? name  color category  department color, category  price name  color category  department color, category  price

17 Example namecategorycolordepartmentprice GizmoGadgetGreenToys49 TweakerGadgetBlackToys99 GizmoStationaryGreenOffice-supp.59 What about this one ? name  color category  department color, category  price name  color category  department color, category  price

18 An Interesting Observation If all these FDs are true: name  color category  department color, category  price name  color category  department color, category  price Then this FD also holds: name, category  price Why ??

19 Goal: Find ALL Functional Dependencies Anomalies occur when certain “bad” FDs hold We know some of the FDs Need to find all FDs, then look for the bad ones

20 Armstrong’s Rules (1/3) Is equivalent to Splitting rule and Combing rule A1...AmB1...Bm A 1, A 2, …, A n  B 1, B 2, …, B m A 1, A 2, …, A n  B 1 A 1, A 2, …, A n  B A 1, A 2, …, A n  B m A 1, A 2, …, A n  B 1 A 1, A 2, …, A n  B A 1, A 2, …, A n  B m

21 Armstrong’s Rules (1/3) Trivial Rule Why ? A1A1 …AmAm where i = 1, 2,..., n A 1, A 2, …, A n  A i

22 Armstrong’s Rules (1/3) Transitive Closure Rule If and then Why ? A 1, A 2, …, A n  B 1, B 2, …, B m B 1, B 2, …, B m  C 1, C 2, …, C p A 1, A 2, …, A n  C 1, C 2, …, C p

23 A1A1 …AmAm B1B1 …BmBm C1C1...CpCp

24 Example (continued) Start from the following FDs: Infer the following FDs: 1. name  color 2. category  department 3. color, category  price 1. name  color 2. category  department 3. color, category  price Inferred FD Which Rule did we apply ? 4. name, category  name 5. name, category  color 6. name, category  category 7. name, category  color, category 8. name, category  price

25 Example (continued) Answers: Inferred FD Which Rule did we apply ? 4. name, category  name Trivial rule 5. name, category  color Transitivity on 4, 1 6. name, category  category Trivial rule 7. name, category  color, category Split/combine on 5, 6 8. name, category  price Transitivity on 3, 7 1. name  color 2. category  department 3. color, category  price 1. name  color 2. category  department 3. color, category  price THIS IS TOO HARD ! Let’s see an easier way.

26 Closure of a set of Attributes Given a set of attributes A 1, …, A n The closure, {A 1, …, A n } + = the set of attributes B s.t. A 1, …, A n  B Given a set of attributes A 1, …, A n The closure, {A 1, …, A n } + = the set of attributes B s.t. A 1, …, A n  B name  color category  department color, category  price name  color category  department color, category  price Example: Closures: name + = {name, color} {name, category} + = {name, category, color, department, price} color + = {color}

27 Closure Algorithm X={A1, …, An}. Repeat until X doesn’t change do: if B 1, …, B n  C is a FD and B 1, …, B n are all in X then add C to X. X={A1, …, An}. Repeat until X doesn’t change do: if B 1, …, B n  C is a FD and B 1, …, B n are all in X then add C to X. {name, category} + = { } name  color category  department color, category  price name  color category  department color, category  price Example: name, category, color, department, price Hence: name, category  color, department, price

28 Example Compute {A,B} + X = {A, B, } Compute {A, F} + X = {A, F, } R(A,B,C,D,E,F) A, B  C A, D  E B  D A, F  B A, B  C A, D  E B  D A, F  B In class:

29 Why Do We Need Closure With closure we can find all FD’s easily To check if X  A –Compute X + –Check if A  X +

30 Using Closure to Infer ALL FDs A, B  C A, D  B B  D A, B  C A, D  B B  D Example: Step 1: Compute X +, for every X: A+ = A, B+ = BD, C+ = C, D+ = D AB+ =ABCD, AC+=AC, AD+=ABCD, BC+=BCD, BD+=BD, CD+=CD ABC+ = ABD+ = ACD + = ABCD (no need to compute– why ?) BCD + = BCD, ABCD+ = ABCD A+ = A, B+ = BD, C+ = C, D+ = D AB+ =ABCD, AC+=AC, AD+=ABCD, BC+=BCD, BD+=BD, CD+=CD ABC+ = ABD+ = ACD + = ABCD (no need to compute– why ?) BCD + = BCD, ABCD+ = ABCD Step 2: Enumerate all FD’s X  Y, s.t. Y  X + and X  Y =  : AB  CD, AD  BC, ABC  D, ABD  C, ACD  B

31 Another Example Enrollment(student, major, course, room, time) student  major major, course  room course  time What else can we infer ? [in class, or at home]

32 Keys A superkey is a set of attributes A 1,..., A n s.t. for any other attribute B, we have A 1,..., A n  B A key is a minimal superkey –i.e. set of attributes which is a superkey and for which no subset is a superkey

33 Computing (Super)Keys Compute X + for all sets X If X + = all attributes, then X is a key List only the minimal X’s

34 Example Product(name, price, category, color) name, category  price category  color name, category  price category  color What is the key ?

35 Example Product(name, price, category, color) name, category  price category  color name, category  price category  color What is the key ? (name, category) + = name, category, price, color Hence (name, category) is a key

36 Examples of Keys Enrollment(student, address, course, room, time) student  address room, time  course student, course  room, time student  address room, time  course student, course  room, time (find keys at home)

37 Eliminating Anomalies Main idea: X  A is OK if X is a (super)key X  A is not OK otherwise

38 Example What the key? {SSN, PhoneNumber} NameSSNPhoneNumberCity Fred Seattle Fred Seattle Joe Westfield Joe Westfield SSN  Name, City Hence SSN  Name, City is a “bad” dependency

39 Key or Keys ? Can we have more than one key ? Given R(A,B,C) define FD’s s.t. there are two or more keys

40 Key or Keys ? Can we have more than one key ? Given R(A,B,C) define FD’s s.t. there are two or more keys AB  C BC  A A  BC B  AC or what are the keys here ? Can you design FDs such that there are three keys ?

41 Boyce-Codd Normal Form A simple condition for removing anomalies from relations: In other words: there are no “bad” FDs A relation R is in BCNF if: If A 1,..., A n  B is a non-trivial dependency in R, then {A 1,..., A n } is a superkey for R A relation R is in BCNF if: If A 1,..., A n  B is a non-trivial dependency in R, then {A 1,..., A n } is a superkey for R Equivalently:  X, either (X + = X) or (X + = all attributes)

42 BCNF Decomposition Algorithm A’s Others B’s R1R1 Is there a 2-attribute relation that is not in BCNF ? repeat choose A 1, …, A m  B 1, …, B n that violates BNCF split R into R 1 (A 1, …, A m, B 1, …, B n ) and R 2 (A 1, …, A m, [others]) continue with both R 1 and R 2 until no more violations R2R2 In practice, we have a better algorithm (coming up)

43 Example What the key? {SSN, PhoneNumber} NameSSNPhoneNumberCity Fred Seattle Fred Seattle Joe Westfield Joe Westfield SSN  Name, City use SSN  Name, City to split

44 Example NameSSNCity Fred Seattle Joe Westfield SSNPhoneNumber SSN  Name, City Let’s check anomalies: Redundancy ? Update ? Delete ?

45 Example Decomposition Person(name, SSN, age, hairColor, phoneNumber) SSN  name, age age  hairColor Decompose in BCNF (in class):

46 BCNF Decomposition Algorithm BCNF_Decompose(R) find X s.t.: X ≠X + ≠ [all attributes] if (not found) then “R is in BCNF” let Y = X + - X let Z = [all attributes] - X + decompose R into R1(X  Y) and R2(X  Z) continue to decompose recursively R1 and R2 BCNF_Decompose(R) find X s.t.: X ≠X + ≠ [all attributes] if (not found) then “R is in BCNF” let Y = X + - X let Z = [all attributes] - X + decompose R into R1(X  Y) and R2(X  Z) continue to decompose recursively R1 and R2

47 Example BCNF Decomposition Person(name, SSN, age, hairColor, phoneNumber) SSN  name, age age  hairColor Iteration 1: Person SSN+ = SSN, name, age, hairColor Decompose into: P(SSN, name, age, hairColor) Phone(SSN, phoneNumber) Iteration 2: P age+ = age, hairColor Decompose: People(SSN, name, age) Hair(age, hairColor) Phone(SSN, phoneNumber) Iteration 1: Person SSN+ = SSN, name, age, hairColor Decompose into: P(SSN, name, age, hairColor) Phone(SSN, phoneNumber) Iteration 2: P age+ = age, hairColor Decompose: People(SSN, name, age) Hair(age, hairColor) Phone(SSN, phoneNumber) Find X s.t.: X ≠X + ≠ [all attributes] What are the keys ?

48 Example What are the keys ? A  B B  C R(A,B,C,D) A + = ABC ≠ ABCD R(A,B,C,D) What happens if in R we first pick B + ? Or AB + ? R 1 (A,B,C) B + = BC ≠ ABC R 2 (A,D) R 11 (B,C) R 12 (A,B)

49 Decompositions in General R 1 = projection of R on A 1,..., A n, B 1,..., B m R 2 = projection of R on A 1,..., A n, C 1,..., C p R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p )

50 Theory of Decomposition Sometimes it is correct: NamePriceCategory Gizmo19.99Gadget OneClick24.99Camera Gizmo19.99Camera NamePrice Gizmo19.99 OneClick24.99 Gizmo19.99 NameCategory GizmoGadget OneClickCamera GizmoCamera Lossless decomposition

51 Incorrect Decomposition Sometimes it is not: NamePriceCategory Gizmo19.99Gadget OneClick24.99Camera Gizmo19.99Camera NameCategory GizmoGadget OneClickCamera GizmoCamera PriceCategory 19.99Gadget 24.99Camera 19.99Camera What’s incorrect ?? Lossy decomposition

52 Decompositions in General R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) If A 1,..., A n  B 1,..., B m Then the decomposition is lossless R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p ) BCNF decomposition is always lossless. WHY ? Note: don’t need A 1,..., A n  C 1,..., C p