1 Lecture 02: Conceptual Design Wednesday, October 6, 2010 Dan Suciu -- CSEP544 Fall 2010.

Slides:



Advertisements
Similar presentations
Information Systems Chapter 4 Relational Databases Modelling.
Advertisements

Lecture 6: Design Constraints and Functional Dependencies January 21st, 2004.
1 Lecture 04 Entity/Relationship Modelling. 2 Outline E/R model (Chapter 5) From E/R diagrams to relational schemas (Chapter 5) Constraints in SQL (Chapter.
Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science San Jose State University.
1 Lecture 10: Database Design XML Wednesday, October 20, 2004.
Functional Dependencies Definition: If two tuples agree on the attributes A, A, … A 12n then they must also agree on the attributes B, B, … B 12m Formally:
1 Introduction to Database Systems CSE 444 Lectures 8 & 9 Database Design April 16 & 18, 2008.
Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000.
Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science San Jose State University.
1 Lecture 06 The Relational Data Model. 2 Outline Relational Data Model Functional Dependencies FDs in ER Logical Schema Design Reading Chapter 8.
M.P. Johnson, DBMS, Stern/NYU, Spring C : Database Management Systems Lecture #7 Matthew P. Johnson Stern School of Business, NYU Spring,
1 Lecture 3: Database Modeling (continued) April 5, 2002.
M.P. Johnson, DBMS, Stern/NYU, Spring C : Database Management Systems Lecture #4 M.P. Johnson Stern School of Business, NYU Spring, 2008.
M.P. Johnson, DBMS, Stern/NYU, Sp20041 C : Database Management Systems Lecture #5 Matthew P. Johnson Stern School of Business, NYU Spring, 2004.
Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee.
M.P. Johnson, DBMS, Stern/NYU, Sp20041 C : Database Management Systems Lecture #6 Matthew P. Johnson Stern School of Business, NYU Spring, 2004.
Lecture 9: Conceptual Database Design January 27 th, 2003.
Multiplicity in E/R Diagrams
Functional Dependencies and Relational Schema Design.
1 Lecture 4: Database Modeling (end) The Relational Data Model April 8, 2002.
CS411 Database Systems Kazuhiro Minami
Database Design April 3, Projects, More Details Goal: build a DB application. (almost) anything goes. Groups of 3-4. End of week 2: groups formed.
1 Design Theory for Relational Databases Functional Dependencies Decompositions Normal Forms.
CS411 Database Systems Kazuhiro Minami 02: The Entity-Relationship Model.
E/R Diagrams and Functional Dependencies. Modeling Subclasses The world is inherently hierarchical. Some entities are special cases of others We need.
Lecture 08: E/R Diagrams and Functional Dependencies.
1 Schema Design & Refinement (aka Normalization).
Lecture 2: E/R Diagrams and the Relational Model Thursday, January 4, 2001.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Lecture 09: Functional Dependencies. Outline Functional dependencies (3.4) Rules about FDs (3.5) Design of a Relational schema (3.6)
Conceptual Database Design. Building an Application with a DBMS Requirements modeling (conceptual, pictures) –Decide what entities should be part of the.
1 Introduction to Database Systems CSE 444 Lecture 07 E/R Diagrams October 10, 2007.
Functional Dependencies. FarkasCSCE 5202 Reading and Exercises Database Systems- The Complete Book: Chapter 3.1, 3.2, 3.3., 3.4 Following lecture slides.
Lecture 2: E/R Diagrams and the Relational Model Wednesday, April 3 rd, 2002.
Functional Dependencies. Outline Functional dependencies (3.4) Rules about FDs (3.5) Design of a Relational schema (3.6)
1 Lecture 7: Normal Forms, Relational Algebra Monday, 10/15/2001.
Tallahassee, Florida, 2015 COP4710 Database Systems Relational Design Fall 2015.
1 Database Systems Lecture #4 Yan Pan School of Software, SYSU 2011.
Lectures 5 & 7: Design Theory Lectures 5 & 7. Announcements Homework #1 due today! Homework was not easy You learned a new, declarative way of programming!
1 Lecture 10: Database Design Wednesday, January 26, 2005.
Functional Dependencies and Relational Schema Design.
Lecture 3: Conceptual Database Design and Schema Design April 12 th, 2004.
1 Lecture 10: Database Design and Relational Algebra Monday, October 20, 2003.
1 Lecture 08: E/R Diagrams and Functional Dependencies Friday, January 21, 2005.
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
COMP 430 Intro. to Database Systems Normal Forms Slides use ideas from Chris Ré.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
1 Lecture 9: Database Design Wednesday, January 25, 2006.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Formal definition of a key A key is a set of attributes A 1,..., A n such that for any other attribute B: A 1,..., A n  B A minimal key is a set of attributes.
Lecture 11: Functional Dependencies
Lecture 5: Conceptual Database Design
Conceptual Database Design
Cse 344 May 14th – Entities.
Introduction to Database Systems CSE 544
Lecture 6: Design Theory
Cse 344 May 11th – Entities.
Lecture 2: Database Modeling (end) The Relational Data Model
Lecture 06 Data Modeling: E/R Diagrams
Cse 344 May 16th – Normalization.
Functional Dependencies and Relational Schema Design
Introduction to Database Systems CSE 444 Lectures 8 & 9 Database Design October 12 & 15, 2007.
Lecture 09: Functional Dependencies, Database Design
Lecture 8: Database Design
Lecture 07: E/R Diagrams and Functional Dependencies
CSE544 Data Modeling, Conceptual Design
Lecture 08: E/R Diagrams and Functional Dependencies
Lecture 6: Functional Dependencies
Lecture 09: Functional Dependencies
Presentation transcript:

1 Lecture 02: Conceptual Design Wednesday, October 6, 2010 Dan Suciu -- CSEP544 Fall 2010

Nulls count(category) != count(*) WHY ? Office hours: Thursdays, 5-6pm Dan Suciu -- CSEP544 Fall 20102

Announcements Homework 2 is posted: due October 19 th You need to create tables, import data: –On SQL Server, in your own database, OR –On postgres (we will use it for Project 2) Follow Web instructions for importing data Read book about CREATE TABLE, INSERT, DELETE, UPDATE Dan Suciu -- CSEP544 Fall 20103

Discussion SQL Databases v. NoSQL Databases, Mike Stonebraker What are “No-SQL Databases” ? What are the two main types of workloads in a database ? (X and Y) How can one improve performance of X ? Where does the time of a single server go ? What are “single-record transactions” ? 4

5 Outline E/R diagrams From E/R diagrams to relations Dan Suciu -- CSEP544 Fall 2010

6 Database Design Why do we need it? – Agree on structure of the database before deciding on a particular implementation. Consider issues such as: –What entities to model –How entities are related –What constraints exist in the domain –How to achieve good designs Several formalisms exists –We discuss E/R diagrams Dan Suciu -- CSEP544 Fall 2010

7 Entity / Relationship Diagrams Objects  entities Classes  entity sets Attributes: Relationships - first class citizens (not associated with classes) - not necessarily binary Product address buys Dan Suciu -- CSEP544 Fall 2010

Person Company Product 8

9 Person Company Product prod-IDcategory price address namessn stockprice name

10 Person Company Product buys makes employs prod-IDcategory price address namessn stockprice name

11 Keys in E/R Diagrams Every entity set must have a key May be a multi-attribute key: Product prod-IDcategory price Dan Suciu -- CSEP544 Fall 2010 Order prod-IDcust-ID date

12 What is a Relation ? A mathematical definition: –if A, B are sets, then a relation R is a subset of A  B A={1,2,3}, B={a,b,c,d}, A  B = {(1,a),(1,b),..., (3,d)} R = {(1,a), (1,c), (3,b)} - makes is a subset of Product  Company: a b c d A= B= makes CompanyProduct

13 Multiplicity of E/R Relations one-one: many-one many-many abcdabcd abcdabcd abcdabcd Dan Suciu -- CSEP544 Fall 2010

Notation in Class v.s. the Book 14 makes CompanyProduct makes CompanyProduct In class: In the book:

15 address namessn Person buys makes employs Company Product prod-IDcategory stockprice name price What does this say ?

16 Multi-way Relationships Purchase Product Person Store Dan Suciu -- CSEP544 Fall 2010 date

17 Converting Multi-way Relationships to Binary Purchase Person Store Product StoreOf ProductOf BuyerOf date Arrows are missing: which ones ?

18 3. Design Principles Purchase Product Person What’s wrong? President PersonCountry Moral: be faithful! Dan Suciu -- CSEP544 Fall 2010

19 Design Principles: What’s Wrong? Purchase Product Store date personName personAddr Moral: pick the right kind of entities.

20 Design Principles: What’s Wrong? Purchase Product Person Store date Dates Moral: don’t complicate life more than it already is.

21 From E/R Diagrams to Relational Schema Entity set  relation Relationship  relation Dan Suciu -- CSEP544 Fall 2010

Entity Set to Relation Product prod-IDcategory price Product(prod-ID, category, price) 22Dan Suciu -- CSEP544 Fall 2010 prod-IDcategoryprice Gizmo55Camera99.99 Pokemn19Toy29.99

Create Table (SQL) Dan Suciu -- CSEP544 Fall CREATE TABLE Product ( prod-ID CHAR(30) PRIMARY KEY, category VARCHAR(20), price double) CREATE TABLE Product ( prod-ID CHAR(30) PRIMARY KEY, category VARCHAR(20), price double)

Relationships to Relations 24 Orders prod-IDcust-ID date Shipment Shipping-Co address name Shipment(prod-ID,cust-ID, name, date) date Dan Suciu -- CSEP544 Fall 2010 prod-IDcust-IDnamedate Gizmo55Joe12UPS4/10/2010 Gizmo55Joe12FEDEX4/9/2010

Create Table (SQL) Dan Suciu -- CSEP544 Fall CREATE TABLE Shipment( name CHAR(30) REFERENCES Shipping-Co, prod-ID CHAR(30), cust-ID VARCHAR(20), date DATETIME, PRIMARY KEY (name, prod-ID, cust-ID), FOREIGN KEY (prod-ID, cust-ID) REFERENCES Orders ) CREATE TABLE Shipment( name CHAR(30) REFERENCES Shipping-Co, prod-ID CHAR(30), cust-ID VARCHAR(20), date DATETIME, PRIMARY KEY (name, prod-ID, cust-ID), FOREIGN KEY (prod-ID, cust-ID) REFERENCES Orders )

26 Multi-way Relationships to Relations Purchase Product Person Store prod-ID price ssnname address Dan Suciu -- CSEP544 Fall 2010 How do we represent that in a relation ?

27 Modeling Subclasses Products Software products Educational products Dan Suciu -- CSEP544 Fall 2010

28 Product namecategory price isa Educational ProductSoftware Product Age Groupplatforms Subclasses Dan Suciu -- CSEP544 Fall 2010

29 Understanding Subclasses Think in terms of records: –Product –SoftwareProduct –EducationalProduct field1 field2 field1 field2 field1 field2 field3 field4 field5

Subclasses to Relations Product namecategory price isa Educational ProductSoftware Product Age Groupplatforms NamePriceCategory Gizmo99gadget Camera49photo Toy39gadget Nameplatforms Gizmounix Name Age Group Gizmotodler Toyretired Product Sw.Product Ed.Product Dan Suciu -- CSEP544 Fall 2010

31 Modeling UnionTypes With Subclasses FurniturePiece Person Company Say: each piece of furniture is owned either by a person, or by a company Dan Suciu -- CSEP544 Fall 2010

32 Modeling Union Types with Subclasses Say: each piece of furniture is owned either by a person, or by a company Solution 1. Acceptable (What’s wrong ?) FurniturePiecePerson Company ownedByPerson Dan Suciu -- CSEP544 Fall 2010

33 Modeling Union Types with Subclasses Solution 2: More faithful isa FurniturePiece Person Company ownedBy Owner isa Dan Suciu -- CSEP544 Fall 2010

34 Constraints in E/R Diagrams Finding constraints is part of the modeling process. Commonly used constraints: Keys: social security number uniquely identifies a person. Single-value constraints: a person can have only one father. Referential integrity constraints: if you work for a company, it must exist in the database. Other constraints: peoples’ ages are between 0 and 150. Dan Suciu -- CSEP544 Fall 2010

35 Keys in E/R Diagrams Product namecategory price Underline: Multi-attribute key v.s. Multiple keys Not possible in E/R

36 Single Value Constraints makes v. s. Dan Suciu -- CSEP544 Fall 2010

37 Referential Integrity Constraints CompanyProduct makes CompanyProduct makes Each product made by at most one company. Some products made by no company Each product made by exactly one company.

38 Other Constraints CompanyProduct makes <100 What does this mean ? Dan Suciu -- CSEP544 Fall 2010

39 Weak Entity Sets Entity sets are weak when their key comes from other classes to which they are related. UniversityTeam affiliation numbersportname Notice: we encountered this when converting multiway relationships to binary relationships

40 Handling Weak Entity Sets UniversityTeam affiliation numbersportname How do we represent this with relations ? Dan Suciu -- CSEP544 Fall 2010

41 Weak Entity Sets Weak entity set = entity where part of the key comes from another number sport name Convert to a relational schema (in class) affiliation Team University

42 What Are the Keys of R ? R A B S T V Q UW V Z C D E G K H F L

Design Theory Dan Suciu -- CSEP544 Fall

44 Schema Refinements = Normal Forms 1st Normal Form = all tables are flat 2nd Normal Form = obsolete Boyce Codd Normal Form = will study 3rd Normal Form = see book Dan Suciu -- CSEP544 Fall 2010

45 First Normal Form (1NF) A database schema is in First Normal Form if all tables are flat NameGPACourses Alice3.8 Bob3.7 Carol3.9 Math DB OS DB OS Math OS Student NameGPA Alice3.8 Bob3.7 Carol3.9 Student Course Math DB OS StudentCourse AliceMath CarolMath AliceDB BobDB AliceOS CarolOS Takes Course May need to add keys

46 Relational Schema Design Person buys Product name pricenamessn Conceptual Model: Relational Model: plus FD’s Normalization: Eliminates anomalies Dan Suciu -- CSEP544 Fall 2010

47 Data Anomalies When a database is poorly designed we get anomalies: Redundancy: data is repeated Updated anomalies: need to change in several places Delete anomalies: may lose data when we don’t want Dan Suciu -- CSEP544 Fall 2010

48 Relational Schema Design Anomalies: Redundancy = repeat data Update anomalies = Fred moves to “Bellevue” Deletion anomalies = Joe deletes his phone number: what is his city ? Recall set attributes (persons with several phones): NameSSNPhoneNumberCity Fred Seattle Fred Seattle Joe Westfield One person may have multiple phones, but lives in only one city

49 Relation Decomposition Break the relation into two: NameSSNCity Fred Seattle Joe Westfield SSNPhoneNumber Anomalies have gone: No more repeated data Easy to move Fred to “Bellevue” (how ?) Easy to delete all Joe’s phone number (how ?) NameSSNPhoneNumberCity Fred Seattle Fred Seattle Joe Westfield

50 Relational Schema Design (or Logical Design) Main idea: Start with some relational schema Find out its functional dependencies Use them to design a better relational schema Dan Suciu -- CSEP544 Fall 2010

51 Functional Dependencies A form of constraint –hence, part of the schema Finding them is part of the database design Also used in normalizing the relations Dan Suciu -- CSEP544 Fall 2010

52 Functional Dependencies Definition: If two tuples agree on the attributes then they must also agree on the attributes Formally: A 1, A 2, …, A n  B 1, B 2, …, B m A 1, A 2, …, A n B 1, B 2, …, B m Dan Suciu -- CSEP544 Fall 2010

53 When Does an FD Hold Definition: A 1,..., A m  B 1,..., B n holds in R if:  t, t’  R, (t.A 1 =t’.A 1 ...  t.A m =t’.A m  t.B 1 =t’.B 1 ...  t.B n =t’.B n ) A1A1...AmAm B1B1 nmnm if t, t’ agree here then t, t’ agree here t t’ R

54 Examples EmpID  Name, Phone, Position Position  Phone but not Phone  Position An FD holds, or does not hold on an instance: EmpIDNamePhonePosition E0045Smith1234Clerk E3542Mike9876Salesrep E1111Smith9876Salesrep E9999Mary1234Lawyer Dan Suciu -- CSEP544 Fall 2010

55 Example Position  Phone EmpIDNamePhonePosition E0045Smith1234Clerk E3542Mike 9876  Salesrep E1111Smith 9876  Salesrep E9999Mary1234Lawyer Dan Suciu -- CSEP544 Fall 2010

56 Example EmpIDNamePhonePosition E0045Smith 1234  Clerk E3542Mike9876Salesrep E1111Smith9876Salesrep E9999Mary 1234  Lawyer but not Phone  Position Dan Suciu -- CSEP544 Fall 2010

57 Example FD’s are constraints: On some instances they hold On others they don’t namecategorycolordepartmentprice GizmoGadgetGreenToys49 TweakerGadgetGreenToys99 Does this instance satisfy all the FDs ? name  color category  department color, category  price name  color category  department color, category  price Dan Suciu -- CSEP544 Fall 2010

58 Example namecategorycolordepartmentprice GizmoGadgetGreenToys49 TweakerGadgetBlackToys99 GizmoStationaryGreenOffice-supp.59 What about this one ? name  color category  department color, category  price name  color category  department color, category  price

59 An Interesting Observation If all these FDs are true: name  color category  department color, category  price name  color category  department color, category  price Then this FD also holds: name, category  price Why ??

60 Goal: Find ALL Functional Dependencies Anomalies occur when certain “bad” FDs hold We know some of the FDs Need to find all FDs, then look for the bad ones Dan Suciu -- CSEP544 Fall 2010

61 Armstrong’s Rules (1/3) Is equivalent to Splitting rule and Combing rule A1...AmB1...Bm A 1, A 2, …, A n  B 1, B 2, …, B m A 1, A 2, …, A n  B 1 A 1, A 2, …, A n  B A 1, A 2, …, A n  B m A 1, A 2, …, A n  B 1 A 1, A 2, …, A n  B A 1, A 2, …, A n  B m Dan Suciu -- CSEP544 Fall 2010

62 Armstrong’s Rules (2/3) Trivial Rule Why ? A1A1 …AmAm where i = 1, 2,..., n A 1, A 2, …, A n  A i Dan Suciu -- CSEP544 Fall 2010

63 Armstrong’s Rules (3/3) Transitive Closure Rule If and then Why ? A 1, A 2, …, A n  B 1, B 2, …, B m B 1, B 2, …, B m  C 1, C 2, …, C p A 1, A 2, …, A n  C 1, C 2, …, C p Dan Suciu -- CSEP544 Fall 2010

64 A1A1 …AmAm B1B1 …BmBm C1C1...CpCp Dan Suciu -- CSEP544 Fall 2010

65 Example (continued) Start from the following FDs: Infer the following FDs: 1. name  color 2. category  department 3. color, category  price 1. name  color 2. category  department 3. color, category  price Inferred FD Which Rule did we apply ? 4. name, category  name 5. name, category  color 6. name, category  category 7. name, category  color, category 8. name, category  price

66 Example (continued) Answers: Inferred FD Which Rule did we apply ? 4. name, category  name Trivial rule 5. name, category  color Transitivity on 4, 1 6. name, category  category Trivial rule 7. name, category  color, category Split/combine on 5, 6 8. name, category  price Transitivity on 3, 7 1. name  color 2. category  department 3. color, category  price 1. name  color 2. category  department 3. color, category  price THIS IS TOO HARD ! Let’s see an easier way.

67 Closure of a set of Attributes Given a set of attributes A 1, …, A n The closure, {A 1, …, A n } + = the set of attributes B s.t. A 1, …, A n  B Given a set of attributes A 1, …, A n The closure, {A 1, …, A n } + = the set of attributes B s.t. A 1, …, A n  B name  color category  department color, category  price name  color category  department color, category  price Example: Closures: name + = {name, color} {name, category} + = {name, category, color, department, price} color + = {color}

68 Closure Algorithm X={A1, …, An}. Repeat until X doesn’t change do: if B 1, …, B n  C is a FD and B 1, …, B n are all in X then add C to X. X={A1, …, An}. Repeat until X doesn’t change do: if B 1, …, B n  C is a FD and B 1, …, B n are all in X then add C to X. {name, category} + = { } name  color category  department color, category  price name  color category  department color, category  price Example: Hence: name, category  color, department, price

69 Example Compute {A,B} + X = {A, B, } Compute {A, F} + X = {A, F, } R(A,B,C,D,E,F) A, B  C A, D  E B  D A, F  B A, B  C A, D  E B  D A, F  B In class: Dan Suciu -- CSEP544 Fall 2010

70 Why Do We Need Closure With closure we can find all FD’s easily To check if X  A –Compute X + –Check if A  X + Dan Suciu -- CSEP544 Fall 2010

71 Using Closure to Infer ALL FDs A, B  C A, D  B B  D A, B  C A, D  B B  D Example: Step 1: Compute X +, for every X: A+ = A, B+ = BD, C+ = C, D+ = D AB+ =ABCD, AC+=AC, AD+=ABCD, BC+=BCD, BD+=BD, CD+=CD ABC+ = ABD+ = ACD + = ABCD (no need to compute– why ?) BCD + = BCD, ABCD+ = ABCD A+ = A, B+ = BD, C+ = C, D+ = D AB+ =ABCD, AC+=AC, AD+=ABCD, BC+=BCD, BD+=BD, CD+=CD ABC+ = ABD+ = ACD + = ABCD (no need to compute– why ?) BCD + = BCD, ABCD+ = ABCD Step 2: Enumerate all FD’s X  Y, s.t. Y  X + and X  Y =  : AB  CD, AD  BC, ABC  D, ABD  C, ACD  B

72 Another Example Enrollment(student, major, course, room, time) student  major major, course  room course  time What else can we infer ? [in class, or at home] Dan Suciu -- CSEP544 Fall 2010

73 Keys A superkey is a set of attributes A 1,..., A n s.t. for any other attribute B, we have A 1,..., A n  B A key is a minimal superkey –I.e. set of attributes which is a superkey and for which no subset is a superkey Dan Suciu -- CSEP544 Fall 2010

74 Computing (Super)Keys Compute X + for all sets X If X + = all attributes, then X is a key List only the minimal X’s Dan Suciu -- CSEP544 Fall 2010

75 Example Product(name, price, category, color) name, category  price category  color name, category  price category  color What is the key ? Dan Suciu -- CSEP544 Fall 2010

76 Example Product(name, price, category, color) name, category  price category  color name, category  price category  color What is the key ? (name, category) + = name, category, price, color Hence (name, category) is a key

77 Examples of Keys Enrollment(student, address, course, room, time) student  address room, time  course student, course  room, time student  address room, time  course student, course  room, time (find keys at home) Dan Suciu -- CSEP544 Fall 2010

78 Eliminating Anomalies Main idea: X  A is OK if X is a (super)key X  A is not OK otherwise Dan Suciu -- CSEP544 Fall 2010

79 Example What the key?} NameSSNPhoneNumberCity Fred Seattle Fred Seattle Joe Westfield Joe Westfield SSN  Name, City

80 Example What the key? {SSN, PhoneNumber} NameSSNPhoneNumberCity Fred Seattle Fred Seattle Joe Westfield Joe Westfield SSN  Name, City Hence SSN  Name, City is a “bad” dependency

81 Key or Keys ? Can we have more than one key ? Given R(A,B,C) define FD’s s.t. there are two or more keys Dan Suciu -- CSEP544 Fall 2010

82 Key or Keys ? Can we have more than one key ? Given R(A,B,C) define FD’s s.t. there are two or more keys AB  C BC  A A  BC B  AC or what are the keys here ? Can you design FDs such that there are three keys ?

83 Boyce-Codd Normal Form A simple condition for removing anomalies from relations: In other words: there are no “bad” FDs A relation R is in BCNF if: If A 1,..., A n  B is a non-trivial dependency in R, then {A 1,..., A n } is a superkey for R A relation R is in BCNF if: If A 1,..., A n  B is a non-trivial dependency in R, then {A 1,..., A n } is a superkey for R Equivalently:  X, either (X + = X) or (X + = all attributes) Dan Suciu -- CSEP544 Fall 2010

84 BCNF Decomposition Algorithm A’s Others B’s R1R1 Is there a 2-attribute relation that is not in BCNF ? repeat choose A 1, …, A m  B 1, …, B n that violates BNCF split R into R 1 (A 1, …, A m, B 1, …, B n ) and R 2 (A 1, …, A m, [others]) continue with both R 1 and R 2 until no more violations R2R2 In practice, we have a better algorithm (coming up)

85 Example What the key? {SSN, PhoneNumber} NameSSNPhoneNumberCity Fred Seattle Fred Seattle Joe Westfield Joe Westfield SSN  Name, City use SSN  Name, City to split Dan Suciu -- CSEP544 Fall 2010

86 Example NameSSNCity Fred Seattle Joe Westfield SSNPhoneNumber SSN  Name, City Let’s check anomalies: Redundancy ? Update ? Delete ? Dan Suciu -- CSEP544 Fall 2010

87 Example Decomposition Person(name, SSN, age, hairColor, phoneNumber) SSN  name, age age  hairColor Decompose in BCNF (in class): Dan Suciu -- CSEP544 Fall 2010

88 BCNF Decomposition Algorithm BCNF_Decompose(R) find X s.t.: X ≠X + ≠ [all attributes] if (not found) then “R is in BCNF” let Y = X + - X let Z = [all attributes] - X + decompose R into R1(X  Y) and R2(X  Z) continue to decompose recursively R1 and R2 BCNF_Decompose(R) find X s.t.: X ≠X + ≠ [all attributes] if (not found) then “R is in BCNF” let Y = X + - X let Z = [all attributes] - X + decompose R into R1(X  Y) and R2(X  Z) continue to decompose recursively R1 and R2

89 Example BCNF Decomposition Person(name, SSN, age, hairColor, phoneNumber) SSN  name, age age  hairColor Find X s.t.: X ≠X + ≠ [all attributes] Dan Suciu -- CSEP544 Fall 2010

90 Example BCNF Decomposition Person(name, SSN, age, hairColor, phoneNumber) SSN  name, age age  hairColor Find X s.t.: X ≠X + ≠ [all attributes] What are the keys ? Dan Suciu -- CSEP544 Fall 2010 Iteration 1: Person: SSN+ = SSN, name, age, hairColor Decompose into: P(SSN, name, age, hairColor) Phone(SSN, phoneNumber) Iteration 2: P: age+ = age, hairColor Decompose: People(SSN, name, age) Hair(age, hairColor) Phone(SSN, phoneNumber) Iteration 1: Person: SSN+ = SSN, name, age, hairColor Decompose into: P(SSN, name, age, hairColor) Phone(SSN, phoneNumber) Iteration 2: P: age+ = age, hairColor Decompose: People(SSN, name, age) Hair(age, hairColor) Phone(SSN, phoneNumber)

91 Example A  B B  C R(A,B,C,D) A + = ABC ≠ ABCD R(A,B,C,D) Dan Suciu -- CSEP544 Fall 2010

92 Example What are the keys ? A  B B  C R(A,B,C,D) A + = ABC ≠ ABCD R(A,B,C,D) What happens if in R we first pick B + ? Or AB + ? R 1 (A,B,C) B + = BC ≠ ABC R 2 (A,D) R 11 (B,C) R 12 (A,B) Dan Suciu -- CSEP544 Fall 2010

93 Decompositions in General R 1 = projection of R on A 1,..., A n, B 1,..., B m R 2 = projection of R on A 1,..., A n, C 1,..., C p R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p ) Dan Suciu -- CSEP544 Fall 2010

94 Theory of Decomposition Sometimes it is correct: NamePriceCategory Gizmo19.99Gadget OneClick24.99Camera Gizmo19.99Camera NamePrice Gizmo19.99 OneClick24.99 Gizmo19.99 NameCategory GizmoGadget OneClickCamera GizmoCamera Lossless decomposition

95 Incorrect Decomposition Sometimes it is not: NamePriceCategory Gizmo19.99Gadget OneClick24.99Camera Gizmo19.99Camera NameCategory GizmoGadget OneClickCamera GizmoCamera PriceCategory 19.99Gadget 24.99Camera 19.99Camera What’s incorrect ?? Lossy decomposition

96 Decompositions in General R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) If A 1,..., A n  B 1,..., B m Then the decomposition is lossless R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p ) BCNF decomposition is always lossless. WHY ? Note: don’t need A 1,..., A n  C 1,..., C p