CSE544 Data Modeling, Conceptual Design

Slides:



Advertisements
Similar presentations
Information Systems Chapter 4 Relational Databases Modelling.
Advertisements

Lecture 6: Design Constraints and Functional Dependencies January 21st, 2004.
1 Lecture 04 Entity/Relationship Modelling. 2 Outline E/R model (Chapter 5) From E/R diagrams to relational schemas (Chapter 5) Constraints in SQL (Chapter.
Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science San Jose State University.
Functional Dependencies Definition: If two tuples agree on the attributes A, A, … A 12n then they must also agree on the attributes B, B, … B 12m Formally:
1 Introduction to Database Systems CSE 444 Lectures 8 & 9 Database Design April 16 & 18, 2008.
Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000.
Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science San Jose State University.
1 Lecture 06 The Relational Data Model. 2 Outline Relational Data Model Functional Dependencies FDs in ER Logical Schema Design Reading Chapter 8.
M.P. Johnson, DBMS, Stern/NYU, Spring C : Database Management Systems Lecture #7 Matthew P. Johnson Stern School of Business, NYU Spring,
Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee.
Lecture 9: Conceptual Database Design January 27 th, 2003.
Functional Dependencies and Relational Schema Design.
E/R Diagrams and Functional Dependencies. Modeling Subclasses The world is inherently hierarchical. Some entities are special cases of others We need.
Lecture 08: E/R Diagrams and Functional Dependencies.
Lecture 2: E/R Diagrams and the Relational Model Thursday, January 4, 2001.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Lecture 09: Functional Dependencies. Outline Functional dependencies (3.4) Rules about FDs (3.5) Design of a Relational schema (3.6)
Conceptual Database Design. Building an Application with a DBMS Requirements modeling (conceptual, pictures) –Decide what entities should be part of the.
1 Introduction to Database Systems CSE 444 Lecture 07 E/R Diagrams October 10, 2007.
Functional Dependencies. Outline Functional dependencies (3.4) Rules about FDs (3.5) Design of a Relational schema (3.6)
1 Lecture 7: Normal Forms, Relational Algebra Monday, 10/15/2001.
1 Lecture 10: Database Design Wednesday, January 26, 2005.
Functional Dependencies and Relational Schema Design.
Lecture 3: Conceptual Database Design and Schema Design April 12 th, 2004.
1 Lecture 10: Database Design and Relational Algebra Monday, October 20, 2003.
1 Lecture 08: E/R Diagrams and Functional Dependencies Friday, January 21, 2005.
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
1 Lecture 9: Database Design Wednesday, January 25, 2006.
Formal definition of a key A key is a set of attributes A 1,..., A n such that for any other attribute B: A 1,..., A n  B A minimal key is a set of attributes.
Lecture 11: Functional Dependencies
Lecture 5: Conceptual Database Design
Database Design Oct. 3, 2001.
CS422 Principles of Database Systems Normalization
COP4710 Database Systems Relational Design.
CS422 Principles of Database Systems Normalization
Conceptual Database Design
3.1 Functional Dependencies
Problems in Designing Schema
Cse 344 May 14th – Entities.
Introduction to Database Systems CSE 544
Cse 344 May 18th – Loss and views.
Lecture 6: Design Theory
Normalization Murali Mani.
Functional Dependencies and Normalization
Cse 344 May 11th – Entities.
Lecture 2: Database Modeling (end) The Relational Data Model
Lecture 06 Data Modeling: E/R Diagrams
Cse 344 May 16th – Normalization.
Functional Dependencies and Relational Schema Design
Introduction to Database Systems CSE 444 Lectures 8 & 9 Database Design October 12 & 15, 2007.
Lecture 09: Functional Dependencies, Database Design
name category name price makes Company Product stockprice buys employs
Lecture 8: Database Design
Lecture 9: The E/R Model II
Lecture 07: E/R Diagrams and Functional Dependencies
Lecture 10: The E/R Model III
CS4433 Database Systems Relational Design.
Functional Dependencies
Lecture 06: SQL Monday, October 11, 2004.
Chapter 19 (part 1) Functional Dependencies
Terminology Product Attribute names Name Price Category Manufacturer
Lecture 08: E/R Diagrams and Functional Dependencies
Lecture 6: Functional Dependencies
Lecture 11: Functional Dependencies
Lecture 09: Functional Dependencies
CS4222 Principles of Database System
Presentation transcript:

CSE544 Data Modeling, Conceptual Design Wednesday, April 7, 2004

Outline ER diagrams (Chapter 2) Conceptual Design (Chapter 19)

Database Design Data Modeling SQL Tables Refinement Files E/R diagrams Relations

Relational Schema Design Person buys Product name price ssn Conceptual Model: Relational Model: plus FD’s Normalization: Eliminates anomalies

Entity / Relationship Diagrams Attributes address Entity sets Product buys Relationships

name category name price makes Company Product stockprice buys employs Person name ssn address

Keys in E/R Diagrams Every entity set must have a key name category price Product

Multiplicity of E/R Relations one-one: many-one many-many 1 2 3 a b c d 1 2 3 a b c d 1 2 3 a b c d

name category name price makes Company Product stockprice What does this say ? buys employs Person name ssn address

Multi-way Relationships Purchase Product Person Store

Arrows in Multiway Relationships Q: what does the arrow mean ? Rental VideoStore Person Movie Invoice A: if I know the store, person, invoice, I know the movie too

Arrows in Multiway Relationships Q: what do these arrow mean ? Invoice VideoStore Rental Movie Person A: store, person, invoice determines movie and store, invoice, movie determines person

Arrows in Multiway Relationships Q: how do I say: “invoice determines store” ? A: no good way; best approximation: Incomplete (why ?) Rental VideoStore Person Movie Invoice

Roles in Relationships What if we need an entity set twice in one relationship? Product Purchase Store buyer salesperson Person

Attributes on Relationships date Product Purchase Store Person

Converting Multi-way Relationships to Binary ProductOf date Product Purchase StoreOf Store BuyerOf Person Need arrows here ! Which direction ?

From E/R Diagrams to Relational Schema Entity set  relation Relationship  relation

Entity Set to Relation name category price Product Product(name, category, price) name category price gizmo gadgets $19.99

Relationships to Relations price name category Start Year name makes Company Product Stock price Makes(product-name, product-category, company-name, year) (watch out for attribute name conflicts)

Relationships to Relations price name category Start Year name makes Company Product Stock price No need for Makes. Modify Product: Product(name, category, price, startYear, companyName)

Multi-way Relationships to Relations address name Product Purchase price Store name Purchase(prodName,stName,ssn) Person ssn name

Design Principles What’s wrong? Purchase Product Person President Country Person Moral: be faithful!

Design Principles: What’s Wrong? date Product Purchase Store Moral: pick the right kind of entities. personAddr personName

Design Principles: What’s Wrong? date Dates Product Purchase Store Moral: don’t complicate life more than it already is. Person

Subclasses name category price Product isa isa Software Product Educational Product platforms Age Group

Subclasses to Relations Product Name Price Category Gizmo 99 gadget Camera 49 photo Toy 39 Product name category price isa Educational Product Software Product Age Group platforms Sw.Product Name platforms Gizmo unix Ed.Product Name Age Group Gizmo todler Toy retired Notice: subclass = subset Alternative: disjoint classes (Java, C++)

Modeling UnionTypes With Subclasses FurniturePiece Person Company Each piece of furniture is owned either by a person, or by a company

Modeling Union Types with Subclasses Solution 1. Acceptable, imperfect (What’s wrong ?) FurniturePiece Person Company ownedByPerson

Modeling Union Types with Subclasses Solution 2: better Owner isa isa ownedBy Person Company FurniturePiece

Referential Integrity Constraints makes Product Company Each product made by at most one company. Some products made by no company makes Product Company Each product made by exactly one company.

Other Constraints makes <100 Product Company What does this mean ?

Weak Entity Sets Entity sets are weak when their key comes from other classes to which they are related. affiliation Team University sport number name University(name) Team(universityName, number, sport)

Weak Entity Sets R1 E3 E1 A3 A1 A2 R2 B3 E2 R3 What are the keys ? E4

Schema Refinement For the relational model Relation: R(A1, A2, …, Am) Schema: relation name, attribute names Instance: a mathematical m-ary relation Database: R1, R2, …, Rn Schema Instance Schema refinement = normalization

First Normal Form (1NF) A database schema is in First Normal Form if all tables are flat Student Student Name GPA Alice 3.8 Bob 3.7 Carol 3.9 Name GPA Courses Alice 3.8 Bob 3.7 Carol 3.9 Math DB OS Takes Course Student Course Alice Math Carol DB Bob OS Course Math DB OS DB OS May need to add keys Math OS

More Normal Forms Based on Functional Dependencies 2nd Normal Form (obsolete) 3rd Normal Form Boyce Codd Normal Form (BCNF) Based on Multivalued Dependencies 4th Normal Form Based on Join Dependencies 5th Normal Form Discuss next

Functional Dependencies A form of constraint hence, part of the schema Finding them is part of the database design

Functional Dependencies Functional Dependency: A1, A2, …, An  B1, B2, …, Bm Meaning: If two tuples agree on the attributes then they must also agree on the attributes A1, A2, …, An B1, B2, …, Bm

Functional Dependencies Definition: A1, ..., An  B1, ..., Bm holds in R if: t, t’  R, (t.A1=t’.A1  ...  t.An=t’.An  t.B1=t’.B1  ...  t.Bm=t’.Bm ) R A1 ... An B1 Bm t if t, t’ agree here then t, t’ agree here t’

Examples EmpID  Name, Phone, Position Position  Phone but Phone  Position E0045 Smith 1234 Clerk E1847 John 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer

Example Product(name, category, color, department, price) Consider these FDs: name  color category  department color, category  price What do they say ?

Example FD’s are constraints: On some instances they hold name  color On others they don’t name  color category  department color, category  price name category color department price Gizmo Gadget Green Toys 49 Tweaker 99 No: color,category  price doesn’t hold Does this instance satisfy all the FDs ?

Example name  color category  department color, category  price Gizmo Gadget Green Toys 49 Tweaker Black 99 Stationary Office-supp. 59 yes What about this one ?

Inference If some FDs are satisfied, then others are satisfied too name  color category  department color, category  price If all these FDs are true: Then this FD also holds: name, category  price Why ?? We say that the new FD is implied

Armstrong’s Axioms Splitting rule and Combing rule A1, A2, …, An  B1, B2, …, Bm Splitting rule and Combing rule Is equivalent to A1 ... An B1 Bm A1, A2, …, An  B1 A1, A2, …, An  B2 . . . . . A1, A2, …, An  Bm

Armstrong’s Axioms Trivial Rule A1, A2, …, An  Ai where i = 1, 2, ..., n A1 … Am Why ?

Armstrong’s Axioms Transitive Closure Rule If A1, A2, …, An  B1, B2, …, Bm Why ? and B1, B2, …, Bm  C1, C2, …, Cp then A1, A2, …, An  C1, C2, …, Cp A1 … An B1 Bm C1 ... Cp

Example (continued) 1. name  color 2. category  department 3. color, category  price To: name, category  price From: Inferred FD Which Rule did we apply ? 4. name, category  name 5. name, category  color 6. name, category  category 7. name, category  color, category 8. name, category  price

Example (continued) 1. name  color 2. category  department 3. color, category  price Answers: Inferred FD Which Rule did we apply ? 4. name, category  name Trivial rule 5. name, category  color Transitivity on 4, 1 6. name, category  category 7. name, category  color, category Split/combine on 5, 6 8. name, category  price Transitivity on 3, 7

Closure of a Set of FDs Definition. Given a set F of functional dependencies, the closure, F+, denotes all FDs implied by F Theorem. Armstrong axioms are sound and complete for computing F+ What do sound and complete mean ? Sound = if Armstrong’s axioms derive some functional dependency X -> Y, then X->Y is indeed implied by F Complete = if some functional dependency X->Y is implied by F, then there exists a sequence of steps that allow us to derive X->Y from Armstrong’s axioms

Variation Augmentation A1, A2, …, An  B If then A1, A2, …, An , C1, C2, …, Cp  B Augmentation follows from trivial rules and transitivity How ?

Problem: Compute F+ Given F compute its closure F+. How to proceed ? Apply Armstrong’s Axioms repeatedly Better: use the Closure Algorithm for a set of attributes (next)

Closure of a set of Attributes Given a set of attributes A1, …, An The closure, {A1, …, An}+ , is the set of attributes B s.t. A1, …, An  B Example: name  color category  department color, category  price Closures: name+ = {name, color} {name, category}+ = {name, category, color, department, price} color+ = {color}

Closure Algorithm (for Attributes) Start with X={A1, …, An}. Repeat until X doesn’t change do: if B1, …, Bn  C is a FD and B1, …, Bn are all in X then add C to X. Example: name  color category  department color, category  price {name, category}+ = {name, category, color, department, price}

Example In class: A, B  C R(A,B,C,D,E,F) A, D  E B  D A, F  B Compute {A,B}+ X = {A, B, } Compute {A, F}+ X = {A, F, }

Closure Algorithm (for FDs) Example: A, B  C A, D  B B  D Step 1: Compute X+, for every X: A+ = A, B+ = BD, C+ = C, D+ = D AB+ = ABCD, AC+ = AC, AD+ = ABCD ABC+ = ABD+ = ACD+ = ABCD (no need to compute– why ?) BCD+ = BCD, ABCD+ = ABCD Step 2: Enumerate all FD’s X  Y, s.t. Y  X+ and XY = : AB  CD, ADBC, ABC  D, ABD  C, ACD  B

Keys A superkey is a set of attributes A1, ..., An s.t. A1, ..., An  B for all attributes B A key is a minimal superkey

Computing Keys Compute X+ for all sets X If X+ = all attributes, then X is a superkey Consider only the minimal superkeys Note: there can be exponentially many keys ! Example: R(A,B,C), ABC, BCA Keys: AB and BC

Examples of Keys Product(name, price, category, color) name, category  price category  color Key: {name, category} Superkeys: supersets Enrollment(student, address, course, room, time) student  address room, time  course student, course  room, time Keys are: [in class] Keys: {student, room, time}, {student, course} and all supersets

Incomplete (what does it say ?) FD’s for E/R Diagrams Say: “the CreditCard determines the Person” Product sname Purchase name Store Incomplete (what does it say ?) card-no CreditCard Person ssn Purchase(name , sname, ssn, card-no) card-no  ssn

Data Anomalies When a database is poorly designed we get anomalies: Redundancy: data is repeated Updated anomalies: need to change in several places Delete anomalies: may lose data when we don’t want Schema refinement means removing the data anomalies.

Data Anomalies Anomalies: Recall set attributes (persons with several phones): Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle 206-555-6543 Joe 987-65-4321 908-555-2121 Westfield SSN  Name, City but not SSN  PhoneNumber Anomalies: Redundancy = repeat data Update anomalies = Fred moves to “Bellevue” Deletion anomalies = Joe deletes his phone number: what is his city ?

Relation Decomposition Break the relation into two: Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle 206-555-6543 Joe 987-65-4321 908-555-2121 Westfield Name SSN City Fred 123-45-6789 Seattle Joe 987-65-4321 Westfield SSN PhoneNumber 123-45-6789 206-555-1234 206-555-6543 987-65-4321 908-555-2121 Anomalies have gone: No more repeated data Easy to move Fred to “Bellevue” (how ?) Easy to delete all Joe’s phone number (how ?)

Decompositions in General R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp) R1 = projection of R on A1, ..., An, B1, ..., Bm R2 = projection of R on A1, ..., An, C1, ..., Cp

Problems With Decomposition Can we get the data back correctly ? Lossless decomposition Discuss next Can we recover the FD’s on the ‘big’ table from the FD’s on the small tables ? Dependency-preserving decomposition Figure out yourself, or read 19.5.2

Lossless Decomposition Sometimes it is correct: Name Price Category Gizmo 19.99 Gadget OneClick 24.99 Camera Name Price Gizmo 19.99 OneClick 24.99 Name Category Gizmo Gadget OneClick Camera

Lossy Decomposition Sometimes it is not: What’s wrong ?? Name Price Category Gizmo 19.99 Gadget OneClick 24.99 Camera What’s wrong ?? Name Category Gizmo Gadget OneClick Camera Price Category 19.99 Gadget 24.99 Camera

Decompositions in General R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp) Theorem If A1, ..., An  B1, ..., Bm Then the decomposition is lossless Note: don’t need necessarily A1, ..., An  C1, ..., Cp Example: name  price, hence the first decomposition is lossless

Normal Forms Decomposition into Boyce Codd Normal Form (BCNF) Losselss Decomposition into 3rd Normal Form Losless Dependency preserving

Boyce-Codd Normal Form A simple condition for removing anomalies from relations: A relation R is in BCNF if: If A1, ..., An  B is a non-trivial dependency in R , then {A1, ..., An} is a superkey for R Equivalently: for any set of attributes X, either X+ = X or X+ = all attributes

BCNF Decomposition Algorithm Repeat choose A1, …, Am  B1, …, Bn that violates the BNCF condition split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [rest]) continue with both R1 and R2 Until no more violations Heuristics: choose B1, …, Bn “as large as possible” Is there a 2-attribute relation that is not in BCNF ? B’s A’s rest Note: need to compute the FDs on R1, R2 (how ?) R1 R2

BCNF Example FD: SSN  Name, City Key: {SSN, PhoneNumber} Fred 123-45-6789 206-555-1234 Seattle 206-555-6543 Joe 987-65-4321 908-555-2121 Westfield 908-555-1234 FD: SSN  Name, City Key: {SSN, PhoneNumber} Is it in BCNF? Another way: SSN+ ={SSN, Name, City} but no PhoneNumber

BCNF Example SSN  Name, City Let’s check anomalies: Redundancy ? Fred 123-45-6789 Seattle Joe 987-65-4321 Westfield SSN  Name, City Let’s check anomalies: Redundancy ? Update ? Delete ? SSN PhoneNumber 123-45-6789 206-555-1234 206-555-6543 987-65-4321 908-555-2121 908-555-1234

Example R(A,B,C,D) A  B, B  C Key: AD Violations of BCNF: A  B, A C, ABC, B  C Pick A BC first: split into R1(A,B,C) R2(A,D) In R1 : B  C; split into R11(A,B), R12(B,C) Final answer: R11(A,B), R12(B,C), R2(A,D)

Example (cont’d) R(A,B,C,D) A  B, B  C Order matters ! Pick A C first: R1(A,C), R2(A,B,D) In R2: A  B; decompose into R21(A,B), R22(A,D) Final answer: R1(A,C), R21(A,B), R22(A,D) Which one is better ? Answer: the first decomposition is better, because it is dependency preserving. In the second decomposition we loose the dependency B->C.

BCNF and Dependencies FD’s: Unit  Company; Company, Product  Unit So, there is a BCNF violation, and we decompose. Unit Company Unit  Company Unit Product No FDs In BCNF we loose the FD: Company, Product  Unit

Solution: 3rd Normal Form (3NF) A simple condition for removing anomalies from relations: A relation R is in 3rd normal form if : Whenever there is a nontrivial dependency A1, A2, ..., An  B for R , then {A1, A2, ..., An } a super-key for R, or B is part of a key. Please read in the book !!!

3NF Discussion 3NF decomposition v.s. BCNF decomposition: Tradeoffs Use same decomposition steps, for a while 3NF may stop decomposing, while BCNF continues Tradeoffs BCNF = no anomalies, but may lose some FDs 3NF = keeps all FDs, but may have some anomalies