Presentation is loading. Please wait.

Presentation is loading. Please wait.

M.P. Johnson, DBMS, Stern/NYU, Spring 20051 C20.0046: Database Management Systems Lecture #7 Matthew P. Johnson Stern School of Business, NYU Spring,

Similar presentations


Presentation on theme: "M.P. Johnson, DBMS, Stern/NYU, Spring 20051 C20.0046: Database Management Systems Lecture #7 Matthew P. Johnson Stern School of Business, NYU Spring,"— Presentation transcript:

1

2 M.P. Johnson, DBMS, Stern/NYU, Spring 20051 C20.0046: Database Management Systems Lecture #7 Matthew P. Johnson Stern School of Business, NYU Spring, 2005

3 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 2 Agenda Last time: FDs This time: 1. Combining FDs 2. Anomalies 3. Normalization Next time: Relational Algebra Now: review

4 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 3 Closure Algorithm Start with X={A 1, …, A n }. Repeat: if B 1, …, B n  C is a FD and B 1, …, B n are all in X then add C to X. until X didn’t change Start with X={A 1, …, A n }. Repeat: if B 1, …, B n  C is a FD and B 1, …, B n are all in X then add C to X. until X didn’t change {name, category} + = {name, category, color, department, price} name  color category  department color, category  price name  color category  department color, category  price Example:

5 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 4 Example Compute {A,B} + X = {A, B, } Compute {A, F} + X = {A, F, } R(A,B,C,D,E,F) A, B  C A, D  E B  D A, F  B A, B  C A, D  E B  D A, F  B In class:

6 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 5 More definitions Given FDs versus Derived FDs  Given FDs: stated initially for the relation  Derived FDs: those that are inferred using the rules or through closure Basis: A set of FDs from which we can infer all other FDs for the relation Minimal basis: a minimal set of FDs (Can’t discard any of them)  No proper subset of the minimal basis can derive the complete set of FDs

7 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 6 Problem: find all FDs Given a relation instance and set of given FDs Find all FD’s satisfied by that instance Useful if we don’t get enough information from our users: need to reverse engineer a data instance Q: How long does this take? A: Some time for each subset of atts Q: How many subsets?  powerset  exponential time in worst-case But can often be smarter…

8 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 7 Using Closure to Infer ALL FDs A, B  C A, D  B B  D A, B  C A, D  B B  D Example: Compute X +, for every set X (AB is shorthand for {A,B}): A + = A, B + = BD, C + = C, D + = D AB + = ABCD, AC + = AC, AD + = ABCD, BC+ = BC, BD+ = BD, CD+ = CD ABC + = ABD + = ACD + = ABCD (no need to compute–why?) BCD + = BCD, ABCD + = ABCD A + = A, B + = BD, C + = C, D + = D AB + = ABCD, AC + = AC, AD + = ABCD, BC+ = BC, BD+ = BD, CD+ = CD ABC + = ABD + = ACD + = ABCD (no need to compute–why?) BCD + = BCD, ABCD + = ABCD

9 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 8 Example R(A,B,C) Given FDs: AB  C, BC  A Q: What are the FDs?  Closure of singleton sets  Closure of doubletons Q: What are the keys? Q: What are the minimal FD bases?

10 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 9 Computing Keys Rephrasing of definition of key: X is a key if  X + = all attributes  X is minimal Note: there can be many minimal keys ! Example: R(A,B,C), AB  C, BC  A Minimal keys: AB and BC

11 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 10 Examples of Keys Product(name, price, category, color) name, category  price category  color FDs are: Keys are: {name, category} Enrollment(student, address, course, room, time) student  address room, time  course student, course  room, time FDs are: Keys are:

12 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 11 Next topic: Anomalies Identify anomalies in existing schema How to decompose a relation Boyce-Codd Normal Form (BCNF) Recovering information from a decomposition Third Normal Form

13 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 12 Types of anomalies Redundancy  Repeat info unnecessarily in several tuples Update anomalies:  Change info in one tuple but not in another Deletion anomalies:  Delete some values & lose other values too Insert anomalies:  Inserting row means having to insert other, separate info / null-ing it out

14 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 13 Example of anomalies Redundancy: name, maddress Update anomaly: Bill moves Delete anom.: Bill doesn’t pay bills, lose phones  lose Bill! Insert anom: can’t insert someone without a (non-null) phone Underlying cause: SSN-phone is many-many Effect: partial dependency ssn  name, maddress,  Whereas key = {ssn,phone} NameSSNMailing-addressPhone Michael123NY212-111-1111 Michael123NY917-111-1111 Hilary456DC202-222-2222 Hilary456DC914-222-2222 Bill789Chappaqua914-222-2222 Bill789Chappaqua212-333-3333 SSN  Name, Mailing-address SSN  Phone

15 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 14 Decomposition by projection Soln: replace anomalous R with projections of R onto two subsets of attributes Projection: an operation in Relational Algebra  Corresponds to SELECT command in SQL Projecting R onto attributes (A 1,…,A n ) means removing all other attributes  Result of projection is another relation  Yields tuples whose fields are A 1,…,A n  Resulting duplicates ignored

16 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 15 Projection for decomposition R 1 = projection of R on A 1,..., A n, B 1,..., B m R 2 = projection of R on A 1,..., A n, C 1,..., C p A 1,..., A n  B 1,..., B m  C 1,..., C p = all attributes, usually disjoint sets R 1 and R 2 may (/not) be reassembled to produce original R R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p )

17 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 16 Chappaqua789Bill DC456Hilary NY123Michael Mailing-addressSSNName Decomposition example The anomalies are gone  No more redundant data  Easy to for Bill to move  Okay for Bill to lose all phones Break the relation into two: NameSSNMailing-addressPhone Michael123NY212-111-1111 Michael123NY917-111-1111 Hilary456DC202-222-2222 Hilary456DC914-222-2222 Bill789Chappaqua914-222-2222 Bill789Chappaqua212-333-3333 789 914-222-2222 789 914-222-2222 456 202-222-2222 456 917-111-1111 123 212-111-1111 123 PhoneSSN

18 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 17 Thus: high-level strategy Person buys Product name pricenamessn Conceptual Model: Relational Model: plus FD’s Normalization: Eliminates anomalies

19 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 18 Using FDs to produce good schemas 1. Start with set of relations 2. Define FDs (and keys) for them based on real world 3. Transform your relations to “normal form” (normalize them)  Do this using “decomposition” Intuitively, good design means  No anomalies  Can reconstruct all (and only the) original information

20 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 19 Decomposition terminology Projection: eliminating certain attributes from relation Decomposition: separating a relation into two by projection Join: (re)assembling two relations  Whenever a row from R 1 and a row from R 2 have the same value for some atts A, join together to form a row of R 3 If exactly the original rows are reproduced by joining the relations, then the decomposition was lossless  We join on the attributes R 1 and R 2 have in common (As) If it can’t, the decomposition was lossy

21 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 20 A decomposition is lossless if we can recover: R(A,B,C) R1(A,B) R2(A,C) R’(A,B,C) should be the same as R(A,B,C) R’ is in general larger than R. Must ensure R’ = R Decompose Recover Lossless Decompositions

22 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 21 Lossless decomposition Sometimes the same set of data is reproduced: (Word, 100) + (Word, WP)  (Word, 100, WP) (Oracle, 1000) + (Oracle, DB)  (Oracle, 1000, DB) (Access, 100) + (Access, DB)  (Access, 100, DB) NamePriceCategory Word100WP Oracle1000DB Access100DB NamePrice Word100 Oracle1000 Access100 NameCategory WordWP OracleDB AccessDB

23 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 22 Lossy decomposition Sometimes it’s not: (Word, WP) + (100, WP)  (Word, 100, WP) (Oracle, DB) + (1000, DB)  (Oracle, 1000, DB) (Oracle, DB) + (100, DB)  (Oracle, 100, DB) (Access, DB) + (1000, DB)  (Access, 1000, DB) (Access, DB) + (100, DB)  (Access, 100, DB) NamePriceCategory Word100WP Oracle1000DB Access100DB CategoryName WPWord DBOracle DBAccess CategoryPrice WP100 DB1000 DB100 What’s wrong?

24 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 23 Ensuring lossless decomposition Examples: name  price, so first decomposition was lossless category  name and category  price, and so second decomposition was lossy R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) If A 1,..., A n  B 1,..., B m or A 1,..., A n  C 1,..., C p Then the decomposition is lossless R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p ) Note: don’t need both

25 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 24 Quick lossless/lossy example At a glance: can we decompose into R 1 (Y,X), R 2 (Y,Z)? At a glance: can we decompose into R 1 (X,Y), R 2 (X,Z)? XYZ 123 425

26 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 25 Next topic: Normal Forms First Normal Form = all attributes are atomic  As opposed to set-valued  Assumed all along Second Normal Form (2NF) Third Normal Form (3NF) Boyce Codd Normal Form (BCNF) Fourth Normal Form (4NF) Fifth Normal Form (5NF)

27 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 26 Most important: BCNF A simple condition for removing anomalies from relations: I.e.: The left side must always contain a key I.e: If a set of attributes determines other attributes, it must determine all the attributes A relation R is in BCNF if: If As  Bs is a non-trivial dependency in R, then As is a superkey for R A relation R is in BCNF if: If As  Bs is a non-trivial dependency in R, then As is a superkey for R C: Ted Codd, IBM researcher, inventor of relational model, 1970 B: Ray Boyce, IBM researcher, helped develop SQL in the 1970s

28 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 27 BCNF decomposition algorithm Repeat choose A 1, …, A m  B 1, …, B n that violates the BNCF condition split R into R 1 (A 1, …, A m, B 1, …, B n ) and R 2 (A 1, …, A m, [others]) continue with both R 1 and R 2 Until no more violations Repeat choose A 1, …, A m  B 1, …, B n that violates the BNCF condition split R into R 1 (A 1, …, A m, B 1, …, B n ) and R 2 (A 1, …, A m, [others]) continue with both R 1 and R 2 Until no more violations A’s Others B’s R1R1 R2R2 //Heuristic: choose Bs as large as possible

29 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 28 Boyce-Codd Normal Form Name/phone example is not BCNF:  {ssn,phone} is key  FD: ssn  name,mailing-address holds Violates BCNF: ssn is not a superkey Its decomposition is BCNF  Only superkeys  anything else NameSSNMailing-addressPhone Michael123NY212-111-1111 Michael123NY917-111-1111 NameSSNMailing-address Michael123NY SSNPhoneNumber 123 212-111-1111 123 917-111-1111


Download ppt "M.P. Johnson, DBMS, Stern/NYU, Spring 20051 C20.0046: Database Management Systems Lecture #7 Matthew P. Johnson Stern School of Business, NYU Spring,"

Similar presentations


Ads by Google