M.P. Johnson, DBMS, Stern/NYU, Sp20041 C20.0046: Database Management Systems Lecture #6 Matthew P. Johnson Stern School of Business, NYU Spring, 2004.

Slides:



Advertisements
Similar presentations
Schema Refinement and Normal Forms Given a design, how do we know it is good or not? What is the best design? Can a bad design be transformed into a good.
Advertisements

Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Lossless Decomposition (2) Prof. Sin-Min Lee Department of Computer Science San Jose State University.
Lecture 6: Design Constraints and Functional Dependencies January 21st, 2004.
Murali Mani Normalization. Murali Mani What and Why Normalization? To remove potential redundancy in design Redundancy causes several anomalies: insert,
Database Systems Lecture #5 Yan Pan School of Software, SYSU 2011.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.
603 Database Systems Senior Lecturer: Laurie Webster II, M.S.S.E.,M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 8 A First Course in Database Systems.
Functional Dependencies Definition: If two tuples agree on the attributes A, A, … A 12n then they must also agree on the attributes B, B, … B 12m Formally:
1 Introduction to Database Systems CSE 444 Lectures 8 & 9 Database Design April 16 & 18, 2008.
Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000.
The principal problem that we encounter is redundancy, where a fact is repeated in more than one tuple. Most common cause: attempts to group into one relation.
Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science San Jose State University.
Matthew P. Johnson, OCL5, CISDD CUNY, Sept OCL4 Oracle 10g: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY June, 2005.
Winter 2002Arthur Keller – CS 1804–1 Schedule Today: Jan. 15 (T) u Normal Forms, Multivalued Dependencies. u Read Sections Assignment 1 due. Jan.
1 Lecture 06 The Relational Data Model. 2 Outline Relational Data Model Functional Dependencies FDs in ER Logical Schema Design Reading Chapter 8.
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
M.P. Johnson, DBMS, Stern/NYU, Spring C : Database Management Systems Lecture #7 Matthew P. Johnson Stern School of Business, NYU Spring,
M.P. Johnson, DBMS, Stern/NYU, Spring C : Database Management Systems Lecture #5 M.P. Johnson Stern School of Business, NYU Spring, 2008.
Databases 6: Normalization
Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee.
Fall 2001Arthur Keller – CS 1804–1 Schedule Today Oct. 4 (TH) Functional Dependencies and Normalization. u Read Sections Project Part 1 due. Oct.
Relation Decomposition A, A, … A 12n Given a relation R with attributes Create two relations R1 and R2 with attributes B, B, … B 12m C, C, … C 12l Such.
Functional Dependencies and Relational Schema Design.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Normalization Goal = BCNF = Boyce-Codd Normal Form = all FD’s follow from the fact “key  everything.” Formally, R is in BCNF if for every nontrivial FD.
King Saud University College of Computer & Information Sciences Computer Science Department CS 380 Introduction to Database Systems Functional Dependencies.
Your name here. Improving Schemas and Normalization What are redundancies and anomalies? What are functional dependencies and how are they related to.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Lecture 09: Functional Dependencies. Outline Functional dependencies (3.4) Rules about FDs (3.5) Design of a Relational schema (3.6)
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Functional Dependencies. Outline Functional dependencies (3.4) Rules about FDs (3.5) Design of a Relational schema (3.6)
1 Lecture 7: Normal Forms, Relational Algebra Monday, 10/15/2001.
© D. Wong Ch. 3 (continued)  Database design problems  Functional Dependency  Keys of relations  Decompositions based on Functional Dependency.
Tallahassee, Florida, 2015 COP4710 Database Systems Relational Design Fall 2015.
1 Lecture 10: Database Design Wednesday, January 26, 2005.
Functional Dependencies and Relational Schema Design.
Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.
3 Spring Chapter Normalization of Database Tables.
1 Lecture 10: Database Design and Relational Algebra Monday, October 20, 2003.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
CS 405G: Introduction to Database Systems Database Normalization.
CS 338Database Design and Normal Forms9-1 Database Design and Normal Forms Lecture Topics Measuring the quality of a schema Schema design with normalization.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
11/06/97J-1 Principles of Relational Design Chapter 12.
1 Lecture 9: Database Design Wednesday, January 25, 2006.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Formal definition of a key A key is a set of attributes A 1,..., A n such that for any other attribute B: A 1,..., A n  B A minimal key is a set of attributes.
Lecture 11: Functional Dependencies
Design Theory for RDB Normal Forms.
Schedule Today: Next After that Normal Forms. Section 3.6.
3.1 Functional Dependencies
Problems in Designing Schema
Normalization Murali Mani.
Functional Dependencies and Relational Schema Design
Introduction to Database Systems CSE 444 Lectures 8 & 9 Database Design October 12 & 15, 2007.
Lecture 8: Database Design
Lecture 07: E/R Diagrams and Functional Dependencies
OCL3 Oracle 10g: SQL & PL/SQL Session #3
Terminology Product Attribute names Name Price Category Manufacturer
Lecture 6: Functional Dependencies
Lecture 09: Functional Dependencies
Presentation transcript:

M.P. Johnson, DBMS, Stern/NYU, Sp20041 C : Database Management Systems Lecture #6 Matthew P. Johnson Stern School of Business, NYU Spring, 2004

M.P. Johnson, DBMS, Stern/NYU, Sp Agenda Last time: FDs Project part 2 up soon This time: 1. Anomalies 2. Normalization Next time: Relational Algebra

M.P. Johnson, DBMS, Stern/NYU, Sp Review examples: finding FDs Product(name, price, category, color) name, category  price category  color Keys are: {name, category} Enrollment(student, address, course, room, time) student  address room, time  course student, course  room, time Keys are: [in class]

M.P. Johnson, DBMS, Stern/NYU, Sp Another review example Relation R(A,B,C) Each of three attributes determines other two Q: What are the FDs?  Closure of singleton sets  Closure of doubletons Q: What are the keys? Q: What are the minimal bases?

M.P. Johnson, DBMS, Stern/NYU, Sp Next topic: Anomalies (3.6) Identify anomalies in existing schema How to decompose a relation Boyce-Codd Normal Form (BCNF) Recovering information from a decomposition Third Normal Form

M.P. Johnson, DBMS, Stern/NYU, Sp Types of anomalies Redundancy  Repeat info unnecessarily in several tuples Update anomalies:  Change info in one tuple but not in another Deletion anomalies:  Delete some values & lose other values too Insert anomalies:  Inserting row means NULL-ing some fields

M.P. Johnson, DBMS, Stern/NYU, Sp Example of anomalies Redundancy: name, address Update anomaly: Bill moves Delete anom.: Bill doesn’t pay bills, lose phones  lose Bill! Underlying cause: SSN-phone is many-many Effect: partial dependency ssn  name, address NameSSNMailing-addressPhone Michael123NY Michael123NY Hilary456DC Hilary456DC Bill789Chappaqua Bill789Chappaqua SSN  Name, Mailing-address SSN  Phone

M.P. Johnson, DBMS, Stern/NYU, Sp Decomposition by projection Soln: replace anomalous R with projections of R onto two subsets of attributes Projection: an operation in Relational Algebra  projection = SELECT in SQL Projecting R onto attributes (A 1,…,A n ) means removing all other attributes  Result of projection is another relation  Yields tuples whose fields are A 1,…,A n  Resulting duplicates ignored

M.P. Johnson, DBMS, Stern/NYU, Sp Projection for decomposition R 1 = projection of R on A 1,..., A n, B 1,..., B m R 2 = projection of R on A 1,..., A n, C 1,..., C p A 1,..., A n  B 1,..., B m  C 1,..., C p = all attributes, usually disjoint sets R 1 and R 2 may (/not) be reassembled to produce original R. R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p )

M.P. Johnson, DBMS, Stern/NYU, Sp Chappaqua789Bill NY123Hilary NY123Michael Mailing-addressSSNName Decomposition example The anomalies are gone  No more redundant data  Easy to for Bill to move  Okay for Bill to lose all phones Break the relation into two: NameSSNMailing-addressPhone Michael123NY Michael123NY Hilary456DC Hilary456DC Bill789Chappaqua Bill789Chappaqua PhoneSSN

M.P. Johnson, DBMS, Stern/NYU, Sp Thus: high-level strategy Person buys Product name pricenamessn Conceptual Model: Relational Model: plus FD’s Normalization: Eliminates anomalies

M.P. Johnson, DBMS, Stern/NYU, Sp Using FDs to produce good schemas Start with set of relations Define FDs (and keys) for them based on real world Transform your relations to “normal form” (normalize them)  Do this using “decomposition” Intuitively, good design means  No anomalies  Can reconstruct all original information

M.P. Johnson, DBMS, Stern/NYU, Sp Decomposition terminology Projection: eliminating certain attributes from relation Decomposition: separating a relation into two by projection Join: (re)assembling two relations  Whenever a row from R 1 and a row from R 2 have the same value for att A join to form a row of R 3 If the original data can be reproduced after a decomposition by joining the relations, then the decomposition was lossless  We join on the attributes R 1 and R 2 have in common (As) If it can’t, the decomposition was lossy

M.P. Johnson, DBMS, Stern/NYU, Sp A decomposition is lossless if we can recover: R(A,B,C) R1(B,C) R2(B,A) R’(A,B,C) should be the same as R(A,B,C) R’ is in general larger than R. Must ensure R’ = R Decompose Recover Lossless Decompositions

M.P. Johnson, DBMS, Stern/NYU, Sp Lossless decomposition Sometimes the data can be reproduced: (MSOffice, 100) + (MSOffice, WP)  (MSOffice, 100, WP) (MSOffice, 100) + (MSOffice, DB)  (MSOffice, 100, DB) (Oracle, 1000) + (Oracle, DB)  (Oracle, 1000, DB) NamePriceCategory MSOffice100WP Oracle1000DB MSOffice100DB NamePrice MSOffice100 Oracle1000 MSOffice100 NameCategory MSOfficeWP OracleDB MSOfficeDB

M.P. Johnson, DBMS, Stern/NYU, Sp Lossy decomposition Sometimes it’s not: (MSOffice, WP) + (100, WP)  (MSOffice, 100, WP) (Oracle, DB) + (1000, DB)  (Oracle, 1000, DB) (Oracle, DB) + (100, DB)  (Oracle, 100, DB) (MSOffice, DB) + (1000, DB)  (MSOffice, 1000, DB) (MSOffice, DB) + (100, DB)  (MSOffice, 100, DB) NamePriceCategory MSOffice100WP Oracle1000DB MSOffice100DB NameCategory MSOfficeWP OracleDB MSOfficeDB PriceCategory 100WP 1000DB 100DB What’s wrong?

M.P. Johnson, DBMS, Stern/NYU, Sp Ensuring lossless decomposition Examples: name  price, so first decomposition was lossless name  category, so second decomposition was lossy R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) If A 1,..., A n  B 1,..., B m Then the decomposition is lossless R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p ) Note: don’t necessarily need A 1,..., A n  C 1,..., C p

M.P. Johnson, DBMS, Stern/NYU, Sp Quick lossless/lossy example At a glance: can we decompose into R 1 (Y,X), R 2 (Y,Z)? At a glance: can we decompose into R 1 (X,Y), R 2 (X,Z)? XYZ

M.P. Johnson, DBMS, Stern/NYU, Sp Next topic: Normal Forms First Normal Form = all attributes are atomic  As opposed to set-valued  Assumed all along Second Normal Form (2NF) Third Normal Form (3NF) Boyce Codd Normal Form (BCNF) Fourth Normal Form (4NF)

M.P. Johnson, DBMS, Stern/NYU, Sp Most important: BCNF A simple condition for removing anomalies from relations: I.e.: The left side must always contain a key I.e: If a set of attributes determines other attributes, it must determine all the attributes A relation R is in BCNF if: If As  Bs is a non-trivial dependency in R, then As is a superkey for R A relation R is in BCNF if: If As  Bs is a non-trivial dependency in R, then As is a superkey for R Codd: Ted Codd, IBM researcher, inventor of relational model, 1970 Boyce: Ray Boyce, IBM researcher, helped develop SQL in the 1970s

M.P. Johnson, DBMS, Stern/NYU, Sp BCNF decomposition algorithm Repeat choose A 1, …, A m  B 1, …, B n that violates the BNCF condition split R into R 1 (A 1, …, A m, B 1, …, B n ) and R 2 (A 1, …, A m, [others]) continue with both R 1 and R 2 Until no more violations Repeat choose A 1, …, A m  B 1, …, B n that violates the BNCF condition split R into R 1 (A 1, …, A m, B 1, …, B n ) and R 2 (A 1, …, A m, [others]) continue with both R 1 and R 2 Until no more violations A’s Others B’s R1R1 R2R2 //Heuristic: choose Bs as large as possible

M.P. Johnson, DBMS, Stern/NYU, Sp Boyce-Codd Normal Form Name/phone example is not BCNF:  {ssn,phone} is key  FD: ssn  name,mailing-address holds Violates BCNF: ssn is not a superkey Its decomposition is BCNF  Only superkeys  anything else NameSSNMailing-addressPhone Michael123NY Michael123NY NameSSNMailing-address Michael123NY SSNPhoneNumber

M.P. Johnson, DBMS, Stern/NYU, Sp BCNF Decomposition Larger example: multiple decompositions {Title, Year, Studio, President, Pres-Address} FDs:  Title Year  Studio  Studio  President  President  Pres-Address   Studio  President, Pres-Address (why?) No many-many this time Problem cause: transitive FDs:  Title,year  studio  president

M.P. Johnson, DBMS, Stern/NYU, Sp BCNF Decomposition Illegal: As  Bs, where As don’t include key Decompose: Studio  President, Pres-Address  As = {studio}  Bs = {president, pres-address}  Cs = {title, year} Result: 1. Studios(studio, president, pres-address) 2. Movies(studio, title, year) Is (2) in BCNF? Is in (1) BCNF?  Key: Studio  FD: President  Pres-Address  Q: Does president  studio? If so, president is a key  But if not, it violates BCNF

M.P. Johnson, DBMS, Stern/NYU, Sp BCNF Decomposition Studios(studio, president, pres-address) Illegal: As  Bs, where As don’t include key  Decompose: President  Pres-Address  As = {president}  Bs = {pres-address}  Cs = {studio} {Studio, President, Pres-Address} becomes  {President, Pres-Address}  {Studio, President}

M.P. Johnson, DBMS, Stern/NYU, Sp BCNF and two-att relations Must a two-attribute relation be in BCNF?  Case 1: there are no non-trivial FDs  Case 2: A  B but not B  A  Case 3: B  A but not A  B  Case 4: Both A  B and B  A Note that relations may have multiple keys BCNF requires a key on the left, not all keys

M.P. Johnson, DBMS, Stern/NYU, Sp Future agenda Skipping chapter 4 (for now) Next topic: Relational Algebra (Codd) Then: SQL! For Tuesday:  Read  Homework 1 due