C : Database Management Systems Lecture #8

Slides:



Advertisements
Similar presentations
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Advertisements

M.P. Johnson, DBMS, Stern/NYU, Sp20041 C : Database Management Systems Lecture #7 Matthew P. Johnson Stern School of Business, NYU Spring, 2004.
603 Database Systems Senior Lecturer: Laurie Webster II, M.S.S.E.,M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 6 A First Course in Database Systems.
Lossless Decomposition (2) Prof. Sin-Min Lee Department of Computer Science San Jose State University.
Database Systems Lecture #5 Yan Pan School of Software, SYSU 2011.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.
Recap of Feb 13: SQL, Relational Calculi, Functional Dependencies SQL: multiple group bys, having, lots of examples Tuple Calculus Domain Calculus Functional.
603 Database Systems Senior Lecturer: Laurie Webster II, M.S.S.E.,M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 8 A First Course in Database Systems.
M.P. Johnson, DBMS, Stern/NYU, Sp20041 C : Database Management Systems Lecture #8 Matthew P. Johnson Stern School of Business, NYU Spring, 2004.
Closure The closure of {B 1 …B k } under the set of FDs S, denoted by {B 1 …B k } +, is defined as follows: {B 1 …B k } + = {B | any relation satisfies.
M.P. Johnson, DBMS, Stern/NYU, Spring C : Database Management Systems Lecture #6 M.P. Johnson Stern School of Business, NYU Spring, 2008.
M.P. Johnson, DBMS, Stern/NYU, Sp20041 C : Database Management Systems Lecture #9 Matthew P. Johnson Stern School of Business, NYU Spring, 2004.
Functional Dependencies Definition: If two tuples agree on the attributes A, A, … A 12n then they must also agree on the attributes B, B, … B 12m Formally:
Matthew P. Johnson, OCL2, CISDD CUNY, January OCL2 Oracle 10g: SQL & PL/SQL Session #3 Matthew P. Johnson CISDD, CUNY January, 2005.
Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000.
The principal problem that we encounter is redundancy, where a fact is repeated in more than one tuple. Most common cause: attempts to group into one relation.
M.P. Johnson, DBMS, Stern/NYU, Spring C : Database Management Systems Lecture #9 Matthew P. Johnson Stern School of Business, NYU Spring, 2005.
Winter 2002Arthur Keller – CS 1804–1 Schedule Today: Jan. 15 (T) u Normal Forms, Multivalued Dependencies. u Read Sections Assignment 1 due. Jan.
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
M.P. Johnson, DBMS, Stern/NYU, Spring C : Database Management Systems Lecture #7 Matthew P. Johnson Stern School of Business, NYU Spring,
M.P. Johnson, DBMS, Stern/NYU, Spring C : Database Management Systems Lecture #5 M.P. Johnson Stern School of Business, NYU Spring, 2008.
Databases 6: Normalization
M.P. Johnson, DBMS, Stern/NYU, Sp20041 C : Database Management Systems Lecture #6 Matthew P. Johnson Stern School of Business, NYU Spring, 2004.
Fall 2001Arthur Keller – CS 1804–1 Schedule Today Oct. 4 (TH) Functional Dependencies and Normalization. u Read Sections Project Part 1 due. Oct.
Department of Computer Science and Engineering, HKUST Slide 1 7. Relational Database Design.
Relation Decomposition A, A, … A 12n Given a relation R with attributes Create two relations R1 and R2 with attributes B, B, … B 12m C, C, … C 12l Such.
Functional Dependencies and Relational Schema Design.
Ch 7: Normalization-Part 2 Much of the material presented in these slides was developed by Dr. Ramon Lawrence at the University of Iowa.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Multi-valued Dependencies and Fourth Normal Form
Normalization Goal = BCNF = Boyce-Codd Normal Form = all FD’s follow from the fact “key  everything.” Formally, R is in BCNF if for every nontrivial FD.
Copyright © Curt Hill Schema Refinement III 4 th NF and 5 th NF.
Chapter 8: Relational Database Design First Normal Form First Normal Form Functional Dependencies Functional Dependencies Decomposition Decomposition Boyce-Codd.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Lecture 09: Functional Dependencies. Outline Functional dependencies (3.4) Rules about FDs (3.5) Design of a Relational schema (3.6)
Design Theory for Relational Databases 2015, Fall Pusan National University Ki-Joune Li.
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
1 Lecture 7: Normal Forms, Relational Algebra Monday, 10/15/2001.
© D. Wong Ch. 3 (continued)  Database design problems  Functional Dependency  Keys of relations  Decompositions based on Functional Dependency.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Database Design and Normalization (based on notes by Silberchatz,Korth,
© D. Wong Normalization  Purpose: process to eliminate redundancy in relations due to functional or multi-valued dependencies.  Decompose relation.
Normalization.
Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.
Lecture 13: Relational Decomposition and Relational Algebra February 5 th, 2003.
3 Spring Chapter Normalization of Database Tables.
1 Lecture 10: Database Design and Relational Algebra Monday, October 20, 2003.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
Multivalued Dependencies and 4th NF CIS 4301 Lecture Notes Lecture /21/2006.
CS 338Database Design and Normal Forms9-1 Database Design and Normal Forms Lecture Topics Measuring the quality of a schema Schema design with normalization.
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
1 Dept. of CIS, Temple Univ. CIS661 – Principles of Data Management V. Megalooikonomou Database design and normalization (based on slides by C. Faloutsos.
1 Lecture 9: Database Design Wednesday, January 25, 2006.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
More on Decompositions and Third Normal Form CIS 4301 Lecture Notes Lecture /16/2006.
Formal definition of a key A key is a set of attributes A 1,..., A n such that for any other attribute B: A 1,..., A n  B A minimal key is a set of attributes.
CS411 Database Systems 08: Midterm Review Kazuhiro Minami 1.
Relational Database Design by Dr. S. Sridhar, Ph. D
CS 480: Database Systems Lecture 22 March 6, 2013.
3.1 Functional Dependencies
Module 5: Overview of Normalization
Lecture 11: Functional Dependencies
Chapter 7a: Overview of Database Design -- Normalization
Presentation transcript:

C20.0046: Database Management Systems Lecture #8 Matthew P. Johnson Stern School of Business, NYU Spring, 2005 M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Roadmap Want to remove redundancy/anomalies Convert to BCNF Find FDs – closure alg Check if each FD AB is ok If A contains a key If not, decompose into R1(A,B), R2(A,rest) Because AB, this will be lossless Could check by joining R1 and R2 Would get no rows not in original M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Normal Forms First Normal Form = all attributes are atomic As opposed to set-valued Assumed all along Second Normal Form (2NF) Third Normal Form (3NF) Boyce Codd Normal Form (BCNF) Fourth Normal Form (4NF) Fifth Normal Form (5NF) M.P. Johnson, DBMS, Stern/NYU, Spring 2005

Decomposition example Name SSN Mailing-address Phone Michael 123 NY 212-111-1111 917-111-1111 Hilary 456 DC 202-222-2222 914-222-2222 Bill 789 Chappaqua 212-333-3333 Break the relation into two: Chappaqua 789 Bill DC 456 Hilary NY 123 Michael Mailing-address SSN Name 212-333-3333 789 914-222-2222 456 202-222-2222 917-111-1111 123 212-111-1111 Phone SSN The anomalies are gone No more redundant data Easy to for Bill to move Okay for Bill to lose all phones M.P. Johnson, DBMS, Stern/NYU, Spring 2005

Boyce-Codd Normal Form Name/phone example is not BCNF: {ssn,phone} is key FD: ssn  name,mailing-address holds Violates BCNF: ssn is not a superkey Its decomposition is BCNF Only superkeys  anything else Name SSN Mailing-address Phone Michael 123 NY 212-111-1111 917-111-1111 Name SSN Mailing-address Michael 123 NY SSN PhoneNumber 123 212-111-1111 917-111-1111 M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 BCNF motivation Two big ideas: Only a key field can determine other fields Key values are unique  no FD-caused redundancy Slogan: “Every FD must contain the key, the whole key and nothing but the key.” More accurate: “Every FD must contain (on the left) a key, a whole key, and maybe other fields. M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Design examples Consider situation: Entities: Parts, Suppliers, Departments Relship: Contracts(P,S,D,id,quant) Draw E/R New rule: no department can buy multiple parts from the same supplier (why?) Translate to FD Normalize M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Design examples Consider situation: Entities: Emp(ssn,name,lot), Dept(id,name,budg) Relship: Works(E,D,since) Draw E/R New info: in each dept, everyone parks in same lot Translate to FD Normalize M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 BCNF Decomposition Larger example: multiple decompositions {Title, Year, Studio, President, Pres-Address} FDs: Title Year  Studio Studio  President President  Pres-Address  Studio  President, Pres-Address (why?) No many-many this time Problem cause: transitive FDs: Title,year  studio  president M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 BCNF Decomposition Illegal: As  Bs, where As don’t include key Decompose: Studio  President, Pres-Address As = {studio} Bs = {president, pres-address} Cs = {title, year} Result: Studios(studio, president, pres-address) Movies(studio, title, year) Is (2) in BCNF? Is in (1) BCNF? Key: Studio FD: President  Pres-Address Q: Does president  studio? If so, president is a key But if not, it violates BCNF M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 BCNF Decomposition Studios(studio, president, pres-address) Illegal: As  Bs, where As don’t include key  Decompose: President  Pres-Address As = {president} Bs = {pres-address} Cs = {studio} {Studio, President, Pres-Address} becomes {President, Pres-Address} {Studio, President} M.P. Johnson, DBMS, Stern/NYU, Spring 2005

Decomposition algorithm example R(N,O,R,P) F = {N  O, O  R, R  N} Key: N,P Violations of BCNF: N  O, OR, N OR which kinds of violations are these? Pick N  OR (on board) Can we rejoin? (on board) What happens if we pick N  O instead? Can we rejoin? (on board) Name Office Residence Phone George Pres. WH 202-… 486-… Dick VP NO 307-… M.P. Johnson, DBMS, Stern/NYU, Spring 2005

BCNF and two-att relations Must a two-attribute relation be in BCNF? Case 1: there are no non-trivial FDs Case 2: A  B but not B  A Case 3: B  A but not A  B Case 4: Both A  B and B  A Note that relations may have multiple keys BCNF requires a key on the left, not all keys M.P. Johnson, DBMS, Stern/NYU, Spring 2005

Lossless BCNF decomposition Consider simple relation: R(A,B,C) Only FD: A  B (assume C!A) Key: A,C Also goes through if CA BCNF violation: no key on the left Thus: Decomposition to BCNF: Create R1(A,B) and R2(A,C) Could this be lossy? We will join R1 and R2 on A to find out M.P. Johnson, DBMS, Stern/NYU, Spring 2005

Lossless BCNF decomposition Suppose R contains the rows: (b,a,c) and (b’,a,c’) In projection onto (B,A): (b,a,c)  (b,a), (b’,a,c’)  (b’,a) In projection onto (A,C): (b,a,c)  (a,c), (b’,a,c’)  (a,c’) In joining, (b’,a) and (a,c) become (b’,a,c), and (b,a) and (a,c’) become (b,a,c’) Q: Is/must/can this be correct? A: Yes! A  B, so b = b’ So this was lossless We assumed C!A, but argument also goes through when CA Moral: BCNF decomp alg really is lossless M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 BCNF summary BCNF decomposition is lossless Can reproduce original by joining Saw: Every 2-attribute relation is in BCNF Final set of decomposed relations might be different depends on order of bad FDs chosen Saw: But all results will be in BCNF M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 A problem with BCNF Relation: R(Title, Theater, Neighboorhood) FDs: Title,N’hood  Theater Assume a movie shouldn’t play twice in same neighborhood Theater  N’hood Keys: {Title, N’hood} {Theater, Title} Title Theater N’hood Aviator Angelica Village Life Aquatic M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 A problem with BCNF BCNF violation: Theater  N’hood Decompose: {Theater, N’Hood} {Theater, Title} Resulting relations: Village Angelica N’hood Theater R1 Life Aquatic Angelica Aviator Title Theater R2 M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Problem - continued Suppose we add new rows to R1 and R2: Their join: R1 R2 Theater N’hood Angelica Village Film Forum Theater Title Angelica Life Aquatic Aviator Film Forum Life Aquatic Village Film Forum N’hood Aviator Title Angelica Theater (R’) A and B could not enforce FD Title,N’hood  Theater M.P. Johnson, DBMS, Stern/NYU, Spring 2005

Third normal form: motivation There are some situations in which BCNF is not dependency-preserving, and Efficient checking for FD violation on updates is important In these cases BCNF is too severe a req. Solution: define a weaker normal form, called Third Normal Form in which FDs can be checked on individual relations without performing a join (no inter-relational FDs) to which relations can be converted, preserving both data and FDs M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Third Normal Form BCNF decomposition is not dependency-preserving! We now define the (weaker) Third Normal Form Turns out: this example was already in 3NF A relation R is in 3rd normal form if : For every nontrivial dependency A1, A2, ..., An  B for R, {A1, A2, ..., An } is a super-key for R, or B is part of a key, i.e., B is prime Tradeoff: BCNF = no FD anomalies, but may lose some FDs 3NF = keeps all FDs, but may have some anomalies M.P. Johnson, DBMS, Stern/NYU, Spring 2005

BCNF: vices and virtues Be clear on the problem just described v. the arg. that BCNF decomp is data-lossless BCNF decomp does not lose data Resulting relations can be rejoined to obtain the original But: it can can lose dependencies After decomp, now legal to add rows whose corresponding rows would be illegal in (rejoined) original M.P. Johnson, DBMS, Stern/NYU, Spring 2005

Recap: goals of normalization When we decompose a relation R with FDs F into R1..Rn we want: lossless-join decomposition – no data lost no/little redundancy: the relations Ri should be in either BCNF or at least 3NF Dependency preservation: if Fi be the set of dependencies in F+ that include only attributes in Ri: F is the “sum” of the FDs of the new relations (F1  F2  F3  …  Fn)+ = F+ Otherwise checking updates for violation of FDs may require computing joins, which is expensive M.P. Johnson, DBMS, Stern/NYU, Spring 2005

Dependency preservation Saw that last req. didn’t hold in move-theater example Did it hold in R(N,O,R,P) example? (on board) M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Testing for 3NF For each dependency X  Y, use attribute closure to check if X is a superkey If X is not a superkey, verify that each attribute in Y is prime This test is rather more expensive, since it involves finding candidate keys Testing for 3NF is NP-complete Interestingly, decomposition into 3NF can be done in polynomial time  Testing for 3NF is harder than decomposing into 3NF! Optimization: need to check only FDs in F, need not check all FDs in F+ (why?) M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 3NF Example R = (J, K, L) F = (JK  L, L  K) Two candidate keys: JK and JL R is in 3NF JK  L JK is a superkey L  K K is prime BCNF decomposition yields R1 = (L,K), R2 = (L,J) testing for JK  L requires a join There is some redundancy in R M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 BCNF and 3NF Comparison Example of problems due to redundancy in 3NF R = (J, K, L) F = (JK  L, L  K) A schema that is in 3NF but not BCNF has the problems of: redundancy (e.g., the relationship between l1 and k1) need to use null values (if allowed!), e.g. to represent the relationship between l2 and k2 when there is no corresponding value for attribute J J K L j1 k1 l1 j2 j3 NULL k2 l2 M.P. Johnson, DBMS, Stern/NYU, Spring 2005

Comparison of BCNF and 3NF It is always possible to decompose a relation into relations in 3NF such that: the decomposition is lossless the dependencies are preserved It is always possible to decompose a relation into relations in BCNF such that: but it may not be possible to preserve dependencies But may eliminate more redundancy M.P. Johnson, DBMS, Stern/NYU, Spring 2005

The Normal Forms (so far) 1NF: every attribute has an atomic value 2NF: no longer used 3NF: for each FD X  Y either it is trivial, or X is a superkey, or Y is a part of some key BCNF: 3NF and third 3NF option disallowed M.P. Johnson, DBMS, Stern/NYU, Spring 2005

Distinguishing examples 1NF but not 2NF: R(Name, SSN ,Mailing- address,Phone) Key: SSN,Phone Partial: ssn  name, address 3NF but not BCNF: R(Title, Theater, N’hood) Title,N’hood  Theater Prime-on-right: Theater  N’hood M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Design Goals Goal for a relational database design is: No redundancy Lossless Join Dependency Preservation If we cannot achieve this, we accept one of dependency loss use of more expensive inter-relational methods to preserve dependencies data redundancy due to use of 3NF Interesting: SQL does not provide a direct way of specifying FDs other than superkeys can specify FDs using assertions, but they are expensive to test M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 3NF 3NF means we may have anomalies Example: TEACH(student, teacher, subject) student, subject  teacher (students not allowed in the same subject with two teachers) teacher  subject (each teacher teaches one subject) Subject is prime, so this is 3NF But we have anomalies: Insertion: cannot insert a teacher until we have a student taking his subject If we convert to BCNF, we lost student, subject  teacher M.P. Johnson, DBMS, Stern/NYU, Spring 2005

BCNF and over-normalization What is the problem? Schema overload – trying to capture two meanings: 1) subject X can be taught by teacher Y 2) student Z takes subject W from teacher V What to do? 3NF has anomalies, normalizing to BCNF loses FDs One soln: keep the 3NF TEACH and another (BCNF) relation SUBJECT-TAUGHT (teacher, subject) Still (more!) redundancy, but no more insert and delete anomalies M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Normalization Review Q: What’s required for BCNF? Q: How do we fix a non-BCNF relation? Q: What’s the loophole for 3NF? M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Normalization Review Q: If AsBs violates BCNF, what do we do? Q: In this case, could the decomposition be lossy? Q: How do we combine two relations? Q: Can BCNF decomp. lose FDs? Q: Can 3NF decomp. lose FDs? M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 New topic: MVDs Consider this relation People ~ their jobs ~ their residences Person-address/city: many-many Person-job: many-many Address/city-job: independent Chappaqua 333 Some Street First Lady 456 Hilary Washington 444 Embassy Row New York 111 East 60th Street CEO 123 Michael London 222 Brompton Road Streets Lawyer Senator Mayor Jobs 789 Citys SSN Name M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Redundancy in BCNF Name Streets Citys Jobs Michael 111 East 60th Street New York Mayor 222 Brompton Road London CEO Hilary 333 Some Street Chappaqua Senator 444 Embassy Row Washington First Lady Lawyer Lots of redundancy! Key? All fields None determined by others! Non-trivial FDs? None!  In BCNF? Yes! Now what? New concept, leading to another normal form: Multivalued dependencies M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 MVD definition As  Bs if, when As are held fixed values in Bs are independent of values in rest More precisely: if t1 and t3 agree on As, we then can find t2 such that t2, t2, t3 agree on As t2, t1 agree of Bs t2, t3 agree on Cs t1 As Bs Cs | | | | t2 As Bs Cs | | | | t3 As Bs Cs M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 MVD example Claim: name  streets,cities If true: can pick arbitrary t1, t3 and find a t2 We pick: first and last of Hilary’s tuples: Now: if true, can find another Hilary row with street/address of t1 and job of t3 Lawyer Washington 444 Embassy Row Hilary Jobs Citys Streets Name Senator Chappaqua 333 Some Street t1 t3 Lawyer Chappaqua 333 Some Street Hilary t2 M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 MVD example Now: if true, can find another Hilary row with street/address of t1 and job of t3 Sure enough: t2 Hilary 333 Some Street Chappaqua Lawyer Name Streets Citys Jobs Michael 111 East 60th Street New York Mayor 222 Brompton Road London CEO Hilary 333 Some Street Chappaqua Senator 444 Embassy Row Washington First Lady Lawyer t2 M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 MVD rules No splitting rule: In the example, name  streets,cities Do we have name  streets? No: 444 Embassy Row doesn’t go with Chappaqua NB: City doesn’t determine street – could have >1 house But city, street aren’t independent Name Streets Citys Jobs Hilary 333 Some Street Chappaqua Senator 444 Embassy Row Washington Lawyer t1 t3 M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 MVD rules Trivial dependencies: As  Bs iff As  BsAi Transitive rule: As  Bs, Bs  Cs  As  Cs Complementation rule: As  Bs  As  rest Intuition: if each value in Bs is assoc’ed w/each value in rest, then each value of rest is assoc’ed w/each value in Bs Name Streets Citys Jobs Michael 111 East 60th Street New York Mayor 222 Brompton Road London CEO M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 MVDs and FDs MVD is a generalization of FD Every FD is an MVD Pf: Suppose As  Bs Pick t1, t3 that agree on As. Must find a t2. Let t2 be t3. Then 1) t2 agrees on As with both 2) t2 agrees on Bs with t1 (why?) 3) t2 agrees on rest with t3 (why?) QED M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Fourth Normal Form 4NF: like BCNF, but with MVDs not FDs An MVD As  Bs is nontrivial if No Bs are As Some attributes left over (why?) 4NF: for every nontrivial MVD As  Bs, As is a superkey In example name  streets,cities, but name isn’t a superkey Name Streets Citys Jobs Hilary 333 Some Street Chappaqua Senator 444 Embassy Row Washington Lawyer M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Decomposition to 4NF Again, analogous to BCNF If we can find As  Bs for R where As isn’t a superkey, replace R with R1(As,Bs) and R2(As,rest) Running example: name  streets,cities  People(name,streets,cities,jobs) becomes Residences(name,street,city) and Employment(name,job) M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 4NF: another construal In nontrivial As  Bs, As must be superkey After df of 4NF, text says: “That is, … every nontrivial MVD is really a FD with a superkey on the left” (p123). We know: FDs are* MVDs but not vice versa So: Why does this follow? Is it true? Yes. As is a superkey  As  everything  As  Bs  the MVD is an FD Two kinds of MVDs: FDs and “true” MVDs 4NF eliminates exactly the true ones * The typo swapping these was fixed. M.P. Johnson, DBMS, Stern/NYU, Spring 2005

Summary of normal forms Guaranteed to 3NF BCFN 4NF Eliminate FD redundancy Mostly Yes Eliminate MVD redundancy No Preserve FDs Preserve MVDs M.P. Johnson, DBMS, Stern/NYU, Spring 2005

Next topic: relational algebra Set operations: union, intersection, difference Projection, selection Cartesian Product Joins: natural joins, theta joins Combining operations to form queries Dependent and independent operations M.P. Johnson, DBMS, Stern/NYU, Spring 2005

What is relational algebra? An algebra for relations “High-school” algebra is an algebra for numbers Formalism for constructing expressions Operations Operands: Variables, Constants, expressions Expressions: Vars & constants Operators applied to expressions Algebra Vars/consts Operators High-school Numbers + * - / etc. Relational Relations (=sets of tupes) union, intersection, join, etc. M.P. Johnson, DBMS, Stern/NYU, Spring 2005

Why do we care about relational algebra? Why construct expressions on relations? The exprs are the form questions about the take The relations these exprs cash out to are the answers to our questions First proof of RDBMS/RA concept: System R (1979) Modern implementation of RA: SQL M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Relation operators Five basic operators: Union:  Intersection: Difference: - Selection: s Projection: P Cartesian Product:  Derived/auxiliary operators: Intersection, complement Joins (natural, equijoin, theta join, semijoin) Renaming: r M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Operators Relations are sets  have set-theoretic ops Venn diagrams Union: R1  R2 Example: ActiveEmployees  RetiredEmployees Difference: R1 – R2 AllEmployees – RetiredEmployees = ActiveEmployees M.P. Johnson, DBMS, Stern/NYU, Spring 2005

Set operations - example Name Address Gender Birthdate Fisher 123 Maple F 9/9/99 Hamill 456 Oak M 8/8/88 Name Address Gender Birthdate Fisher 123 Maple F 9/9/99 Ford 345 Palm M 7/7/77 S: R  S: Name Address Gender Birthdate Fisher 123 Maple F 9/9/99 Hamill 456 Oak M 8/8/88 Ford 345 Palm 7/7/77 M.P. Johnson, DBMS, Stern/NYU, Spring 2005

Set operations - example Name Address Gender Birthdate Fisher 123 Maple F 9/9/99 Hamill 456 Oak M 8/8/88 Name Address Gender Birthdate Fisher 123 Maple F 9/9/99 Ford 345 Palm M 7/7/77 S: R - S: Name Address Gender Birthdate Hamill 456 Oak M 8/8/88 M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Operators Intersection: R1  R2 Example: UnionizedEmployees  RetiredEmployees Intersection can be derived from  and – R1  R2 = R1 – (R1 – R2) R1  R2 = -(-R1  -R2) (allowed?) M.P. Johnson, DBMS, Stern/NYU, Spring 2005

Set operations - example Name Address Gender Birthdate Fisher 123 Maple F 9/9/99 Hamill 456 Oak M 8/8/88 Name Address Gender Birthdate Fisher 123 Maple F 9/9/99 Ford 345 Palm M 7/7/77 S: R  S: Name Address Gender Birthdate Fisher 123 Maple F 9/9/99 M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Operators Selection Selects all tuples satisfying a condition Notation: sc(R) Examples ssalary > 100000(Employee) sname = “Smith”(Employee) The condition c can have comparison ops:=, <, , >, , <> boolean ops: and, or M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Selection example City of God Village Film Forum N’hood Fog of War Title Angelica Theater Select the movies at Angelica: sTheater=“Angelica”(Showings) Village N’hood Fog of War City of God Title Angelica Theater M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Operators Projection: op we used for decomposition Eliminates columns, then removes duplicates Notation: PA1,…,An(R) M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Operators Cartesian Product Cross product Each tuple in R1 combines w/each tuple in R2 Notation: R1  R2 If R1, R2 fields overlap, include both and disambiguate: R1.A, R2.A Fairly rare in practice used to express joins Q: Where does the name come from? Q: If R1 has n1 rows and R2 has n2, how large is R1 x R2? M.P. Johnson, DBMS, Stern/NYU, Spring 2005

Cartesian product example Hillary-addresses Hillary-jobs Street City 333 Some Street Chappaqua 444 Embassy Row Washington Job Senator First Lady Lawyer Hillary-addresses x Hillary-jobs Street City Job 333 Some Street Chappaqua Senator 444 Embassy Row Washington First Lady Lawyer M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Operators Natural join: our join up to now But always merging shared attributes Notation: R1 ⋈ R2 Meaning: R1 ⋈ R2 = Pevery att once(sshared atts =(R1  R2)) I.e., first compute the cross product R1 x R2 Next, select the rows in which shared fields agree Finally, project onto the union of R1 and R2’s fields (remove duplicates) M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Natural join example Addresses Jobs Name Street City Hilary 333 Some Street Chappaqua 444 Embassy Row Washington Name Job Hilary Senator First Lady Lawyer Addresses ⋈ Jobs Name Street City Job Hilary 333 Some Street Chappaqua Senator 444 Embassy Row Washington First Lady Lawyer M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Natural Join R S R ⋈ S= ? Unpaired tuples called dangling A B X Y Z V B C Z U V W M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Natural Join Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R ⋈ S ? Given R(A, B, C), S(D, E), what is R ⋈ S? Given R(A, B), S(A, B), what is R ⋈ S? M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Theta Join Like natural join, but includes only rows that satisfy arbitrary condition Does not project away shared attributes R1 ⋈q R2 = sq(R1  R2) Here q can be any condition If condition is always satisfies, then theta join becomes natural join M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Theta-join example U A B C 1 2 3 6 7 8 9 V B C D 2 3 4 5 7 8 10 A U.B U.C V.B V.C D 1 2 3 4 5 7 8 10 6 9 U V A<D M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Equijoin A theta join where q is an equality R1 ⋈A=B R2 = sA=B(R1  R2) s = lower-case sigma Example: Employee ⋈SSN=SSN Dependents Most useful join in practice M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Semijoin R ⋉ S = P{atts of R}(R ⋈ S) Q: What does this mean? Natural join of R and S; Then project onto R’s atts A: The rows of R for which >1 row in S agree on shared atts M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Semijoin example Dependents Employee DSSN Dname SSN . . . SSN Name . . . network Employee ⋉ Dependents = {employees who have dependents} M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Renaming Changes the schema, not the instance Notation: rB1,…,Bn(R) r is spelled “rho”, pronounced “row” Example: Employee(ssn,name) rE2(social, name)(Employee) Or just: rE(Employee) M.P. Johnson, DBMS, Stern/NYU, Spring 2005

Complex RA Expressions Q: How long was Star Wars (1977)? Strategy: find the row with Star Wars; then project the length field Title Year Length inColor Studio Prdcr# Star Wars 1977 124 True Fox 12345 M.Ducks 1991 104 Disney 67890 W.World 1992 95 Paramount 99999 M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 Combining operations Schema: Movies (Title, year, length, filmType, studioName) Query: select titles and years of movies by Fox that are at least 100 minutes long. Title Year Length Filmtype Studio Star wars 1977 124 Color Fox Mighty ducks 1991 104 Disney Wayne’s world 1992 85 Paramount M.P. Johnson, DBMS, Stern/NYU, Spring 2005

Complex RA Expressions Reps(ssn, name, etc.) Clients(ssn, name, rssn) Q: Find George’s client names PClients.name(sReps.name=George(sReps.ssn=rssn( Reps x Clients))) Or: PClients.name(sReps.name=George and Reps.ssn=rssn(Reps x Clients)) Or: PClients.name(sReps.name=George(Reps x Clients)  sReps.ssn=rssn(Reps x Clients)) M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 For next time Finish chapter 5 Come to office hours! M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 BCNF Review Q: What’s required for BCNF? Q: What’s the slogan for BCNF? Q: Who are B & C? Q: What are the two types of violations? M.P. Johnson, DBMS, Stern/NYU, Spring 2005

M.P. Johnson, DBMS, Stern/NYU, Spring 2005 BCNF Review Q: How do we fix a non-BCNF relation? Q: If AsBs violates BCNF, what do we do? Q: In this case, could the decomposition be lossy? Q: Under what circumstances could a decomposition be lossy? Q: How do we combine two relations? M.P. Johnson, DBMS, Stern/NYU, Spring 2005