M.P. Johnson, DBMS, Stern/NYU, Sp20041 C20.0046: Database Management Systems Lecture #8 Matthew P. Johnson Stern School of Business, NYU Spring, 2004.

Slides:



Advertisements
Similar presentations
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Advertisements

ALAK ROY. Assistant Professor Dept. of CSE NIT Agartala N ATIONAL I NSTITUTE OF T ECHNOLOGY A GARTALA Aug-Dec,2010 Normalization 2 CSE-503 :: D ATABASE.
Lecture 07: Relational Algebra
Relational Algebra Ch. 7.4 – 7.6 John Ortiz. Lecture 4Relational Algebra2 Relational Query Languages  Query languages: allow manipulation and retrieval.
M.P. Johnson, DBMS, Stern/NYU, Sp20041 C : Database Management Systems Lecture #7 Matthew P. Johnson Stern School of Business, NYU Spring, 2004.
603 Database Systems Senior Lecturer: Laurie Webster II, M.S.S.E.,M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 6 A First Course in Database Systems.
Relational Algebra.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.
Midterm Review II. Redundancy. –Information may be repeated unnecessarily in several tuples. –E.g. length and filmType. Update anomalies. –We may change.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
603 Database Systems Senior Lecturer: Laurie Webster II, M.S.S.E.,M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 8 A First Course in Database Systems.
M.P. Johnson, DBMS, Stern/NYU, Spring C : Database Management Systems Lecture #9 M.P. Johnson Stern School of Business, NYU Spring, 2008.
Closure The closure of {B 1 …B k } under the set of FDs S, denoted by {B 1 …B k } +, is defined as follows: {B 1 …B k } + = {B | any relation satisfies.
M.P. Johnson, DBMS, Stern/NYU, Spring C : Database Management Systems Lecture #6 M.P. Johnson Stern School of Business, NYU Spring, 2008.
M.P. Johnson, DBMS, Stern/NYU, Sp20041 C : Database Management Systems Lecture #9 Matthew P. Johnson Stern School of Business, NYU Spring, 2004.
Functional Dependencies Definition: If two tuples agree on the attributes A, A, … A 12n then they must also agree on the attributes B, B, … B 12m Formally:
Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000.
The principal problem that we encounter is redundancy, where a fact is repeated in more than one tuple. Most common cause: attempts to group into one relation.
C : Database Management Systems Lecture #8
M.P. Johnson, DBMS, Stern/NYU, Spring C : Database Management Systems Lecture #9 Matthew P. Johnson Stern School of Business, NYU Spring, 2005.
Winter 2002Arthur Keller – CS 1804–1 Schedule Today: Jan. 15 (T) u Normal Forms, Multivalued Dependencies. u Read Sections Assignment 1 due. Jan.
M.P. Johnson, DBMS, Stern/NYU, Spring C : Database Management Systems Lecture #10 M.P. Johnson Stern School of Business, NYU Spring, 2008.
Operations in the Relational Model These operation can be expressed in an algebra, called “relational algebra”. In this algebra relations are the operands.
Fall 2001Arthur Keller – CS 1805–1 Schedule Today Oct. 9 (T) Multivalued Dependencies, Relational Algebra u Read Sections 3.7, Assignment 2 due.
1 Lecture 07: Relational Algebra. 2 Outline Relational Algebra (Section 6.1)
Multivalued Dependencies. Intuition Redundancy: addresses, title repeated several times –because a star might have several addresses and stars in several.
M.P. Johnson, DBMS, Stern/NYU, Spring C : Database Management Systems Lecture #7 Matthew P. Johnson Stern School of Business, NYU Spring,
M.P. Johnson, DBMS, Stern/NYU, Spring C : Database Management Systems Lecture #11 M.P. Johnson Stern School of Business, NYU Spring, 2005.
M.P. Johnson, DBMS, Stern/NYU, Sp20041 C : Database Management Systems Lecture #6 Matthew P. Johnson Stern School of Business, NYU Spring, 2004.
Fall 2001Arthur Keller – CS 1804–1 Schedule Today Oct. 4 (TH) Functional Dependencies and Normalization. u Read Sections Project Part 1 due. Oct.
Relational Schema Design (end) Relational Algebra Finally, querying the database!
Relation Decomposition A, A, … A 12n Given a relation R with attributes Create two relations R1 and R2 with attributes B, B, … B 12m C, C, … C 12l Such.
Ch 7: Normalization-Part 2 Much of the material presented in these slides was developed by Dr. Ramon Lawrence at the University of Iowa.
1 Database Systems Lecture #6 Yan Pan School of Software, SYSU 2011.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Multi-valued Dependencies and Fourth Normal Form
Normalization Goal = BCNF = Boyce-Codd Normal Form = all FD’s follow from the fact “key  everything.” Formally, R is in BCNF if for every nontrivial FD.
Copyright © Curt Hill Schema Refinement III 4 th NF and 5 th NF.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Design Theory for Relational Databases 2015, Fall Pusan National University Ki-Joune Li.
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
1 Introduction to Database Systems CSE 444 Lecture 20: Query Execution: Relational Algebra May 21, 2008.
Relational Algebra (Chapter 7)
Transactions, Relational Algebra, XML February 11 th, 2004.
CSE 544: Relational Operators, Sorting Wednesday, 5/12/2004.
1 Lecture 7: Normal Forms, Relational Algebra Monday, 10/15/2001.
© D. Wong Ch. 3 (continued)  Database design problems  Functional Dependency  Keys of relations  Decompositions based on Functional Dependency.
Relational Algebra 2. Relational Algebra Formalism for creating new relations from existing ones Its place in the big picture: Declartive query language.
© D. Wong Normalization  Purpose: process to eliminate redundancy in relations due to functional or multi-valued dependencies.  Decompose relation.
Operations in the Relational Model COP 4720 Lecture 8 Lecture Notes.
Normalization.
1 Multivalued Dependencies Fourth Normal Form Reasoning About FD’s + MVD’s.
Lecture 13: Relational Decomposition and Relational Algebra February 5 th, 2003.
3 Spring Chapter Normalization of Database Tables.
1 Lecture 10: Database Design and Relational Algebra Monday, October 20, 2003.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
Functional Dependencies CIS 4301 Lecture Notes Lecture 8 - 2/7/2006.
Multivalued Dependencies and 4th NF CIS 4301 Lecture Notes Lecture /21/2006.
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
© D. Wong Ch. 3 (part 1)  Relational Model basics  From E/R diagram to Relations.
4NF & MULTIVALUED DEPENDENCY By Kristina Miguel. Review  Superkey – a set of attributes which will uniquely identify each tuple in a relation  Candidate.
More on Decompositions and Third Normal Form CIS 4301 Lecture Notes Lecture /16/2006.
Relational Algebra.
3.1 Functional Dependencies
RDBMS RELATIONAL DATABASE MANAGEMENT SYSTEM.
Lecture 33: The Relational Model 2
Relational Algebra Friday, 11/14/2003.
Lecture 11: Functional Dependencies
Presentation transcript:

M.P. Johnson, DBMS, Stern/NYU, Sp20041 C : Database Management Systems Lecture #8 Matthew P. Johnson Stern School of Business, NYU Spring, 2004

M.P. Johnson, DBMS, Stern/NYU, Sp Agenda Last time: Normalization This time: 1. 4NF 2. Relational Algebra Pep talk  OHs today, drop-ins (80809)

M.P. Johnson, DBMS, Stern/NYU, Sp Normalization Review Q: What’s required for BCNF? Q: What are the two types of violations? Q: What’s the loophole for 3NF? Q: How do we fix a non-BCNF relation?

M.P. Johnson, DBMS, Stern/NYU, Sp Normalization Review Q: If As  Bs violates BCNF, what do we do?  Q: In this case, could the decomposition be lossy? Q: How do we combine two relations? Q: Can BCNF decomp. lose FDs? Q: Can 3NF decomp. lose FDs?

M.P. Johnson, DBMS, Stern/NYU, Sp New topic: MVDs (3.7) Consider this relation  People ~ their jobs ~ their residences  Person-address/city: many-many  Person-job: many-many  Address/city-job: independent Chappaqua333 Some StreetFirst Lady456Hilary Washington444 Embassy RowFirst Lady456Hilary New York111 East 60 th StreetCEO123Michael London222 Brompton RoadCEO123Michael 444 Embassy Row 333 Some Street 444 Embassy Row 333 Some Street 222 Brompton Road 111 East 60 th Street Streets Lawyer Senator Mayor Jobs Washington456Hilary Chappaqua789Hilary Washington789Hilary Chappaqua456Hilary London123Michael New York123Michael CitysSSNName

M.P. Johnson, DBMS, Stern/NYU, Sp Redundancy in BCNF Lots of redundancy! Key? All fields  None determined by others! Non-trivial FDs? None!  In BCNF? Yes! NameStreetsCitysJobs Michael111 East 60 th StreetNew YorkMayor Michael222 Brompton RoadLondonMayor Michael111 East 60 th StreetNew YorkCEO Michael222 Brompton RoadLondonCEO Hilary333 Some StreetChappaquaSenator Hilary444 Embassy RowWashingtonSenator Hilary333 Some StreetChappaquaFirst Lady Hilary444 Embassy RowWashingtonFirst Lady Hilary333 Some StreetChappaquaLawyer Hilary444 Embassy RowWashingtonLawyer Now what? New concept, leading to another normal form: Multivalued dependencies

M.P. Johnson, DBMS, Stern/NYU, Sp As  Bs if, when As are held fixed values in Bs are independent of values in rest More precisely: if t 1 and t 3 agree on As, we then can find t 2 such that t 2, t 2, t 3 agree on As t 2, t 1 agree of Bs t 2, t 3 agree on Cs MVD definition AsBsCs t1t1 AsBsCst2t2 AsBsCst3t3 |

M.P. Johnson, DBMS, Stern/NYU, Sp MVD example Claim: name  streets,cities If true: can pick arbitrary t 1, t 3 and find a t 2 We pick: first and last of Hilary’s tuples: Now: if true, can find another Hilary row with street/address of t 1 and job of t 3 LawyerWashington444 Embassy RowHilary JobsCitysStreetsName SenatorChappaqua333 Some StreetHilary t1t1 t3t3 LawyerChappaqua333 Some StreetHilary t2t2

M.P. Johnson, DBMS, Stern/NYU, Sp MVD example Now: if true, can find another Hilary row with street/address of t 1 and job of t 3 Sure enough: Hilary333 Some StreetChappaquaLawyer t2t2 NameStreetsCitysJobs Michael111 East 60 th StreetNew YorkMayor Michael222 Brompton RoadLondonMayor Michael111 East 60 th StreetNew YorkCEO Michael222 Brompton RoadLondonCEO Hilary333 Some StreetChappaquaSenator Hilary444 Embassy RowWashingtonSenator Hilary333 Some StreetChappaquaFirst Lady Hilary444 Embassy RowWashingtonFirst Lady Hilary333 Some StreetChappaquaLawyer Hilary444 Embassy RowWashingtonLawyer t2t2

M.P. Johnson, DBMS, Stern/NYU, Sp MVD rules No splitting rule:  In the example, name  streets,cities  Do we have name  streets?  No: 444 Embassy Row doesn’t go with Chappaqua  NB: City doesn’t determine street – could have >1 house But city, street aren’t independent NameStreetsCitysJobs Hilary333 Some StreetChappaquaSenator Hilary444 Embassy RowWashingtonLawyer t1t1 t3t3

M.P. Johnson, DBMS, Stern/NYU, Sp MVD rules Trivial dependencies:  As  Bs iff As  BsA i Transitive rule:  As  Bs, Bs  Cs  As  Cs Complementation rule:  As  Bs  As  rest  Intuition: if each value in Bs is assoc’ed w/each value in rest, then each value of rest is assoc’ed w/each value in Bs NameStreetsCitysJobs Michael111 East 60 th StreetNew YorkMayor Michael222 Brompton RoadLondonMayor Michael111 East 60 th StreetNew YorkCEO Michael222 Brompton RoadLondonCEO

M.P. Johnson, DBMS, Stern/NYU, Sp MVDs and FDs MVD is a generalization of FD Every FD is an MVD Pf: Suppose As  Bs Pick t 1, t 3 that agree on As. Must find a t 2. Let t 2 be t 3. Then1) t 2 agrees on As with both 2) t 2 agrees on Bs with t 1 (why?) 3) t 2 agrees on rest with t 3 (why?) QED

M.P. Johnson, DBMS, Stern/NYU, Sp Fourth Normal Form 4NF: like BCNF, but with MVDs not FDs An MVD As  Bs is nontrivial if  No Bs are As  Some attributes left over (why?) 4NF: for every nontrivial MVD As  Bs, As is a superkey In example name  streets,cities, but name isn’t a superkey NameStreetsCitysJobs Hilary333 Some StreetChappaquaSenator Hilary444 Embassy RowWashingtonLawyer

M.P. Johnson, DBMS, Stern/NYU, Sp Decomposition to 4NF Again, analogous to BCNF If we can find As  Bs for R where As isn’t a superkey, replace R with R 1 (As,Bs) and R 2 (As,rest) Running example: name  streets,cities  People(name,streets,cities,jobs) becomes Residences(name,street,city) and Employment(name,job)

M.P. Johnson, DBMS, Stern/NYU, Sp NF: another construal In nontrivial As  Bs, As must be superkey After df of 4NF, text says: “That is, … every nontrivial MVD is really a FD with a superkey on the left” (p123). We know: FDs are* MVDs but not vice versa So: Why does this follow? Is it true? Yes. As is a superkey  As  everything  As  Bs  the MVD is an FD Two kinds of MVDs: FDs and “true” MVDs 4NF eliminates exactly the true ones * The typo swapping these was fixed.

M.P. Johnson, DBMS, Stern/NYU, Sp Summary of normal forms Guaranteed to3NFBCFN4NF Eliminate FD redundancy MostlyYes Eliminate MVD redundancy No Yes Preserve FDsYesNo Preserve MVDsNo

M.P. Johnson, DBMS, Stern/NYU, Sp Combined isa/weak example Exercise  Convert from E/R to R, by E/R, OO and nulls courses Lab- courses Depts Computer- allocation room number givenBy name chair isa

M.P. Johnson, DBMS, Stern/NYU, Sp Next topic: relational algebra (5.1-2) Set operations: union, intersection, difference Projection, selection Cartesian Product Joins: natural joins, theta joins Combining operations to form queries Dependent and independent operations

M.P. Johnson, DBMS, Stern/NYU, Sp What is relational algebra? An algebra for relations “High-school” algebra is an algebra for numbers Formalism for constructing expressions  Operations  Operands: Variables, Constants, expressions Expressions:  Vars & constants  Operators applied to expressions AlgebraVars/constsOperators High-schoolNumbers+ * - / etc. RelationalRelations (=sets of tupes) union, intersection, join, etc.

M.P. Johnson, DBMS, Stern/NYU, Sp Why do we care about relational algebra? Why construct expressions on relations? The exprs are the form questions about the take The relations these exprs cash out to are the answers to our questions First proof of RDBMS/RA concept: System R (1979) Modern implementation of RA: SQL

M.P. Johnson, DBMS, Stern/NYU, Sp Relation operators Five basic operators:  Union:   Intersection:  Difference: -  Selection:   Projection:   Cartesian Product:  Derived/auxiliary operators:  Intersection, complement  Joins (natural, equijoin, theta join, semijoin)  Renaming: 

M.P. Johnson, DBMS, Stern/NYU, Sp Operators Relations are sets  have set-theoretic ops  Venn diagrams Union: R1  R2 Example:  ActiveEmployees  RetiredEmployees Difference: R1 – R2 Example:  AllEmployees – RetiredEmployees = ActiveEmployees

M.P. Johnson, DBMS, Stern/NYU, Sp Set operations - example NameAddressGenderBirthdate Fisher123 MapleF9/9/99 Hamill456 OakM8/8/88 NameAddressGenderBirthdate Fisher123 MapleF9/9/99 Ford345 PalmM7/7/77 R: S: NameAddressGenderBirthdate Fisher123 MapleF9/9/99 Hamill456 OakM8/8/88 Ford345 PalmM7/7/77 R  S:

M.P. Johnson, DBMS, Stern/NYU, Sp Set operations - example NameAddressGenderBirthdate Fisher123 MapleF9/9/99 Hamill456 OakM8/8/88 NameAddressGenderBirthdate Fisher123 MapleF9/9/99 Ford345 PalmM7/7/77 R: S: R - S: NameAddressGenderBirthdate Hamill456 OakM8/8/88

M.P. Johnson, DBMS, Stern/NYU, Sp Operators Intersection: R1  R2 Example:  UnionizedEmployees  RetiredEmployees Intersection can be derived from  and –  R1  R2 = R1 – (R1 – R2)  R1  R2 = -(-R1  -R2) (allowed?)

M.P. Johnson, DBMS, Stern/NYU, Sp Set operations - example NameAddressGenderBirthdate Fisher123 MapleF9/9/99 Hamill456 OakM8/8/88 NameAddressGenderBirthdate Fisher123 MapleF9/9/99 Ford345 PalmM7/7/77 R: S: R  S: NameAddressGenderBirthdate Fisher123 MapleF9/9/99

M.P. Johnson, DBMS, Stern/NYU, Sp Operators Selection Selects all tuples satisfying a condition Notation:  c (R) Examples   salary > (Employee)   name = “Smith” (Employee) The condition c can have  comparison ops:=,, , <>  boolean ops: and, or

M.P. Johnson, DBMS, Stern/NYU, Sp Selection example Select the movies at Angelica:  Theater=“Angelica” (Showings) City of GodVillageFilm Forum Village N’hood Fog of War City of God Title Angelica Theater Village N’hood Fog of War City of God Title Angelica Theater

M.P. Johnson, DBMS, Stern/NYU, Sp Operators Projection: op we used for decomposition  Eliminates columns, then removes duplicates Notation:  A1,…,An (R)

M.P. Johnson, DBMS, Stern/NYU, Sp Operators Cartesian Product  Cross product Each tuple in R 1 combines w/each tuple in R 2 Notation: R 1  R 2 If R1, R2 fields overlap, include both and disambiguate: R1.A, R2.A Fairly rare in practice  used to express joins Q: Where does the name come from? Q: If R1 has n1 rows and R2 has n2, how large is R1 x R2?

M.P. Johnson, DBMS, Stern/NYU, Sp Cartesian product example StreetCity 333 Some StreetChappaqua 444 Embassy RowWashington 333 Some StreetChappaqua Hillary-addresses Job Senator First Lady Lawyer Hillary-jobs StreetCityJob 333 Some StreetChappaquaSenator 444 Embassy RowWashingtonSenator 333 Some StreetChappaquaFirst Lady 444 Embassy RowWashingtonFirst Lady 333 Some StreetChappaquaLawyer 444 Embassy RowWashingtonLawyer Hillary-addresses x Hillary-jobs

M.P. Johnson, DBMS, Stern/NYU, Sp Operators Natural join: our join up to now  But always merging shared attributes Notation: R1 ⋈ R2 Meaning: R 1 ⋈ R 2 =  every att once (  shared atts = (R 1  R 2 )) I.e., first compute the cross product R 1 x R 2 Next, select the rows in which shared fields agree Finally, project onto the union of R 1 and R 2 ’s fields (remove duplicates)

M.P. Johnson, DBMS, Stern/NYU, Sp Natural join example NameStreetCity Hilary333 Some StreetChappaqua Hilary444 Embassy RowWashington Hilary333 Some StreetChappaqua Addresses NameJob HilarySenator HilaryFirst Lady HilaryLawyer Jobs Addresses ⋈ Jobs NameStreetCityJob Hilary333 Some StreetChappaquaSenator Hilary444 Embassy RowWashingtonSenator Hilary333 Some StreetChappaquaFirst Lady Hilary444 Embassy RowWashingtonFirst Lady Hilary333 Some StreetChappaquaLawyer Hilary444 Embassy RowWashingtonLawyer

M.P. Johnson, DBMS, Stern/NYU, Sp Natural Join R S R ⋈ S= ? Unpaired tuples called dangling AB XY XZ YZ ZV BC ZU VW ZV

M.P. Johnson, DBMS, Stern/NYU, Sp Natural Join Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R ⋈ S ? Given R(A, B, C), S(D, E), what is R ⋈ S? Given R(A, B), S(A, B), what is R ⋈ S?

M.P. Johnson, DBMS, Stern/NYU, Sp Theta Join Like natural join, but  includes only rows that satisfy arbitrary condition  Does not project away shared attributes R 1 ⋈  R 2 =   (R 1  R 2 ) Here  can be any condition If condition is always satisfies, then theta join becomes natural join

M.P. Johnson, DBMS, Stern/NYU, Sp Theta-join example ABC BCD AU.BU.CV.BV.CD UV U V A<D

M.P. Johnson, DBMS, Stern/NYU, Sp Equijoin A theta join where  is an equality R1 ⋈ A=B R2 =  A=B (R1  R2)  = lower-case sigma Example:  Employee ⋈ SSN=SSN Dependents Most useful join in practice

M.P. Johnson, DBMS, Stern/NYU, Sp Semijoin R ⋉ S =  {atts of R} (R ⋈ S) Q: What does this mean?  Natural join of R and S;  Then project onto R’s atts A: The rows of R for which >1 row in S agree on shared atts

M.P. Johnson, DBMS, Stern/NYU, Sp Semijoin example SSNName... DSSNDnameSSN... Employee Dependents network Employee ⋉ Dependents = { employees who have dependents} Employee ⋉ Dependents = { employees who have dependents}

M.P. Johnson, DBMS, Stern/NYU, Sp Renaming Changes the schema, not the instance Notation:  B1,…,Bn (R)  is spelled “rho”, pronounced “row” Example:  Employee(ssn,name)    social, name) (Employee)  Or just:   (Employee)

M.P. Johnson, DBMS, Stern/NYU, Sp Complex RA Expressions Q: How long was Star Wars (1977)? Strategy: find the row with Star Wars; then project the length field TitleYearLengthinColorStudioPrdcr# Star Wars TrueFox12345 M.Ducks TrueDisney67890 W.World199295TrueParamount99999

M.P. Johnson, DBMS, Stern/NYU, Sp Combining operations Schema: Movies (Title, year, length, filmType, studioName) Query: select titles and years of movies by Fox that are at least 100 minutes long. TitleYearLengthFilmtypeStudio Star wars ColorFox Mighty ducks ColorDisney Wayne’s world199285ColorParamount

M.P. Johnson, DBMS, Stern/NYU, Sp Complex RA Expressions Reps(ssn, name, etc.) Clients(ssn, name, rssn) Q: Find George’s client names  Clients.name (  Reps.name=George (  Reps.ssn=rssn ( Reps x Clients))) Or:  Clients.name (  Reps.name=George and Reps.ssn=rssn (Reps x Clients)) Or:  Clients.name (  Reps.name=George (Reps x Clients)   Reps.ssn=rssn (Reps x Clients))

M.P. Johnson, DBMS, Stern/NYU, Sp For next time Finish chapter 5 Come to office hours!