M.P. Johnson, DBMS, Stern/NYU, Spring C : Database Management Systems Lecture #9 Matthew P. Johnson Stern School of Business, NYU Spring, 2005
M.P. Johnson, DBMS, Stern/NYU, Spring You don’t have the right to remain silent.
M.P. Johnson, DBMS, Stern/NYU, Spring Our long national nightmare is over.
M.P. Johnson, DBMS, Stern/NYU, Spring Economic man
M.P. Johnson, DBMS, Stern/NYU, Spring second Normalization Review Q: What’s required for BCNF? Q: How do we fix a non-BCNF relation? Q: If As Bs violates BCNF, what do we do? Q: In this case, could the decomposition be lossy? Q: How do we combine two relations? Q: Can BCNF decomp. lose FDs? Q: Can 3NF decomp. lose FDs?
M.P. Johnson, DBMS, Stern/NYU, Spring Recap: You are here First part of course is done: conceptual foundations You now know: E/R Model Relational Model Relational Algebra You now know how to: Capture part of world as an E/R model Convert E/R models to relational models Convert relational models to good (normal) forms Next: Create, update, query tables with R.A/SQL Write SQL/DB-connected applications
M.P. Johnson, DBMS, Stern/NYU, Spring Next topic: relational algebra Projection, selection Cartesian Product Joins: natural joins, theta joins Set operations: union, intersection, difference Combining operations to form queries Dependent and independent operations
M.P. Johnson, DBMS, Stern/NYU, Spring What is relational algebra? An algebra for relations “High-school” algebra is an algebra for numbers Formalism for constructing expressions Operations Operands: Variables, Constants, expressions Expressions: Vars & constants Operators applied to expressions They evaluate to values AlgebraVars/constsOperatorsEval to High-schoolNumbers+ * - / etc.Numbers RelationalRelations (=sets of tupes) union, intersection, join, etc. Relations
M.P. Johnson, DBMS, Stern/NYU, Spring Why do we care about relational algebra? Why construct expressions on relations? The exprs are the form questions about the take The relations these exprs cash out to are the answers to our questions First proof of RDBMS/RA concept: System R “Modern” implementation of RA: SQL Both state of the art, 1970s
M.P. Johnson, DBMS, Stern/NYU, Spring Relation operators Five basic operators: Selection: Projection: Cartesian Product: Union: Intersection: Difference: - Derived/auxiliary operators: Intersection, complement Joins (natural, equijoin, theta join, semijoin) Renaming:
M.P. Johnson, DBMS, Stern/NYU, Spring Operators - Selection Selects all tuples satisfying a condition Notation: c (R) Examples salary > (Employee) name = “Smith” (Employee) The condition c can have comparison ops:=,, , <> boolean ops: and, or
M.P. Johnson, DBMS, Stern/NYU, Spring Selection example Select the movies at Angelica: Theater=“Sunshine” (Showings) Masc. Fem.VillageFilm Forum Village N’hood Bad Edu. Annie Hall Title Sunshine Theater Village N’hood Bad Edu. Annie Hall Title Sunshine Theater
M.P. Johnson, DBMS, Stern/NYU, Spring Operators - Projection Keep only certain columns Projection: op we used for decomposition Eliminates other columns, then removes duplicates Notation: A1,…,An (R)
M.P. Johnson, DBMS, Stern/NYU, Spring Next topic: SQL Standard language for querying and manipulating data Structured Query Language Many standards: ANSI SQL, SQL92/SQL2, SQL3/SQL99 Vendors support various subsets/extensions We’ll do SQL99/Oracle/MySQL “No one ever got fired for buying Oracle.” Basic form (many more bells and whistles in addition): SELECT attributes FROM relations (possibly multiple, joined) WHERE conditions (selections) SELECT attributes FROM relations (possibly multiple, joined) WHERE conditions (selections)
M.P. Johnson, DBMS, Stern/NYU, Spring “Tables” PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi Product Attribute names Table name Tuples or rows
M.P. Johnson, DBMS, Stern/NYU, Spring Data Types in SQL Characters: CHAR(20)-- fixed length VARCHAR(40)-- variable length Numbers: BIGINT, INT, SMALLINT, TINYINT REAL, FLOAT -- differ in precision MONEY Times and dates: DATE DATETIME-- SQL Server
M.P. Johnson, DBMS, Stern/NYU, Spring Simple SQL Query PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi SELECT * FROM Product WHERE category=‘Gadgets’ Product PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks “selection”
M.P. Johnson, DBMS, Stern/NYU, Spring Simple SQL Query PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi SELECT PName, Price, Manufacturer FROM Product WHERE Price > 100 Product PNamePriceManufacturer SingleTouch$149.99Canon MultiTouch$203.99Hitachi “selection” and “projection”
M.P. Johnson, DBMS, Stern/NYU, Spring A Notation for SQL Queries SELECT Name, Price, Manufacturer FROM Product WHERE Price > 100 Product(PName, Price, Category, Manfacturer) (PName, Price, Manfacturer) Input Schema Output Schema
M.P. Johnson, DBMS, Stern/NYU, Spring R.A. SQL R.A. Projection SQL SELECT R.A. Selection SQL WHERE R.A. Join SQL FROM Comma-separated list… What goes in the WHERE clause: x = y, x < y, x <= y, etc. For numbers, they have the usual meanings For CHARs/VARCHARs: lexicographic ordering Expected conversion between CHAR and VARCHAR For dates and times, what you expect
M.P. Johnson, DBMS, Stern/NYU, Spring Complex RA Expressions Q: How long was Star Wars (1977)? Strategy: find the row with Star Wars; then project the length field TitleYearLengthinColorStudioPrdcr# Star Wars TrueFox12345 M.Ducks TrueDisney67890 W.World199295TrueParamount99999
M.P. Johnson, DBMS, Stern/NYU, Spring Combining operations Schema: Movies (Title, year, length, filmType, studioName) Query: select titles and years of movies by Fox that are at least 100 minutes long. TitleYearLengthFilmtypeStudio Star wars ColorFox Mighty ducks ColorDisney Wayne’s world199285ColorParamount
M.P. Johnson, DBMS, Stern/NYU, Spring Operators Cartesian Product Cross product Each tuple in R 1 combines w/each tuple in R 2 Notation: R 1 R 2 If R1, R2 fields overlap, include both and disambiguate: R1.A, R2.A Q: Where does the name come from? Q: If R1 has n1 rows and R2 has n2, how large is R1 x R2?
M.P. Johnson, DBMS, Stern/NYU, Spring Cartesian product example StreetCity 333 Some StreetChappaqua 444 Embassy RowWashington Hillary-addresses Job Senator First Lady Lawyer Hillary-jobs StreetCityJob 333 Some StreetChappaquaSenator 444 Embassy RowWashingtonSenator 333 Some StreetChappaquaFirst Lady 444 Embassy RowWashingtonFirst Lady 333 Some StreetChappaquaLawyer 444 Embassy RowWashingtonLawyer Hillary-addresses x Hillary-jobs
M.P. Johnson, DBMS, Stern/NYU, Spring Operators Natural join: our join up to now But always merging shared attributes Notation: R1 R2 Meaning: R 1 R 2 = every att once ( shared atts = (R 1 R 2 )) I.e., first compute the cross product R 1 x R 2 Next, select the rows in which shared fields agree Finally, project onto the union of R 1 and R 2 ’s fields (remove duplicates)
M.P. Johnson, DBMS, Stern/NYU, Spring Natural join example NameStreetCity Hilary333 Some StreetChappaqua Hilary444 Embassy RowWashington BillSomewhere elseddd Addresses NameJob HilarySenator HilaryFirst Lady HilaryLawyer Jobs Addresses Jobs NameStreetCityJob Hilary333 Some StreetChappaquaSenator Hilary444 Embassy RowWashingtonSenator Hilary333 Some StreetChappaquaFirst Lady Hilary444 Embassy RowWashingtonFirst Lady Hilary333 Some StreetChappaquaLawyer Hilary444 Embassy RowWashingtonLawyer
M.P. Johnson, DBMS, Stern/NYU, Spring Natural Join R S R S= ? Unpaired tuples called dangling AB XY XZ YZ ZV BC ZU VW ZV
M.P. Johnson, DBMS, Stern/NYU, Spring Natural Join Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R S ? Given R(A, B, C), S(D, E), what is R S? Given R(A, B), S(A, B), what is R S?
M.P. Johnson, DBMS, Stern/NYU, Spring Theta Join Like natural join, but includes only rows that satisfy arbitrary condition Does not project away shared attributes R 1 R 2 = (R 1 R 2 ) Here can be any condition If condition is always satisfies, then theta join becomes natural join
M.P. Johnson, DBMS, Stern/NYU, Spring Theta-join example ABC BCD AU.BU.CV.BV.CD UV U V A<D
M.P. Johnson, DBMS, Stern/NYU, Spring Equijoin A theta join where is an equality R1 A=B R2 = A=B (R1 R2) = lower-case Greek sigma Example: Employee SSN=SSN Dependents Common join in practice
M.P. Johnson, DBMS, Stern/NYU, Spring Semijoin R S = {atts of R} (R S) Q: What does this mean? Natural join of R and S; Then project onto R’s atts A: The rows of R for which >1 row in S agree on shared atts
M.P. Johnson, DBMS, Stern/NYU, Spring Semijoin example SSNName... DSSNDnameSSN... Employee Dependents network Employee Dependents = { employees who have dependents} Employee Dependents = { employees who have dependents}
M.P. Johnson, DBMS, Stern/NYU, Spring Renaming Changes the schema, not the instance Notation: B1,…,Bn (R) is spelled “rho”, pronounced “row” Example: Employee(ssn,name) social, name) (Employee) Or just: (Employee)
M.P. Johnson, DBMS, Stern/NYU, Spring Joins in SQL Connect two or more tables: PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi Product Company CNameStockPriceCountry GizmoWorks25USA Canon65Japan Hitachi15Japan What is the connection between them?
M.P. Johnson, DBMS, Stern/NYU, Spring Joins in SQL Product (pname, price, category, manufacturer) Company (cname, stockPrice, country) Find all products under $200 manufactured in Japan; return their names and prices. SELECT PName, Price FROM Product, Company WHERE Manufacturer=CName AND Country=‘Japan’ AND Price <= 200 Join between Product and Company
M.P. Johnson, DBMS, Stern/NYU, Spring Joins in SQL PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi Product Company CnameStockPriceCountry GizmoWorks25USA Canon65Japan Hitachi15Japan PNamePrice SingleTouch$ SELECT PName, Price FROM Product, Company WHERE Manufacturer=CName AND Country=‘Japan’ AND Price <= 200
M.P. Johnson, DBMS, Stern/NYU, Spring Joins in SQL Product (pname, price, category, manufacturer) Company (cname, stockPrice, country) Find all countries that manufacture some product in the ‘Gadgets’ category. SELECTCountry FROMProduct, Company WHEREManufacturer=CName AND Category=‘Gadgets’
M.P. Johnson, DBMS, Stern/NYU, Spring Joins in SQL NamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi Product Company CnameStockPriceCountry GizmoWorks25USA Canon65Japan Hitachi15Japan Country ?? What is the problem? What’s the solution? SELECT Country FROM Product, Company WHERE Manufacturer=CName AND Category=‘Gadgets’
M.P. Johnson, DBMS, Stern/NYU, Spring Joins Product (pname, price, category, manufacturer) Purchase (buyer, seller, store, product) Person(name, phone, city) Find names of Seattleites who bought Gadgets, and the names of the stores they bought such product from. SELECT DISTINCT name, store FROM Person, Purchase, Product WHERE persname=buyer AND product = pname AND city=‘Seattle’ AND category=‘Gadgets’
M.P. Johnson, DBMS, Stern/NYU, Spring Review Examples from sqlzoo.netsqlzoo.net SELECT L FROM R 1, …, R n WHERE C SELECT L FROM R 1, …, R n WHERE C L ( C (R 1 x … R n )