M.P. Johnson, DBMS, Stern/NYU, Spring C : Database Management Systems Lecture #6 M.P. Johnson Stern School of Business, NYU Spring, 2008
M.P. Johnson, DBMS, Stern/NYU, Spring Agenda Receive proj1 Basic SQL RA…
M.P. Johnson, DBMS, Stern/NYU, Spring Recap: You are here First part of course is done: conceptual foundations You now know: E/R Model Relational Model Relational Algebra You now know how to: Capture part of world as an E/R model Convert E/R models to relational models Convert relational models to good (normal) forms Next: Create, update, query tables with R.A/SQL Write SQL/DB-connected applications
M.P. Johnson, DBMS, Stern/NYU, Spring minute Normalization Review 1. Q: What’s required for BCNF? 2. Q: How do we fix a non-BCNF relation? 3. Q: If As Bs violates BCNF, what do we do? 4. Q: Can BCNF decomposition ever be lossy? 5. Q: How do we combine two relations? 6. Q: Can BCNF decomp. lose FDs? 7. Q: Why would you ever use 3NF?
M.P. Johnson, DBMS, Stern/NYU, Spring Normalization example: bookstore
M.P. Johnson, DBMS, Stern/NYU, Spring Normalization example: bookstore Orders tbl: key = ordernum,isbn Orders tbl FDs:
M.P. Johnson, DBMS, Stern/NYU, Spring Next topic: SQL Standard language for querying and manipulating data Structured Query Language Many standards: ANSI SQL, SQL92/SQL2, SQL3/SQL99 Originally: Structured English Query Language (SEQUEL) Vendors support various subsets/extensions We’ll do Oracle/MySQL/generic “No one ever got fired for buying Oracle.” Basic form (many more bells and whistles in addition): SELECT attributes FROM relations (possibly multiple, joined) WHERE conditions (selections) SELECT attributes FROM relations (possibly multiple, joined) WHERE conditions (selections)
M.P. Johnson, DBMS, Stern/NYU, Spring Data Types in SQL Characters: CHAR(20)-- fixed length VARCHAR(40)-- variable length Numbers: BIGINT, INT, SMALLINT, TINYINT REAL, FLOAT -- differ in precision MONEY Times and dates: DATE DATETIME-- SQL Server
M.P. Johnson, DBMS, Stern/NYU, Spring “Tables” PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi Product Attribute names Table name Tuples or rows
M.P. Johnson, DBMS, Stern/NYU, Spring Simple SQL Query PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi SELECT * FROM Product WHERE category='Gadgets' Product PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks “selection”
M.P. Johnson, DBMS, Stern/NYU, Spring Simple SQL Query PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi SELECT PName, Price, Manufacturer FROM Product WHERE Price > 100 Product PNamePriceManufacturer SingleTouch$149.99Canon MultiTouch$203.99Hitachi “selection” and “projection”
M.P. Johnson, DBMS, Stern/NYU, Spring A Notation for SQL Queries SELECT Name, Price, Manufacturer FROM Product WHERE Price > 100 Product(PName, Price, Category, Manfacturer) (PName, Price, Manfacturer) Input Relation Output Relation
M.P. Johnson, DBMS, Stern/NYU, Spring The WHERE clause Contains a boolean expression Teach literal is a test: x = y, x < y, x <= y, etc. For numbers, they have the usual meanings For CHARs/VARCHARs: lexicographic ordering Expected conversion between CHAR and VARCHAR For dates and times, what you expect
M.P. Johnson, DBMS, Stern/NYU, Spring Complex RA Expressions Schema: Movies (Title, year, length, inColor, studioName, Prdcr#) Q: How long was Star Wars (1977)? Strategy: find the row with Star Wars; then project the length field TitleYearLengthinColorStudioPrdcr# Star Wars TrueFox12345 M.Ducks TrueDisney67890 W.World199295TrueParamount99999
M.P. Johnson, DBMS, Stern/NYU, Spring Combining operations Query: Which Fox moves were >= 100 minutes long? TitleYearLengthFilmtypeStudio Star wars ColorFox Mighty ducks ColorDisney Wayne’s world199285ColorParamount
M.P. Johnson, DBMS, Stern/NYU, Spring Operators Cross product again “Cartesian Product” Each tuple in R 1 combines w/each tuple in R 2 Algebraic notation: R 1 R 2 Not actual SQL! If R1, R2 fields overlap, include both and disambiguate: R1.A, R2.A Q: Where does the name come from? Q: If R1 has n1 rows and R2 has n2, how large is R1 x R2?
M.P. Johnson, DBMS, Stern/NYU, Spring Cartesian product example StreetCity 333 Some StreetChappaqua 444 Embassy RowWashington Hillary-addresses Job Senator First Lady Lawyer Hillary-jobs StreetCityJob 333 Some StreetChappaquaSenator 444 Embassy RowWashingtonSenator 333 Some StreetChappaquaFirst Lady 444 Embassy RowWashingtonFirst Lady 333 Some StreetChappaquaLawyer 444 Embassy RowWashingtonLawyer Hillary-addresses x Hillary-jobs
M.P. Johnson, DBMS, Stern/NYU, Spring Operators Natural join: our join up to now merging shared attributes Algebraic notation: R1 R2 SQL query: a = shared fields SELECT * FROM R1,R2 WHERE R1.a = R2.a
M.P. Johnson, DBMS, Stern/NYU, Spring Natural join example NameStreetCity Hilary333 Some StreetChappaqua Hilary444 Embassy RowWashington BillSomewhere elseddd Addresses NameJob HilarySenator HilaryFirst Lady HilaryLawyer Jobs Addresses Jobs NameStreetCityJob Hilary333 Some StreetChappaquaSenator Hilary444 Embassy RowWashingtonSenator Hilary333 Some StreetChappaquaFirst Lady Hilary444 Embassy RowWashingtonFirst Lady Hilary333 Some StreetChappaquaLawyer Hilary444 Embassy RowWashingtonLawyer
M.P. Johnson, DBMS, Stern/NYU, Spring Natural Join R S R S= ? Unpaired tuples called dangling AB XY XZ YZ ZV BC ZU VW ZV
M.P. Johnson, DBMS, Stern/NYU, Spring Natural Join Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R S ? Given R(A, B, C), S(D, E), what is R S? Given R(A, B), S(A, B), what is R S?
M.P. Johnson, DBMS, Stern/NYU, Spring Join on arbitrary test ABC BCD AU.BU.CV.BV.CD UV U V A<D “Theta-join”
M.P. Johnson, DBMS, Stern/NYU, Spring Next (parallel) topic: relational algebra Projection Selection Cartesian Product Joins: natural joins, theta joins Set operations: union, intersection, difference Combining operations to form queries Dependent and independent operations
M.P. Johnson, DBMS, Stern/NYU, Spring What is relational algebra? An algebra for relations “High-school” algebra: an algebra for numbers Algebra = formalism for constructing expressions Operations Operands: Variables, Constants, expressions Expressions: Vars & constants Operators applied to expressions They evaluate to values AlgebraVars/constsOperatorsEval to High-schoolNumbers+ * - / etc.Numbers RelationalRelations (=sets of tupes) union, intersection, join, etc. Relations
M.P. Johnson, DBMS, Stern/NYU, Spring Why do we care about relational algebra? 1. The exprs are the form that questions about the data take The relations these exprs cash out to are the answers to our questions 2. RA ~ more succinct rep. of many SQL queries 3. DBMS parse SQL into something like RA First proofs of concept for RDBMS/RA: System R at IBM Ingress at Berkeley “Modern” implementation of RA: SQL Both state of the art, mid-70s
M.P. Johnson, DBMS, Stern/NYU, Spring Relation operators Basic operators: Selection: Projection: Cartesian Product: Other set-theoretic ops: Union: Intersection: Difference: - Additional operators: Joins (natural, equijoin, theta join, semijoin) Renaming: Grouping…
M.P. Johnson, DBMS, Stern/NYU, Spring Selection op Selects all tuples satisfying a condition Notation: c (R) Examples salary > (Employee) name = “Smith” (Employee) The condition c can have comparison ops:=,, , <> boolean ops: and, or
M.P. Johnson, DBMS, Stern/NYU, Spring Selection example Select the movies at Angelica: Theater=“Sunshine” (Showings) Masc. Fem.VillageFilm Forum Village N’hood Bad Edu. Annie Hall Title Sunshine Theater Village N’hood Bad Edu. Annie Hall Title Sunshine Theater
M.P. Johnson, DBMS, Stern/NYU, Spring Projection op Keep only certain columns Projection: op we used for decomposition Eliminates other columns, then removes duplicates Notation: A1,…,An (R)
M.P. Johnson, DBMS, Stern/NYU, Spring Join op Corresponds to SQL query doing cross & equality test Specifically: R 1 R 2 = every att once ( shared atts = (R 1 R 2 )) I.e., first compute the cross product R 1 x R 2 Next, select the rows in which shared fields agree Finally, project onto the union of R 1 and R 2 ’s fields (remove duplicates)
M.P. Johnson, DBMS, Stern/NYU, Spring Rename op Changes the schema, not the instance Notation: B1,…,Bn (R) is spelled “rho”, pronounced “row” Example: Employee(ssn,name) social, name) (Employee) Or just: (Employee)
M.P. Johnson, DBMS, Stern/NYU, Spring RA SQL SQL SELECT RA Projection SQL WHERE RA Selection SQL FROM RA Join/cross Comma-separated list… SQL renaming RA rho More ops later Keep RA in the back of your mind…
M.P. Johnson, DBMS, Stern/NYU, Spring Review Examples from sqlzoo.netsqlzoo.net SELECT L FROM R 1, …, R n WHERE C SELECT L FROM R 1, …, R n WHERE C L ( C (R 1 x … R n )