Matthew P. Johnson, OCL4, CISDD CUNY, Sept OCL4 Oracle 10g: SQL & PL/SQL Session #3 Matthew P. Johnson CISDD, CUNY June, 2005
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Agenda Review Lab 2 SQL Lab 3 SQL Lab 4
Matthew P. Johnson, OCL4, CISDD CUNY, Sept High-level design strategy Person buys Product name pricenamessn Conceptual Model: Relational Model: plus FD’s Normalization: Eliminates anomalies
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Functional dependencies Definition: Notation: Read: A i functionally determines B j If two tuples agree on the attributes A 1, A 2, …, A n then they must also agree on the attributes B 1, B 2, …, B m A 1, A 2, …, A n B 1, B 2, …, B m
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Typical Examples of FDs Product name price, manufacturer Person ssn name, age father’s/husband’s-name last-name zipcode state phone state (notwithstanding inter-state area codes) Company name stockprice, president symbol name name symbol
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Example of anomalies Redundancy: name, maddress Update anomaly: Bill moves Delete anom.: Bill doesn’t pay bills, lose phones lose Bill! Insert anom: can’t insert someone without a (non-null) phone Underlying cause: SSN-phone is many-many Effect: partial dependency ssn name, maddress, Whereas key = {ssn,phone} NameSSNMailing-addressPhone Michael123NY Michael123NY Hilary456DC Hilary456DC Bill789Chappaqua Bill789Chappaqua SSN Name, Mailing-address SSN Phone
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Most important: BCNF A simple condition for removing anomalies from relations: I.e.: The left side must always contain a key I.e: If a set of attributes determines other attributes, it must determine all the attributes A relation R is in BCNF if: If As Bs is a non-trivial dependency in R, then As is a superkey for R A relation R is in BCNF if: If As Bs is a non-trivial dependency in R, then As is a superkey for R Codd: Ted Codd, IBM researcher, inventor of relational model, 1970 Boyce: Ray Boyce, IBM researcher, helped develop SQL in the 1970s
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Boyce-Codd Normal Form Name/phone example is not BCNF: {ssn,phone} is key FD: ssn name,mailing-address holds Violates BCNF: ssn is not a superkey Its decomposition is BCNF Only superkeys anything else NameSSNMailing-addressPhone Michael123NY Michael123NY NameSSNMailing-address Michael123NY SSNPhoneNumber
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Lab 2
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Spooling Review lab 1 SQL
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Joins in SQL Connect two or more tables: PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi Product Company CNameStockPriceCountry GizmoWorks25USA Canon65Japan Hitachi15Japan What is the connection between them?
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Joins in SQL Product (pname, price, category, manufacturer) Company (cname, stockPrice, country) Find all products under $200 manufactured in Japan; return their names and prices. SELECT PName, Price FROM Product, Company WHERE Manufacturer=CName AND Country='Japan' AND Price <= 200 Join between Product and Company
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Joins in SQL PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi Product Company CnameStockPriceCountry GizmoWorks25USA Canon65Japan Hitachi15Japan PNamePrice SingleTouch$ SELECT PName, Price FROM Product, Company WHERE Manufacturer=CName AND Country='Japan' AND Price <= 200
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Joins in SQL Product (pname, price, category, manufacturer) Company (cname, stockPrice, country) Find all countries that manufacture some product in the ‘Gadgets’ category. SELECT Country FROM Product, Company WHERE Manufacturer=CName AND Category='Gadgets'
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Joins in SQL NamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi Product Company CnameStockPriceCountry GizmoWorks25USA Canon65Japan Hitachi15Japan Country ?? What is the problem? What’s the solution? SELECT Country FROM Product, Company WHERE Manufacturer=CName AND Category='Gadgets'
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Joins Product (pname, price, category, manufacturer) Purchase (buyer, seller, store, product) Person(name, phone, city) Find names of Seattleites who bought Gadgets, and the names of the stores they bought such product from. SELECT DISTINCT name, store FROM Person, Purchase, Product WHERE persname=buyer AND product = pname AND city='Seattle' AND category='Gadgets'
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Disambiguating Attributes Sometimes two relations have the same attr: Person(pname, address, worksfor) Company(cname, address) SELECT DISTINCT pname, address FROM Person, Company WHERE worksfor = cname SELECT DISTINCT Person.pname, Company.address FROM Person, Company WHERE Person.worksfor = Company.cname Which address ?
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Tuple Variables SELECT DISTINCT x.store FROM Purchase AS x, Purchase AS y WHERE x.product = y.product AND y.store = 'BestBuy' SELECT DISTINCT x.store FROM Purchase AS x, Purchase AS y WHERE x.product = y.product AND y.store = 'BestBuy' Find all stores that sold at least one product that the store ‘BestBuy’ also sold: Answer: (store) Product (pname, price, category, manufacturer) Purchase (buyer, seller, store, product) Person(persname, phoneNumber, city)
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Tuple Variables Tuple variables introduced automatically: Product (name, price, category, manufacturer) Becomes: Doesn’t work when Product occurs more than once In that case the user needs to define variables explicitly SELECT name FROM Product WHERE price > 100 SELECT name FROM Product WHERE price > 100 SELECT Product.name FROM Product AS Product WHERE Product.price > 100 SELECT Product.name FROM Product AS Product WHERE Product.price > 100
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Details: Disambiguation in SQL Every selected field must be unambiguous For R(A,B), Select A from R, R Select R1.A from R R1, R R2 Consider: Why? * is shorthand for all fields, each must be unambiguous Select * from R R1, R R2 SQL> Select * from R, R; Select * from R, R * ERROR at line 1: ORA-00918: column ambiguously defined SQL> Select * from R, R; Select * from R, R * ERROR at line 1: ORA-00918: column ambiguously defined
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Details: Disambiguation in Oracle SQL Can rename fields by Select name as n … Select name n … But not by Select name=n… Can rename relations only by … from tab t1, tab t2 Lesson: if you get errors, remove all =s, ASs
Matthew P. Johnson, OCL4, CISDD CUNY, Sept SQL Query Semantics SELECT a1, a2, …, ak FROM R1 AS x1, R2 AS x2, …, Rn AS xn WHERE Conditions 1. Nested loops: Answer = {} for x1 in R1 do for x2 in R2 do ….. for xn in Rn do if Conditions then Answer = Answer {(a1,…,ak)} return Answer Answer = {} for x1 in R1 do for x2 in R2 do ….. for xn in Rn do if Conditions then Answer = Answer {(a1,…,ak)} return Answer
Matthew P. Johnson, OCL4, CISDD CUNY, Sept SQL Query Semantics SELECT a1, a2, …, ak FROM R1 AS x1, R2 AS x2, …, Rn AS xn WHERE Conditions 2. Parallel assignment Doesn’t impose any order! Answer = {} for all assignments x1 in R1, …, xn in Rn do if Conditions then Answer = Answer {(a1,…,ak)} return Answer Answer = {} for all assignments x1 in R1, …, xn in Rn do if Conditions then Answer = Answer {(a1,…,ak)} return Answer
Matthew P. Johnson, OCL4, CISDD CUNY, Sept SQL e.g. Acc(name,ssn,balance) Q: Who has the largest balance? Conceptually: name (Acc) - a2.name ( a2.bal < Acc.bal (Acc x a2 (Acc))) In SQL?
Matthew P. Johnson, OCL4, CISDD CUNY, Sept New topic: Subqueries Powerful feature of SQL: one clause can contain other SQL queries Anywhere where a value or relation is allowed Several ways: Selection single constant (scalar) in SELECT Selection single constant (scalar) in WHERE Selection relation in WHERE Selection relation in FROM
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Subquery motivation Consider standard multi-table example: Purchase(prodname, buyerssn, etc.) Person(name, ssn, etc.) What did Christo buy? As usual, need to AND on equality identifying ssn’s row and buyerssn’s row SELECT Purchase.prodname FROM Purchase, Person WHERE buyerssn = ssn AND name = 'Christo'
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Subquery motivation Purchase(prodname, buyerssn, etc.) Person(name, ssn, etc.) What did Conrad buy? Natural intuition: Go find Conrad’s ssn Then find purchases SELECT ssn FROM Person WHERE name = 'Christo' SELECT Purchase.prodname FROM Purchase WHERE buyerssn = Christo’s-ssn
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Subqueries Subquery: copy in Conrad’s selection for his ssn: The subquery returns one value, so the = is valid If it returns more (or fewer), we get a run-time error SELECT Purchase.prodname FROM Purchase WHERE buyerssn = (SELECT ssn FROM Person WHERE name = 'Christo') SELECT Purchase.prodname FROM Purchase WHERE buyerssn = (SELECT ssn FROM Person WHERE name = 'Christo')
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Operators on subqueries Several new operators applied to (unary) selections: 1. IN R 2. EXISTS R 3. UNIQUE R 4. s > ALL R 5. s > ANY R 6. x IN R > is just an example op Each expression can be negated with NOT
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Subqueries with IN Product(name,maker), Person(name,ssn), Purchase(buyerssn,product) Q: Find companies Martha bought products from Strategy: 1. Find Martha’s ssn 2. Find products listed with that ssn as buyer 3. Find company names of those products SELECT DISTINCT Product.maker FROM Product WHERE Product.name IN (SELECT Purchase.product FROM Purchase WHERE Purchase.buyerssn = (SELECT ssn FROM Person WHERE name = 'Martha')) SELECT DISTINCT Product.maker FROM Product WHERE Product.name IN (SELECT Purchase.product FROM Purchase WHERE Purchase.buyerssn = (SELECT ssn FROM Person WHERE name = 'Martha'))
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Subqueries returning relations Equivalent to: SELECT DISTINCT Product.maker FROM Product, Purchase, People WHERE Product.name = Purchase.product AND Purchase.buyerssn = ssn AND name = 'Martha' SELECT DISTINCT Product.maker FROM Product, Purchase, People WHERE Product.name = Purchase.product AND Purchase.buyerssn = ssn AND name = 'Martha'
Matthew P. Johnson, OCL4, CISDD CUNY, Sept FROM subqueries Motivation for another way: suppose we’re given Martha’s purchases Then could just cross with Products to get product makers Substitute (named) subquery for Martha’s purchases SELECT Product.maker FROM Product, (SELECT Purchase.product FROM Purchase WHERE Purchase.buyerssn = (SELECT ssn FROM Person WHERE name = 'Martha')) Marthas WHERE Product.name = Marthas.product SELECT Product.maker FROM Product, (SELECT Purchase.product FROM Purchase WHERE Purchase.buyerssn = (SELECT ssn FROM Person WHERE name = 'Martha')) Marthas WHERE Product.name = Marthas.product
Matthew P. Johnson, OCL4, CISDD CUNY, Sept ALL op Employees(name, job, divid, salary) Find which employees are paid more than all the programmers SELECT name FROM Employees WHERE salary > ALL (SELECT salary FROM Employees WHERE job='programmer') SELECT name FROM Employees WHERE salary > ALL (SELECT salary FROM Employees WHERE job='programmer')
Matthew P. Johnson, OCL4, CISDD CUNY, Sept ANY/SOME op Employees(name, job, divid, salary) Find which employees are paid more than at least one vice president SELECT name FROM Employees WHERE salary > ANY (SELECT salary FROM Employees WHERE job='VP') SELECT name FROM Employees WHERE salary > ANY (SELECT salary FROM Employees WHERE job='VP')
Matthew P. Johnson, OCL4, CISDD CUNY, Sept ANY/SOME op Employees(name, job, divid, salary) Find which employees are paid more than at least one vice president SELECT name FROM Employees WHERE salary > SOME (SELECT salary FROM Employees WHERE job='VP') SELECT name FROM Employees WHERE salary > SOME (SELECT salary FROM Employees WHERE job='VP')
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Existential/Universal Conditions Employees(name, job, divid, salary) Division(name, id, head) Find all divisions with an employee whose salary is > Existential: easy! SELECT DISTINCT Division.name FROM Employees, Division WHERE salary > AND divid=id SELECT DISTINCT Division.name FROM Employees, Division WHERE salary > AND divid=id
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Existential/Universal Conditions Employees(name, job, divid, salary) Division(name, id, head) Find all divisions in which everyone makes > Universal: hard!
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Existential/universal with IN 2. Select the divisions we didn’t find 1. Find the other divisions: in which someone makes <= SELECT name FROM Division WHERE id IN (SELECT divid FROM Employees WHERE salary <= SELECT name FROM Division WHERE id IN (SELECT divid FROM Employees WHERE salary <= SELECT name FROM Division WHERE id NOT IN (SELECT divid FROM Employees WHERE salary <= SELECT name FROM Division WHERE id NOT IN (SELECT divid FROM Employees WHERE salary <=
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Acc(name,bal,type…) Q: Who has the largest balance? Can we do this with subqueries?
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Last time: Acc(name,bal,type,…) Q: Find holder of largest account SELECT name FROM Acc WHERE bal >= ALL (SELECT bal FROM Acc) SELECT name FROM Acc WHERE bal >= ALL (SELECT bal FROM Acc) Correlated Queries
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Correlated Queries So far, subquery executed once; result used for higher query More complicated: correlated queries “[T]he subquery… [is] evaluated many times, once for each assignment of a value to some term in the subquery that comes from a tuple variable outside the subquery” (Ullman, p286). Q: What does this mean? A: That subqueries refer to vars from outer queries
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Last time: Acc(name,bal,type,…) Q2: Find holder of largest account of each type SELECT name, type FROM Acc WHERE bal >= ALL (SELECT bal FROM Acc WHERE type=type) SELECT name, type FROM Acc WHERE bal >= ALL (SELECT bal FROM Acc WHERE type=type) Correlated Queries correlation
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Last time: Acc(name,bal,type,…) Q2: Find holder of largest account of each type Note: 1. scope of variables 2. this can still be expressed as single SFW SELECT name, type FROM Acc a1 WHERE bal >= ALL (SELECT bal FROM Acc WHERE type=a1.type) SELECT name, type FROM Acc a1 WHERE bal >= ALL (SELECT bal FROM Acc WHERE type=a1.type) Correlated Queries correlation
Matthew P. Johnson, OCL4, CISDD CUNY, Sept EXCEPT and INTERSECT (SELECT R.A, R.B FROM R) INTERSECT (SELECT S.A, S.B FROM S) (SELECT R.A, R.B FROM R) INTERSECT (SELECT S.A, S.B FROM S) (SELECT R.A, R.B FROM R) EXCEPT (SELECT S.A, S.B FROM S) (SELECT R.A, R.B FROM R) EXCEPT (SELECT S.A, S.B FROM S) SELECT R.A, R.B FROM R WHERE EXISTS(SELECT * FROM S WHERE R.A=S.A and R.B=S.B) SELECT R.A, R.B FROM R WHERE EXISTS(SELECT * FROM S WHERE R.A=S.A and R.B=S.B) SELECT R.A, R.B FROM R WHERE NOT EXISTS(SELECT * FROM S WHERE R.A=S.A and R.B=S.B) SELECT R.A, R.B FROM R WHERE NOT EXISTS(SELECT * FROM S WHERE R.A=S.A and R.B=S.B)
Matthew P. Johnson, OCL4, CISDD CUNY, Sept More on Set-Comparison Operators We’ve already seen IN R, NOT IN R. Can also use EXISTS R, NOT EXISTS R Also available: op ANY R, op ALL R Find sailors whose rating is greater than that of some sailor called Alberto: SELECT R.SID FROM Reserves R WHERE R.rating > ANY ( SELECT R2.rating FROM Reserves R2 WHERE R2.sname=‘Alberto’)
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Extended e.g. Scenario: 1. Purchase(pid, seller-ssn, buyer-ssn, etc.) 2. Person(ssn, name, etc.) 3. Product(pid, name, etc.) Q: Who (give names) bought gizmos from Dick? Where to start? Purchase uses pid, ssn, so must get them…
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Last time: Complex RA Expressions Scenario: 1. Purchase(pid, seller-ssn, buyer-ssn, etc.) 2. Person(ssn, name, etc.) 3. Product(pid, name, etc.) Q: Who (give names) bought gizmos from Dick? Where to start? Purchase uses pid, ssn, so must get them…
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Complex RA Expressions Person Purchase Person Product name='Dick' name='Gizmo' pid ssn seller-ssn=ssnpid=pidbuyer-ssn=Person.ssn name
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Translation to SQL We’re converting the tree on the last slide into SQL The result of the query should be the names indicated above One step at a time, we’ll make the query more complete, until we’ve translated the English-language description to an actual SQL query We’ll also simplify the query when possible (the names of the people who bought gadgets from Dick)
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Translation to SQL Blue type = actual SQL Black italics = description of subquery Note: the subquery above consists of purchase records, except with the info describing the buyers attached In the results, the column header for name will be 'buyer' SELECT DISTINCT name buyer FROM (the info, along with buyer names, for purchases of gadgets sold by Dick) SELECT DISTINCT name buyer FROM (the info, along with buyer names, for purchases of gadgets sold by Dick)
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Translation to SQL Note: the subquery in this version is being given the name P2 We’re pairing our rows from Person with rows from P2 SELECT DISTINCT name buyer FROM (SELECT * FROM Person, (the purchases of gadgets from Dick) P2 WHERE Person.ssn = P2.buyer-ssn) SELECT DISTINCT name buyer FROM (SELECT * FROM Person, (the purchases of gadgets from Dick) P2 WHERE Person.ssn = P2.buyer-ssn)
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Translation to SQL We simplified by combining the two SELECTs SELECT DISTINCT name buyer FROM Person, (the purchases of gadgets from Dick) P2 WHERE Person.ssn = P2.buyer-ssn SELECT DISTINCT name buyer FROM Person, (the purchases of gadgets from Dick) P2 WHERE Person.ssn = P2.buyer-ssn
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Translation to SQL P2 is still the name of the subquery It’s just been filled in with a query that contains two subqueries Outer parentheses are bolded for clarity SELECT DISTINCT name buyer FROM Person, (SELECT * FROM Purchases WHERE seller-ssn = (Dick’s ssn) AND pid = (the id of gadget)) P2 WHERE Person.ssn = P2.buyer-ssn SELECT DISTINCT name buyer FROM Person, (SELECT * FROM Purchases WHERE seller-ssn = (Dick’s ssn) AND pid = (the id of gadget)) P2 WHERE Person.ssn = P2.buyer-ssn
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Translation to SQL Now the subquery to find Dick’s ssn is filled in SELECT DISTINCT name buyer FROM Person, (SELECT * FROM Purchases WHERE seller-ssn = (SELECT ssn FROM Person WHERE name='Dick') AND pid = (the id of gadget)) P2 WHERE Person.ssn = P2.buyer-ssn SELECT DISTINCT name buyer FROM Person, (SELECT * FROM Purchases WHERE seller-ssn = (SELECT ssn FROM Person WHERE name='Dick') AND pid = (the id of gadget)) P2 WHERE Person.ssn = P2.buyer-ssn
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Translation to SQL And now the subquery to find Gadget’s product id is filled in, too Note: the SQL simplified by using subqueries Not used in relational algebra SELECT DISTINCT name buyer FROM Person, (SELECT * FROM Purchases WHERE seller-ssn = (SELECT ssn FROM Person WHERE name='Dick') AND pid = (SELECT pid FROM Product WHERE name='Gadget')) P2 WHERE Person.ssn = P2.buyer-ssn SELECT DISTINCT name buyer FROM Person, (SELECT * FROM Purchases WHERE seller-ssn = (SELECT ssn FROM Person WHERE name='Dick') AND pid = (SELECT pid FROM Product WHERE name='Gadget')) P2 WHERE Person.ssn = P2.buyer-ssn
Matthew P. Johnson, OCL4, CISDD CUNY, Sept Review Examples from sqlzoo.netsqlzoo.net SELECT L FROM R 1, …, R n WHERE C SELECT L FROM R 1, …, R n WHERE C L ( C (R 1 x … R n )