Presentation is loading. Please wait.

Presentation is loading. Please wait.

Principles of Database Systems CSE 544p Lecture #1 September 28, 2011 1Dan Suciu -- p544 Fall 2011.

Similar presentations


Presentation on theme: "Principles of Database Systems CSE 544p Lecture #1 September 28, 2011 1Dan Suciu -- p544 Fall 2011."— Presentation transcript:

1 Principles of Database Systems CSE 544p Lecture #1 September 28, 2011 1Dan Suciu -- p544 Fall 2011

2 Staff Instructor: Dan Suciu – CSE 662, suciu@cs.washington.edusuciu@cs.washington.edu – Office hours: Wednesdays, 5:30-6:20 TAs: – Sandra Fan, sbfan@cs.washington.edusbfan@cs.washington.edu Dan Suciu -- p544 Fall 20112

3 Communications Web page: http://www.cs.washington.edu/p544 http://www.cs.washington.edu/p544 – Lectures will be available here – Homework will be posted here – Announcements may be posted here Mailing list: – Announcements, group discussions – If you registered, you are automatically subscribed 3Dan Suciu -- p544 Fall 2011

4 Textbook(s) Main textbook: Database Management Systems, Ramakrishnan and Gehrke Second textbook: Database Systems: The Complete Book, Garcia-Molina, Ullman, Widom 4Dan Suciu -- p544 Fall 2011

5 Course Format Lectures Wednesdays, 6:30-9:20 7 Homework Assignments Take-home Final 5Dan Suciu -- p544 Fall 2011

6 Grading Homework:70 % Take-home Final:30% 6Dan Suciu -- p544 Fall 2011

7 Homework Assignments 1.SQL 2.Conceptual design 3.JAVA/SQL 4.Transactions 5.Database tuning 6.XML/XPath/XQuery 7.Pig Latin, on AWS 7Dan Suciu -- p544 Fall 2011 Due: Mondays’, by 11:59pm. Three late days per person

8 Take-home Final Posted on December 8, at 11:59pm Due on December 10, by 10:00pm No late days/hours/minutes/seconds Dan Suciu -- p544 Fall 20118

9 Software Tools Postgres: – Preferred usage: download from download http://www.postgresql.org/download/download http://www.postgresql.org/download/ – Other option: use postgres on lab machines SQL Server 2008 – Download client from http://msdnaa.cs.washington.eduhttp://msdnaa.cs.washington.edu – Username is your full @cs.washington.edu email address – Doesn’t work ? Email ms-sw-admin@cs.washington.edums-sw-admin@cs.washington.edu – Connect to IPROJSRV (may need tunneling) – OK to use you own server, just import IMDB Xquery: download one interpreter from – Preferred: Saxon: http://saxon.sourceforge.net/ (from apache; very popular)http://saxon.sourceforge.net/ – Others: Zorba: http://www.zorba-xquery.com/ (I used this one: ½ day installation)http://www.zorba-xquery.com/ Galax: http://galax.sourceforge.net/ (great in the past, seems less well maintained)http://galax.sourceforge.net/ Pig Latin: – We will run it on Amazon Web Services – You may download from http://hadoop.apache.org/pig/, but you won’t need ithttp://hadoop.apache.org/pig/ Dan Suciu -- p544 Fall 20119

10 Accessing SQL Server SQL Server Management Studio Server Type = Database Engine Server Name = IPROJSRV Authentication = SQL Server Authentication – Login = your UW email address (not the CSE email) – Password = [in class] Must connect from within CSE, or must use tunneling Alternatively: install your own, get it from MSDNAA (see earlier slide) Then play with IMDB, start working on HW 1 Dan Suciu -- p544 Fall 201110

11 Rest of Today’s Lecture Overview of DBMS Overview of the course content SQL Dan Suciu -- p544 Fall 201111

12 Database What is a database ? Give examples of databases 12Dan Suciu -- p544 Fall 2011

13 Database What is a database ? A collection of files storing related data Give examples of databases Accounts database; payroll database; UW’s students database; Amazon’s products database; airline reservation database 13Dan Suciu -- p544 Fall 2011

14 Database Management System What is a DBMS ? Give examples of DBMS 14Dan Suciu -- p544 Fall 2011

15 Database Management System What is a DBMS ? A big C program written by someone else that allows us to manage efficiently a large database and allows it to persist over long periods of time Give examples of DBMS DB2 (IBM), SQL Server (MS), Oracle, Sybase MySQL, Postgres, … 15 SQL for Nerds, Greenspun, http://philip.greenspun.com/sql/ (Chap 1)http://philip.greenspun.com/sql/ SQL for Nerds, Greenspun, http://philip.greenspun.com/sql/ (Chap 1)http://philip.greenspun.com/sql/ Dan Suciu -- p544 Fall 2011

16 Market Shares From 2006 Gartner report: IBM: 21% market with $3.2BN in sales Oracle: 47% market with $7.1BN in sales Microsoft: 17% market with $2.6BN in sales 16Dan Suciu -- p544 Fall 2011

17 An Example The Internet Movie Database http://www.imdb.com http://www.imdb.com Entities: Actors (800k), Movies (400k), Directors, … Relationships: who played where, who directed what, … 17Dan Suciu -- p544 Fall 2011

18 Key concept 1: Relational Data Model 18Dan Suciu -- p544 Fall 2011 Actor:Cast: Movie: idfNamelNamegender 195428TomHanksM 645947AmyHanksF... idNameyear 337166Toy Story1995... pidmid 195428337166...

19 Key concept 2: Declarative Query Language 19 SELECT * FROM Actor SELECT * FROM Actor Dan Suciu -- p544 Fall 2011 SELECT count(*) FROM Actor SELECT count(*) FROM Actor SELECT * FROM Actor WHERE lName = ‘Hanks’ SELECT * FROM Actor WHERE lName = ‘Hanks’ SQL We write what we want, not how we want it.

20 Key concept 3: Data Independence 20 SELECT * FROM Actor, Casts, Movie WHERE lname='Hanks' and Actor.id = Casts.pid and Casts.mid=Movie.id and Movie.year=1995 SELECT * FROM Actor, Casts, Movie WHERE lname='Hanks' and Actor.id = Casts.pid and Casts.mid=Movie.id and Movie.year=1995 817k actors, 3.5M casts, 380k movies; How can it be so fast ? Physical data independence: query is independent of physical storage Physical data independence: query is independent of physical storage

21 21 How Can We Evaluate the Query ? Actor:Cast: Movie: idfNamelNamegender...Hanks... idNameyear...1995... pidmid... Plan 1:.... [ in class ] Plan 2:.... [ in class ] Dan Suciu -- p544 Fall 2011

22 22 ActorCastMovie  lName=‘Hanks’  year=1995 ActorCastMovie  lName=‘Hanks’  year=1995 Indexes: on Actor.lName, on Movie.year Alternative query plans: Query optimization Database Statistics histograms, synopses, etc

23 Key concept 4: Transactions Dan Suciu -- p544 Fall 201123 X = Read(Account_1); X.amount = X.amount - 100; Write(Account_1, X); Y = Read(Account_2); Y.amount = Y.amount + 100; Write(Account_2, Y); X = Read(Account_1); X.amount = X.amount - 100; Write(Account_1, X); Y = Read(Account_2); Y.amount = Y.amount + 100; Write(Account_2, Y); CRASH ! What is the problem ? Recovery from systems failures: Transfer $100 from account 1 to account 2:

24 Dan Suciu -- p544 Fall 201124 X = Read(Account); if (X.amount >= 100) { dispense_money( ); X.amount = X.amount – 100; } else error(“Insufficient funds”); X = Read(Account); if (X.amount >= 100) { dispense_money( ); X.amount = X.amount – 100; } else error(“Insufficient funds”); X = Read(Account); if (X.amount >= 100) { dispense_money( ); X.amount = X.amount – 100; } else error(“Insufficient funds”); X = Read(Account); if (X.amount >= 100) { dispense_money( ); X.amount = X.amount – 100; } else error(“Insufficient funds”); What can go wrong ? Concurrency Control Overdrafting an account: User 1: User 2:

25 Transactions ACID = Atomicity ( = recovery) Consistency Isolation ( = concurrency control) Durability 25Dan Suciu -- p544 Fall 2011

26 Client/Server Database Architecture Single server that stores the database Many clients running apps and connecting to DBMS Performance bottlenecks: – Client/server communication – Transactional semantics Other architectures: – main memory database – replicated databases 26Dan Suciu -- p544 Fall 2011

27 Two Types of Database Usage OLTP (online-transaction-processing) – Many updates – Many simple “point queries” – Few (or no) complex aggregate queries Decision-Support – Many aggregate/group-by queries. – Few (or no) updates Dan Suciu -- p544 Fall 201127

28 Trends in Data Management Large scale data analytics: Map/Reduce, Pig, … Cloud based database service: AWS, Azure, … NoSQL: sacrifice ACID for performance Data privacy Data provenance Complex data analytics: probabilistic databases Dan Suciu -- p544 Fall 201128

29 Outline of Course Content 1.SQL 2.Relational Calculus, Database Design 3.Constraints, Views 4.Transactions: recovery 5.Transactions: concurrency control 6.XML, XPath, XQuery 7.Data storage, indexes, physical tuning 8.Query execution 9.Query optimization 10.Big Data: Parallel databases, Map/Reduce, Pig Latin 11.Advanced topics: privacy, provenance, probabilistic dbs Dan Suciu -- p544 Fall 201129

30 Announcement: Homework 1 Homework 1 is posted; Due on Monday, Oct. 10 Tools: – Postgres: install on your computer (PREFERRED) or use the installation in the lab – SQL Server, for testing only; connect to IPROJSRV: login: your UW email address; password: …….. Tasks: create db, import data, create indices, write 11 SQL queries Dan Suciu -- p544 Fall 201130

31 31 Outline for rest of today Basics SQL (Chapters 5.2, 5.3) Aggregates (Chapter 5.5.) Nulls, Outer joins (Chapter 5.6) Subqueries (Chapters 5.4) – This is tough ! Next lecture we will discuss Relational Calculus (a.k.a. Tuple Calculus, Chapter 4.3). See supplementary text Three Query Language Formalisms Dan Suciu -- p544 Fall 2011

32 32 SQL Data Definition Language (DDL) – Create/alter/delete tables and their attributes – Read from the book Data Manipulation Language (DML) – Query tables, Insert/delete/modify – Discussed in class Dan Suciu -- p544 Fall 2011

33 33 Tables in SQL PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi Product Attribute names Table name Tuples or rows Key Dan Suciu -- p544 Fall 2011

34 The Relational Data Model Data is stored in tables, a.k.a. relations Each relation has: 1.A schema = name+attributes – Product(PName, Price, Category, Manufacturer) – Each relation has a key, which we underline 2.An instance = set of rows SQL departs from the pure relational model in that it allows duplicate tuples Set semantics  bag semantics {1, 2, 3}  {1, 1, 2, 3, 3, 3} Dan Suciu -- p544 Fall 201134

35 35 Data Types in SQL Atomic types: – Characters: CHAR(20), VARCHAR(50) – Numbers: INT, BIGINT, SMALLINT, FLOAT – Others: MONEY, DATETIME, … Record (aka tuple) – Has atomic attributes Table (relation) – A set of tuples Dan Suciu -- p544 Fall 2011

36 36 Simple SQL Query PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi SELECT * FROM Product WHERE category=‘Gadgets’ Product PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks “selection” Dan Suciu -- p544 Fall 2011

37 37 Simple SQL Query PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi SELECT PName, Price, Manufacturer FROM Product WHERE Price > ‘$100’ Product PNamePriceManufacturer SingleTouch$149.99Canon MultiTouch$203.99Hitachi “selection” and “projection” Dan Suciu -- p544 Fall 2011

38 38 Details Case insensitive: SELECT = Select = select Product = product BUT: ‘Seattle’ ≠ ‘seattle’ Constants: ‘abc’ - yes “abc” - no Dan Suciu -- p544 Fall 2011

39 39 Eliminating Duplicates SELECT DISTINCT category FROM Product SELECT DISTINCT category FROM Product Compare to: SELECT category FROM Product SELECT category FROM Product Category Gadgets Photography Household Category Gadgets Photography Household Dan Suciu -- p544 Fall 2011

40 40 Ordering the Results SELECT pname, price, manufacturer FROM Product WHERE category=‘Gadgets’ AND price > ‘$10’ ORDER BY price, pname SELECT pname, price, manufacturer FROM Product WHERE category=‘Gadgets’ AND price > ‘$10’ ORDER BY price, pname Ties are broken by the second attribute on the ORDER BY list. Ordering is ascending, unless you specify the DESC keyword. Dan Suciu -- p544 Fall 2011

41 41 SELECT Category FROM Product ORDER BY PName SELECT Category FROM Product ORDER BY PName PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi ? SELECT DISTINCT category FROM Product ORDER BY category SELECT DISTINCT category FROM Product ORDER BY category SELECT DISTINCT category FROM Product ORDER BY PName SELECT DISTINCT category FROM Product ORDER BY PName ? ? Dan Suciu -- p544 Fall 2011

42 42 Keys and Foreign Keys PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi Product Company CNameCountry GizmoWorksUSA CanonJapan HitachiJapan Key Foreign key Dan Suciu -- p544 Fall 2011

43 43 Joins Product (PName, Price, Category, Manufacturer) Company (CName,, Country) Find all products under $200 manufactured in Japan; return their names and prices. SELECT PName, Price FROM Product, Company WHERE Manufacturer=CName AND Country=‘Japan’ AND Price <= ‘$200’ Join between Product and Company Dan Suciu -- p544 Fall 2011

44 44 Joins PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi ProductCompany CnameCountry GizmoWorksUSA CanonJapan HitachiJapan PNamePrice SingleTouch$149.99 SELECT PName, Price FROM Product, Company WHERE Manufacturer=CName AND Country=‘Japan’ AND Price <= ‘$200’ Dan Suciu -- p544 Fall 2011

45 45 Tuple Variables SELECT DISTINCT name, country FROM Person, Company WHERE worksfor = cname Which country ? Product (PName, Price, Category, Manufacturer) Company (CName,, Country) Person(name, Country, Worksfor) SELECT DISTINCT Person.name, Company.country FROM Person, Company WHERE Person.worksfor = Company.cname Dan Suciu -- p544 Fall 2011 SELECT DISTINCT x.name, y.country FROM Person AS x, Company AS y WHERE x.worksfor = y.cname

46 46 In Class Product (pname, price, category, manufacturer) Company (cname, country) Find all Chinese companies that manufacture products in the ‘toy’ category SELECT cname FROM WHERE Dan Suciu -- p544 Fall 2011

47 47 In Class Product (pname, price, category, manufacturer) Company (cname, country) Find all Chinese companies that manufacture products both in the ‘electronic’ and ‘toy’ categories SELECT cname FROM WHERE Dan Suciu -- p544 Fall 2011

48 48 The Nested Loop Semantics of SQL Queries SELECT a 1, a 2, …, a k FROM R 1 AS x 1, R 2 AS x 2, …, R n AS x n WHERE Conditions SELECT a 1, a 2, …, a k FROM R 1 AS x 1, R 2 AS x 2, …, R n AS x n WHERE Conditions Dan Suciu -- p544 Fall 2011 Answer = {} for x 1 in R 1 do for x 2 in R 2 do ….. for x n in R n do if Conditions then Answer = Answer  {(a 1,…,a k )} return Answer Answer = {} for x 1 in R 1 do for x 2 in R 2 do ….. for x n in R n do if Conditions then Answer = Answer  {(a 1,…,a k )} return Answer

49 49 SELECT DISTINCT R.A FROM R, S, T WHERE R.A=S.A OR R.A=T.A SELECT DISTINCT R.A FROM R, S, T WHERE R.A=S.A OR R.A=T.A Using the Formal Semantics If S ≠ ∅ and T ≠ ∅ then returns R  (S  T) else returns ∅ What do these queries compute ? SELECT DISTINCT R.A FROM R, S WHERE R.A=S.A SELECT DISTINCT R.A FROM R, S WHERE R.A=S.A Returns R  S Dan Suciu -- p544 Fall 2011

50 50 Aggregation SELECT count(*) FROM Product SELECT count(*) FROM Product Except count, all aggregations apply to a single attribute SELECT sum(price) FROM Product WHERE manufacturer=‘GizmoWorks’ SELECT sum(price) FROM Product WHERE manufacturer=‘GizmoWorks’ SQL supports several aggregation operations: sum, count, min, max, avg Dan Suciu -- p544 Fall 2011 Product (pname, price, category, manufacturer) Company (cname, country)

51 51 COUNT applies to duplicates, unless otherwise stated: SELECT count(category) FROM Product WHERE price > ‘$20’ SELECT count(category) FROM Product WHERE price > ‘$20’ If category has no nulls, then count(category)=count(*) We probably want: SELECT count(DISTINCT category) FROM Product WHERE price > ‘$20’ SELECT count(DISTINCT category) FROM Product WHERE price > ‘$20’ Aggregation: Count Dan Suciu -- p544 Fall 2011 Product (pname, price, category, manufacturer) Company (cname, country)

52 52 Grouping and Aggregation SELECT manufacturer, count(*) AS total FROM Product WHERE price < ‘$200’ GROUP BY manufacturer SELECT manufacturer, count(*) AS total FROM Product WHERE price < ‘$200’ GROUP BY manufacturer Let’s see what this means… For each manufacturer, find total number of its products under $200. Dan Suciu -- p544 Fall 2011 Product (pname, price, category, manufacturer) Company (cname, country)

53 53 Grouping and Aggregation 1. Compute the FROM and WHERE clauses. 2. Group by the attributes in the GROUPBY 3. Compute the SELECT clause, including aggregates. Dan Suciu -- p544 Fall 2011

54 54 1&2. FROM-WHERE-GROUPBY Dan Suciu -- p544 Fall 2011 PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi SELECT manufacturer, count(*) AS total FROM Product WHERE price < ‘$200’ GROUP BY manufacturer SELECT manufacturer, count(*) AS total FROM Product WHERE price < ‘$200’ GROUP BY manufacturer

55 55 3. SELECT Dan Suciu -- p544 Fall 2011 SELECT manufacturer, count(*) AS total FROM Product WHERE price < ‘$200’ GROUP BY manufacturer SELECT manufacturer, count(*) AS total FROM Product WHERE price < ‘$200’ GROUP BY manufacturer PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi count(*)Manufacturer 2GizmoWorks 1Canon

56 56 HAVING Clause SELECT manufacturer, count(*) AS total FROM Product WHERE price < ‘$200’ GROUP BY manufacturer HAVING min(price) >’$20’ SELECT manufacturer, count(*) AS total FROM Product WHERE price < ‘$200’ GROUP BY manufacturer HAVING min(price) >’$20’ Same query, except that we return only those manufacturers that make only products with price > $20 HAVING clause contains conditions on aggregates. Dan Suciu -- p544 Fall 2011 Product (pname, price, category, manufacturer) Company (cname, country)

57 57 General form of Grouping and Aggregation SELECT S FROM R 1,…,R n WHERE C1 GROUP BY a 1,…,a k HAVING C2 S = may contain attributes a 1,…,a k and/or any aggregates but NO OTHER ATTRIBUTES C1 = is any condition on the attributes in R 1,…,R n C2 = is any condition on aggregate expressions Why ? Dan Suciu -- p544 Fall 2011

58 58 General form of Grouping and Aggregation Evaluation steps: 1.Evaluate FROM-WHERE, apply condition C1 2.Group by the attributes a 1,…,a k 3.Apply condition C2 to each group (may have aggregates) 4.Compute aggregates in S and return the result SELECT S FROM R 1,…,R n WHERE C1 GROUP BY a 1,…,a k HAVING C2 SELECT S FROM R 1,…,R n WHERE C1 GROUP BY a 1,…,a k HAVING C2 Dan Suciu -- p544 Fall 2011

59 59 NULLS in SQL Whenever we don’t have a value, we can put a NULL Can mean many things: – Value does not exists – Value exists but is unknown – Value not applicable – Etc. The schema specifies for each attribute if can be null (nullable attribute) or not How does SQL cope with tables that have NULLs ? Dan Suciu -- p544 Fall 2011

60 60 Null Values If x= NULL then 4*(3-x)/7 is still NULL If x= NULL then x=‘Joe’ is UNKNOWN In SQL there are three boolean values: FALSE = 0 UNKNOWN = 0.5 TRUE = 1 Dan Suciu -- p544 Fall 2011

61 61 Null Values C1 AND C2 = min(C1, C2) C1 OR C2 = max(C1, C2) NOT C1 = 1 – C1 Rule in SQL: include only tuples that yield TRUE SELECT * FROM Person WHERE (age < 25) AND (height > 6 OR weight > 190) SELECT * FROM Person WHERE (age < 25) AND (height > 6 OR weight > 190) E.g. age=20 heigth=NULL weight=200 Dan Suciu -- p544 Fall 2011

62 62 Null Values Unexpected behavior: Some Persons are not included ! SELECT * FROM Person WHERE age = 25 SELECT * FROM Person WHERE age = 25 Dan Suciu -- p544 Fall 2011

63 63 Null Values Can test for NULL explicitly: – x IS NULL – x IS NOT NULL Now it includes all Persons SELECT * FROM Person WHERE age = 25 OR age IS NULL SELECT * FROM Person WHERE age = 25 OR age IS NULL Dan Suciu -- p544 Fall 2011

64 Outerjoins 64 SELECT x.country, y.pname FROM Company x JOIN Product y ON x.cname = y.manufacturer SELECT x.country, y.pname FROM Company x JOIN Product y ON x.cname = y.manufacturer SELECT x.country, y.pname FROM Company x, Product y WHERE x.cname = y.manufacturer SELECT x.country, y.pname FROM Company x, Product y WHERE x.cname = y.manufacturer Same as: But countries that don’t manufacture will not be listed ! Product (pname, price, category, manufacturer) Company (cname, country) Normally, joins are “inner joins”: Dan Suciu -- p544 Fall 2011

65 Outerjoins 65 SELECT x.country, y.pname FROM Company x LEFT OUTER JOIN Product y ON x.cname = y.manufacturer SELECT x.country, y.pname FROM Company x LEFT OUTER JOIN Product y ON x.cname = y.manufacturer If we want to see the companies that don’t produce anything, then we use an outer join: Dan Suciu -- p544 Fall 2011 Product (pname, price, category, manufacturer) Company (cname, country)

66 66 Product Company Dan Suciu -- p544 Fall 2011 PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi CnameCountry GizmoWorksUSA CanonJapan HitachiJapan MuseumPassVatican CnamePName USA GizmoWorks USAGizmoWorks JapanCanon JapanHitachi Vatican NULL

67 Application Dan Suciu -- p544 Fall 201167 SELECT x.country, count(*) FROM Company x, Product y WHERE x.cname = y.manufacturer GROUP BY x.country SELECT x.country, count(*) FROM Company x, Product y WHERE x.cname = y.manufacturer GROUP BY x.country What’s wrong ? Product (pname, price, category, manufacturer) Company (cname, country) Compute the total number of products made by each country

68 Application Dan Suciu -- p544 Fall 201168 SELECT x.country, count(y.pname) FROM Company x LEFT OUTER JOIN Product y ON x.cname = y.manufacturer GROUP BY x.country SELECT x.country, count(y.pname) FROM Company x LEFT OUTER JOIN Product y ON x.cname = y.manufacturer GROUP BY x.country Now we also get the products who sold in 0 quantity Product (pname, price, category, manufacturer) Company (cname, country) Compute the total number of products made by each country Note: we don’t use count(*) WHY ? Note: we don’t use count(*) WHY ?

69 69 Outer Joins Left outer join: – Include the left tuple even if there’s no match Right outer join: – Include the right tuple even if there’s no match Full outer join: – Include the both left and right tuples even if there’s no match Dan Suciu -- p544 Fall 2011

70 Subqueries A subquery is another SQL query nested inside a larger query Such inner-outer queries are called nested queries A subquery may occur in: 1.A SELECT clause 2.A FROM clause 3.A WHERE clause Dan Suciu -- p544 Fall 201170 Rule of thumb: avoid writing nested queries when possible; sometimes it’s impossible

71 71 1. Subqueries in SELECT Product (pname, price, category, manufacturer) Company (cname, country) For each product return the country that manufactures it SELECT X.pname, (SELECT Y.country FROM Company Y WHERE Y.cname=X.manufacturer) FROM Product X SELECT X.pname, (SELECT Y.country FROM Company Y WHERE Y.cname=X.manufacturer) FROM Product X What happens if a subquery returns more than one country ? Dan Suciu -- p544 Fall 2011

72 72 1. Subqueries in SELECT Whenever possible, don’t use a nested queries: = We have “unnested” the query Dan Suciu -- p544 Fall 2011 SELECT X.pname, (SELECT Y.country FROM Company Y WHERE Y.cname=X.manufacturer) FROM Product X SELECT X.pname, (SELECT Y.country FROM Company Y WHERE Y.cname=X.manufacturer) FROM Product X SELECT pname, country FROM Product, Company WHERE cname=manufacturer SELECT pname, country FROM Product, Company WHERE cname=manufacturer Product (pname, price, category, manufacturer) Company (cname, country)

73 73 1. Subqueries in SELECT Compute the number of products made by each country SELECT DISTINCT x.country, (SELECT count(*) FROM Company y, Product WHERE y.cname=manufacturer and y.country = x.country) FROM Company x SELECT DISTINCT x.country, (SELECT count(*) FROM Company y, Product WHERE y.cname=manufacturer and y.country = x.country) FROM Company x Better: we can unnest by using a GROUP BY Dan Suciu -- p544 Fall 2011 Product (pname, price, category, manufacturer) Company (cname, country) SELECT x.country, count(*) FROM Company x, Product z WHERE x.cname = z.manufacturer GROUP BY x.country SELECT x.country, count(*) FROM Company x, Product z WHERE x.cname = z.manufacturer GROUP BY x.country

74 74 GROUP BY v.s. Nested Quereis SELECT manufacturer, count(*) AS total FROM Product WHERE price < '$200’ GROUP BY manufacturer SELECT manufacturer, count(*) AS total FROM Product WHERE price < '$200’ GROUP BY manufacturer SELECT DISTINCT x.manufacturer, (SELECT count(*) FROM Product y WHERE x.manufacturer = y.manufacturer AND price < '$200’) AS total FROM Product x WHERE price < '$200’ SELECT DISTINCT x.manufacturer, (SELECT count(*) FROM Product y WHERE x.manufacturer = y.manufacturer AND price < '$200’) AS total FROM Product x WHERE price < '$200’ Why twice ? Dan Suciu -- p544 Fall 2011

75 75 2. Subqueries in FROM Find all products whose prices is > 20 and < 30 SELECT * FROM (SELECT * FROM Product AS Y WHERE Y.price > ‘$20’) AS x WHERE x.price < ‘$30’ SELECT * FROM (SELECT * FROM Product AS Y WHERE Y.price > ‘$20’) AS x WHERE x.price < ‘$30’ Unnest this query ! Dan Suciu -- p544 Fall 2011 Product (pname, price, category, manufacturer) Company (cname, country)

76 76 3. Subqueries in WHERE Find all countries that make some products with price < 100 SELECT DISTINCT x.country FROM Company x WHERE EXISTS (SELECT * FROM Product y WHERE y.manufacturer = x.cname and y.price < ‘$100’) SELECT DISTINCT x.country FROM Company x WHERE EXISTS (SELECT * FROM Product y WHERE y.manufacturer = x.cname and y.price < ‘$100’) Existential quantifiers Using EXISTS: Dan Suciu -- p544 Fall 2011 Product (pname, price, category, manufacturer) Company (cname, country) Correlated subqery: uses x from outer query

77 77 3. Subqueries in WHERE Find all countries that make some products with price < 100 Predicate Calculus (a.k.a. First Order Logic) Dan Suciu -- p544 Fall 2011 { y | ∃ x.Company(x,y) ∧ ( ∃ z. ∃ p. ∃ c.Product(z,p,c,x) ∧ p<100)} Existential quantifiers Product (pname, price, category, manufacturer) Company (cname, country)

78 78 3. Subqueries in WHERE Find all countries that make some products with price < 100 SELECT DISTINCT country FROM Company WHERE cname IN (SELECT Product.manufacturer FROM Product WHERE Product.price < ‘$100’) SELECT DISTINCT country FROM Company WHERE cname IN (SELECT Product.manufacturer FROM Product WHERE Product.price < ‘$100’) Using IN Dan Suciu -- p544 Fall 2011 Existential quantifiers Product (pname, price, category, manufacturer) Company (cname, country) De-correlated subqery

79 79 3. Subqueries in WHERE Find all countries that make some products with price < 100 SELECT DISTINCT Company.country FROM Company WHERE ‘$100’ > ANY (SELECT price FROM Product WHERE manufacturer = cname) SELECT DISTINCT Company.country FROM Company WHERE ‘$100’ > ANY (SELECT price FROM Product WHERE manufacturer = cname) Using ANY: Dan Suciu -- p544 Fall 2011 Existential quantifiers Product (pname, price, category, manufacturer) Company (cname, country)

80 80 3. Subqueries in WHERE Find all countries that make some products with price < 100 SELECT DISTINCT x.country FROM Company x, Product y WHERE x.cname = y.manufacturer and y.price < ‘$100’ SELECT DISTINCT x.country FROM Company x, Product y WHERE x.cname = y.manufacturer and y.price < ‘$100’ Existential quantifiers are easy ! Now let’s unnest it: Dan Suciu -- p544 Fall 2011 Existential quantifiers Product (pname, price, category, manufacturer) Company (cname, country)

81 81 3. Subqueries in WHERE Universal quantifiers are hard !  Find the countries of all companies that make only products with price < 100 Dan Suciu -- p544 Fall 2011 Universal quantifiers Product (pname, price, category, manufacturer) Company (cname, country)

82 82 3. Subqueries in WHERE Predicate Calculus (a.k.a. First Order Logic) Dan Suciu -- p544 Fall 2011 { y | ∃ x.Company(x,y) ∧ ( ∀ z. ∀ p. ∀ c.Product(z,p,c,x)  p<100) } Find the countries of all companies that make only products with price < 100 Universal quantifiers Product (pname, price, category, manufacturer) Company (cname, country)

83 83 3. Subqueries in WHERE Dan Suciu -- p544 Fall 2011 { y | ∃ x. Company(x,y) ∧ ( ∀ z. ∀ p. ∀ c.Product(z,p,c,x)  p<100) } De Morgan’s Laws: ¬(A ∧ B) = ¬A ∨ ¬B ¬(A ∨ B) = ¬A ∧ ¬B ¬ ∀ x. P(x) = ∃ x. ¬ P(x) ¬ ∃ x. P(x) = ∀ x. ¬ P(x) ¬(A ∧ B) = ¬A ∨ ¬B ¬(A ∨ B) = ¬A ∧ ¬B ¬ ∀ x. P(x) = ∃ x. ¬ P(x) ¬ ∃ x. P(x) = ∀ x. ¬ P(x) { y| ∃ x.Company(x,y) ∧ ¬( ∃ z ∃ p. ∃ p.Product(z,p,c,x) ∧ p≥100) } { y | ∃ x. Company(x,y)) } − { y | ∃ x. Company(x,y) ∧ ( ∃ z ∃ p. ∃ c.Product(z,p,c,x) ∧ p≥100) } { y | ∃ x. Company(x,y)) } − { y | ∃ x. Company(x,y) ∧ ( ∃ z ∃ p. ∃ c.Product(z,p,c,x) ∧ p≥100) } ¬(A  B) = A ∧ ¬B = =

84 84 3. Subqueries in WHERE 2. Find all companies s.t. all their products have price < 100 1. Find the other companies: i.e. s.t. some product  100 Dan Suciu -- p544 Fall 2011 SELECT DISTINCT country FROM Company WHERE cname IN (SELECT manufacturer FROM Product WHERE price >= ‘$100’) SELECT DISTINCT country FROM Company WHERE cname IN (SELECT manufacturer FROM Product WHERE price >= ‘$100’) SELECT DISTINCT country FROM Company WHERE cname NOT IN (SELECT manufacturer FROM Product WHERE price >= ‘$100’) SELECT DISTINCT country FROM Company WHERE cname NOT IN (SELECT manufacturer FROM Product WHERE price >= ‘$100’)

85 85 3. Subqueries in WHERE Find the countries of all companies that make only products with price < 100 Universal quantifiers Using EXISTS: Dan Suciu -- p544 Fall 2011 SELECT DISTINCT x.country FROM Company x WHERE NOT EXISTS (SELECT * FROM Product y WHERE y.manufacturer = x.cname and y.price >= ‘$100’) SELECT DISTINCT x.country FROM Company x WHERE NOT EXISTS (SELECT * FROM Product y WHERE y.manufacturer = x.cname and y.price >= ‘$100’) Product (pname, price, category, manufacturer) Company (cname, country)

86 86 3. Subqueries in WHERE SELECT DISTINCT Company.country FROM Company WHERE ‘$100’ > ALL (SELECT price FROM Product WHERE manufacturer = cname) SELECT DISTINCT Company.country FROM Company WHERE ‘$100’ > ALL (SELECT price FROM Product WHERE manufacturer = cname) Using ALL: Dan Suciu -- p544 Fall 2011 Find the countries of all companies that make only products with price < 100 Universal quantifiers Product (pname, price, category, manufacturer) Company (cname, country)

87 87 Question for Database Fans and their Friends Can we unnest this query ? Dan Suciu -- p544 Fall 2011 Find the countries of all companies that make only products with price < 100

88 88 Monotone Queries A query Q is monotone if: – Whenever we add tuples to one or more of the tables… – … the answer to the query cannot contain fewer tuples Fact: all unnested queries are monotone – Proof: using the “nested for loops” semantics Fact: A query a universal quantifier is not monotone Consequence: we cannot unnest a query with a universal quantifier Dan Suciu -- p544 Fall 2011

89 Queries that must be nested Dan Suciu -- p544 Fall 201189 Rule of Thumb: Non-monotone queries cannot be unnested. In particular, queries with a universal quantifier cannot be unnested Rule of Thumb: Non-monotone queries cannot be unnested. In particular, queries with a universal quantifier cannot be unnested

90 More SQL Read the following commands in the book CREATE TABLE INSERT DELETE UPDATE They are easy; but we need/use them all the time in class, and in the homework assignments Dan Suciu -- p544 Fall 201190

91 91 Advanced SQLizing 1.Unnesting Aggregates 2.Finding witnesses Dan Suciu -- p544 Fall 2011

92 Unnesting Aggregates For each category, find the maximum price SELECT DISTINCT X.category, (SELECT max(Y.price) FROM Product Y WHERE X.category = Y.category) FROM Product X SELECT DISTINCT X.category, (SELECT max(Y.price) FROM Product Y WHERE X.category = Y.category) FROM Product X SELECT category, max(price) FROM Product GROUP BY category SELECT category, max(price) FROM Product GROUP BY category Equivalent queries Note: no need for DISTINCT (DISTINCT is the same as GROUP BY) Note: no need for DISTINCT (DISTINCT is the same as GROUP BY) Product (pname, price, category, manufacturer) Company (cname, country)

93 Unnesting Aggregates Find the number of products made in each country SELECT DISTINCT X.country, (SELECT count(*) FROM Company Y, Product Z WHERE Y.cname=Z.manufacturer AND Y.country = X.country) FROM Company X SELECT DISTINCT X.country, (SELECT count(*) FROM Company Y, Product Z WHERE Y.cname=Z.manufacturer AND Y.country = X.country) FROM Company X SELECT X.country, count(*) FROM Company X, Product Y WHERE X.cname=Y.manufacturer GROUP BY X.country SELECT X.country, count(*) FROM Company X, Product Y WHERE X.cname=Y.manufacturer GROUP BY X.country They are NOT equivalent ! (WHY?) Product (pname, price, category, manufacturer) Company (cname, country)

94 94 More Unnesting Find authors who wrote  10 documents: Attempt 1: with nested queries SELECT DISTINCT Author.name FROM Author WHERE count(SELECT Wrote.url FROM Wrote WHERE Author.login=Wrote.login) > 10 SELECT DISTINCT Author.name FROM Author WHERE count(SELECT Wrote.url FROM Wrote WHERE Author.login=Wrote.login) > 10 This is SQL by a novice Author(login,name) Wrote(login,url) Dan Suciu -- p544 Fall 2011

95 95 More Unnesting Find all authors who wrote at least 10 documents: Attempt 2: SQL style (with GROUP BY) SELECT DISTINCT Author.name FROM Author, Wrote WHERE Author.login=Wrote.login GROUP BY Author.name HAVING count(wrote.url) > 10 SELECT DISTINCT Author.name FROM Author, Wrote WHERE Author.login=Wrote.login GROUP BY Author.name HAVING count(wrote.url) > 10 This is SQL by an expert Dan Suciu -- p544 Fall 2011

96 96 Finding Witnesses For each country, find its most expensive products Dan Suciu -- p544 Fall 2011 Product (pname, price, category, manufacturer) Company (cname, country)

97 Finding Witnesses SELECT x.country, max(y.price) FROM Company x, Product y WHERE x.cname = y.manufacturer GROUP BY x.country SELECT x.country, max(y.price) FROM Company x, Product y WHERE x.cname = y.manufacturer GROUP BY x.country Finding the maximum price is easy… But we need the witnesses, i.e. the products with max price For each country, find its most expensive products Product (pname, price, category, manufacturer) Company (cname, country)

98 98 Finding Witnesses SELECT u.country, v.pname, v.price FROM Company u, Product v, (SELECT x.country, max(y.price) as mprice FROM Company x, Product y WHERE x.cname = y.manufacturer GROUP BY x.country) AS p WHERE u.country = p.country and v.price = p.mprice SELECT u.country, v.pname, v.price FROM Company u, Product v, (SELECT x.country, max(y.price) as mprice FROM Company x, Product y WHERE x.cname = y.manufacturer GROUP BY x.country) AS p WHERE u.country = p.country and v.price = p.mprice To find the witnesses, compute the maximum price in a subquery Dan Suciu -- p544 Fall 2011

99 99 Finding Witnesses There is a more concise solution here: SELECT x.country, y.pname, y.price FROM Company x, Product y WHERE x.cname = y.manufacturer and y.price >= ALL (SELECT z.price FROM Product z WHERE x.cname = z.manufacturer) SELECT x.country, y.pname, y.price FROM Company x, Product y WHERE x.cname = y.manufacturer and y.price >= ALL (SELECT z.price FROM Product z WHERE x.cname = z.manufacturer) Dan Suciu -- p544 Fall 2011 Product (pname, price, category, manufacturer) Company (cname, country)


Download ppt "Principles of Database Systems CSE 544p Lecture #1 September 28, 2011 1Dan Suciu -- p544 Fall 2011."

Similar presentations


Ads by Google