Principles of Database Systems CSE 544p Lecture #1 September 28, 2011 1Dan Suciu -- p544 Fall 2011.

Slides:



Advertisements
Similar presentations
1 Lecture 02: SQL. 2 Outline Data in SQL Simple Queries in SQL (6.1) Queries with more than one relation (6.2) Recomeded reading: Chapter 3, Simple Queries.
Advertisements

1 Lecture 12: SQL Friday, October 26, Outline Simple Queries in SQL (5.1) Queries with more than one relation (5.2) Subqueries (5.3) Duplicates.
Principles of Database Systems CSE 544p Lecture #1 September 29, Dan Suciu -- p544 Fall 2010.
Principles of Database Systems CSE 544p Lecture #1 January 6 th, Dan Suciu , Winter 2011.
1 Lecture 03: SQL Friday, January 7, Administrivia Have you logged in IISQLSRV yet ? HAVE YOU CHANGED YOUR PASSWORD ? Homework 1 is now posted.
Matthew P. Johnson, OCL1, CISDD CUNY, F20041 OCL1 Oracle 8i: SQL & PL/SQL Session #3 Matthew P. Johnson CISDD, CUNY Fall, 2004.
1 Lecture 02: Basic SQL. 2 Outline Data in SQL Simple Queries in SQL Queries with more than one relation Reading: Chapter 3, “Simple Queries” from SQL.
1 Lecture 2: SQL Wednesday, January 7, Agenda Leftovers from Monday The relational model (very quick) SQL Homework #1 given out later this week.
1 Lecture 3: More SQL Friday, January 9, Agenda Homework #1 on the web site today. Sign up for the mailing list! Next Friday: –In class ‘activity’
1 Introduction to Database Systems CSE 444 Lecture #1 March 31, 2008.
1 Information Systems Chapter 6 Database Queries.
1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.
1 Introduction to Database Systems CSE 444 Lecture #1 January 4, 2006.
CSE544: SQL Monday 3/27 and Wednesday 3/29, 2006.
+ From Relational Algebra to SQL W2013 CSCI 2141.
Lecture 3: Introduction to SQL September 29, 2014.
1 Introduction to Database Systems CSE 444 Lecture #1 September 27, 2006.
CSE544 Introduction Monday, March 27, Staff Instructor: Dan Suciu –CSE 662, –Office hours: Wednesdays, 12pm-1pm TA: Bhushan.
1 Introduction to Database Systems CSE 444 Lecture #1 September 28, 2005.
1. Midterm summary Types of data on the web: unstructured, semi- structured, structured Scale and uncertainty are key features Main goals are to model,
1 SQL cont.. 2 Outline Unions, intersections, differences (6.2.5, 6.4.2) Subqueries (6.3) Aggregations (6.4.3 – 6.4.6) Hint for reading the textbook:
IM433-Industrial Data Systems Management Lecture 5: SQL.
Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core.
More SQL: Complex Queries, Triggers, Views, and Schema Modification UMM AL QURA UNIVERSITY College of Computer Dr. Ali Al Najjar 1.
1 Lecture 04: SQL Wednesday, January 11, Outline Two Examples Nulls (6.1.6) Outer joins (6.3.8) Database Modifications (6.5)
Lectures 2&3: Introduction to SQL. Lecture 2: SQL Part I Lecture 2.
SQL SQL Review. SQL Introduction Standard language for querying and manipulating data Structured Query Language Many standards out there: ANSI SQL, SQL92.
SQL. SQL Introduction Standard language for querying and manipulating data Structured Query Language Many standards out there: ANSI SQL, SQL92 (a.k.a.
Hassan Tariq MULTIPLE TABLES: SQL provides a convenient operation to retrieve information from multiple tables.SQL provides a convenient operation to.
1 Introduction to Database Systems CSE 444 Lecture 02: SQL September 28, 2007.
1 Lecture 02: SQL Friday, September 30, Administrivia Homework 1 is out. Due: Wed., Oct. 12 Did you login on IISQLSRV ? Did you change your password.
Lectures 2&3: Introduction to SQL. Lecture 2: SQL Part I Lecture 2.
1 Introduction to Database Systems CSE 444 Lecture 04: SQL April 7, 2008.
Hassan Tariq INTRODUCTION TO SQL What is SQL? –When a user wants to get some information from a database file, he can issue a query. – A query is a user–request.
1 Lecture 03: SQL Monday, January 9, Project t/Default.aspxhttp://iisqlsrv.cs.washington.edu/444/Projec.
1 Introduction to Database Systems CSE 444 Lecture #1 September 26, 2007.
SQL. SQL Introduction Standard language for querying and manipulating data Structured Query Language Many standards out there: ANSI SQL, SQL92 (a.k.a.
CHAPTER 6: INTRODUCTION TO SQL © 2013 Pearson Education, Inc. Publishing as Prentice Hall 1 Modern Database Management 11 th Edition Jeffrey A. Hoffer,
Standard language for querying and manipulating data Structured Query Language Many standards out there: ANSI SQL, SQL92 (a.k.a. SQL2), SQL99 (a.k.a. SQL3),
SQL.
Lecture 05: SQL Wednesday, January 12, 2005.
Lecture 04: SQL Monday, January 10, 2005.
Server-Side Application and Data Management IT IS 3105 (FALL 2009)
Cse 344 April 4th – Subqueries.
Monday 3/27 and Wednesday 3/29, 2006
Introduction to Database Systems CSE 444 Lecture 04: SQL
Lecture 2 (cont’d) & Lecture 3: Advanced SQL – Part I
Cse 344 January 12th –joins.
March 30th – intro to joins
Introduction to Database Systems CSE 444 Lecture 03: SQL
January 19th – Subqueries 2 and relational algebra
January 17th – Subqueries
Lecture 4: Advanced SQL – Part II
Introduction to SQL Wenhao Zhang October 5, 2018.
Introduction to Database Systems CSE 444 Lecture 03: SQL
CSE544 SQL Wednesday, March 31, 2004.
Lectures 3: Introduction to SQL Part II
Lectures 7: Introduction to SQL 6
Lectures 3: Introduction to SQL 2
Introduction to Database Systems CSE 444 Lecture 02: SQL
Lectures 6: Introduction to SQL 5
Lecture 3 Monday, April 8, 2002.
Lecture 4: SQL Wednesday, April 10, 2002.
Lecture 03: SQL Friday, October 3, 2003.
Lecture 04: SQL Monday, October 6, 2003.
Presentation transcript:

Principles of Database Systems CSE 544p Lecture #1 September 28, Dan Suciu -- p544 Fall 2011

Staff Instructor: Dan Suciu – CSE 662, – Office hours: Wednesdays, 5:30-6:20 TAs: – Sandra Fan, Dan Suciu -- p544 Fall 20112

Communications Web page: – Lectures will be available here – Homework will be posted here – Announcements may be posted here Mailing list: – Announcements, group discussions – If you registered, you are automatically subscribed 3Dan Suciu -- p544 Fall 2011

Textbook(s) Main textbook: Database Management Systems, Ramakrishnan and Gehrke Second textbook: Database Systems: The Complete Book, Garcia-Molina, Ullman, Widom 4Dan Suciu -- p544 Fall 2011

Course Format Lectures Wednesdays, 6:30-9:20 7 Homework Assignments Take-home Final 5Dan Suciu -- p544 Fall 2011

Grading Homework:70 % Take-home Final:30% 6Dan Suciu -- p544 Fall 2011

Homework Assignments 1.SQL 2.Conceptual design 3.JAVA/SQL 4.Transactions 5.Database tuning 6.XML/XPath/XQuery 7.Pig Latin, on AWS 7Dan Suciu -- p544 Fall 2011 Due: Mondays’, by 11:59pm. Three late days per person

Take-home Final Posted on December 8, at 11:59pm Due on December 10, by 10:00pm No late days/hours/minutes/seconds Dan Suciu -- p544 Fall 20118

Software Tools Postgres: – Preferred usage: download from download – Other option: use postgres on lab machines SQL Server 2008 – Download client from – Username is your address – Doesn’t work ? – Connect to IPROJSRV (may need tunneling) – OK to use you own server, just import IMDB Xquery: download one interpreter from – Preferred: Saxon: (from apache; very popular) – Others: Zorba: (I used this one: ½ day installation) Galax: (great in the past, seems less well maintained) Pig Latin: – We will run it on Amazon Web Services – You may download from but you won’t need ithttp://hadoop.apache.org/pig/ Dan Suciu -- p544 Fall 20119

Accessing SQL Server SQL Server Management Studio Server Type = Database Engine Server Name = IPROJSRV Authentication = SQL Server Authentication – Login = your UW address (not the CSE ) – Password = [in class] Must connect from within CSE, or must use tunneling Alternatively: install your own, get it from MSDNAA (see earlier slide) Then play with IMDB, start working on HW 1 Dan Suciu -- p544 Fall

Rest of Today’s Lecture Overview of DBMS Overview of the course content SQL Dan Suciu -- p544 Fall

Database What is a database ? Give examples of databases 12Dan Suciu -- p544 Fall 2011

Database What is a database ? A collection of files storing related data Give examples of databases Accounts database; payroll database; UW’s students database; Amazon’s products database; airline reservation database 13Dan Suciu -- p544 Fall 2011

Database Management System What is a DBMS ? Give examples of DBMS 14Dan Suciu -- p544 Fall 2011

Database Management System What is a DBMS ? A big C program written by someone else that allows us to manage efficiently a large database and allows it to persist over long periods of time Give examples of DBMS DB2 (IBM), SQL Server (MS), Oracle, Sybase MySQL, Postgres, … 15 SQL for Nerds, Greenspun, (Chap 1) SQL for Nerds, Greenspun, (Chap 1) Dan Suciu -- p544 Fall 2011

Market Shares From 2006 Gartner report: IBM: 21% market with $3.2BN in sales Oracle: 47% market with $7.1BN in sales Microsoft: 17% market with $2.6BN in sales 16Dan Suciu -- p544 Fall 2011

An Example The Internet Movie Database Entities: Actors (800k), Movies (400k), Directors, … Relationships: who played where, who directed what, … 17Dan Suciu -- p544 Fall 2011

Key concept 1: Relational Data Model 18Dan Suciu -- p544 Fall 2011 Actor:Cast: Movie: idfNamelNamegender TomHanksM AmyHanksF... idNameyear Toy Story pidmid

Key concept 2: Declarative Query Language 19 SELECT * FROM Actor SELECT * FROM Actor Dan Suciu -- p544 Fall 2011 SELECT count(*) FROM Actor SELECT count(*) FROM Actor SELECT * FROM Actor WHERE lName = ‘Hanks’ SELECT * FROM Actor WHERE lName = ‘Hanks’ SQL We write what we want, not how we want it.

Key concept 3: Data Independence 20 SELECT * FROM Actor, Casts, Movie WHERE lname='Hanks' and Actor.id = Casts.pid and Casts.mid=Movie.id and Movie.year=1995 SELECT * FROM Actor, Casts, Movie WHERE lname='Hanks' and Actor.id = Casts.pid and Casts.mid=Movie.id and Movie.year= k actors, 3.5M casts, 380k movies; How can it be so fast ? Physical data independence: query is independent of physical storage Physical data independence: query is independent of physical storage

21 How Can We Evaluate the Query ? Actor:Cast: Movie: idfNamelNamegender...Hanks... idNameyear pidmid... Plan 1:.... [ in class ] Plan 2:.... [ in class ] Dan Suciu -- p544 Fall 2011

22 ActorCastMovie  lName=‘Hanks’  year=1995 ActorCastMovie  lName=‘Hanks’  year=1995 Indexes: on Actor.lName, on Movie.year Alternative query plans: Query optimization Database Statistics histograms, synopses, etc

Key concept 4: Transactions Dan Suciu -- p544 Fall X = Read(Account_1); X.amount = X.amount - 100; Write(Account_1, X); Y = Read(Account_2); Y.amount = Y.amount + 100; Write(Account_2, Y); X = Read(Account_1); X.amount = X.amount - 100; Write(Account_1, X); Y = Read(Account_2); Y.amount = Y.amount + 100; Write(Account_2, Y); CRASH ! What is the problem ? Recovery from systems failures: Transfer $100 from account 1 to account 2:

Dan Suciu -- p544 Fall X = Read(Account); if (X.amount >= 100) { dispense_money( ); X.amount = X.amount – 100; } else error(“Insufficient funds”); X = Read(Account); if (X.amount >= 100) { dispense_money( ); X.amount = X.amount – 100; } else error(“Insufficient funds”); X = Read(Account); if (X.amount >= 100) { dispense_money( ); X.amount = X.amount – 100; } else error(“Insufficient funds”); X = Read(Account); if (X.amount >= 100) { dispense_money( ); X.amount = X.amount – 100; } else error(“Insufficient funds”); What can go wrong ? Concurrency Control Overdrafting an account: User 1: User 2:

Transactions ACID = Atomicity ( = recovery) Consistency Isolation ( = concurrency control) Durability 25Dan Suciu -- p544 Fall 2011

Client/Server Database Architecture Single server that stores the database Many clients running apps and connecting to DBMS Performance bottlenecks: – Client/server communication – Transactional semantics Other architectures: – main memory database – replicated databases 26Dan Suciu -- p544 Fall 2011

Two Types of Database Usage OLTP (online-transaction-processing) – Many updates – Many simple “point queries” – Few (or no) complex aggregate queries Decision-Support – Many aggregate/group-by queries. – Few (or no) updates Dan Suciu -- p544 Fall

Trends in Data Management Large scale data analytics: Map/Reduce, Pig, … Cloud based database service: AWS, Azure, … NoSQL: sacrifice ACID for performance Data privacy Data provenance Complex data analytics: probabilistic databases Dan Suciu -- p544 Fall

Outline of Course Content 1.SQL 2.Relational Calculus, Database Design 3.Constraints, Views 4.Transactions: recovery 5.Transactions: concurrency control 6.XML, XPath, XQuery 7.Data storage, indexes, physical tuning 8.Query execution 9.Query optimization 10.Big Data: Parallel databases, Map/Reduce, Pig Latin 11.Advanced topics: privacy, provenance, probabilistic dbs Dan Suciu -- p544 Fall

Announcement: Homework 1 Homework 1 is posted; Due on Monday, Oct. 10 Tools: – Postgres: install on your computer (PREFERRED) or use the installation in the lab – SQL Server, for testing only; connect to IPROJSRV: login: your UW address; password: …….. Tasks: create db, import data, create indices, write 11 SQL queries Dan Suciu -- p544 Fall

31 Outline for rest of today Basics SQL (Chapters 5.2, 5.3) Aggregates (Chapter 5.5.) Nulls, Outer joins (Chapter 5.6) Subqueries (Chapters 5.4) – This is tough ! Next lecture we will discuss Relational Calculus (a.k.a. Tuple Calculus, Chapter 4.3). See supplementary text Three Query Language Formalisms Dan Suciu -- p544 Fall 2011

32 SQL Data Definition Language (DDL) – Create/alter/delete tables and their attributes – Read from the book Data Manipulation Language (DML) – Query tables, Insert/delete/modify – Discussed in class Dan Suciu -- p544 Fall 2011

33 Tables in SQL PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi Product Attribute names Table name Tuples or rows Key Dan Suciu -- p544 Fall 2011

The Relational Data Model Data is stored in tables, a.k.a. relations Each relation has: 1.A schema = name+attributes – Product(PName, Price, Category, Manufacturer) – Each relation has a key, which we underline 2.An instance = set of rows SQL departs from the pure relational model in that it allows duplicate tuples Set semantics  bag semantics {1, 2, 3}  {1, 1, 2, 3, 3, 3} Dan Suciu -- p544 Fall

35 Data Types in SQL Atomic types: – Characters: CHAR(20), VARCHAR(50) – Numbers: INT, BIGINT, SMALLINT, FLOAT – Others: MONEY, DATETIME, … Record (aka tuple) – Has atomic attributes Table (relation) – A set of tuples Dan Suciu -- p544 Fall 2011

36 Simple SQL Query PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi SELECT * FROM Product WHERE category=‘Gadgets’ Product PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks “selection” Dan Suciu -- p544 Fall 2011

37 Simple SQL Query PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi SELECT PName, Price, Manufacturer FROM Product WHERE Price > ‘$100’ Product PNamePriceManufacturer SingleTouch$149.99Canon MultiTouch$203.99Hitachi “selection” and “projection” Dan Suciu -- p544 Fall 2011

38 Details Case insensitive: SELECT = Select = select Product = product BUT: ‘Seattle’ ≠ ‘seattle’ Constants: ‘abc’ - yes “abc” - no Dan Suciu -- p544 Fall 2011

39 Eliminating Duplicates SELECT DISTINCT category FROM Product SELECT DISTINCT category FROM Product Compare to: SELECT category FROM Product SELECT category FROM Product Category Gadgets Photography Household Category Gadgets Photography Household Dan Suciu -- p544 Fall 2011

40 Ordering the Results SELECT pname, price, manufacturer FROM Product WHERE category=‘Gadgets’ AND price > ‘$10’ ORDER BY price, pname SELECT pname, price, manufacturer FROM Product WHERE category=‘Gadgets’ AND price > ‘$10’ ORDER BY price, pname Ties are broken by the second attribute on the ORDER BY list. Ordering is ascending, unless you specify the DESC keyword. Dan Suciu -- p544 Fall 2011

41 SELECT Category FROM Product ORDER BY PName SELECT Category FROM Product ORDER BY PName PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi ? SELECT DISTINCT category FROM Product ORDER BY category SELECT DISTINCT category FROM Product ORDER BY category SELECT DISTINCT category FROM Product ORDER BY PName SELECT DISTINCT category FROM Product ORDER BY PName ? ? Dan Suciu -- p544 Fall 2011

42 Keys and Foreign Keys PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi Product Company CNameCountry GizmoWorksUSA CanonJapan HitachiJapan Key Foreign key Dan Suciu -- p544 Fall 2011

43 Joins Product (PName, Price, Category, Manufacturer) Company (CName,, Country) Find all products under $200 manufactured in Japan; return their names and prices. SELECT PName, Price FROM Product, Company WHERE Manufacturer=CName AND Country=‘Japan’ AND Price <= ‘$200’ Join between Product and Company Dan Suciu -- p544 Fall 2011

44 Joins PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi ProductCompany CnameCountry GizmoWorksUSA CanonJapan HitachiJapan PNamePrice SingleTouch$ SELECT PName, Price FROM Product, Company WHERE Manufacturer=CName AND Country=‘Japan’ AND Price <= ‘$200’ Dan Suciu -- p544 Fall 2011

45 Tuple Variables SELECT DISTINCT name, country FROM Person, Company WHERE worksfor = cname Which country ? Product (PName, Price, Category, Manufacturer) Company (CName,, Country) Person(name, Country, Worksfor) SELECT DISTINCT Person.name, Company.country FROM Person, Company WHERE Person.worksfor = Company.cname Dan Suciu -- p544 Fall 2011 SELECT DISTINCT x.name, y.country FROM Person AS x, Company AS y WHERE x.worksfor = y.cname

46 In Class Product (pname, price, category, manufacturer) Company (cname, country) Find all Chinese companies that manufacture products in the ‘toy’ category SELECT cname FROM WHERE Dan Suciu -- p544 Fall 2011

47 In Class Product (pname, price, category, manufacturer) Company (cname, country) Find all Chinese companies that manufacture products both in the ‘electronic’ and ‘toy’ categories SELECT cname FROM WHERE Dan Suciu -- p544 Fall 2011

48 The Nested Loop Semantics of SQL Queries SELECT a 1, a 2, …, a k FROM R 1 AS x 1, R 2 AS x 2, …, R n AS x n WHERE Conditions SELECT a 1, a 2, …, a k FROM R 1 AS x 1, R 2 AS x 2, …, R n AS x n WHERE Conditions Dan Suciu -- p544 Fall 2011 Answer = {} for x 1 in R 1 do for x 2 in R 2 do ….. for x n in R n do if Conditions then Answer = Answer  {(a 1,…,a k )} return Answer Answer = {} for x 1 in R 1 do for x 2 in R 2 do ….. for x n in R n do if Conditions then Answer = Answer  {(a 1,…,a k )} return Answer

49 SELECT DISTINCT R.A FROM R, S, T WHERE R.A=S.A OR R.A=T.A SELECT DISTINCT R.A FROM R, S, T WHERE R.A=S.A OR R.A=T.A Using the Formal Semantics If S ≠ ∅ and T ≠ ∅ then returns R  (S  T) else returns ∅ What do these queries compute ? SELECT DISTINCT R.A FROM R, S WHERE R.A=S.A SELECT DISTINCT R.A FROM R, S WHERE R.A=S.A Returns R  S Dan Suciu -- p544 Fall 2011

50 Aggregation SELECT count(*) FROM Product SELECT count(*) FROM Product Except count, all aggregations apply to a single attribute SELECT sum(price) FROM Product WHERE manufacturer=‘GizmoWorks’ SELECT sum(price) FROM Product WHERE manufacturer=‘GizmoWorks’ SQL supports several aggregation operations: sum, count, min, max, avg Dan Suciu -- p544 Fall 2011 Product (pname, price, category, manufacturer) Company (cname, country)

51 COUNT applies to duplicates, unless otherwise stated: SELECT count(category) FROM Product WHERE price > ‘$20’ SELECT count(category) FROM Product WHERE price > ‘$20’ If category has no nulls, then count(category)=count(*) We probably want: SELECT count(DISTINCT category) FROM Product WHERE price > ‘$20’ SELECT count(DISTINCT category) FROM Product WHERE price > ‘$20’ Aggregation: Count Dan Suciu -- p544 Fall 2011 Product (pname, price, category, manufacturer) Company (cname, country)

52 Grouping and Aggregation SELECT manufacturer, count(*) AS total FROM Product WHERE price < ‘$200’ GROUP BY manufacturer SELECT manufacturer, count(*) AS total FROM Product WHERE price < ‘$200’ GROUP BY manufacturer Let’s see what this means… For each manufacturer, find total number of its products under $200. Dan Suciu -- p544 Fall 2011 Product (pname, price, category, manufacturer) Company (cname, country)

53 Grouping and Aggregation 1. Compute the FROM and WHERE clauses. 2. Group by the attributes in the GROUPBY 3. Compute the SELECT clause, including aggregates. Dan Suciu -- p544 Fall 2011

54 1&2. FROM-WHERE-GROUPBY Dan Suciu -- p544 Fall 2011 PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi SELECT manufacturer, count(*) AS total FROM Product WHERE price < ‘$200’ GROUP BY manufacturer SELECT manufacturer, count(*) AS total FROM Product WHERE price < ‘$200’ GROUP BY manufacturer

55 3. SELECT Dan Suciu -- p544 Fall 2011 SELECT manufacturer, count(*) AS total FROM Product WHERE price < ‘$200’ GROUP BY manufacturer SELECT manufacturer, count(*) AS total FROM Product WHERE price < ‘$200’ GROUP BY manufacturer PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi count(*)Manufacturer 2GizmoWorks 1Canon

56 HAVING Clause SELECT manufacturer, count(*) AS total FROM Product WHERE price < ‘$200’ GROUP BY manufacturer HAVING min(price) >’$20’ SELECT manufacturer, count(*) AS total FROM Product WHERE price < ‘$200’ GROUP BY manufacturer HAVING min(price) >’$20’ Same query, except that we return only those manufacturers that make only products with price > $20 HAVING clause contains conditions on aggregates. Dan Suciu -- p544 Fall 2011 Product (pname, price, category, manufacturer) Company (cname, country)

57 General form of Grouping and Aggregation SELECT S FROM R 1,…,R n WHERE C1 GROUP BY a 1,…,a k HAVING C2 S = may contain attributes a 1,…,a k and/or any aggregates but NO OTHER ATTRIBUTES C1 = is any condition on the attributes in R 1,…,R n C2 = is any condition on aggregate expressions Why ? Dan Suciu -- p544 Fall 2011

58 General form of Grouping and Aggregation Evaluation steps: 1.Evaluate FROM-WHERE, apply condition C1 2.Group by the attributes a 1,…,a k 3.Apply condition C2 to each group (may have aggregates) 4.Compute aggregates in S and return the result SELECT S FROM R 1,…,R n WHERE C1 GROUP BY a 1,…,a k HAVING C2 SELECT S FROM R 1,…,R n WHERE C1 GROUP BY a 1,…,a k HAVING C2 Dan Suciu -- p544 Fall 2011

59 NULLS in SQL Whenever we don’t have a value, we can put a NULL Can mean many things: – Value does not exists – Value exists but is unknown – Value not applicable – Etc. The schema specifies for each attribute if can be null (nullable attribute) or not How does SQL cope with tables that have NULLs ? Dan Suciu -- p544 Fall 2011

60 Null Values If x= NULL then 4*(3-x)/7 is still NULL If x= NULL then x=‘Joe’ is UNKNOWN In SQL there are three boolean values: FALSE = 0 UNKNOWN = 0.5 TRUE = 1 Dan Suciu -- p544 Fall 2011

61 Null Values C1 AND C2 = min(C1, C2) C1 OR C2 = max(C1, C2) NOT C1 = 1 – C1 Rule in SQL: include only tuples that yield TRUE SELECT * FROM Person WHERE (age < 25) AND (height > 6 OR weight > 190) SELECT * FROM Person WHERE (age < 25) AND (height > 6 OR weight > 190) E.g. age=20 heigth=NULL weight=200 Dan Suciu -- p544 Fall 2011

62 Null Values Unexpected behavior: Some Persons are not included ! SELECT * FROM Person WHERE age = 25 SELECT * FROM Person WHERE age = 25 Dan Suciu -- p544 Fall 2011

63 Null Values Can test for NULL explicitly: – x IS NULL – x IS NOT NULL Now it includes all Persons SELECT * FROM Person WHERE age = 25 OR age IS NULL SELECT * FROM Person WHERE age = 25 OR age IS NULL Dan Suciu -- p544 Fall 2011

Outerjoins 64 SELECT x.country, y.pname FROM Company x JOIN Product y ON x.cname = y.manufacturer SELECT x.country, y.pname FROM Company x JOIN Product y ON x.cname = y.manufacturer SELECT x.country, y.pname FROM Company x, Product y WHERE x.cname = y.manufacturer SELECT x.country, y.pname FROM Company x, Product y WHERE x.cname = y.manufacturer Same as: But countries that don’t manufacture will not be listed ! Product (pname, price, category, manufacturer) Company (cname, country) Normally, joins are “inner joins”: Dan Suciu -- p544 Fall 2011

Outerjoins 65 SELECT x.country, y.pname FROM Company x LEFT OUTER JOIN Product y ON x.cname = y.manufacturer SELECT x.country, y.pname FROM Company x LEFT OUTER JOIN Product y ON x.cname = y.manufacturer If we want to see the companies that don’t produce anything, then we use an outer join: Dan Suciu -- p544 Fall 2011 Product (pname, price, category, manufacturer) Company (cname, country)

66 Product Company Dan Suciu -- p544 Fall 2011 PNamePriceCategoryManufacturer Gizmo$19.99GadgetsGizmoWorks Powergizmo$29.99GadgetsGizmoWorks SingleTouch$149.99PhotographyCanon MultiTouch$203.99HouseholdHitachi CnameCountry GizmoWorksUSA CanonJapan HitachiJapan MuseumPassVatican CnamePName USA GizmoWorks USAGizmoWorks JapanCanon JapanHitachi Vatican NULL

Application Dan Suciu -- p544 Fall SELECT x.country, count(*) FROM Company x, Product y WHERE x.cname = y.manufacturer GROUP BY x.country SELECT x.country, count(*) FROM Company x, Product y WHERE x.cname = y.manufacturer GROUP BY x.country What’s wrong ? Product (pname, price, category, manufacturer) Company (cname, country) Compute the total number of products made by each country

Application Dan Suciu -- p544 Fall SELECT x.country, count(y.pname) FROM Company x LEFT OUTER JOIN Product y ON x.cname = y.manufacturer GROUP BY x.country SELECT x.country, count(y.pname) FROM Company x LEFT OUTER JOIN Product y ON x.cname = y.manufacturer GROUP BY x.country Now we also get the products who sold in 0 quantity Product (pname, price, category, manufacturer) Company (cname, country) Compute the total number of products made by each country Note: we don’t use count(*) WHY ? Note: we don’t use count(*) WHY ?

69 Outer Joins Left outer join: – Include the left tuple even if there’s no match Right outer join: – Include the right tuple even if there’s no match Full outer join: – Include the both left and right tuples even if there’s no match Dan Suciu -- p544 Fall 2011

Subqueries A subquery is another SQL query nested inside a larger query Such inner-outer queries are called nested queries A subquery may occur in: 1.A SELECT clause 2.A FROM clause 3.A WHERE clause Dan Suciu -- p544 Fall Rule of thumb: avoid writing nested queries when possible; sometimes it’s impossible

71 1. Subqueries in SELECT Product (pname, price, category, manufacturer) Company (cname, country) For each product return the country that manufactures it SELECT X.pname, (SELECT Y.country FROM Company Y WHERE Y.cname=X.manufacturer) FROM Product X SELECT X.pname, (SELECT Y.country FROM Company Y WHERE Y.cname=X.manufacturer) FROM Product X What happens if a subquery returns more than one country ? Dan Suciu -- p544 Fall 2011

72 1. Subqueries in SELECT Whenever possible, don’t use a nested queries: = We have “unnested” the query Dan Suciu -- p544 Fall 2011 SELECT X.pname, (SELECT Y.country FROM Company Y WHERE Y.cname=X.manufacturer) FROM Product X SELECT X.pname, (SELECT Y.country FROM Company Y WHERE Y.cname=X.manufacturer) FROM Product X SELECT pname, country FROM Product, Company WHERE cname=manufacturer SELECT pname, country FROM Product, Company WHERE cname=manufacturer Product (pname, price, category, manufacturer) Company (cname, country)

73 1. Subqueries in SELECT Compute the number of products made by each country SELECT DISTINCT x.country, (SELECT count(*) FROM Company y, Product WHERE y.cname=manufacturer and y.country = x.country) FROM Company x SELECT DISTINCT x.country, (SELECT count(*) FROM Company y, Product WHERE y.cname=manufacturer and y.country = x.country) FROM Company x Better: we can unnest by using a GROUP BY Dan Suciu -- p544 Fall 2011 Product (pname, price, category, manufacturer) Company (cname, country) SELECT x.country, count(*) FROM Company x, Product z WHERE x.cname = z.manufacturer GROUP BY x.country SELECT x.country, count(*) FROM Company x, Product z WHERE x.cname = z.manufacturer GROUP BY x.country

74 GROUP BY v.s. Nested Quereis SELECT manufacturer, count(*) AS total FROM Product WHERE price < '$200’ GROUP BY manufacturer SELECT manufacturer, count(*) AS total FROM Product WHERE price < '$200’ GROUP BY manufacturer SELECT DISTINCT x.manufacturer, (SELECT count(*) FROM Product y WHERE x.manufacturer = y.manufacturer AND price < '$200’) AS total FROM Product x WHERE price < '$200’ SELECT DISTINCT x.manufacturer, (SELECT count(*) FROM Product y WHERE x.manufacturer = y.manufacturer AND price < '$200’) AS total FROM Product x WHERE price < '$200’ Why twice ? Dan Suciu -- p544 Fall 2011

75 2. Subqueries in FROM Find all products whose prices is > 20 and < 30 SELECT * FROM (SELECT * FROM Product AS Y WHERE Y.price > ‘$20’) AS x WHERE x.price < ‘$30’ SELECT * FROM (SELECT * FROM Product AS Y WHERE Y.price > ‘$20’) AS x WHERE x.price < ‘$30’ Unnest this query ! Dan Suciu -- p544 Fall 2011 Product (pname, price, category, manufacturer) Company (cname, country)

76 3. Subqueries in WHERE Find all countries that make some products with price < 100 SELECT DISTINCT x.country FROM Company x WHERE EXISTS (SELECT * FROM Product y WHERE y.manufacturer = x.cname and y.price < ‘$100’) SELECT DISTINCT x.country FROM Company x WHERE EXISTS (SELECT * FROM Product y WHERE y.manufacturer = x.cname and y.price < ‘$100’) Existential quantifiers Using EXISTS: Dan Suciu -- p544 Fall 2011 Product (pname, price, category, manufacturer) Company (cname, country) Correlated subqery: uses x from outer query

77 3. Subqueries in WHERE Find all countries that make some products with price < 100 Predicate Calculus (a.k.a. First Order Logic) Dan Suciu -- p544 Fall 2011 { y | ∃ x.Company(x,y) ∧ ( ∃ z. ∃ p. ∃ c.Product(z,p,c,x) ∧ p<100)} Existential quantifiers Product (pname, price, category, manufacturer) Company (cname, country)

78 3. Subqueries in WHERE Find all countries that make some products with price < 100 SELECT DISTINCT country FROM Company WHERE cname IN (SELECT Product.manufacturer FROM Product WHERE Product.price < ‘$100’) SELECT DISTINCT country FROM Company WHERE cname IN (SELECT Product.manufacturer FROM Product WHERE Product.price < ‘$100’) Using IN Dan Suciu -- p544 Fall 2011 Existential quantifiers Product (pname, price, category, manufacturer) Company (cname, country) De-correlated subqery

79 3. Subqueries in WHERE Find all countries that make some products with price < 100 SELECT DISTINCT Company.country FROM Company WHERE ‘$100’ > ANY (SELECT price FROM Product WHERE manufacturer = cname) SELECT DISTINCT Company.country FROM Company WHERE ‘$100’ > ANY (SELECT price FROM Product WHERE manufacturer = cname) Using ANY: Dan Suciu -- p544 Fall 2011 Existential quantifiers Product (pname, price, category, manufacturer) Company (cname, country)

80 3. Subqueries in WHERE Find all countries that make some products with price < 100 SELECT DISTINCT x.country FROM Company x, Product y WHERE x.cname = y.manufacturer and y.price < ‘$100’ SELECT DISTINCT x.country FROM Company x, Product y WHERE x.cname = y.manufacturer and y.price < ‘$100’ Existential quantifiers are easy ! Now let’s unnest it: Dan Suciu -- p544 Fall 2011 Existential quantifiers Product (pname, price, category, manufacturer) Company (cname, country)

81 3. Subqueries in WHERE Universal quantifiers are hard !  Find the countries of all companies that make only products with price < 100 Dan Suciu -- p544 Fall 2011 Universal quantifiers Product (pname, price, category, manufacturer) Company (cname, country)

82 3. Subqueries in WHERE Predicate Calculus (a.k.a. First Order Logic) Dan Suciu -- p544 Fall 2011 { y | ∃ x.Company(x,y) ∧ ( ∀ z. ∀ p. ∀ c.Product(z,p,c,x)  p<100) } Find the countries of all companies that make only products with price < 100 Universal quantifiers Product (pname, price, category, manufacturer) Company (cname, country)

83 3. Subqueries in WHERE Dan Suciu -- p544 Fall 2011 { y | ∃ x. Company(x,y) ∧ ( ∀ z. ∀ p. ∀ c.Product(z,p,c,x)  p<100) } De Morgan’s Laws: ¬(A ∧ B) = ¬A ∨ ¬B ¬(A ∨ B) = ¬A ∧ ¬B ¬ ∀ x. P(x) = ∃ x. ¬ P(x) ¬ ∃ x. P(x) = ∀ x. ¬ P(x) ¬(A ∧ B) = ¬A ∨ ¬B ¬(A ∨ B) = ¬A ∧ ¬B ¬ ∀ x. P(x) = ∃ x. ¬ P(x) ¬ ∃ x. P(x) = ∀ x. ¬ P(x) { y| ∃ x.Company(x,y) ∧ ¬( ∃ z ∃ p. ∃ p.Product(z,p,c,x) ∧ p≥100) } { y | ∃ x. Company(x,y)) } − { y | ∃ x. Company(x,y) ∧ ( ∃ z ∃ p. ∃ c.Product(z,p,c,x) ∧ p≥100) } { y | ∃ x. Company(x,y)) } − { y | ∃ x. Company(x,y) ∧ ( ∃ z ∃ p. ∃ c.Product(z,p,c,x) ∧ p≥100) } ¬(A  B) = A ∧ ¬B = =

84 3. Subqueries in WHERE 2. Find all companies s.t. all their products have price < Find the other companies: i.e. s.t. some product  100 Dan Suciu -- p544 Fall 2011 SELECT DISTINCT country FROM Company WHERE cname IN (SELECT manufacturer FROM Product WHERE price >= ‘$100’) SELECT DISTINCT country FROM Company WHERE cname IN (SELECT manufacturer FROM Product WHERE price >= ‘$100’) SELECT DISTINCT country FROM Company WHERE cname NOT IN (SELECT manufacturer FROM Product WHERE price >= ‘$100’) SELECT DISTINCT country FROM Company WHERE cname NOT IN (SELECT manufacturer FROM Product WHERE price >= ‘$100’)

85 3. Subqueries in WHERE Find the countries of all companies that make only products with price < 100 Universal quantifiers Using EXISTS: Dan Suciu -- p544 Fall 2011 SELECT DISTINCT x.country FROM Company x WHERE NOT EXISTS (SELECT * FROM Product y WHERE y.manufacturer = x.cname and y.price >= ‘$100’) SELECT DISTINCT x.country FROM Company x WHERE NOT EXISTS (SELECT * FROM Product y WHERE y.manufacturer = x.cname and y.price >= ‘$100’) Product (pname, price, category, manufacturer) Company (cname, country)

86 3. Subqueries in WHERE SELECT DISTINCT Company.country FROM Company WHERE ‘$100’ > ALL (SELECT price FROM Product WHERE manufacturer = cname) SELECT DISTINCT Company.country FROM Company WHERE ‘$100’ > ALL (SELECT price FROM Product WHERE manufacturer = cname) Using ALL: Dan Suciu -- p544 Fall 2011 Find the countries of all companies that make only products with price < 100 Universal quantifiers Product (pname, price, category, manufacturer) Company (cname, country)

87 Question for Database Fans and their Friends Can we unnest this query ? Dan Suciu -- p544 Fall 2011 Find the countries of all companies that make only products with price < 100

88 Monotone Queries A query Q is monotone if: – Whenever we add tuples to one or more of the tables… – … the answer to the query cannot contain fewer tuples Fact: all unnested queries are monotone – Proof: using the “nested for loops” semantics Fact: A query a universal quantifier is not monotone Consequence: we cannot unnest a query with a universal quantifier Dan Suciu -- p544 Fall 2011

Queries that must be nested Dan Suciu -- p544 Fall Rule of Thumb: Non-monotone queries cannot be unnested. In particular, queries with a universal quantifier cannot be unnested Rule of Thumb: Non-monotone queries cannot be unnested. In particular, queries with a universal quantifier cannot be unnested

More SQL Read the following commands in the book CREATE TABLE INSERT DELETE UPDATE They are easy; but we need/use them all the time in class, and in the homework assignments Dan Suciu -- p544 Fall

91 Advanced SQLizing 1.Unnesting Aggregates 2.Finding witnesses Dan Suciu -- p544 Fall 2011

Unnesting Aggregates For each category, find the maximum price SELECT DISTINCT X.category, (SELECT max(Y.price) FROM Product Y WHERE X.category = Y.category) FROM Product X SELECT DISTINCT X.category, (SELECT max(Y.price) FROM Product Y WHERE X.category = Y.category) FROM Product X SELECT category, max(price) FROM Product GROUP BY category SELECT category, max(price) FROM Product GROUP BY category Equivalent queries Note: no need for DISTINCT (DISTINCT is the same as GROUP BY) Note: no need for DISTINCT (DISTINCT is the same as GROUP BY) Product (pname, price, category, manufacturer) Company (cname, country)

Unnesting Aggregates Find the number of products made in each country SELECT DISTINCT X.country, (SELECT count(*) FROM Company Y, Product Z WHERE Y.cname=Z.manufacturer AND Y.country = X.country) FROM Company X SELECT DISTINCT X.country, (SELECT count(*) FROM Company Y, Product Z WHERE Y.cname=Z.manufacturer AND Y.country = X.country) FROM Company X SELECT X.country, count(*) FROM Company X, Product Y WHERE X.cname=Y.manufacturer GROUP BY X.country SELECT X.country, count(*) FROM Company X, Product Y WHERE X.cname=Y.manufacturer GROUP BY X.country They are NOT equivalent ! (WHY?) Product (pname, price, category, manufacturer) Company (cname, country)

94 More Unnesting Find authors who wrote  10 documents: Attempt 1: with nested queries SELECT DISTINCT Author.name FROM Author WHERE count(SELECT Wrote.url FROM Wrote WHERE Author.login=Wrote.login) > 10 SELECT DISTINCT Author.name FROM Author WHERE count(SELECT Wrote.url FROM Wrote WHERE Author.login=Wrote.login) > 10 This is SQL by a novice Author(login,name) Wrote(login,url) Dan Suciu -- p544 Fall 2011

95 More Unnesting Find all authors who wrote at least 10 documents: Attempt 2: SQL style (with GROUP BY) SELECT DISTINCT Author.name FROM Author, Wrote WHERE Author.login=Wrote.login GROUP BY Author.name HAVING count(wrote.url) > 10 SELECT DISTINCT Author.name FROM Author, Wrote WHERE Author.login=Wrote.login GROUP BY Author.name HAVING count(wrote.url) > 10 This is SQL by an expert Dan Suciu -- p544 Fall 2011

96 Finding Witnesses For each country, find its most expensive products Dan Suciu -- p544 Fall 2011 Product (pname, price, category, manufacturer) Company (cname, country)

Finding Witnesses SELECT x.country, max(y.price) FROM Company x, Product y WHERE x.cname = y.manufacturer GROUP BY x.country SELECT x.country, max(y.price) FROM Company x, Product y WHERE x.cname = y.manufacturer GROUP BY x.country Finding the maximum price is easy… But we need the witnesses, i.e. the products with max price For each country, find its most expensive products Product (pname, price, category, manufacturer) Company (cname, country)

98 Finding Witnesses SELECT u.country, v.pname, v.price FROM Company u, Product v, (SELECT x.country, max(y.price) as mprice FROM Company x, Product y WHERE x.cname = y.manufacturer GROUP BY x.country) AS p WHERE u.country = p.country and v.price = p.mprice SELECT u.country, v.pname, v.price FROM Company u, Product v, (SELECT x.country, max(y.price) as mprice FROM Company x, Product y WHERE x.cname = y.manufacturer GROUP BY x.country) AS p WHERE u.country = p.country and v.price = p.mprice To find the witnesses, compute the maximum price in a subquery Dan Suciu -- p544 Fall 2011

99 Finding Witnesses There is a more concise solution here: SELECT x.country, y.pname, y.price FROM Company x, Product y WHERE x.cname = y.manufacturer and y.price >= ALL (SELECT z.price FROM Product z WHERE x.cname = z.manufacturer) SELECT x.country, y.pname, y.price FROM Company x, Product y WHERE x.cname = y.manufacturer and y.price >= ALL (SELECT z.price FROM Product z WHERE x.cname = z.manufacturer) Dan Suciu -- p544 Fall 2011 Product (pname, price, category, manufacturer) Company (cname, country)