Lecture 7: Subqueries Tarik Booker California State University, Los Angeles.

Slides:



Advertisements
Similar presentations
Advanced SQL (part 1) CS263 Lecture 7.
Advertisements

© 2007 by Prentice Hall (Hoffer, Prescott & McFadden) 1 Joins and Sub-queries in SQL.
Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Lecture 3: Joins Part I.
A Guide to SQL, Seventh Edition. Objectives Use joins to retrieve data from more than one table Use the IN and EXISTS operators to query multiple tables.
Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Lecture 7: Subqueries and Set Operations.
Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.
Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Lecture 7: Aggregates.
Instructor: Craig Duckett CASE, ORDER BY, GROUP BY, HAVING, Subqueries
Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Lecture 7: Subqueries and Set Operations.
Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Lecture 2: Single-Table Selections.
Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Lecture 6: Set Functions.
Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Lecture 3: Joins Part I.
Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Lecture 9: Data Manipulation Language.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 8 Advanced SQL.
Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Lecture 7: Aggregates.
Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Lecture 5: Subqueries and Set Operations.
Using Relational Databases and SQL Department of Computer Science California State University, Los Angeles Lecture 7:
Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Lecture 2: Single-Table Selections.
Chapter 6 SQL: Data Manipulation Cont’d. 2 ANY and ALL u ANY and ALL used with subqueries that produce single column of numbers u ALL –Condition only.
Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Lecture 4: Joins Part II.
DAY 21: MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Akhila Kondai October 30, 2013.
Using Relational Databases and SQL Department of Computer Science California State University, Los Angeles Lecture 8: Subqueries.
Banner and the SQL Select Statement: Part Four (Multiple Connected Select Statements) Mark Holliday Department of Mathematics and Computer Science Western.
Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 3: Joins Part I.
A Guide to MySQL 5. 2 Objectives Use joins to retrieve data from more than one table Use the IN and EXISTS operators to query multiple tables Use a subquery.
CSC271 Database Systems Lecture # 12. Summary: Previous Lecture  Row selection using WHERE clause  WHERE clause and search conditions  Sorting results.
Database Programming Sections 6 –Subqueries, Single Row Subqueries, Multiple-column subqueries, Multiple-row Subqueries, Correlated Subqueries 11/2/10,
SQL: Data Manipulation Presented by Mary Choi For CS157B Dr. Sin Min Lee.
Using Relational Databases and SQL Department of Computer Science California State University, Los Angeles Lecture 6: Midterm Review.
ADVANCED SQL SELECT QUERIES CS 260 Database Systems.
Day 13, Slide 1 U:/msu/course/cse/103 CSE 103 Students: Review INNER and OUTER JOINs, Subqueries. Others: Please save your.
Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 2: Single-Table Selections.
Intro to SQL Management Studio. Please Be Sure!! Make sure that your access is read only. If it isn’t, you have the potential to change data within your.
Relational Databases.  In week 1 we looked at the concept of a key, the primary key is a column/attribute that uniquely identifies the rest of the data.
Using Relational Databases and SQL Department of Computer Science California State University, Los Angeles Lecture 4: Joins Part II.
DAY 21: MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Aliya Farheen October 29,2015.
A Guide to SQL, Eighth Edition Chapter Five Multiple-Table Queries.
# 1# 1 QueriesQueries How do we ask questions of the data? What is SELECT? What is FROM? What is WHERE? What is a calculated field? Spring 2010 CS105.
In this session, you will learn to: Query data by using joins Query data by using subqueries Objectives.
Database Programming Sections 6 –Subqueries, Single Row Subqueries, Multiple-row Subqueries, Correlated Subqueries.
There’s a particular style to it… Rob Hatton
MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Sravanthi Lakkimsety Mar 14,2016.
CS122 Using Relational Databases and SQL Huiping Guo Department of Computer Science California State University, Los Angeles 2. Single Table Queries.
CS 122: Lecture 3 Joins (Part 1) Tarik Booker CS 122 California State University, Los Angeles October 7, 2014.
Select Complex Queries Database Management Fundamentals LESSON 3.1b.
Tarik Booker CS 122. What we will cover… Tables (review) SELECT statement DISTINCT, Calculated Columns FROM Single tables (for now…) WHERE Date clauses,
CSC314 DAY 9 Intermediate SQL 1. Chapter 6 © 2013 Pearson Education, Inc. Publishing as Prentice Hall USING AND DEFINING VIEWS  Views provide users controlled.
Joins (Part II) Tarik Booker California State University, Los Angeles.
CS122 Using Relational Databases and SQL Huiping Guo Department of Computer Science California State University, Los Angeles 4. Subqueries and joins.
Using Subqueries to Solve Queries
CS122 Using Relational Databases and SQL
MySQL Subquery Source: Dev.MySql.com
Prof: Dr. Shu-Ching Chen TA: Hsin-Yu Ha
Instructor: Craig Duckett Lecture 09: Tuesday, April 25th, 2017
CS122 Using Relational Databases and SQL
CS122 Using Relational Databases and SQL
Using Subqueries to Solve Queries
CS122 Using Relational Databases and SQL
Writing Correlated Subqueries
Prof: Dr. Shu-Ching Chen TA: Yimin Yang
20761B 10: Using Subqueries Module 10   Using Subqueries.
Prof: Dr. Shu-Ching Chen TA: Hsin-Yu Ha
Prof: Dr. Shu-Ching Chen TA: Haiman Tian
CS122 Using Relational Databases and SQL
CS122 Using Relational Databases and SQL
M1G Introduction to Database Development
Subqueries Schedule: Timing Topic 25 minutes Lecture
CS122 Using Relational Databases and SQL
Presentation transcript:

Lecture 7: Subqueries Tarik Booker California State University, Los Angeles

What we will cover… Subqueries What, Why, How Where to use Subqueries in the WHERE clause Subqueries in the HAVING clause IN / NOT IN ALL / ANY Subqueries in the SELECT clause Nested Subqueries Correlated Subqueries EXISTS with Subqueries Subqueries vs. Joins

Subqueries There might be cases where certain results can’t be done with only one query! Ex: Find the name of the youngest friend of “Helen X. World.” I can find the youngest age (max(birthdate)), but not name of that person. In this case, use a subquery! What is a subquery? A query within a query! Also called an inner query These must be surrounded by parentheses! The query that contains a subquery is called an outer query We will come back to the above problem (later in the lecture).

Subqueries (2): Example We can also use subqueries to simplify longer queries: Ex: List the name of all salespeople that don’t represent any members What we’ve done so far: SELECT s.firstname, s.lastname FROM Salespeople s LEFT JOIN Members m USING (SalesID) WHERE m.memberid is NULL; Using a Subquery: SELECT firstname, lastname FROM Salespeople WHERE salesID NOT IN (SELECT DISTINCT salesID FROM Members);

Subqueries (3): When to use When to use a subquery? Impossible or very difficult to solve the problem using a single query When a subquery runs faster than a non-subquery solution Not as necessary with MySQL When it is easier to understand than an alternate solution (Outer join, union, might be harder) When you want to use an aggregate function in a WHERE clause Mostly used this way Subquery executes separately

Types of Subqueries Single Value Subqueries The subquery returns a single value Data is in one column, one row List Subqueries Subquery returns a list One column, multiple rows Table Subqueries Subquery returns an entire table Multiple columns, multiple rows

Subqueries in the WHERE Clause WHERE Clause subqueries are most common Filters out values from the outer query using values (or a list of values) from the inner query We can now apply the results of aggregate functions in the WHERE clause!!! Note: This is the only way of doing this (to get the right result)! Outer Query Inner Query WHERE ()

WHERE Clause Subqueries (2) HOW to do this? Ex: List all tracks with runtime greater than the average runtime of all tracks. WRONG Answer: SELECT Tracktitle, LengthSeconds FROM Tracks t WHERE LengthSeconds > AVG(t.Lengthseconds) Why? The WHERE Clause processes before the aggregate function AVG () cannot be processed. That order is part of the design of SQL We have to process the aggregate function before the WHERE clause How? Use a subquery!

WHERE Clause Subqueries (3) Ex: List all tracks with runtime greater than the average runtime of all tracks. CORRECT Answer: SELECT Tracktitle, LengthSeconds FROM Tracks WHERE LengthSeconds > (SELECT AVG(LengthSeconds) FROM Tracks); Note: The aggregate function is processed before the WHERE clause, because it’s in an entirely different query (a subquery) The inner query doesn’t actually filter anything out, it just supplies a value (in this case the average) to the WHERE statement in the outer query The inner query runs first, returns a value, then the outer query runs last.

WHERE Clause Subqueries (4) The inner query runs first, the outer query runs last. Inner query’s result(s) produces outer query’s values Note: Inner query can be any legal query! Outer Query Inner Query

WHERE Clause Subqueries (5) Expanded from the midterm review DB: Ex: Find the name of the youngest friend (s) of “Helen X. World.” We can only do this with a subquery: First: Find the query to get the date of Helen’s youngest friend: SELECT MAX(birthday) FROM Users JOIN XrefUsersChats USING(chatID) JOIN Chats USING(ChatID) WHERE Firstname = ‘Helen’ AND Lastname = ‘World’ AND MI = ‘X’; Next: Use the resulting (date) value to compare names with SELECT Firstname, Lastname, MI FROM Users WHERE Birthday = (Resulting_Value); Last: Substitute the inner query for the resulting value:

WHERE Clause Subqueries (6) Inner query: SELECT MAX(birthday) FROM Users JOIN XrefUsersChats USING(chatID) JOIN Chats USING(ChatID) WHERE Firstname = ‘Helen’ AND Lastname = ‘World’ AND MI = ‘X’; Outer query: SELECT Firstname, Lastname, MI FROM Users WHERE Birthday = (Resulting_Value); Complete Query: SELECT Firstname, Lastname, MI FROM Users WHERE Birthday = ( SELECT MAX(birthday) FROM Users JOIN XrefUsersChats USING(chatID) JOIN Chats USING(ChatID) WHERE Firstname = ‘Helen’ AND Lastname = ‘World’ AND MI = ‘X’); Note: This looks hard, but we’ve just combined two simple queries.

Where Clause Subquery Example Using the Lyric database: List all titles recorded at MakeTrax or LoneStar Recording. Do not use a join and do not hard-code company ID’s. We will use subqueries: Outer query: SELECT Title FROM Titles WHERE StudioID = (X) OR StudioID = (Y); Inner Query X (Collecting the StudioID for MakeTrax): SELECT StudioID FROM Studios WHERE studioname = ‘MakeTrax’; Inner Query Y (Collecting the StudioID for LoneStar): SELECT StudioID FROM Studios WHERE studioname = ‘Lone Star Recording’;

WHERE Clause Subquery Example (2) Complete Query: SELECT Title FROM Titles WHERE StudioID = ( SELECT StudioID FROM Studios WHERE Studioname = ‘MakeTrax’) OR StudioID = ( SELECT StudioID FROM Studios WHERE Studioname = ‘Lone Star Recording’); Note: Indenting helps specify subqueries, but is not required. SELECT Title FROM Titles WHERE StudioID = (SELECT StudioID FROM Studios WHERE Studioname = ‘MakeTrax’) OR StudioID = (SELECT StudioID FROM Studios WHERE Studioname = ‘Lone Star Recording’); This is okay. What’s another way to do this problem (without joins)?

Tips for WHERE Clause Subqueries Do’s: Do remember to remove semicolons from inner queries. Do separate inner and outer queries if you are having trouble. Do check that each query works and gives the correct result before combining them. Don’ts: Don’t forget the ending parentheses for subqueries! Don’t forget this is the only way to use aggregate functions in the WHERE Clause!

IN / NOT IN IN (Keyword) Tests if an expression matches any items in a list List = one column, many rows Typically used in the result of a (list) subquery. NOT IN If the expression is not in the list Syntax: expression IN (list_subquery) expression NOT IN (list subquery)

IN / NOT IN Example Ex: List the names of salespeople that represent Members in the USA without using a join. How to solve? Get the USA Members’ salesid’s, then compare with salespeople Members info is in Members Table Salespeople information is in SalesPeople Table How to structure the subquery? Listing salespeople names is the final result, so that should be an outer query. Therefore, getting member associated salesIDs should be an inner query Test if the salespeople names are IN the group returned by the member salesID’s

IN / NOT IN Example (2) Outer query: Get names of a particular salesID. SELECT SalesID FROM Salespeople WHERE SalesID IN (salesID_values); I’m using IN (instead of “=“) because there are (possibly) multiple values. Note: No single quotes in the salesID_values (between the parentheses) when using a subquery. Inner query:Get SalesID’s of Members from the USA. SELECT SalesID FROM Members WHERE Country = ‘USA’; When you combine, insert the inner query directly between the parentheses.

IN / NOT IN Example (3) Final Result : SELECT Firstname, Lastname FROM Salespeople WHERE SalesID IN ( SELECT SalesID FROM Members WHERE Country = ‘USA’); Or (no indentation): SELECT Firstname, Lastname FROM Salespeople WHERE SalesID IN ( SELECT SalesID FROM Members WHERE Country = ‘USA’); Note: I didn’t need to use a table alias for my outer query. Why?

IN / NOT IN Example(4) Keep in mind: IN is NOT A JOIN!!! IN simply compares a group of values Whatever is in the results of the subquery (In the previous case, salesids) You can compare salesid’s anywhere without having to chain together tables What does this query do? SELECT Firstname, Lastname FROM Salespeople WHERE SalesID IN ( SELECT SalesID FROM Studios WHERE Country = ‘USA’);

ALL and ANY ALL Condition must hold true for all elements in the list Syntax: expression operator ALL ANY Condition must hold true for any element in the list Syntax: expression operator ANY

ALL and ANY Example Ex: List the names of all members whose birthdays are later than those of all members from CA or OH How to solve? Get birthdays of members from CA and OH, then compare with other member birthdays How to structure? Listing later birthdays is final result, so this is outer query Birthdays from CA or OH is inner query All members, so ALL

ALL and ANY Example (2) Outer query: SELECT LastName, FirstName FROM Members WHERE Birthday > ALL (X) AND Birthday > ALL (Y); Inner query (X): SELECT birthday FROM Members WHERE Region = ‘CA’; Inner query (Y): SELECT birthday FROM Members WHERE Region = ‘OH’; Note: What’s another way to do this query?

ALL and ANY Example (3) Combined Query: SELECT LastName, FirstName FROM Members WHERE Birthday > ALL ( SELECT birthday FROM Members WHERE Region = ‘CA’) AND Birthday > ALL (SELECT birthday FROM Members WHERE Region = ‘OH’); What’s the other way to do this? Instead of two subqueries…

Subqueries in the HAVING Clause You can also have subqueries in the HAVING clause! You can substitute each subquery as well Ex: List the number of members in each region that has more members than California. Outer Query = Number of members in each region Inner Query = Number of members in California Since we want number of members in each region, we also must group results (in outer query) by region.

Outer Query: SELECT Region, COUNT(*) FROM Members GROUP BY Region HAVING COUNT(*) > (number_of_members_in_ca); Inner Query (Number of members in CA): SELECT COUNT( *) FROM Members WHERE Region = ‘CA’;

Subqueries in the HAVING Clause (2) Solution: SELECT Region, COUNT(*) FROM Members GROUP BY Region HAVING COUNT(*) > (SELECT COUNT(*) FROM Members WHERE Region = ‘CA’); Note: We don’t need DISTINCT because we are working from the Members table (all rows are unique)

Subqueries in the SELECT Clause If you put a subquery in the SELECT clause, the subquery must return a single value NOT a list, or table! Examples: SELECT (SELECT 1) + (SELECT 2); Ans: 3 SELECT (SELECT COUNT(*) FROM Tracks); Ans: 50 SELECT (SELECT * From Tracks); Ans: Error! Why?

Subqueries in the SELECT Clause (2) SELECT Clause subqueries are useful when computing single- value calculations Percentages Ex: What percentage of members are male? How to solve? Percentage = 100 * (total number of male members) / (total number of members) Outer Query = 100 * (x) / (y); Inner Query (x): Total number of male members SELECT COUNT(*) FROM Members WHERE Gender = ‘M’; Inner Query (y): Total number of members SELECT COUNT(*) FROM Members;

Subqueries in the SELECT Clause(3) Complete Query: SELECT 100 * (SELECT COUNT(*) FROM Members WHERE Gender = ‘M’) / (SELECT COUNT(*) FROM Members); Remember, only single-valued subqueries in the SELECT Clause When in doubt, test each query individually!

Nested Subqueries Subqueries within Subqueries! Do the same techniques, but with more layers Ex: List the birthdays of all members who belong to artists which have recorded titles that include the word “the”. Do not use any joins. How to solve? Outermost query: Birthdays of all members Inner query : Members who belong to artists with titles Innermost query:Titles that include the word “the”. Note: We want the word “the”, not everything with the letters t-h-e in it.

Nested Subqueries (2) Outermost query: SELECT Birthday FROM Members WHERE Memberid IN (group_of_memberids); Inner query: SELECT memberid FROM XrefArtistsMembers WHERE ArtistID IN (group_of_artistids); Innermost query: SELECT ArtistID FROM Titles WHERE Title LIKE ‘% the %’ OR Title LIKE ‘% the’ OR Title LIKE ‘the %’ ; Complete Query: SELECT Birthday FROM Members WHERE Memberid IN (SELECT memberid FROM XrefArtistsMembers WHERE ArtistID IN (SELECT ArtistID FROM Titles WHERE Title LIKE ‘% the %’ OR Title LIKE ‘% the’ OR Title LIKE ‘the %’ ));

Nested Subqueries (3) SQL -> English: SELECT A.artistName FROM artists A WHERE (A.artistID IN (select artistID FROM titles WHERE (titles.studioID IN (select studioID FROM studios P WHERE P.salesID IN (select salesID FROM salespeople WHERE base > 100)))));

Nested Subqueries (4) Find all artists who have recorded titles at studios which are represented by salespeople whose base salaries are greater than $100

Correlated Subqueries Our Previous subqueries have been non-correlated. Non-correlated? This means no “dependencies.” The inner query could be run separately We will now deal with correlated subqueries: You can’t run the inner query separately The result of the inner query “depends on” data given it by the outer query The correlated subquery is executed for each row returned by an outer query The WHERE clause of the subquery is joined to the outer query Note: Correlated subqueries cannot be debugged (checked) like independent subqueries.

Correlated Subqueries (2) Ex: List the first track of each title with its length in seconds and the total length in seconds of all tracks for that title: Display: First Track, Length(s) of First Track, Total length of all tracks SELECT TrackTitle, LengthSeconds As Sec, (SELECT SUM(LengthSeconds) FROM Tracks SC WHERE SC.TitleID = T.TitleID) AS TotSec FROM Tracks T WHERE TrackNum = 1; Note: Third field is a subquery in the SELECT Clause, but its TitleID is tied into the Tracks table of the outer query This will make the inner query operate on the same rows as the outer query (at the same time) Note: A correlated subquery in the SELECT clause can only return one value!

Correlated Subqueries (3) Use an alias for the results of the outer query to make the results more readable. Aliases aren’t required in correlated subqueries, but required if using the same table. Ex: Find the titles of all tracks that are less than the mean (average) lengths of tracks for the titles on which they occur: SELECT tr.TrackTitle, tr.Lengthseconds FROM Tracks tr WHERE tr.lengthseconds < ( SELECT AVG(LengthSeconds) FROM Tracks WHERE titleid = tr.titleid); Note: Because the inner query’s titleid is linked to the outer query’s titleid, this becomes a correlated subquery Inner query runs once per row in the outer query

Using EXISTS with Subqueries EXISTS (Keyword) Checks if data exists in a subquery If there is data, returns true If not, returns false Ex: List the name of artists who have recorded at least one title SELECT artistname FROM Artists A WHERE EXISTS (SELECT ArtistID FROM Titles T WHERE T.ArtistID = A.ArtistID);

More on Subqueries Ex: Find all artists that have members from Georgia (GA). We can do this different ways: Without subqueries: SELECT DISTINCT Artistname FROM Artists A INNER JOIN XRefArtistsMembers X USING(ArtistID) INNER JOIN Members M USING(MemberID) WHERE M.Region = ‘GA’; With one subquery (and two joins): SELECT DISTINCT Artistname FROM Artists A INNER JOIN XRefArtistsMembers X USING(ArtistID) INNER JOIN (SELECT MemberID FROM Members WHERE Region = ‘GA’) M USING(MemberID); How can you do this without joins at all?

More on Subqueries (2) Use multiple subqueries! Two subqueries: SELECT DISTINCT Artistname FROM Artists A WHERE ArtistID IN (SELECT ArtistID FROM XrefArtistsMembers X WHERE X.memberid IN (SELECT MemberID FROM Members WHERE Region = ‘GA’));

Subqueries vs. Joins Joins construct Cartesian Products Then filter (remove) data Subqueries select matching records Subqueries tend to be much faster

Updating Records EXISTS with a subquery is faster than a join. Why? With an EXISTS sub-query, SQL does not have to perform a full row by row join, building the Cartesian product and then tossing out unmatched rows. It simply runs the sub-query for each row of the outer query. It may not even have to run the entire sub-query, since as soon as it finds one good record it knows that at least some data exists.

Tips on Solving Subquery Problems When solving subquery problems: Think substitution! Analyze the question, looking for subqueries within the question Replace subqueries in the original question with substitution variables such as X, Y, and Z Write queries for your substitution variables Write a query to that solves the original question using your substitution variables Replace substitution variables with your subqueries

Tips on Solving Subquery Problems (2) If you find yourself unsure of a problem: Try to solve the problem using a single query, and when you get stuck, write a subquery for the part you get stuck on! Good Luck!