Lecture 7: Subqueries Tarik Booker California State University, Los Angeles
What we will cover… Subqueries What, Why, How Where to use Subqueries in the WHERE clause Subqueries in the HAVING clause IN / NOT IN ALL / ANY Subqueries in the SELECT clause Nested Subqueries Correlated Subqueries EXISTS with Subqueries Subqueries vs. Joins
Subqueries There might be cases where certain results can’t be done with only one query! Ex: Find the name of the youngest friend of “Helen X. World.” I can find the youngest age (max(birthdate)), but not name of that person. In this case, use a subquery! What is a subquery? A query within a query! Also called an inner query These must be surrounded by parentheses! The query that contains a subquery is called an outer query We will come back to the above problem (later in the lecture).
Subqueries (2): Example We can also use subqueries to simplify longer queries: Ex: List the name of all salespeople that don’t represent any members What we’ve done so far: SELECT s.firstname, s.lastname FROM Salespeople s LEFT JOIN Members m USING (SalesID) WHERE m.memberid is NULL; Using a Subquery: SELECT firstname, lastname FROM Salespeople WHERE salesID NOT IN (SELECT DISTINCT salesID FROM Members);
Subqueries (3): When to use When to use a subquery? Impossible or very difficult to solve the problem using a single query When a subquery runs faster than a non-subquery solution Not as necessary with MySQL When it is easier to understand than an alternate solution (Outer join, union, might be harder) When you want to use an aggregate function in a WHERE clause Mostly used this way Subquery executes separately
Types of Subqueries Single Value Subqueries The subquery returns a single value Data is in one column, one row List Subqueries Subquery returns a list One column, multiple rows Table Subqueries Subquery returns an entire table Multiple columns, multiple rows
Subqueries in the WHERE Clause WHERE Clause subqueries are most common Filters out values from the outer query using values (or a list of values) from the inner query We can now apply the results of aggregate functions in the WHERE clause!!! Note: This is the only way of doing this (to get the right result)! Outer Query Inner Query WHERE ()
WHERE Clause Subqueries (2) HOW to do this? Ex: List all tracks with runtime greater than the average runtime of all tracks. WRONG Answer: SELECT Tracktitle, LengthSeconds FROM Tracks t WHERE LengthSeconds > AVG(t.Lengthseconds) Why? The WHERE Clause processes before the aggregate function AVG () cannot be processed. That order is part of the design of SQL We have to process the aggregate function before the WHERE clause How? Use a subquery!
WHERE Clause Subqueries (3) Ex: List all tracks with runtime greater than the average runtime of all tracks. CORRECT Answer: SELECT Tracktitle, LengthSeconds FROM Tracks WHERE LengthSeconds > (SELECT AVG(LengthSeconds) FROM Tracks); Note: The aggregate function is processed before the WHERE clause, because it’s in an entirely different query (a subquery) The inner query doesn’t actually filter anything out, it just supplies a value (in this case the average) to the WHERE statement in the outer query The inner query runs first, returns a value, then the outer query runs last.
WHERE Clause Subqueries (4) The inner query runs first, the outer query runs last. Inner query’s result(s) produces outer query’s values Note: Inner query can be any legal query! Outer Query Inner Query
WHERE Clause Subqueries (5) Expanded from the midterm review DB: Ex: Find the name of the youngest friend (s) of “Helen X. World.” We can only do this with a subquery: First: Find the query to get the date of Helen’s youngest friend: SELECT MAX(birthday) FROM Users JOIN XrefUsersChats USING(chatID) JOIN Chats USING(ChatID) WHERE Firstname = ‘Helen’ AND Lastname = ‘World’ AND MI = ‘X’; Next: Use the resulting (date) value to compare names with SELECT Firstname, Lastname, MI FROM Users WHERE Birthday = (Resulting_Value); Last: Substitute the inner query for the resulting value:
WHERE Clause Subqueries (6) Inner query: SELECT MAX(birthday) FROM Users JOIN XrefUsersChats USING(chatID) JOIN Chats USING(ChatID) WHERE Firstname = ‘Helen’ AND Lastname = ‘World’ AND MI = ‘X’; Outer query: SELECT Firstname, Lastname, MI FROM Users WHERE Birthday = (Resulting_Value); Complete Query: SELECT Firstname, Lastname, MI FROM Users WHERE Birthday = ( SELECT MAX(birthday) FROM Users JOIN XrefUsersChats USING(chatID) JOIN Chats USING(ChatID) WHERE Firstname = ‘Helen’ AND Lastname = ‘World’ AND MI = ‘X’); Note: This looks hard, but we’ve just combined two simple queries.
Where Clause Subquery Example Using the Lyric database: List all titles recorded at MakeTrax or LoneStar Recording. Do not use a join and do not hard-code company ID’s. We will use subqueries: Outer query: SELECT Title FROM Titles WHERE StudioID = (X) OR StudioID = (Y); Inner Query X (Collecting the StudioID for MakeTrax): SELECT StudioID FROM Studios WHERE studioname = ‘MakeTrax’; Inner Query Y (Collecting the StudioID for LoneStar): SELECT StudioID FROM Studios WHERE studioname = ‘Lone Star Recording’;
WHERE Clause Subquery Example (2) Complete Query: SELECT Title FROM Titles WHERE StudioID = ( SELECT StudioID FROM Studios WHERE Studioname = ‘MakeTrax’) OR StudioID = ( SELECT StudioID FROM Studios WHERE Studioname = ‘Lone Star Recording’); Note: Indenting helps specify subqueries, but is not required. SELECT Title FROM Titles WHERE StudioID = (SELECT StudioID FROM Studios WHERE Studioname = ‘MakeTrax’) OR StudioID = (SELECT StudioID FROM Studios WHERE Studioname = ‘Lone Star Recording’); This is okay. What’s another way to do this problem (without joins)?
Tips for WHERE Clause Subqueries Do’s: Do remember to remove semicolons from inner queries. Do separate inner and outer queries if you are having trouble. Do check that each query works and gives the correct result before combining them. Don’ts: Don’t forget the ending parentheses for subqueries! Don’t forget this is the only way to use aggregate functions in the WHERE Clause!
IN / NOT IN IN (Keyword) Tests if an expression matches any items in a list List = one column, many rows Typically used in the result of a (list) subquery. NOT IN If the expression is not in the list Syntax: expression IN (list_subquery) expression NOT IN (list subquery)
IN / NOT IN Example Ex: List the names of salespeople that represent Members in the USA without using a join. How to solve? Get the USA Members’ salesid’s, then compare with salespeople Members info is in Members Table Salespeople information is in SalesPeople Table How to structure the subquery? Listing salespeople names is the final result, so that should be an outer query. Therefore, getting member associated salesIDs should be an inner query Test if the salespeople names are IN the group returned by the member salesID’s
IN / NOT IN Example (2) Outer query: Get names of a particular salesID. SELECT SalesID FROM Salespeople WHERE SalesID IN (salesID_values); I’m using IN (instead of “=“) because there are (possibly) multiple values. Note: No single quotes in the salesID_values (between the parentheses) when using a subquery. Inner query:Get SalesID’s of Members from the USA. SELECT SalesID FROM Members WHERE Country = ‘USA’; When you combine, insert the inner query directly between the parentheses.
IN / NOT IN Example (3) Final Result : SELECT Firstname, Lastname FROM Salespeople WHERE SalesID IN ( SELECT SalesID FROM Members WHERE Country = ‘USA’); Or (no indentation): SELECT Firstname, Lastname FROM Salespeople WHERE SalesID IN ( SELECT SalesID FROM Members WHERE Country = ‘USA’); Note: I didn’t need to use a table alias for my outer query. Why?
IN / NOT IN Example(4) Keep in mind: IN is NOT A JOIN!!! IN simply compares a group of values Whatever is in the results of the subquery (In the previous case, salesids) You can compare salesid’s anywhere without having to chain together tables What does this query do? SELECT Firstname, Lastname FROM Salespeople WHERE SalesID IN ( SELECT SalesID FROM Studios WHERE Country = ‘USA’);
ALL and ANY ALL Condition must hold true for all elements in the list Syntax: expression operator ALL ANY Condition must hold true for any element in the list Syntax: expression operator ANY
ALL and ANY Example Ex: List the names of all members whose birthdays are later than those of all members from CA or OH How to solve? Get birthdays of members from CA and OH, then compare with other member birthdays How to structure? Listing later birthdays is final result, so this is outer query Birthdays from CA or OH is inner query All members, so ALL
ALL and ANY Example (2) Outer query: SELECT LastName, FirstName FROM Members WHERE Birthday > ALL (X) AND Birthday > ALL (Y); Inner query (X): SELECT birthday FROM Members WHERE Region = ‘CA’; Inner query (Y): SELECT birthday FROM Members WHERE Region = ‘OH’; Note: What’s another way to do this query?
ALL and ANY Example (3) Combined Query: SELECT LastName, FirstName FROM Members WHERE Birthday > ALL ( SELECT birthday FROM Members WHERE Region = ‘CA’) AND Birthday > ALL (SELECT birthday FROM Members WHERE Region = ‘OH’); What’s the other way to do this? Instead of two subqueries…
Subqueries in the HAVING Clause You can also have subqueries in the HAVING clause! You can substitute each subquery as well Ex: List the number of members in each region that has more members than California. Outer Query = Number of members in each region Inner Query = Number of members in California Since we want number of members in each region, we also must group results (in outer query) by region.
Outer Query: SELECT Region, COUNT(*) FROM Members GROUP BY Region HAVING COUNT(*) > (number_of_members_in_ca); Inner Query (Number of members in CA): SELECT COUNT( *) FROM Members WHERE Region = ‘CA’;
Subqueries in the HAVING Clause (2) Solution: SELECT Region, COUNT(*) FROM Members GROUP BY Region HAVING COUNT(*) > (SELECT COUNT(*) FROM Members WHERE Region = ‘CA’); Note: We don’t need DISTINCT because we are working from the Members table (all rows are unique)
Subqueries in the SELECT Clause If you put a subquery in the SELECT clause, the subquery must return a single value NOT a list, or table! Examples: SELECT (SELECT 1) + (SELECT 2); Ans: 3 SELECT (SELECT COUNT(*) FROM Tracks); Ans: 50 SELECT (SELECT * From Tracks); Ans: Error! Why?
Subqueries in the SELECT Clause (2) SELECT Clause subqueries are useful when computing single- value calculations Percentages Ex: What percentage of members are male? How to solve? Percentage = 100 * (total number of male members) / (total number of members) Outer Query = 100 * (x) / (y); Inner Query (x): Total number of male members SELECT COUNT(*) FROM Members WHERE Gender = ‘M’; Inner Query (y): Total number of members SELECT COUNT(*) FROM Members;
Subqueries in the SELECT Clause(3) Complete Query: SELECT 100 * (SELECT COUNT(*) FROM Members WHERE Gender = ‘M’) / (SELECT COUNT(*) FROM Members); Remember, only single-valued subqueries in the SELECT Clause When in doubt, test each query individually!
Nested Subqueries Subqueries within Subqueries! Do the same techniques, but with more layers Ex: List the birthdays of all members who belong to artists which have recorded titles that include the word “the”. Do not use any joins. How to solve? Outermost query: Birthdays of all members Inner query : Members who belong to artists with titles Innermost query:Titles that include the word “the”. Note: We want the word “the”, not everything with the letters t-h-e in it.
Nested Subqueries (2) Outermost query: SELECT Birthday FROM Members WHERE Memberid IN (group_of_memberids); Inner query: SELECT memberid FROM XrefArtistsMembers WHERE ArtistID IN (group_of_artistids); Innermost query: SELECT ArtistID FROM Titles WHERE Title LIKE ‘% the %’ OR Title LIKE ‘% the’ OR Title LIKE ‘the %’ ; Complete Query: SELECT Birthday FROM Members WHERE Memberid IN (SELECT memberid FROM XrefArtistsMembers WHERE ArtistID IN (SELECT ArtistID FROM Titles WHERE Title LIKE ‘% the %’ OR Title LIKE ‘% the’ OR Title LIKE ‘the %’ ));
Nested Subqueries (3) SQL -> English: SELECT A.artistName FROM artists A WHERE (A.artistID IN (select artistID FROM titles WHERE (titles.studioID IN (select studioID FROM studios P WHERE P.salesID IN (select salesID FROM salespeople WHERE base > 100)))));
Nested Subqueries (4) Find all artists who have recorded titles at studios which are represented by salespeople whose base salaries are greater than $100
Correlated Subqueries Our Previous subqueries have been non-correlated. Non-correlated? This means no “dependencies.” The inner query could be run separately We will now deal with correlated subqueries: You can’t run the inner query separately The result of the inner query “depends on” data given it by the outer query The correlated subquery is executed for each row returned by an outer query The WHERE clause of the subquery is joined to the outer query Note: Correlated subqueries cannot be debugged (checked) like independent subqueries.
Correlated Subqueries (2) Ex: List the first track of each title with its length in seconds and the total length in seconds of all tracks for that title: Display: First Track, Length(s) of First Track, Total length of all tracks SELECT TrackTitle, LengthSeconds As Sec, (SELECT SUM(LengthSeconds) FROM Tracks SC WHERE SC.TitleID = T.TitleID) AS TotSec FROM Tracks T WHERE TrackNum = 1; Note: Third field is a subquery in the SELECT Clause, but its TitleID is tied into the Tracks table of the outer query This will make the inner query operate on the same rows as the outer query (at the same time) Note: A correlated subquery in the SELECT clause can only return one value!
Correlated Subqueries (3) Use an alias for the results of the outer query to make the results more readable. Aliases aren’t required in correlated subqueries, but required if using the same table. Ex: Find the titles of all tracks that are less than the mean (average) lengths of tracks for the titles on which they occur: SELECT tr.TrackTitle, tr.Lengthseconds FROM Tracks tr WHERE tr.lengthseconds < ( SELECT AVG(LengthSeconds) FROM Tracks WHERE titleid = tr.titleid); Note: Because the inner query’s titleid is linked to the outer query’s titleid, this becomes a correlated subquery Inner query runs once per row in the outer query
Using EXISTS with Subqueries EXISTS (Keyword) Checks if data exists in a subquery If there is data, returns true If not, returns false Ex: List the name of artists who have recorded at least one title SELECT artistname FROM Artists A WHERE EXISTS (SELECT ArtistID FROM Titles T WHERE T.ArtistID = A.ArtistID);
More on Subqueries Ex: Find all artists that have members from Georgia (GA). We can do this different ways: Without subqueries: SELECT DISTINCT Artistname FROM Artists A INNER JOIN XRefArtistsMembers X USING(ArtistID) INNER JOIN Members M USING(MemberID) WHERE M.Region = ‘GA’; With one subquery (and two joins): SELECT DISTINCT Artistname FROM Artists A INNER JOIN XRefArtistsMembers X USING(ArtistID) INNER JOIN (SELECT MemberID FROM Members WHERE Region = ‘GA’) M USING(MemberID); How can you do this without joins at all?
More on Subqueries (2) Use multiple subqueries! Two subqueries: SELECT DISTINCT Artistname FROM Artists A WHERE ArtistID IN (SELECT ArtistID FROM XrefArtistsMembers X WHERE X.memberid IN (SELECT MemberID FROM Members WHERE Region = ‘GA’));
Subqueries vs. Joins Joins construct Cartesian Products Then filter (remove) data Subqueries select matching records Subqueries tend to be much faster
Updating Records EXISTS with a subquery is faster than a join. Why? With an EXISTS sub-query, SQL does not have to perform a full row by row join, building the Cartesian product and then tossing out unmatched rows. It simply runs the sub-query for each row of the outer query. It may not even have to run the entire sub-query, since as soon as it finds one good record it knows that at least some data exists.
Tips on Solving Subquery Problems When solving subquery problems: Think substitution! Analyze the question, looking for subqueries within the question Replace subqueries in the original question with substitution variables such as X, Y, and Z Write queries for your substitution variables Write a query to that solves the original question using your substitution variables Replace substitution variables with your subqueries
Tips on Solving Subquery Problems (2) If you find yourself unsure of a problem: Try to solve the problem using a single query, and when you get stuck, write a subquery for the part you get stuck on! Good Luck!