Subqueries CIS 4301 Lecture Notes Lecture /23/2006
Lecture 17© CIS Spring Subqueries So far, conditions in where clause involved scalar values SELECT statements appearing in WHERE clause = “subquery” Treated as value if returns a single tuple result Otherwise treated as relation and compared to values using IN, EXISTS, ALL, ANY
Lecture 17© CIS Spring Subqueries Involving Scalar Values Find the name of the producer of ‘Star Wars’
Lecture 17© CIS Spring Subqueries Involving Scalar Values producerC# Run-time error, if zero or more than one tuple is produced! Different way to look at this: Subquery returning scalar value Unary relation with one tuple Find the name of the producer of ‘Star Wars’ SELECT name FROM MovieExec WHERE cert# = (SELECT DISTINCT producerC# FROM Movie WHERE title = ‘Star Wars’ );
Lecture 17© CIS Spring Conditions Involving Relations Apply set operations to subqueries that produce a relation Will be result of select-from-where query 1. EXISTS R : true iff R not empty Let s be a scalar value, R must be one-column relation 2. s IN R : true if s in R ( can be relaxed to incl. tuple variables ) 3. s op ALL R, op = {,<>,=, …}: true if s greater, smaller, etc. than ALL values in R 4. s op ANY R : true if s greater, smaller, … than at least one value in R
Lecture 17© CIS Spring Set Operators in Subqueries SELECT... WHERE... att < ALL/ANY (subquery) ALL/ANY > ALL/ANY ALL/ANY = ALL/ANY <> ALL/ANY Can also precede expression with NOT, e.g.: SELECT... WHERE... NOT (att < ALL/ANY...) scalar value s
Lecture 17© CIS Spring Works for Comparing Tuples SELECT... WHERE... t IN (subquery) If a tuple t has same number of components as R, makes sense to test for set membership using IN operator
Lecture 17© CIS Spring Example Find the producers of Harrison Ford’s movies Option 1: Using Cartesian Product
Lecture 17© CIS Spring Using Nested Subquery Option 2: Using nested subqueries and comparison operator IN SELECT name FROM MovieExec WHERE cert# IN (SELECT producerC# FROM Movie WHERE (title, year) IN (SELECT movieTitle, movieYear FROM StarsIn WHERE starName = ‘Harrison Ford’ ) );
Lecture 17© CIS Spring Correlated Subqueries So far, subqueries have been evaluated ONCE! Correlated subqueries are evaluated many times, once for each assignment to the subquery parameter Find the titles that have been used for two or more movies Step 1: Start with outer query that looks at all tuples in: Movie (title,year,length,inColor,studioName,producerC#) Step 2: For each tuple, we ask subquery whether there is a movie with the same title and a greater year
Lecture 17© CIS Spring Correlated Subqueries Find the titles that have been used for two or more movies SELECT title FROM Movie AS Old WHERE year < ANY (SELECT year FROM Movie WHERE title = Old.title ); For each iteration of the outer query, provide value for inner query Scope for tuple variable “Movie” Scope for tuple variable “Old”
Lecture 17© CIS Spring Example Consider MovieStar(Name,address,gender,age) Find all Movie Stars who are younger than the oldest colleague of their gender.
Lecture 17© CIS Spring Another Example Find all producers who made at least one film before 1960 can be rewritten to Find all producers such that there exists at least one film that they produced before 1960
Lecture 17© CIS Spring SQL Join Expressions Can produce new relations using a number of join operators R1 CROSS JOIN R2 behaves just like Cartesian product R1 JOIN R2 ON behaves like a theta join R1 NATURAL JOIN R2 behaves like a natural join Outerjoins
Lecture 17© CIS Spring Examples Movie CROSS JOIN StarsIn; Movies JOIN StarsIn ON title = movieTitle AND year = movieYear; MovieStar NATURAL JOIN MovieExec;
Lecture 17© CIS Spring But Why? Results in more readable queries: SELECT name FROM MovieStar, MovieExec WHERE MovieStar.name = MovieExec.name AND MovieStar.address = MovieExec.address; or SELECT name FROM MovieStar NATURAL JOIN MovieExec; Drawback? Not supported in all dialects of SQL
Lecture 17© CIS Spring Outer Joins In “inner” joins, dangling tuples do NOT show up in the result If we want to see those as well, need new type of join, “outer join” How does it work? Join tuples as before Dangling tuples from either relation will added to the result, padded with NULL values FULL outer join pads dangling tuples from both of the relations LEFT or RIGHT outer join pads dangling tuples from left or right relation only
Lecture 17© CIS Spring Example MovieStar (name,address,gender,bdate) MovieExec(name,address,cert#,networth) MovieStar NATURAL FULL OUTER JOIN MovieExec; Assume we have two movie stars, one of which is also a movie exec, and one movie exec who is not a movie star nameaddressgenderbdatecert#networth Mary Tyler MooreMaple Str.‘F’9/9/ $100M T. HanksCherry Ln.‘M’8/8/88NULL G. LucasOak Rd.NULL 3456$400M
Lecture 17© CIS Spring Duplicate Elimination Be careful about whether of not to expect duplicates in result SQL’s notion of relation different from that used by “pure” relational algebra SQL uses bags or multisets SELECT DISTINCT … However, not all queries suffer from duplicates
Lecture 17© CIS Spring Example SELECT name FROM MovieExec WHERE cert# IN (SELECT producerC# FROM Movie WHERE (title, year) IN (SELECT movieTitle, movieYear FROM StarsIn WHERE starName = ‘Harrison Ford’ ) ); Does it return duplicates or not?
Lecture 17© CIS Spring More on Duplicates UNION, INTERSECTION, EXCEPT automatically remove duplicates If you want to retain them, use keyword ALL
Lecture 17© CIS Spring Aggregation Operations that form a single value from the list of values appearing in the column SUM, AVG, MIN, MAX, COUNT Find the total number of movie executives SELECT COUNT(DISTINCT name) FROM MovieExec; works even if name is not key If we want to be sure not to count same exec twice SELECT COUNT(*) FROM MovieExec; Only allowed with count
Lecture 17© CIS Spring Grouping of Tuples Often, do not want to apply aggregate operator to entire column Find the total number of minutes of movies produced by each studio Want output like: studioSUM (length) Disney MGM Use GROUP BY followed by list of grouping attributes
Lecture 17© CIS Spring GROUP BY Answer SELECT studioName, SUM(length) FROM Movie GROUP BY studioName; Very important!! SELECT clause has two different kinds of terms (1) Aggregations, evaluated on a per-group basis (2) Attributes that appear in the group by clause In a SELECT clause that has aggregations, only those attributes that are mentioned in GROUP BY clause may appear un-aggregated
Lecture 17© CIS Spring Another Group-By Example Find each producer’s total length of film produced
Lecture 17© CIS Spring Restricted Grouping Find total length of movies for producers with net worth greater than $10M.
Lecture 17© CIS Spring Another Example Find total film length for only those producers who made at least one film before SELECT name, SUM(length) FROM Movie, MovieExec WHERE producerC# = cert# AND year < 1930 GROUP BY name; Not correct! Includes only movies made before 1930 in total Need one more clause
Lecture 17© CIS Spring HAVING Mechanism to chose groups based on some aggregate property of the group itself Find total film length for only those producers who made at least one film before 1960 SELECT name,SUM(length) FROM Movie, MovieExec WHERE producerC# = cert# GROUP BY name HAVING MIN(year)< 1960; Removes from grouped relation all those groups in which every tuple had a year component 1960 or higher