1 Execution Strategies for SQL Subqueries Mostafa Elhemali, César Galindo- Legaria, Torsten Grabs, Milind Joshi Microsoft Corp With additional slides from.

Slides:



Advertisements
Similar presentations
1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Advertisements

Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
EXECUTION PLANS By Nimesh Shah, Amit Bhawnani. Outline  What is execution plan  How are execution plans created  How to get an execution plan  Graphical.
Query Optimization Goal: Declarative SQL query
University of Konstanz Advances in Database Query Processing Sahak Maloyan Avoiding Sorting and Grouping In Processing Queries Sahak Maloyan.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Query Processing (overview)
Query Processing & Optimization
CS561 On Relational Support for XML Publishing Beyond Sorting and Tagging Surajit Chaudhuri Raghav Kaushik Jeffrey F. Naughton Presented by:
CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.
02 | Advanced SELECT Statements Brian Alderman | MCT, CEO / Founder of MicroTechPoint Tobias Ternstrom | Microsoft SQL Server Program Manager.
Query Processing Presented by Aung S. Win.
Jingren Zhou, Per-Ake Larson, Ronnie Chaiken ICDE 2010 Talk by S. Sudarshan, IIT Bombay Some slides from original talk by Zhou et al. 1.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Introduction to Databases Chapter 7: Data Access and Manipulation.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Access Path Selection in a Relational Database Management System Selinger et al.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Database Management 9. course. Execution of queries.
QMapper for Smart Grid: Migrating SQL-based Application to Hive Yue Wang, Yingzhong Xu, Yue Liu, Jian Chen and Songlin Hu SIGMOD’15, May 31–June 4, 2015.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
1 Definition of a subquery Nested subqueries Correlated subqueries The ISNULL function Derived tables The EXISTS operator Mixing data types: CAST & CONVERT.
Lecture 4 - Query Optimization Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Primary Key, Cluster Key & Identity Loop, Hash & Merge Joins Joe Chang
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
CS 338Query Evaluation7-1 Query Evaluation Lecture Topics Query interpretation Basic operations Costs of basic operations Examples Textbook Chapter 12.
Dive into the Query Optimizer Dive into the Query Optimizer: Undocumented Insight Benjamin Nevarez Blog: benjaminnevarez.com
Optimization in XSLT and XQuery Michael Kay. 2 Challenges XSLT/XQuery are high-level declarative languages: performance depends on good optimization Performance.
12.1Database System Concepts - 6 th Edition Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Join Operation Sorting 、 Other.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan Chapter 13: Query Processing.
CPS216: Data-Intensive Computing Systems Introduction to Query Processing Shivnath Babu.
SQL Performance and Optimization l SQL Overview l Performance Tuning Process l SQL-Tuning –EXPLAIN PLANs –Tuning Tools –Optimizing Table Scans –Optimizing.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1.
Propositional Calculus CS 270: Mathematical Foundations of Computer Science Jeremy Johnson.
Chapter 4 Controlling Execution CSE Objectives Evaluate logical expressions –Boolean –Relational Change the flow of execution –Diagrams (e.g.,
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Mostafa Elhemali Leo Giakoumakis. Problem definition QRel system overview Case Study Conclusion 2.
1 Execution Strategies for SQL Subqueries Mostafa Elhemali, César Galindo- Legaria, Torsten Grabs, Milind Joshi Microsoft Corp.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Optimization.
Sorting and Joining.
Query Processing – Implementing Set Operations and Joins Chap. 19.
1 SQL: The Query Language. 2 Example Instances R1 S1 S2 v We will use these instances of the Sailors and Reserves relations in our examples. v If the.
Oracle9i Developer: PL/SQL Programming Chapter 11 Performance Tuning.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Rewriting Procedures for Batched Bindings Ravindra Guravannavar and S. Sudarshan Indian Institute of Technology, Bombay Appeared in VLDB 2008.
Chapter 13: Query Processing
Query Processing COMP3017 Advanced Databases Nicholas Gibbins
Inside Query Processor Sort Dmitry Pilugin SQL Server Developer, MVP (RU) (EN)
Execution Plans Detail From Zero to Hero İsmail Adar.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
Rewriting Procedures for Batched Bindings Ravindra Guravannavar and S. Sudarshan Indian Institute of Technology Bombay.
CHAPTER 19 Query Optimization. CHAPTER 19 Query Optimization.
Chapter 14: Query Optimization
Tuning Transact-SQL Queries
MySQL Subquery Source: Dev.MySql.com
02 | Advanced SELECT Statements
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
Prepared by : Ankit Patel (226)
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Chapter 15 QUERY EXECUTION.
Physical Join Operators
Propositional Calculus: Boolean Algebra and Simplification
February 7th – Exam Review
SQL: Structured Query Language
EXECUTION PLANS Quick Dive.
Query Optimization.
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Presentation transcript:

1 Execution Strategies for SQL Subqueries Mostafa Elhemali, César Galindo- Legaria, Torsten Grabs, Milind Joshi Microsoft Corp With additional slides from material in paper, added by S. Sudarshan

2 Motivation Optimization of subqueries has been studied for some time Challenges  Mixing scalar and relational expressions  Appropriate abstractions for correct and efficient processing  Integration of special techniques in complete system This talk presents the approach followed in SQL Server Framework where specific optimizations can be plugged Framework applies also to “nested loops” languages

3 Outline Query optimizer context Subquery processing framework Subquery disjunctions

4 Algebraic query representation Relational operator trees Not SQL-block-focused GroupBy T.c, sum(T.a) Select (T.b=R.b and R.c = 5) RT Cross product SELECT SUM(T.a) FROM T, R WHERE T.b = R.b AND R.c = 5 GROUP BY T.c GroupBy T.c, sum(T.a) Select (R.c = 5) R T Join (T.b=R.b) algebrizetransform

5 Operator tree transformations Select (A.x = 5) Join BA Select (A.x = 5) Join B A GropBy A.x, B.k, sum(A.y) Join BA B A GropBy A.x, sum(A.y) Join BA Hash-Join BA Simplification / normalization ImplementationExploration

6 SQL Server Optimization process T0 simplify T1pool of alternatives search(0)search(1)search(2) T2 (input) (output) cost-based optimization use simplification / normalization rules use exploration and implementation rules, cost alternatives

7 Plan Generation Overview

8 Outline Query optimizer context Subquery processing framework Subquery disjunctions

9 SQL Subquery A relational expression where you expect a scalar  Existential test, e.g. NOT EXISTS(SELECT…)  Quantified comparison, e.g. T.a =ANY (SELECT…)  Scalar-valued, e.g. T.a = (SELECT…) + (SELECT…) Convenient and widely used by query generators

10 Algebrization select * from customer where 100,000 < (select sum(o_totalprice) from orders where o_custkey = c_custkey) Subqueries: relational operators with scalar parents Commonly “correlated,” i.e. they have outer-references

11 Subquery removal Executing subquery requires mutual recursion between scalar engine and relational engine Subquery removal: Transform tree to remove relational operators from under scalar operators Preserve special semantics of using a relational expression in a scalar, e.g. at-most-one-row

12 The Apply operator R Apply E(r)  For each row r of R, execute function E on r  Return union: {r 1 } X E(r 1 ) U {r 2 } X E(r 2 ) U …  Abstracts “for each” and relational function invocation Also known as d-join and tuple-substitution join Variants: left outer join, semi-join, anti-join Exposed in SQL Server (FROM clause)  LATERAL clause in SQL standard Useful to invoke table-valued functions

13 Subquery removal CUSTOMERSGb(X=SUM(O_TOTALPRICE)) ORDERS APPLY(bind:C_CUSTKEY) SELECT( <X) SELECT(O_CUSTKEY=C_CUSTKEY)

14 Algebraization of SubQueries SQL Query:  SELECT *, (SELECT C_NAME FROM CUSTOMER WHERE C_CUSTKEY = O_CUSTKEY) FROM ORDERS Translated to  ORDERS Apply OJ (π [C_NAME] σ [C_CUSTKEY = O_CUSTKEY] CUSTOMER) In general:  R Apply OJ max1row(E(r)) Subqueries with exists/not exists become  R Apply SJ E(r)  R Apply ASJ E(r)

15 Conditional Scalar Execution Expression  CASE WHEN EXISTS(E1(r)) THEN E2(r) ELSE 0 END Translated to  π CASE WHEN p = 1 THEN e2 ELSE 0 END ( (R Apply[semijoin, probe as p] E1(r)) Apply[outerjoin, pass-through p=1] max1row(E2(r)) as e2)

16 Disjunction of SubQueries WHERE p(r) OR EXISTS( E1(r)) OR EXISTS(E2(r)) R Apply SJ ((σ p(r) CT(1) UA E1(r) UA E2(r))  CT(1): Constant Table returning 1  UA: Union All Can also translate to apply with passthrough

17 Quantification and NULLs Consider predicate 5 NOT IN S which is equivalent to <>ALL The result of this predicate is as follows, for various cases of set S: 1. If S = {} then p is TRUE. 2. If S = {1} then p is TRUE. 3. If S = {5} then p is FALSE. 4. If S = {NULL, 5} then p is FALSE. 5. If S = {NULL, 1} then p is UNKNOWN. (FOR ALL s in S: p)  = (NOT EXISTS s in S: NOT p): But only without nulls  In general predicate A cmp B is translated as A B OR A IS NULL OR B IS NULL where cmp’ is the complement of cmp

18 Apply removal Executing Apply forces nested loops execution into the subquery Apply removal: Transform tree to remove Apply operator The crux of efficient processing Not specific to SQL subqueries Can go by “unnesting,” “decorrelation,” “unrolling loops” Get joins, outerjoin, semijoins, … as a result

19 Apply removal Apply does not add expressive power to relational algebra Removal rules exist for different operators

20 Why remove Apply? Goal is NOT to avoid nested loops execution, but to normalize the query Queries formulated using “for each” surface may be executed more efficiently using set-oriented algorithms … and queries formulated using declarative join syntax may be executed more efficiently using nested loop, “for each” algorithms

21 Removing Apply Cont. Apply removal that preserves the size of the expression.  With Apply ORDERS Apply OJ (σ[C_CUSTKEY = O_CUSTKEY] CUSTOMER)  Removing apply ORDERS OJ [C_CUSTKEY = O_CUSTKEY] CUSTOMER Apply removal that duplicates subexpressions. Apply removal not always possible  max1row/pass-through predicates, opaque functions

22 Magic Sets Originally formulated for recursive query processing Special case for non-recursive queries

23 Magic Sets with Group By Other options  B: Pull groupby above join  C: “Segmented execution”, when R and S are the same E.g. Select all students with the highest mark

24 Reordering Semijoins and Antijoins Pushing down semi/anti joins Converting semi-join to join (to allow reordering)  How about anti-joins?

25 Subquery Disjunctions generates an antijoin with predicate which can be rewritten using Another useful rule

26 Categories of execution strategies select … from customer where exists(… orders …) and … apply customerorders lookup apply orderscustomer lookup hash / merge join customerorders forward lookupreverse lookupset oriented semijoin customerorders normalized logical tree

27 Forward lookup CUSTOMER ORDERS Lkup(O_CUSTKEY=C_CUSTKEY) APPLY[semijoin](bind:C_CUSTKEY) The “natural” form of subquery execution Early termination due to semijoin – pull execution model Best alternative if few CUSTOMERs and index on ORDER exists

28 Reverse lookup ORDERS CUSTOMERS Lkup(C_CUSTKEY=O_CUSTKEY) APPLY(bind:O_CUSTKEY) Mind the duplicates Consider reordering GroupBy (DISTINCT) around join DISTINCT on C_CUSTKEY ORDERS CUSTOMERS Lkup(C_CUSTKEY=O_CUSTKEY) APPLY(bind:O_CUSTKEY) DISTINCT on O_CUSTKEY

29 Subquery processing overview SQL without subquery “nested loops” languages SQL with subquery relational expr without Apply relational expr with Apply logical reordering set-oriented execution navigational, “nested loops” execution Parsing and normalizationCost-based optimization Removal of Subquery Removal of Apply physical optimizations

30 The fine print Can you always remove subqueries?  Yes, but you need a quirky Conditional Apply  Subqueries in CASE WHEN expressions Can you always remove Apply?  Not Conditional Apply  Not with opaque table-valued functions Beyond yes/no answer: Apply removal can explode size of original relational expression

31 Outline Query optimizer context Subquery processing framework Subquery disjunctions

32 Subquery disjunctions select * from customer where c_catgory = “preferred” or exists(select * from nation where n_nation = c_nation and …) or exists(select * from orders where o_custkey = c_custkey and …) CUSTOMER ORDERS APPLY[semijoin](bind:C_CUSTKEY, C_NATION, C_CATEGORY) SELECT UNION ALL NATION SELECTSELECT(C_CATEGORY = “preferred”) 1 Natural forward lookup plan Union All with early termination short-circuits “OR” computation

33 Apply removal on Union CUSTOMERORDERS SEMIJOIN UNION (DISTINCT) SELECT(C_CATEGORY = “preferred”) CUSTOMERNATION SEMIJOIN CUSTOMER Distributivity replicates outer expression Allows set-oriented and reverse lookup plan This form of Apply removal done in cost-based optimization, not simplification

34 Optimizing Apply Caching of results from earlier calls  Trivial if no correlation variables  In-memory if few distinct values/small results  May or may no be worthwhile if large results Asynchronous IO Batch Sort

35 Asynchronous IO Ask OS to prefetch data, continue doing other things while prefetch is happening  better use of resources, esp with multiple disks/CPUs SELECT FROM PART natural join SUPPLIER natural join PARTSUPP WHERE AND PS_SUPPLYCOST = (SELECT MIN(PS_SUPPLYCOST) FROM PARTSUPP, SUPPLIER WHERE P_PARTKEY = PS_PARTKEY AND S_SUPPKEY = PS_SUPPKEY) Plan used by SQL Server (how the hell did it come up with this?)  where

36 Batch Sort Sort order of parameters can help inner query But sorting outer query can be time consuming  esp if we stop after a few answers So: batch a group of outer parameters, sort them, then invoke inner in sorted order Batch size increased step-wise  so first few answers are fast at cost of more IO, later ones optimize IO more but with some delay

37 Summary Presentation focused on overall framework for processing SQL subqueries and “for each” constructs Many optimization pockets within such framework – you can read in the paper:  Optimizations for semijoin, antijoin, outerjoin  “Magic subquery decorrelation” technique  Optimizations for general Apply  … Goal of “decorrelation” is not set-oriented execution, but to normalize and open up execution alternatives

38 A question of costing Fwd-lookup … Set-oriented execution … Bwd-lookup … 10ms to 3 days 2 to 3 hours 10ms to 3 days Optimizer that picks the right strategy for you …priceless, cases opposite to fwd-lkp

39 Execution Strategies for Semijoin Outer query on orders, exists subquery on lineitem (Section 6.1)

40 Execution Strategies for Antijoin Outer query uses only orders, exists subquery on lineitem

41 Strategies for Subquery Disjunction Section 7.1: One disjunction is a select, other is an exists on a subquery

42 Execution Optimization for Apply