Rewriting Procedures for Batched Bindings Ravindra Guravannavar and S. Sudarshan Indian Institute of Technology Bombay.

Slides:



Advertisements
Similar presentations
1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Advertisements

CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
AN INTRODUCTION TO PL/SQL Mehdi Azarmi 1. Introduction PL/SQL is Oracle's procedural language extension to SQL, the non-procedural relational database.
SQL*PLUS, PLSQL and SQLLDR Ali Obaidi. SQL Advantages High level – Builds on relational algebra and calculus – Powerful operations – Enables automatic.
The Volcano/Cascades Query Optimization Framework
CS 540 Database Management Systems
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Query Optimization Goal: Declarative SQL query
Midterm Review Lecture 14b. 14 Lectures So Far 1.Introduction 2.The Relational Model 3.Disks and Files 4.Relational Algebra 5.File Org, Indexes 6.Relational.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Introduction to Databases Chapter 7: Data Access and Manipulation.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 14 – Join Processing.
Efficient Evaluation of XQuery over Streaming Data Xiaogang Li Gagan Agrawal The Ohio State University.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
DB RIDGE : A PROGRAM REWRITE TOOL FOR SET - ORIENTED QUERY EXECUTION Mahendra Chavan*, Ravindra Guravannavar, Prabhas Kumar Samanta, Karthik Ramachandra,
Access Path Selection in a Relational Database Management System Selinger et al.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
QMapper for Smart Grid: Migrating SQL-based Application to Hive Yue Wang, Yingzhong Xu, Yue Liu, Jian Chen and Songlin Hu SIGMOD’15, May 31–June 4, 2015.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
CS 338Query Evaluation7-1 Query Evaluation Lecture Topics Query interpretation Basic operations Costs of basic operations Examples Textbook Chapter 12.
1 Execution Strategies for SQL Subqueries Mostafa Elhemali, César Galindo- Legaria, Torsten Grabs, Milind Joshi Microsoft Corp With additional slides from.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Exploiting Asynchronous IO using the Asynchronous Iterator Model Suresh Iyengar * S. Sudarshan Santosh Kumar # Raja Agrawal & IIT Bombay Current affiliations:
CS4432: Database Systems II Query Processing- Part 2.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
1 Execution Strategies for SQL Subqueries Mostafa Elhemali, César Galindo- Legaria, Torsten Grabs, Milind Joshi Microsoft Corp.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
CS 540 Database Management Systems
Text TCS INTERNAL Oracle PL/SQL – Introduction. TCS INTERNAL PL SQL Introduction PLSQL means Procedural Language extension of SQL. PLSQL is a database.
Rewriting Procedures for Batched Bindings Ravindra Guravannavar and S. Sudarshan Indian Institute of Technology, Bombay Appeared in VLDB 2008.
SQL Triggers, Functions & Stored Procedures Programming Operations.
CHAPTER 19 Query Optimization. CHAPTER 19 Query Optimization.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
Practical Database Design and Tuning
Efficient Evaluation of XQuery over Streaming Data
Tuning Transact-SQL Queries
CS 540 Database Management Systems
CS 440 Database Management Systems
Database Management System
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Chapter 12: Query Processing
Database Performance Tuning and Query Optimization
Introduction to Query Optimization
COST ESTIMATION FOR THE RELATIONAL ALGEBRA OPERATIONS MIT 813 GROUP 15 PRESENTATION.
Database Management Systems (CS 564)
Evaluation of Relational Operations: Other Operations
Introduction to Database Systems
Examples of Physical Query Plan Alternatives
Physical Join Operators
File Processing : Query Processing
April 20th – RDBMS Internals
Yan Huang - CSCI5330 Database Implementation – Access Methods
Practical Database Design and Tuning
Chapters 15 and 16b: Query Optimization
Lecture 2- Query Processing (continued)
Contents Preface I Introduction Lesson Objectives I-2
Relational Database Design
Chapter 8 Advanced SQL.
Chapter 11 Database Performance Tuning and Query Optimization
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
Evaluation of Relational Operations: Other Techniques
A Framework for Testing Query Transformation Rules
Query Optimization.
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

Rewriting Procedures for Batched Bindings Ravindra Guravannavar and S. Sudarshan Indian Institute of Technology Bombay

1/56 Motivation Reason 1. Nested sub-queries (with correlated evaluation) SELECT o_orderkey, o_custkey FROM orders WHERE o_shipdate NOT IN ( SELECT l_shipdate FROM lineitem WHERE l_orderkey = o_orderkey ); Database queries/updates are often invoked repeatedly with different parameter values in a single decision support task.

2/56 Motivation Reason 2. Queries inside user-defined functions (UDFs) that are invoked from other queries SELECT * FROM category WHERE count_items(category-id) > 50; Database queries/updates are often invoked repeatedly with different parameter values. Database queries/updates are often invoked repeatedly with different parameter values in a single decision support task.

3/56 Motivation int count_items(int categoryId) { … SELECT count(item-id) INTO icount FROM item WHERE category-id = :categoryId; … } Procedural logic with embedded queries Database queries/updates are often invoked repeatedly with different parameter values in a single decision support task.

4/56 Motivation Reason 3: Queries/updates inside stored-procedures, which are called repeatedly by external batch jobs. Reason 4: Queries/updates invoked from imperative loops in application programs or complex procedures. // A procedure to count items in a given category and sub-categories int count_subcat_items(int categoryId) { … while(…) { … SELECT count(item-id) INTO icount FROM item WHERE category-id = :curcat; … }

5/56 Problem Overview Naïve iterative execution of queries and updates is often inefficient Causes:  No sharing of work (e.g., disk IO)  Random IO and poor buffer effects  Network round-trip delays

6/56 Problem Overview SELECT o_orderkey, o_custkey FROM orders WHERE o_shipdate NOT IN ( SELECT l_shipdate FROM lineitem WHERE l_orderkey = o_orderkey ); One random read for each outer tuple (assuming all internal pages of the index are cached), avg seek time 4ms. ORDERS 1.5M tuples LINEITEM 6M tuples Primary Index on l_orderkey

7/56 Known Approach: Decorrelation Rewrite a nested query using set operations (such as join, semi-join and outer-join). [Kim82, Dayal87, Murali89, Seshadri96, Galindo01, ElHemali07] Increases the number of alternative plans available to the query optimizer.  Makes it possible choose set-oriented plans (e.g., plans that use hash or sort-merge join) which are often more efficient than naïve iterative plans.

8/56 Gaps in the Known Approach Decorrelation does not speed up iterative plans; it only enables the use of alternative plans.  For some nested queries an iterative plan can be the best of the available alternatives [Graefe03] Known decorrelation techniques are not applicable for:  complex nested blocks such as user-defined functions and procedures  repeated query executions from imperative program loops

9/56 Our Approach A two-pronged approach 1. Improve the efficiency of iterative plans by exploiting sorted parameter bindings.  Earlier work [Graefe03], ElHemali[07] and Iyengar[08] consider improved iterative plans through asynchronous IO and caching.  Our work adds new techniques and considers optimization issues Earlier work, not part of the VLDB08 paper or this talk. 2. Enable set-oriented execution of repeatedly called stored procedures and UDFs, by rewriting them to accept batches of parameter bindings.  Involves program analysis and rewrite.

10/56 // UDF to count items in a category int count_items(int categoryId) { … SELECT count(item-id) INTO :icount FROM item WHERE category-id = :categoryId; … } Optimizing Iterative Calls to Stored Procedures and UDFs SPs and UDFs are written using procedural extensions to SQL (e.g., PL/SQL) Called iteratively from  Outer query block  Batch applications Known decorrelation techniques do not apply (except in very simple cases)

11/56 // UDF to count items in a category and all // the sub-categories int count_subcat_items(int categoryId) { … while(…) { … SELECT count(item-id) INTO :icount FROM item WHERE category-id = :curcat; … } Optimizing Iterative Calls to Stored Procedures and UDFs UDFs can involve complex control-flow and looping. Our techniques can deal with:  Iterative calls to a UDF from an application.  Iterative execution of queries within a single UDF invocation.

12/56 Problem Motivated by Real World Applications National Informatics Centre (NIC)  Iterative updates from an external Java application A financial enterprise  SQLJ stored procedures called iteratively National Securities Depository Ltd. (NSDL)  SPs called repeatedly for EOD processing

13/56 Optimizing Iterative Calls to Stored Procedures and UDFs Approach: Use Parameter Batching Repeated invocation of an operation is replaced by a single invocation of its batched form. Batched form: Processes a set of parameters  Enables set-oriented execution of queries/updates inside the procedure Possible to use alternative plans  Repeated selection  join  Reduced random disk access Efficient integrity checks and index updates Reduced network round-trip delays

14/56 Effect of Batch Size on Inserts Bulk Load: 1.3 min

15/56 Iterative and Set-Oriented Updates on the Server Side TPC-H PARTSUPP (800,000 records), Clustering index on (partkey, suppkey) Iterative update of all the records using T-SQL script (each update has an index lookup plan) Single commit at the end of all updates Takes 1 minute Same update processed as a merge (update … from …) Takes 15 seconds

16/56 Batched Forms of Queries/Updates Batched forms of queries/updates are known Example: SELECT item-id FROM item WHERE category-id=? Batched form: SELECT pb.category-id, item-id FROM param-batch pb LEFT OUTER JOIN item ON pb.category-id = item.category-id;  INSERT : INSERT INTO … SELECT FROM …  UPDATE : SQL MERGE

17/56 SQL Merge empidnamegrants S101Ramesh8000 S204Gopal4000 S305Veena3500 S602Mayur2000 GRANTMASTER empidgrants S S GRANTLOAD merge into GRANTMASTER GM using GRANTLOAD GL on GM.empid=GL.empid when matched then update set GM.grants=GL.grants; Notation: M c1=c1’,c2=c2’,… cn=cn’ (r, s)

18/56 Batched Forms of Procedures // Batched form of a simple UDF Table count_items_batched( Table params(categoryId) ) { RETURN ( SELECT params.categoryId, item.count(item-id) FROM params LEFT OUTER JOIN item ON params.categoryId = item.categoryId; ); } Possible to write manually: Time-consuming and error-prone Our aim: Automate the generation of batched forms

19/56 Challenges in Generating Batched Forms of Procedures Must deal with control-flow, looping and variable assignments Inter-statement dependencies may not permit batching of desired operations Presence of “non batch-safe” (order sensitive) operations along with queries to batch Approach: Equivalence rules for program transformation Make use of the data dependence graph

20/56 Batch Safe Operations Batched forms – no guaranteed order of parameter processing Can be a problem for operations having side-effects Batch-Safe operations All operations without side effects Also a few operations with side effects  E.g.: INSERT on a table with no constraints  Operations inside unordered loops (e.g., cursor loops with no order-by)

21/56 Generating Batched Form of a Procedure Step 1: Create trivial batched form. Transform: procedure p(r) { body of p} To procedure p_batched(pb) { for each record r in pb { } return the collected results paired with corrsp. params; } Step 2: Move the query exec stmts out of the loop and replace them with a call to their batched form.

22/56 Rule 1A: Rewriting a Simple Set Iteration Loop where q is any batch-safe operation with qb as its batched form for each t in r loop insert into orders values (t.order-key, t.order-date,…); end loop; insert into orders select … from r;

23/56 Rule 1B: Assignment of Scalar Result for each r in t { t.count = select count(itemid) from item where category=t.catid; } merge into t using bq on t.catid=bq.catid when matched then update set t.count=bq.item-count; bq: select t.catid, count(itemid) as item-count from t left outer join item on item.category=t.catid; * SQL Merge: Lets merging the result of a query into an existing relation

24/56 Rewriting Simple Set Iteration Loops Rule 1B: Unconditional invocation with return value SQL Merge

25/56 Rule 1C: Batching Conditional Statements // Merge the results* Operation to Batch Return values Condition for Invocation Let b= // Parameter batch Let * SQL Merge: Lets merging the results of a query into an existing relation

26/56 Rule 2: Splitting a Loop while (p) { ss1; s q ; ss2; } Table(T) t; while(p) { ss1 modified to save local variables as a tuple in t } Collect the parameters for each r in t { s q modified to use attributes of r; } Can apply Rule 1A-1C and batch. for each r in t { ss2 modified to use attributes of r; } Process the results * Conditions Apply

27/56 Loop Splitting: An Example while(top > 0) { catid = stack[--top]; count = select count(itemid) from item where category=:catid; total += count; } Table(key, catid, count) t; int loopkey = 0; while(top > 0) { Record r; catid = stack[--top]; r.catid = catid; r.key = loopkey++; t.addRecord(r) } for each r in t order by key { t.count = select count(itemid) from item where category=t.catid; } for each r in t order by key { total += t.count; } // order-by is removed if operation is batch-safe

28/56 Rule 2: Pre-conditions Conditions are on the data dependence graph (DDG)  Nodes: program statements  Edges: data dependencies between statements Types of dependence edges  Flow (Write  Read), Anti (Read  Write) and Output (Write  Write)  Loop-carried flow/anti/output: Dependencies across loop iterations Pre-conditions for Rule-2 (Loop splitting)  No loop-carried flow dependencies cross the points at which the loop is split  No loop-carried dependencies through external data (e.g., DB) Formal specification of Rule-2 in paper

29/56 Dependency Analysis

30/56 More Notation

31/56 Yet More Notation

32/56 Rule 2

33/56 Is rewritten to

34/56 Rule 2a contd.

35/56 Rule 2a contd.

36/56 Cycles of Flow Dependencies

37/56 Rule 3: Separating Batch Safe Expressions

38/56 Rule 4: Control Dependencies while (…) { item = …; qty = …; brcode = …; if (brcode == 58) brcode = 1; insert into … values (item, qty, brcode); } Store the branching decision in a boolean variable while (…) { item = …; qty = …; brcode = …; boolean cv = (brcode == 58); cv? brcode = 1; cv? insert into … values (item, qty, brcode); }

39/56 Cascading of Rules Table(…) t; while (…) { r.item = …; r.qty = …; r.brcode = …; r.cv = (r.brcode == 58); r.cv=true? r.brcode = 1; t.addRecord(r); } for each r in t { r.cv=true? insert into … values (r.item, r.qty, r.brcode); } insert into.. ( select item, qty, brcode from t where cv=true ); After applying Rule 2 (loop splitting) Rule 1C deals with batching conditional statements. Formal spec in the Thesis.

40/56 Need for Reordering Statements (s1) while (category != null) { (s2) item-count = q1(category); (s3) sum = sum + item-count; (s4) category = getParent(category); } Flow Dependence Anti Dependence Output Dependence Loop-Carried Control Dependence Data Dependencies

41/56 while (category != null) { int item-count = q1(category); // Query to batch sum = sum + item-count; category = getParent(category); } Splitting made possible after reordering while (category != null) { int temp = category; category = getParent(category); int item-count = q1(temp); sum = sum + item-count; } Reordering Statements to Enable Rule 2

42/56 Rule 5a and 5b

43/56 Rule 5a and 5b

44/56 Statement Reordering FD+ X X X X Case-1 Case-2 Case-3A Case-3B X  Cannot exist

45/56 Statement Reordering Algorithm reorder: Reorders statements in a loop to enable loop splitting. (Details in thesis.) Theorem: (Not in VLDB08 paper) If a query execution statement does not lie on a true- dependence cycle in the DDG, then algorithm reorder always reorders the statements such that the query execution can be batched. True-dependence cycle: A directed cycle made up of only FD and LCFD edges.

46/56 Batching Across Nested Loops Statement to batch can be present inside nested loops Rules 6A, 6B handle nested loops Batch the stmt w.r.t. the inner loop and then w.r.t. the outer loop Batches as nested tables: use nest and unnest

47/56 Batching Across Multiple Levels while(…) { …. while(…) {... q(v 1, v 2, … v n ); … } … } while(…) { …. Table t (…); while(…) {... } qb(t); … } Batch q w.r.t inner loop

48/56 Parameter Batches as Nested Tables cust-idcust-classorders C101Gold C180Regular ord-iddate ord-iddate

49/56 Nest and Unnest Operations μ c (T) : Unnest T w.r.t. table-valued column c v S  s (T) : Group T on columns other than S and nest the columns in S under the name s

50/56 Rule 6: Unnesting Nested Batches

51/56 Implementation and Evaluation Conceptually the techniques can be used with any language (PL/SQL, Java, C#-LINQ) We implemented for Java using the SOOT framework for program analysis Evaluation No benchmarks for procedural SQL Scenarios from three real-world applications, which faced performance problems Data Sets: TPC-H and synthetic

52/56 Application 1: ESOP Management App Process records from a file in custom format. Repeatedly called a stored procedure which: - Validates inputs - Looks up existing record - Updates or Inserts Rewritten program generated by our implementation used outer-join and merge.

53/56 Application 2: Category Traversal Find the maximum size of any part in a given category and its sub-categories. Clustered Index CATEGORY (category-id) Secondary Index PART (category-id) Original Program Repeatedly executed a query that performed selection followed by grouping. Rewritten Program Group-By followed by Join

54/56 Application 3: Value Range Expansion Log scale Expand records of the form: (start-num, end-num, issued-to, …) Performed repeated inserts. Rewritten program Pulled the insert stmt out of the loop and replaced it with batched insert. ~75% improvement ~10% overhead

55/56 Related Work Query unnesting - Kim [TODS82], Dayal [VLDB87], …  We extend the benefits of unnesting to procedural nested blocks Optimizing set iteration loops in database programming languages - Lieuwen and DeWitt [SIGMOD 92]  Perform rule-based program rewriting, but  Do not address batching of queries/procedure calls within the loop  Limited language constructs – e.g., no WHILE loops Rewriting CODASYL programs – Katz and Wong [TODS82], Demo and Kundu [SIGMOD 85]  Employ data-flow analysis, for identifying dependencies between FIND statements. Parallelizing compilers Kennedy[90], Padua[95]  We borrow and extend their techniques  Transformation for batching can afford extra CPU operations to save IO.

56/56 Summary and Conclusions Parameter batching is the key for set-oriented evaluation of repeatedly called procedures We presented a program transformation based approach, which can be used for: a. Generating batched forms of procedures and b. Replacing imperative loops with calls to batched forms Our work combines query optimization with program analysis and transformation. Experiments on real-world applications show the applicability and benefits of the proposed approach.

57/56 Future Work Cost-based decision of calls to batch Pipelined execution of queries within a UDF Inferring batch-safety of an operation Recognizing/handling JDBC calls (for Java) Implementation for PL/SQL Exploring the use of program analysis beyond batching  E.g., pushing externally applied predicates to the database.

58/56 Thanks