Rewriting Procedures for Batched Bindings Ravindra Guravannavar and S. Sudarshan Indian Institute of Technology, Bombay Appeared in VLDB 2008.

Slides:



Advertisements
Similar presentations
1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Advertisements

Optimizing Join Enumeration in Transformation-based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY VLDB 2014
AN INTRODUCTION TO PL/SQL Mehdi Azarmi 1. Introduction PL/SQL is Oracle's procedural language extension to SQL, the non-procedural relational database.
SQL*PLUS, PLSQL and SQLLDR Ali Obaidi. SQL Advantages High level – Builds on relational algebra and calculus – Powerful operations – Enables automatic.
The Volcano/Cascades Query Optimization Framework
CS 540 Database Management Systems
A survey of techniques for precise program slicing Komondoor V. Raghavan Indian Institute of Science, Bangalore.
Introduction to Structured Query Language (SQL)
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 8 Advanced SQL.
Fundamentals, Design, and Implementation, 9/e Chapter 7 Using SQL in Applications.
Chapter 7 Advanced SQL Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Introduction to Structured Query Language (SQL)
Access Path Selection in a Relation Database Management System (summarized in section 2)
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 8 Advanced SQL.
Native Support for Web Services  Native Web services access  Enables cross platform interoperability  Reduces middle-tier dependency (no IIS)  Simplifies.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Introduction to Databases Chapter 7: Data Access and Manipulation.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 14 – Join Processing.
Efficient Evaluation of XQuery over Streaming Data Xiaogang Li Gagan Agrawal The Ohio State University.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
DB RIDGE : A PROGRAM REWRITE TOOL FOR SET - ORIENTED QUERY EXECUTION Mahendra Chavan*, Ravindra Guravannavar, Prabhas Kumar Samanta, Karthik Ramachandra,
Chapter 7 Advanced SQL Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Access Path Selection in a Relational Database Management System Selinger et al.
Communicating with the Outside. Overview Package several SQL statements within one call to the database server Embedded procedural language (Transact.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
QMapper for Smart Grid: Migrating SQL-based Application to Hive Yue Wang, Yingzhong Xu, Yue Liu, Jian Chen and Songlin Hu SIGMOD’15, May 31–June 4, 2015.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
SQL Server 7.0 Maintaining Referential Integrity.
CS 338Query Evaluation7-1 Query Evaluation Lecture Topics Query interpretation Basic operations Costs of basic operations Examples Textbook Chapter 12.
Oracle Data Integrator Transformations: Adding More Complexity
1 Execution Strategies for SQL Subqueries Mostafa Elhemali, César Galindo- Legaria, Torsten Grabs, Milind Joshi Microsoft Corp With additional slides from.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
8 1 Chapter 8 Advanced SQL Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Exploiting Asynchronous IO using the Asynchronous Iterator Model Suresh Iyengar * S. Sudarshan Santosh Kumar # Raja Agrawal & IIT Bombay Current affiliations:
Chapter 8 Advanced SQL Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
CS4432: Database Systems II Query Processing- Part 2.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
1 Execution Strategies for SQL Subqueries Mostafa Elhemali, César Galindo- Legaria, Torsten Grabs, Milind Joshi Microsoft Corp.
Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
1 Chapter 9 Tuning Table Access. 2 Overview Improve performance of access to single table Explain access methods – Full Table Scan – Index – Partition-level.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Sorting and Joining.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Query Processing – Implementing Set Operations and Joins Chap. 19.
CS 540 Database Management Systems
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Text TCS INTERNAL Oracle PL/SQL – Introduction. TCS INTERNAL PL SQL Introduction PLSQL means Procedural Language extension of SQL. PLSQL is a database.
Execution Plans Detail From Zero to Hero İsmail Adar.
CS422 Principles of Database Systems Stored Procedures and Triggers Chengyu Sun California State University, Los Angeles.
Rewriting Procedures for Batched Bindings Ravindra Guravannavar and S. Sudarshan Indian Institute of Technology Bombay.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
Efficient Evaluation of XQuery over Streaming Data
Tuning Transact-SQL Queries
CS 540 Database Management Systems
CS 440 Database Management Systems
Database Management System
Database Systems: Design, Implementation, and Management Tenth Edition
File Processing : Query Processing
Lecture 2- Query Processing (continued)
Contents Preface I Introduction Lesson Objectives I-2
Chapter 7 Using SQL in Applications
Chapter 8 Advanced SQL.
A Framework for Testing Query Transformation Rules
Database Systems: Design, Implementation, and Management Tenth Edition
Translating Imperative Code into SQL
Presentation transcript:

Rewriting Procedures for Batched Bindings Ravindra Guravannavar and S. Sudarshan Indian Institute of Technology, Bombay Appeared in VLDB 2008

2 Motivation Nested subqueries (with correlated evaluation) Queries embedded in user-defined functions (UDFs) that are invoked from other queries Queries/updates in stored procedures that are invoked repeatedly (e.g. batch jobs) Queries/updates invoked inside loops in application programs Queries/updates are often invoked repeatedly. E.g.:

3 Example: A Nested Query List the orders of high worth (> 10K) SELECT O.orderid, O.custid FROM orders O WHERE < (SELECT sum(lineprice) FROM lineitem L WHERE L.orderid=O.orderid); Iterative Execution: For each record in the result of the outer query block - bind the parameters for the nested sub-query - evaluate the nested sub-query - process the results (check predicate, output)

4 Optimizing Repeated Invocations Iterative execution of queries often performs poorly  Redundant I/O  Random I/O and poor buffer effects  Network round-trip delays Decorrelation is widely used for optimizing nested queries

5 Query Decorrelation Original Query SELECT O.orderid, O.custid FROM orders O WHERE < (SELECT sum(lineprice) FROM lineitem L WHERE L.orderid=O.orderid); After Decorrelation (Most systems do this automatically) SELECT O.orderid, O.custid FROM orders O, lineitem L WHERE O.orderid=L.orderid GROUP BY O.orderid, O.custid HAVING sum(L.lineprice) > 10000;

6 Limitations of Decorrelation Tricky at times…  COUNT aggregate  Non-equality correlation predicates The solutions may not produce the best plan Decorrelation techniques are not applicable for:  Nested invocation of procedures with complex logic and embedded query invocations  Iterative invocation of queries/updates from application code

7 Example: UDF Invoked from a Query SELECT * FROM category WHERE count_items(category-id) > 50; // Count the items in a given category and its sub-categories int count_items(int categoryId) { … while(…) { … … SELECT count(item-id) INTO icount FROM item WHERE category-id = :curcat; … } } Procedural logic with embedded queries

8 Key Idea: Parameter Batching Repeated invocation of an operation is replaced by a single invocation of its batched form  Batched form: Works on a set of parameters Benefits  Choice of efficient set-oriented plans Repeated selection → Join Efficient integrity checks Efficient disk access (sort RIDs before fetch)  Reduced network round-trip delay

9 Batched Forms of Basic Operations Insert  insert into select … from …  Bulk load (SQLServer: bcp, Oracle: sqlldr, DB2: load) Update  update from where … (equivalent to SQL:2003 merge statement) Queries  Make use of join or outer-join (seen in decorrelation)

10 SQL Merge empidnamegrants S101Ramesh8000 S204Gopal4000 S305Veena3500 S602Mayur2000 GRANTMASTER empidgrants S S GRANTLOAD merge into GRANTMASTER GM using GRANTLOAD GL on GM.empid=GL.empid when matched then update set GM.grants=GL.grants; Notation: M c1=c1’,c2=c2’,… cn=cn’ (r, s)

11 Effect of Batch Size on Inserts Bulk Load: 1.3 min

12 Iterative and Set-Oriented Updates on the Server Side TPC-H PARTSUPP (800,000 records), Clustering index on (partkey, suppkey) Iterative update of all the records using T-SQL script (each update has an index lookup plan) Single commit at the end of all updates Takes 1 minute Same update processed as a merge (update … from …) Takes 15 seconds

13 The Challenge Given a procedure, how to obtain its batched form? Possible to manually rewrite, but time consuming and error-prone.

14 Our Work Automatic generation of batched forms of UDFs/stored procedures using rewrite rules Automatic rewrite of programs to replace looping with batched invocation

15 Batched Forms Batched form qb of a pure function q  Returns results for a set of parameters  Result in the form {(parameter, result)}  For queries: standard techniques for creating batched forms (from work on decorrelation) Example: Original query: SELECT item-id FROM item WHERE category-id=? Batched form: SELECT pb.category-id, item-id FROM param-batch pb LEFT OUTER JOIN item ON pb.category-id = item.category-id;

16 Batch Safe Operations Batched forms – no guaranteed order of parameter processing Can be a problem for operations having side-effects Batch-Safe operations All operations without side effects Also a few operations with side effects  E.g.: INSERT on a table with no constraints  Operations inside unordered loops (e.g., cursor loops with no order-by)

17 Generating Batched Forms of Procedures Step 1: Create trivial batched form. Transform: procedure p(r) { body of p} To procedure p_batched(pb) { for each record r in pb { } return the collected results paired with corrsp. params; } Step 2: Optimize query invocations in the trivial batched form

18 Rule 1A: Rewriting a Simple Set Iteration Loop where q is any batch-safe operation with qb as its batched form Rule 1B Handles return values

19 Rule 1C: Batching Conditional Statements Let s = // Batched invocation where s // Merge the results Operation to Batch Return values Condition for Invocation

20 Rule 2: Splitting a Loop while (p) { ss1; s q ; ss2; } Table(T) t; while(p) { ss1 modified to save local variables as a tuple in t } Collect the parameters for each r in t { s q modified to use attributes of r; } Can apply Rule 1A-1C and batch. for each r in t { ss2 modified to use attributes of r; } Process the results * Conditions Apply

21 Rule 2: Pre-conditions The conditions make use of the data dependence graph Data Dependence Graph  Nodes: program statements  Edges: dependencies between statements that read/write same location Types of Dependencies  Flow (Write  Read), Anti (Read  Write) and Output (Write  Write)  Loop-carried flow/anti/output Dependencies across iterations Pre-conditions for Rule-2  No loop-carried flow/output dependencies cross the points at which the loop is split  No loop-carried dependencies through external data (e.g., DB)

22 Need for Reordering Statements (s1) while (category != null) { (s2) item-count = q1(category); (s3) sum = sum + item-count; (s4) category = getParent(category); } Flow Dependence Anti Dependence Output Dependence Loop-Carried Control Dependence Data Dependencies

23 while (category != null) { int item-count = q1(category); // Query to batch sum = sum + item-count; category = getParent(category); } Splitting made possible after reordering while (category != null) { int temp = category; category = getParent(category); int item-count = q1(temp); sum = sum + item-count; } Reordering Statements to Enable Rule 2

24 Cycles of Flow Dependencies

25 Rule 4: Control Dependencies while (…) { item = …; qty = …; brcode = …; if (brcode == 58) { brcode = 1; q(item, qty, brcode); } } Remember the branching decision in a boolean variable while (…) { item = …; qty = …; brcode = …; boolean cv = (brcode == 58); cv? brcode = 1; cv? q(item, qty, brcode); }

26 Cascading of Rules Table(…) t; while (…) { r.item = …; r.qty = …; r.brcode = …; r.cv = (r.brcode == 58); r.cv? r.brcode = 1; t.addRecord(r); } for each r in t { r.cv? q(r.item, r.qty, r.brcode); } qb(  item,qty,brcode (  cv=true (t)) Rule 1C After applying Rule 2

27 Batching Across Multiple Levels while(…) { …. while(…) {... q(v 1, v 2, … v n ); … } … } while(…) { …. Table t (…); while(…) {... } qb(t); … } Batch q w.r.t inner loop

28 Parameter Batches as Nested Tables cust-idcust-classorders C101Gold C180Regular ord-iddate ord-iddate

29 Nest and Unnest Operations μ c (T) : Unnest T w.r.t. table-valued column c v S  s (T) : Group T on columns other than S and nest the columns in S under the name s

30 Rule 6: Unnesting Nested Batches

31 Implementation and Evaluation Conceptually the techniques can be used with any language (PL/SQL, Java, C#-LINQ) We implemented for Java using the SOOT framework for program analysis Evaluation No benchmarks for procedural SQL Scenarios from three real-world applications, which faced performance problems Data Sets: TPC-H and synthetic

32 Application 1: ESOP Management App Process records from a file in custom format. Repeatedly called a stored procedure with mix of queries, updates and business logic. - Validate inputs - Lookup existing record - Update or Insert Rewritten program used outer-join and merge.

33 Application 2: Category Traversal Find the maximum size of any part in a given category and its sub-categories. Clustered Index CATEGORY (category-id) Secondary Index PART (category-id) Original Program Repeatedly executed a query that performed selection followed by grouping. Rewritten Program Group-By followed by Join

34 Application 3: Value Range Expansion Log scale Expand records of the form: (start-num, end-num, issued-to, …) Performed repeated inserts. Rewritten program Pulled the insert stmt out of the loop and replaced it with batched insert. ~75% improvement ~10% overhead

35 Related Work Query unnesting  E.g. Kim [TODS82], Dayal [VLDB87], Seshadri et al. [ICDE96], Galindo Legaria et al. [SIGMOD01]  We extend the benefits of unnesting to procedural nested blocks Graefe [BTW03] highlights the importance of batching in nested iteration plans (a motivation for our work) Optimizing set iteration loops in database programming languages - Lieuwen and DeWitt [SIGMOD 92]  Also perform program rewriting, but  Do not address batching of queries/procedure calls within the loop  Limited language constructs - No WHILE loops, IF-THEN-ELSE Parallelizing compilers Kennedy[90], Padua[95]  We borrow and extend the techniques

36 Conclusion Automatic rewrite of programs for set-orientation is possible  Combining query rewrite with program analysis is the key Our experiments on real-world scenarios show significant benefits due to batching Future Work Cost-based selection of operations to batch Handling exceptions Automatically deciding whether an operation is batch- safe Implementing rewriting for PL/SQL

37 Questions?