Presentation is loading. Please wait.

Presentation is loading. Please wait.

Rewriting Procedures for Batched Bindings Ravindra Guravannavar and S. Sudarshan Indian Institute of Technology, Bombay Appeared in VLDB 2008.

Similar presentations


Presentation on theme: "Rewriting Procedures for Batched Bindings Ravindra Guravannavar and S. Sudarshan Indian Institute of Technology, Bombay Appeared in VLDB 2008."— Presentation transcript:

1 Rewriting Procedures for Batched Bindings Ravindra Guravannavar and S. Sudarshan Indian Institute of Technology, Bombay Appeared in VLDB 2008

2 2 Motivation Nested subqueries (with correlated evaluation) Queries embedded in user-defined functions (UDFs) that are invoked from other queries Queries/updates in stored procedures that are invoked repeatedly (e.g. batch jobs) Queries/updates invoked inside loops in application programs Queries/updates are often invoked repeatedly. E.g.:

3 3 Example: A Nested Query List the orders of high worth (> 10K) SELECT O.orderid, O.custid FROM orders O WHERE 10000 < (SELECT sum(lineprice) FROM lineitem L WHERE L.orderid=O.orderid); Iterative Execution: For each record in the result of the outer query block - bind the parameters for the nested sub-query - evaluate the nested sub-query - process the results (check predicate, output)

4 4 Optimizing Repeated Invocations Iterative execution of queries often performs poorly  Redundant I/O  Random I/O and poor buffer effects  Network round-trip delays Decorrelation is widely used for optimizing nested queries

5 5 Query Decorrelation Original Query SELECT O.orderid, O.custid FROM orders O WHERE 10000 < (SELECT sum(lineprice) FROM lineitem L WHERE L.orderid=O.orderid); After Decorrelation (Most systems do this automatically) SELECT O.orderid, O.custid FROM orders O, lineitem L WHERE O.orderid=L.orderid GROUP BY O.orderid, O.custid HAVING sum(L.lineprice) > 10000;

6 6 Limitations of Decorrelation Tricky at times…  COUNT aggregate  Non-equality correlation predicates The solutions may not produce the best plan Decorrelation techniques are not applicable for:  Nested invocation of procedures with complex logic and embedded query invocations  Iterative invocation of queries/updates from application code

7 7 Example: UDF Invoked from a Query SELECT * FROM category WHERE count_items(category-id) > 50; // Count the items in a given category and its sub-categories int count_items(int categoryId) { … while(…) { … … SELECT count(item-id) INTO icount FROM item WHERE category-id = :curcat; … } } Procedural logic with embedded queries

8 8 Key Idea: Parameter Batching Repeated invocation of an operation is replaced by a single invocation of its batched form  Batched form: Works on a set of parameters Benefits  Choice of efficient set-oriented plans Repeated selection → Join Efficient integrity checks Efficient disk access (sort RIDs before fetch)  Reduced network round-trip delay

9 9 Batched Forms of Basic Operations Insert  insert into select … from …  Bulk load (SQLServer: bcp, Oracle: sqlldr, DB2: load) Update  update from where … (equivalent to SQL:2003 merge statement) Queries  Make use of join or outer-join (seen in decorrelation)

10 10 SQL Merge empidnamegrants S101Ramesh8000 S204Gopal4000 S305Veena3500 S602Mayur2000 GRANTMASTER empidgrants S2045000 S6022600 GRANTLOAD merge into GRANTMASTER GM using GRANTLOAD GL on GM.empid=GL.empid when matched then update set GM.grants=GL.grants; Notation: M c1=c1’,c2=c2’,… cn=cn’ (r, s)

11 11 Effect of Batch Size on Inserts Bulk Load: 1.3 min

12 12 Iterative and Set-Oriented Updates on the Server Side TPC-H PARTSUPP (800,000 records), Clustering index on (partkey, suppkey) Iterative update of all the records using T-SQL script (each update has an index lookup plan) Single commit at the end of all updates Takes 1 minute Same update processed as a merge (update … from …) Takes 15 seconds

13 13 The Challenge Given a procedure, how to obtain its batched form? Possible to manually rewrite, but time consuming and error-prone.

14 14 Our Work Automatic generation of batched forms of UDFs/stored procedures using rewrite rules Automatic rewrite of programs to replace looping with batched invocation

15 15 Batched Forms Batched form qb of a pure function q  Returns results for a set of parameters  Result in the form {(parameter, result)}  For queries: standard techniques for creating batched forms (from work on decorrelation) Example: Original query: SELECT item-id FROM item WHERE category-id=? Batched form: SELECT pb.category-id, item-id FROM param-batch pb LEFT OUTER JOIN item ON pb.category-id = item.category-id;

16 16 Batch Safe Operations Batched forms – no guaranteed order of parameter processing Can be a problem for operations having side-effects Batch-Safe operations All operations without side effects Also a few operations with side effects  E.g.: INSERT on a table with no constraints  Operations inside unordered loops (e.g., cursor loops with no order-by)

17 17 Generating Batched Forms of Procedures Step 1: Create trivial batched form. Transform: procedure p(r) { body of p} To procedure p_batched(pb) { for each record r in pb { } return the collected results paired with corrsp. params; } Step 2: Optimize query invocations in the trivial batched form

18 18 Rule 1A: Rewriting a Simple Set Iteration Loop where q is any batch-safe operation with qb as its batched form Rule 1B Handles return values

19 19 Rule 1C: Batching Conditional Statements Let s = // Batched invocation where s // Merge the results Operation to Batch Return values Condition for Invocation

20 20 Rule 2: Splitting a Loop while (p) { ss1; s q ; ss2; } Table(T) t; while(p) { ss1 modified to save local variables as a tuple in t } Collect the parameters for each r in t { s q modified to use attributes of r; } Can apply Rule 1A-1C and batch. for each r in t { ss2 modified to use attributes of r; } Process the results * Conditions Apply

21 21 Rule 2: Pre-conditions The conditions make use of the data dependence graph Data Dependence Graph  Nodes: program statements  Edges: dependencies between statements that read/write same location Types of Dependencies  Flow (Write  Read), Anti (Read  Write) and Output (Write  Write)  Loop-carried flow/anti/output Dependencies across iterations Pre-conditions for Rule-2  No loop-carried flow/output dependencies cross the points at which the loop is split  No loop-carried dependencies through external data (e.g., DB)

22 22 Need for Reordering Statements (s1) while (category != null) { (s2) item-count = q1(category); (s3) sum = sum + item-count; (s4) category = getParent(category); } Flow Dependence Anti Dependence Output Dependence Loop-Carried Control Dependence Data Dependencies

23 23 while (category != null) { int item-count = q1(category); // Query to batch sum = sum + item-count; category = getParent(category); } Splitting made possible after reordering while (category != null) { int temp = category; category = getParent(category); int item-count = q1(temp); sum = sum + item-count; } Reordering Statements to Enable Rule 2

24 24 Cycles of Flow Dependencies

25 25 Rule 4: Control Dependencies while (…) { item = …; qty = …; brcode = …; if (brcode == 58) { brcode = 1; q(item, qty, brcode); } } Remember the branching decision in a boolean variable while (…) { item = …; qty = …; brcode = …; boolean cv = (brcode == 58); cv? brcode = 1; cv? q(item, qty, brcode); }

26 26 Cascading of Rules Table(…) t; while (…) { r.item = …; r.qty = …; r.brcode = …; r.cv = (r.brcode == 58); r.cv? r.brcode = 1; t.addRecord(r); } for each r in t { r.cv? q(r.item, r.qty, r.brcode); } qb(  item,qty,brcode (  cv=true (t)) Rule 1C After applying Rule 2

27 27 Batching Across Multiple Levels while(…) { …. while(…) {... q(v 1, v 2, … v n ); … } … } while(…) { …. Table t (…); while(…) {... } qb(t); … } Batch q w.r.t inner loop

28 28 Parameter Batches as Nested Tables cust-idcust-classorders C101Gold C180Regular ord-iddate 101110-12-08 101212-01-09 ord-iddate 180110-12-08 180220-12-08 180308-01-09

29 29 Nest and Unnest Operations μ c (T) : Unnest T w.r.t. table-valued column c v S  s (T) : Group T on columns other than S and nest the columns in S under the name s

30 30 Rule 6: Unnesting Nested Batches

31 31 Implementation and Evaluation Conceptually the techniques can be used with any language (PL/SQL, Java, C#-LINQ) We implemented for Java using the SOOT framework for program analysis Evaluation No benchmarks for procedural SQL Scenarios from three real-world applications, which faced performance problems Data Sets: TPC-H and synthetic

32 32 Application 1: ESOP Management App Process records from a file in custom format. Repeatedly called a stored procedure with mix of queries, updates and business logic. - Validate inputs - Lookup existing record - Update or Insert Rewritten program used outer-join and merge.

33 33 Application 2: Category Traversal Find the maximum size of any part in a given category and its sub-categories. Clustered Index CATEGORY (category-id) Secondary Index PART (category-id) Original Program Repeatedly executed a query that performed selection followed by grouping. Rewritten Program Group-By followed by Join

34 34 Application 3: Value Range Expansion Log scale Expand records of the form: (start-num, end-num, issued-to, …) Performed repeated inserts. Rewritten program Pulled the insert stmt out of the loop and replaced it with batched insert. ~75% improvement ~10% overhead

35 35 Related Work Query unnesting  E.g. Kim [TODS82], Dayal [VLDB87], Seshadri et al. [ICDE96], Galindo Legaria et al. [SIGMOD01]  We extend the benefits of unnesting to procedural nested blocks Graefe [BTW03] highlights the importance of batching in nested iteration plans (a motivation for our work) Optimizing set iteration loops in database programming languages - Lieuwen and DeWitt [SIGMOD 92]  Also perform program rewriting, but  Do not address batching of queries/procedure calls within the loop  Limited language constructs - No WHILE loops, IF-THEN-ELSE Parallelizing compilers Kennedy[90], Padua[95]  We borrow and extend the techniques

36 36 Conclusion Automatic rewrite of programs for set-orientation is possible  Combining query rewrite with program analysis is the key Our experiments on real-world scenarios show significant benefits due to batching Future Work Cost-based selection of operations to batch Handling exceptions Automatically deciding whether an operation is batch- safe Implementing rewriting for PL/SQL

37 37 Questions?


Download ppt "Rewriting Procedures for Batched Bindings Ravindra Guravannavar and S. Sudarshan Indian Institute of Technology, Bombay Appeared in VLDB 2008."

Similar presentations


Ads by Google