Download presentation
Presentation is loading. Please wait.
Published byGeorgiana Riley Modified over 8 years ago
1
Rewriting Procedures for Batched Bindings Ravindra Guravannavar and S. Sudarshan Indian Institute of Technology, Bombay Appeared in VLDB 2008
2
2 Motivation Nested subqueries (with correlated evaluation) Queries embedded in user-defined functions (UDFs) that are invoked from other queries Queries/updates in stored procedures that are invoked repeatedly (e.g. batch jobs) Queries/updates invoked inside loops in application programs Queries/updates are often invoked repeatedly. E.g.:
3
3 Example: A Nested Query List the orders of high worth (> 10K) SELECT O.orderid, O.custid FROM orders O WHERE 10000 < (SELECT sum(lineprice) FROM lineitem L WHERE L.orderid=O.orderid); Iterative Execution: For each record in the result of the outer query block - bind the parameters for the nested sub-query - evaluate the nested sub-query - process the results (check predicate, output)
4
4 Optimizing Repeated Invocations Iterative execution of queries often performs poorly Redundant I/O Random I/O and poor buffer effects Network round-trip delays Decorrelation is widely used for optimizing nested queries
5
5 Query Decorrelation Original Query SELECT O.orderid, O.custid FROM orders O WHERE 10000 < (SELECT sum(lineprice) FROM lineitem L WHERE L.orderid=O.orderid); After Decorrelation (Most systems do this automatically) SELECT O.orderid, O.custid FROM orders O, lineitem L WHERE O.orderid=L.orderid GROUP BY O.orderid, O.custid HAVING sum(L.lineprice) > 10000;
6
6 Limitations of Decorrelation Tricky at times… COUNT aggregate Non-equality correlation predicates The solutions may not produce the best plan Decorrelation techniques are not applicable for: Nested invocation of procedures with complex logic and embedded query invocations Iterative invocation of queries/updates from application code
7
7 Example: UDF Invoked from a Query SELECT * FROM category WHERE count_items(category-id) > 50; // Count the items in a given category and its sub-categories int count_items(int categoryId) { … while(…) { … … SELECT count(item-id) INTO icount FROM item WHERE category-id = :curcat; … } } Procedural logic with embedded queries
8
8 Key Idea: Parameter Batching Repeated invocation of an operation is replaced by a single invocation of its batched form Batched form: Works on a set of parameters Benefits Choice of efficient set-oriented plans Repeated selection → Join Efficient integrity checks Efficient disk access (sort RIDs before fetch) Reduced network round-trip delay
9
9 Batched Forms of Basic Operations Insert insert into select … from … Bulk load (SQLServer: bcp, Oracle: sqlldr, DB2: load) Update update from where … (equivalent to SQL:2003 merge statement) Queries Make use of join or outer-join (seen in decorrelation)
10
10 SQL Merge empidnamegrants S101Ramesh8000 S204Gopal4000 S305Veena3500 S602Mayur2000 GRANTMASTER empidgrants S2045000 S6022600 GRANTLOAD merge into GRANTMASTER GM using GRANTLOAD GL on GM.empid=GL.empid when matched then update set GM.grants=GL.grants; Notation: M c1=c1’,c2=c2’,… cn=cn’ (r, s)
11
11 Effect of Batch Size on Inserts Bulk Load: 1.3 min
12
12 Iterative and Set-Oriented Updates on the Server Side TPC-H PARTSUPP (800,000 records), Clustering index on (partkey, suppkey) Iterative update of all the records using T-SQL script (each update has an index lookup plan) Single commit at the end of all updates Takes 1 minute Same update processed as a merge (update … from …) Takes 15 seconds
13
13 The Challenge Given a procedure, how to obtain its batched form? Possible to manually rewrite, but time consuming and error-prone.
14
14 Our Work Automatic generation of batched forms of UDFs/stored procedures using rewrite rules Automatic rewrite of programs to replace looping with batched invocation
15
15 Batched Forms Batched form qb of a pure function q Returns results for a set of parameters Result in the form {(parameter, result)} For queries: standard techniques for creating batched forms (from work on decorrelation) Example: Original query: SELECT item-id FROM item WHERE category-id=? Batched form: SELECT pb.category-id, item-id FROM param-batch pb LEFT OUTER JOIN item ON pb.category-id = item.category-id;
16
16 Batch Safe Operations Batched forms – no guaranteed order of parameter processing Can be a problem for operations having side-effects Batch-Safe operations All operations without side effects Also a few operations with side effects E.g.: INSERT on a table with no constraints Operations inside unordered loops (e.g., cursor loops with no order-by)
17
17 Generating Batched Forms of Procedures Step 1: Create trivial batched form. Transform: procedure p(r) { body of p} To procedure p_batched(pb) { for each record r in pb { } return the collected results paired with corrsp. params; } Step 2: Optimize query invocations in the trivial batched form
18
18 Rule 1A: Rewriting a Simple Set Iteration Loop where q is any batch-safe operation with qb as its batched form Rule 1B Handles return values
19
19 Rule 1C: Batching Conditional Statements Let s = // Batched invocation where s // Merge the results Operation to Batch Return values Condition for Invocation
20
20 Rule 2: Splitting a Loop while (p) { ss1; s q ; ss2; } Table(T) t; while(p) { ss1 modified to save local variables as a tuple in t } Collect the parameters for each r in t { s q modified to use attributes of r; } Can apply Rule 1A-1C and batch. for each r in t { ss2 modified to use attributes of r; } Process the results * Conditions Apply
21
21 Rule 2: Pre-conditions The conditions make use of the data dependence graph Data Dependence Graph Nodes: program statements Edges: dependencies between statements that read/write same location Types of Dependencies Flow (Write Read), Anti (Read Write) and Output (Write Write) Loop-carried flow/anti/output Dependencies across iterations Pre-conditions for Rule-2 No loop-carried flow/output dependencies cross the points at which the loop is split No loop-carried dependencies through external data (e.g., DB)
22
22 Need for Reordering Statements (s1) while (category != null) { (s2) item-count = q1(category); (s3) sum = sum + item-count; (s4) category = getParent(category); } Flow Dependence Anti Dependence Output Dependence Loop-Carried Control Dependence Data Dependencies
23
23 while (category != null) { int item-count = q1(category); // Query to batch sum = sum + item-count; category = getParent(category); } Splitting made possible after reordering while (category != null) { int temp = category; category = getParent(category); int item-count = q1(temp); sum = sum + item-count; } Reordering Statements to Enable Rule 2
24
24 Cycles of Flow Dependencies
25
25 Rule 4: Control Dependencies while (…) { item = …; qty = …; brcode = …; if (brcode == 58) { brcode = 1; q(item, qty, brcode); } } Remember the branching decision in a boolean variable while (…) { item = …; qty = …; brcode = …; boolean cv = (brcode == 58); cv? brcode = 1; cv? q(item, qty, brcode); }
26
26 Cascading of Rules Table(…) t; while (…) { r.item = …; r.qty = …; r.brcode = …; r.cv = (r.brcode == 58); r.cv? r.brcode = 1; t.addRecord(r); } for each r in t { r.cv? q(r.item, r.qty, r.brcode); } qb( item,qty,brcode ( cv=true (t)) Rule 1C After applying Rule 2
27
27 Batching Across Multiple Levels while(…) { …. while(…) {... q(v 1, v 2, … v n ); … } … } while(…) { …. Table t (…); while(…) {... } qb(t); … } Batch q w.r.t inner loop
28
28 Parameter Batches as Nested Tables cust-idcust-classorders C101Gold C180Regular ord-iddate 101110-12-08 101212-01-09 ord-iddate 180110-12-08 180220-12-08 180308-01-09
29
29 Nest and Unnest Operations μ c (T) : Unnest T w.r.t. table-valued column c v S s (T) : Group T on columns other than S and nest the columns in S under the name s
30
30 Rule 6: Unnesting Nested Batches
31
31 Implementation and Evaluation Conceptually the techniques can be used with any language (PL/SQL, Java, C#-LINQ) We implemented for Java using the SOOT framework for program analysis Evaluation No benchmarks for procedural SQL Scenarios from three real-world applications, which faced performance problems Data Sets: TPC-H and synthetic
32
32 Application 1: ESOP Management App Process records from a file in custom format. Repeatedly called a stored procedure with mix of queries, updates and business logic. - Validate inputs - Lookup existing record - Update or Insert Rewritten program used outer-join and merge.
33
33 Application 2: Category Traversal Find the maximum size of any part in a given category and its sub-categories. Clustered Index CATEGORY (category-id) Secondary Index PART (category-id) Original Program Repeatedly executed a query that performed selection followed by grouping. Rewritten Program Group-By followed by Join
34
34 Application 3: Value Range Expansion Log scale Expand records of the form: (start-num, end-num, issued-to, …) Performed repeated inserts. Rewritten program Pulled the insert stmt out of the loop and replaced it with batched insert. ~75% improvement ~10% overhead
35
35 Related Work Query unnesting E.g. Kim [TODS82], Dayal [VLDB87], Seshadri et al. [ICDE96], Galindo Legaria et al. [SIGMOD01] We extend the benefits of unnesting to procedural nested blocks Graefe [BTW03] highlights the importance of batching in nested iteration plans (a motivation for our work) Optimizing set iteration loops in database programming languages - Lieuwen and DeWitt [SIGMOD 92] Also perform program rewriting, but Do not address batching of queries/procedure calls within the loop Limited language constructs - No WHILE loops, IF-THEN-ELSE Parallelizing compilers Kennedy[90], Padua[95] We borrow and extend the techniques
36
36 Conclusion Automatic rewrite of programs for set-orientation is possible Combining query rewrite with program analysis is the key Our experiments on real-world scenarios show significant benefits due to batching Future Work Cost-based selection of operations to batch Handling exceptions Automatically deciding whether an operation is batch- safe Implementing rewriting for PL/SQL
37
37 Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.