Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

Similar presentations


Presentation on theme: "Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay."— Presentation transcript:

1 Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay

2 May 2000Multi-Query Optimization and Applications2 Motivation Queries often involve repeated computation –Queries on overlapping views, stored procedures, nested queries, etc. –Update expressions for a set of overlapping materialized views –Automatically generated queries XML-QL complex path expressions  SQL query batches Our focus: Faster query processing by avoiding repeated computation

3 May 2000Multi-Query Optimization and Applications3 Outline Multi-query optimization Application to related problems –Query result caching –Materialized view selection and maintenance Conclusions and future work

4 Multi-Query Optimization Prasan Roy, S. Seshadri, S. Sudarshan and Siddhesh Bhobe, Efficient and Extensible Algorithms for Multi-Query Optimization, ACM SIGMOD 2000

5 May 2000Multi-Query Optimization and Applications5 Motivating Example A B C B CD Best Plan for A JOIN B JOIN C Best Plan for B JOIN C JOIN D Foreign Key Dependency: A  B  C  D Total Cost = 460 100 10 100 100 10 10 10 100 10 10

6 May 2000Multi-Query Optimization and Applications6 BC Motivating Example A B C D Total Cost = 370 Benefit = 90 100100 100 10 10 10 10 10 10 10 Foreign Key Dependency: A  B  C  D

7 May 2000Multi-Query Optimization and Applications7 Problem Statement A B C D Find the cheapest plan exploiting transiently materialized common subexpressions (CSEs) –Assumption: No shared pipelines Common Subexpression

8 May 2000Multi-Query Optimization and Applications8 Problems Locally optimal subplans may not be globally optimal Mutually exclusive alternatives (A JOIN B JOIN C) (B JOIN C JOIN D) (B JOIN C JOIN D) (C JOIN D JOIN E) (C JOIN D JOIN E) (B JOIN C)(C JOIN D) What to share: (B JOIN C) or (C JOIN D) ? Materializing and sharing a CSE not necessarily cheaper

9 May 2000Multi-Query Optimization and Applications9 Example A B C B CD Best Plan for A JOIN B JOIN C Best Plan for B JOIN C JOIN D Foreign Key Dependency: A  B  C  D Total Cost = 154 100 10 10 10 1 10 10 1 1 1

10 May 2000Multi-Query Optimization and Applications10 BC Example A B C D 10010 10 10 1 10 10 1 10 10 Foreign Key Dependency: A  B  C  D Total Cost = 172 Benefit = -18

11 May 2000Multi-Query Optimization and Applications11 Approach 1. Set up the search space of execution plans 2. Explore the search space to find the best execution plan

12 May 2000Multi-Query Optimization and Applications12 Representation of Plan Space Equivalence Class (OR node) Operation (AND node) AND/OR Query DAG BC A ABC BCD CD AB C D B Example Plan (Solution Graph)

13 May 2000Multi-Query Optimization and Applications13 DAG Generation Modifications Unification Volcano: Duplicate subexpressions  No CSEs! BC A ABC AB C B BC BCD CD C D B Modification: Duplicate subexpressions unified

14 May 2000Multi-Query Optimization and Applications14 DAG Generation Modifications Subsumption Volcano: No expression subsumption  Missed CSEs  (A<10)  (A>50)  (A 50)  (A>50)  (A>10)  (A>50) Subsumptionderivation Modification: Subsumption derivations introduced

15 May 2000Multi-Query Optimization and Applications15 Exploring the Search Space An Exhaustive Algorithm Input: DAG for query Q Output: Set of nodes to materialize, corresp. best plan 1. Y = set of equivalence nodes in DAG 2. Pick X  Y which minimizes BestCost(Q, X) 3. Return X BestCost(Q, X) = cost of the best plan for Q given that the nodes in X are transiently materialized Too expensive! Need heuristics.

16 May 2000Multi-Query Optimization and Applications16 Exploring the Search Space A Greedy Heuristic Input: DAG for query Q Output: Set of nodes to materialize, corresp. best plan 1. X = {}; Y = set of equivalence nodes in DAG 2. While( Y  {} ) Pick z  Y which maximizes Benefit(z | Q, X) If( Benefit(z | Q, X) > 0 ) Y = Y – {z}; X = X U {z} Else Y = {} 3. Return X Benefit(z | Q, X) = BestCost(Q, X) - BestCost(Q, X U {z}) Appeared in [Gupta, ICDT97]. Our Contribution: improve efficiency

17 May 2000Multi-Query Optimization and Applications17 Improving Efficiency Summary Input: DAG for query Q Output: Set of nodes to materialize, corresp. best plan 1. X = {}; Y = set of equivalence nodes in DAG 2. While( Y  {} ) Pick z  Y which maximizes Benefit(z | Q, X) If( Benefit(z | Q, X) > 0 ) Y = Y – {z}; X = X U {z} Else Y = {} 3. Return X  Restrict the set of materialization candidates  Compute Benefit efficiently  Heuristically avoid computing Benefit for some nodes  

18 May 2000Multi-Query Optimization and Applications18 Improving Efficiency Only CSEs Materialized CSEs identified in a bottom-up traversal Common Subexpression BC A ABC BCD CD AB C D B

19 May 2000Multi-Query Optimization and Applications19 Improving Efficiency Summary Input: DAG for query Q Output: Set of nodes to materialize, corresp. best plan 1. X = {}; Y = set of equivalence nodes in DAG 2. While( Y  {} ) Pick z  Y which maximizes Benefit(z | Q, X) If( Benefit(z | Q, X) > 0 ) Y = Y – {z}; X = X U {z} Else Y = {} 3. Return X  Restrict the set of materialization candidates  Compute Benefit efficiently  Heuristically avoid computing Benefit for some nodes  

20 May 2000Multi-Query Optimization and Applications20 Efficient Benefit Computation Incremental Re-optimization X : Set of CSEs already materialized z : unmaterialized CSE Best plan given X materialized  Best plan given X U {z} materialized Observation Best plans change only for the ancestors of z

21 May 2000Multi-Query Optimization and Applications21 Incremental Re-optimization Example BC ABC BCD CD AB Best Plan X = {} 10101010 100 100100 100 100100100 230 230230 230 z = (B JOIN C) BC 1010 10 120120 130 C BA D 

22 May 2000Multi-Query Optimization and Applications22 Incremental Re-optimization Efficient Propagation Ancestor nodes visited bottom-up in a topological order –Guarantees no revisits Propagation path pruned if the current node’s best cost remains unchanged

23 May 2000Multi-Query Optimization and Applications23 Improving Efficiency Summary Input: DAG for query Q Output: Set of nodes to materialize, corresp. best plan 1. X = {}; Y = set of equivalence nodes in DAG 2. While( Y  {} ) Pick z  Y which maximizes Benefit(z | Q, X) If( Benefit(z | Q, X) > 0 ) Y = Y – {z}; X = X U {z} Else Y = {} 3. Return X  Restrict the set of materialization candidates  Compute Benefit efficiently  Heuristically avoid computing Benefit for some nodes  

24 May 2000Multi-Query Optimization and Applications24 Avoiding Benefit Computation Monotonicity Assumption –Benefit of a node does not increase due to materialization of other nodes Often true  An earlier benefit of a node is an upper bound on its current benefit  Do not recompute a node’s benefit if another node’s current benefit is greater Optimization costs decrease by 90%

25 May 2000Multi-Query Optimization and Applications25 Experimental Results TPCD-0.1 on Microsoft SQL Server 6.5 –using SQL rewriting for MQO

26 May 2000Multi-Query Optimization and Applications26 Alternatives to Greedy Volcano-SH A lightweight post-pass heuristic 1.Compute the best plan for each query independently, using Volcano 2.Find the set of nodes in the best plans to materialize (cost-based) Similar previous work [Subramanium and Venkataraman, SIGMOD 1998]

27 May 2000Multi-Query Optimization and Applications27 Alternatives to Greedy Volcano-RU A lightweight extension of Volcano 1.Batched queries optimized in sequence Q1, Q2, …, Qn 2.Find the best plan for query Qi given the best plans for queries Qj, j < i 3.Cost based materialization of nodes in best plans of Qj, j < i Plan quality sensitive to the query sequence

28 May 2000Multi-Query Optimization and Applications28 Experimental Results TPCD-0.1 query batches

29 May 2000Multi-Query Optimization and Applications29 Experimental Results TPCD-0.1 query batches

30 May 2000Multi-Query Optimization and Applications30 Features Easily implemented –First MQO implementation integrated with a state-of-the-art optimizer (as far as we know) –Also partially prototyped on Microsoft SQL-Server Support for index selection –Index modeled as physical property (like “interesting order”) Extensible and flexible –New operators, data models –Readily adapts to other problems Query result caching Materialized view selection/maintenance

31 Query Result Caching P. Roy, K. Ramamritham, S. Seshadri, P. Shenoy and S. Sudarshan, Don’t Trash Your Intermediate Results, Cache ‘em, Submitted for publication

32 May 2000Multi-Query Optimization and Applications32 Problem Statement Minimize the total execution time of an online workload by –Caching intermediate/final results of individual queries, and –Using these cached results to answer later queries

33 May 2000Multi-Query Optimization and Applications33 System Model

34 May 2000Multi-Query Optimization and Applications34 Contributions Intermediate as well as final results cached –Optimizer-driven cache management –Adapts to workload changes Cache-aware cost-based optimization –Novel framework for cached result matching

35 May 2000Multi-Query Optimization and Applications35 Experimental Results Overheads negligible Performance on 900 query TPCD-1 based uniform cube-point workload

36 Materialized View Selection and Maintenance Hoshi Mistry, Prasan Roy, K. Ramamritham and S. Sudarshan, Materialized View Selection and Maintenance Using Multi-Query Optimization, Submitted for publication

37 May 2000Multi-Query Optimization and Applications37 Problem Statement Speed up maintenance of a set of materialized views by –Exploiting CSEs between different view maintenance expressions –Selecting additional views to be materialized

38 May 2000Multi-Query Optimization and Applications38 Contributions Optimization of maintenance expressions –Support for transiently materialized “delta’’ views Nicely integrates transient vs permanent view materialization choices

39 May 2000Multi-Query Optimization and Applications39 Experimental Results Overheads negligible Performance benefit for maintenance of two TPCD-0.1 based SPJA views

40 May 2000Multi-Query Optimization and Applications40 Conclusion MQO is practical –Low overheads, high benefits –Easily implemented and integrated Leads to novel solutions to related problems –Query result caching –Materialized view selection and maintenance

41 May 2000Multi-Query Optimization and Applications41 Future Work Further extensions of MQO –Shared execution pipelines Query result caching in presence of updates Other problems –Continuous queries, XML view caching, etc.

42 May 2000Multi-Query Optimization and Applications42 Other Contributions Garbage Collection in Object Oriented Databases –Developed a “transaction-aware” cyclic reference counting algorithm –Provided a formal proof of correctness S. Ashwin, Prasan Roy, S. Seshadri, Avi Silberschatz and S. Sudarshan, Garbage Collection in Object-Oriented Databases Using Transactional Cyclic Reference Counting, VLDB 1997 Prasan Roy, S. Seshadri, Avi Silberschatz, S. Sudarshan and S. Ashwin, Garbage Collection in Object-Oriented Databases Using Transactional Cyclic Reference Counting, Invited Paper, VLDB Journal, August 1998


Download ppt "Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay."

Similar presentations


Ads by Google