Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay.

Slides:



Advertisements
Similar presentations
Porosity Aware Buffered Steiner Tree Construction C. Alpert G. Gandham S. Quay IBM Corp M. Hrkic Univ Illinois Chicago J. Hu Texas A&M Univ.
Advertisements

Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Traveling Salesperson Problem
Optimizing Join Enumeration in Transformation-based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY VLDB 2014
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Topics covered: CPU Architecture CSE 243: Introduction to Computer Architecture and Hardware/Software Interface.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Code Generation Steve Johnson. May 23, 2005Copyright (c) Stephen C. Johnson The Problem Given an expression tree and a machine architecture, generate.
Structural Joins: A Primitive for Efficient XML Query Pattern Matching Al Khalifa et al., ICDE 2002.
The Volcano/Cascades Query Optimization Framework
Fast Algorithms For Hierarchical Range Histogram Constructions
Solving Problem by Searching
Dynamic Programming.
EE 553 Integer Programming
AI – Week 5 Implementing your own AI Planner in Prolog – part II : HEURISTICS Lee McCluskey, room 2/09
Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams Presented by: Andy Mason and Sheng Zhong Ahmed M.Ayad and Jeffrey.
Outline SQL Server Optimizer  Enumeration architecture  Search space: flexibility/extensibility  Cost and statistics Automatic Physical Tuning  Database.
An Efficient Cost-Driven Selection Tool for Microsoft SQL Server Surajit ChaudhuriVivek Narasayya Indian Institute of Technology Bombay CS632 Course seminar.
Chapter 14 An Overview of Query Optimization. Copyright © 2005 Pearson Addison-Wesley. All rights reserved Figure 14.1 Typical architecture for.
Cs44321 CS4432: Database Systems II Query Optimizer – Cost Based Optimization.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Design of Optimal Multiple Spaced Seeds for Homology Search Jinbo Xu School of Computer Science, University of Waterloo Joint work with D. Brown, M. Li.
Recent Development on Elimination Ordering Group 1.
MAE 552 – Heuristic Optimization Lecture 27 April 3, 2002
Exploiting Correlated Attributes in Acquisitional Query Processing Amol Deshpande University of Maryland Joint work with Carlos Sam
1 R-Trees for Spatial Indexing Yanlei Diao UMass Amherst Feb 27, 2007 Some Slide Content Courtesy of J.M. Hellerstein.
Cost-Based Plan Selection Choosing an Order for Joins Chapter 16.5 and16.6 by:- Vikas Vittal Rao ID: 124/227 Chiu Luk ID: 210.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
Parametric Query Generation Student: Dilys Thomas Mentor: Nico Bruno Manager: Surajit Chaudhuri.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.
Query Optimization CS 157B Ch. 14 Mien Siao. Outline Introduction Steps in Cost-based query optimization- Query Flow Projection Example Query Interaction.
Query Processing Presented by Aung S. Win.
A Randomized Approach to Robot Path Planning Based on Lazy Evaluation Robert Bohlin, Lydia E. Kavraki (2001) Presented by: Robbie Paolini.
Loading a Cache with Query Results Laura Haas, IBM Almaden Donald Kossmann, Univ. Passau Ioana Ursu, IBM Almaden.
Access Path Selection in a Relational Database Management System Selinger et al.
Database Management 9. course. Execution of queries.
Query Optimization. Query Optimization Query Optimization The execution cost is expressed as weighted combination of I/O, CPU and communication cost.
Query Optimization (CB Chapter ) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented.
Query Optimization Arash Izadpanah. Introduction: What is Query Optimization? Query optimization is the process of selecting the most efficient query-evaluation.
Lecture 4 - Query Optimization Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,
The Volcano Query Optimization Framework S. Sudarshan (based on description in Prasan Roy’s thesis Chapter 2)
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Union-find Algorithm Presented by Michael Cassarino.
Acclimatizing Taxonomic Semantics for Hierarchical Content Categorization --- Lei Tang, Jianping Zhang and Huan Liu.
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.
Speeding Up Warehouse Physical Design Using A Randomized Algorithm Minsoo Lee Joachim Hammer Dept. of Computer & Information Science & Engineering University.
Materialized View Selection and Maintenance using Multi-Query Optimization Hoshi Mistry Prasan Roy S. Sudarshan Krithi Ramamritham.
Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna.
Problem Reduction So far we have considered search strategies for OR graph. In OR graph, several arcs indicate a variety of ways in which the original.
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
Search Techniques CS480/580 Fall Introduction Trees: – Root, parent, child, sibling, leaf node, node, edge – Single path from root to any node Graphs:
Scalable and Topology-Aware Load Balancers in Charm++ Amit Sharma Parallel Programming Lab, UIUC.
A dynamic algorithm for topologically sorting directed acyclic graphs David J. Pearce and Paul H.J. Kelly Imperial College, London, UK
Graph Data Management Lab, School of Computer Science Branch Code: A Labeling Scheme for Efficient Query Answering on Tree
An Exact Algorithm for Difficult Detailed Routing Problems Kolja Sulimma Wolfgang Kunz J. W.-Goethe Universität Frankfurt.
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
Flexible and fast convergent learning agent Miguel A. Soto Santibanez Michael M. Marefat Department of Electrical and Computer Engineering University of.
Yinghui Wu, SIGMOD Incremental Graph Pattern Matching Wenfei Fan Xin Wang Yinghui Wu University of Edinburgh Jianzhong Li Jizhou Luo Harbin Institute.
1 Semijoin Reduction in Query Processors Stocker, Kossman, Braumandl, Kemper Integrating Semi-Join-Reducers into State-of-the-Art Query Processors ICDE.
Best-first search is a search algorithm which explores a graph by expanding the most promising node chosen according to a specified rule.
CHC ++: Coherent Hierarchical Culling Revisited Oliver Mattausch, Jiří Bittner, Michael Wimmer Institute of Computer Graphics and Algorithms Vienna University.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Chapter 14: Query Optimization
Lecture 6- Query Optimization (continued)
Automatic Physical Design Tuning: Workload as a Sequence
A Framework for Testing Query Transformation Rules
Presentation transcript:

Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

Overview Multi-Query Optimization: What? –Problem statement Multi-Query Optimization: Why? –Application scenarios Multi-Query Optimization: How? –A cost-based practical approach –Prototyping Multi-Query Optimization On MS SQL-Server at Microsoft Research prototype at IIT-Bombay

Multi-Query Optimization: What? Exploit common subexpressions (CSEs) in query optimization Consider DAG execution plans in addition to tree execution plans

Example A B C B C D Best Plan for A JOIN B JOIN C Best Plan for B JOIN C JOIN D

Example (contd) Alternative: A B C D Common Subexpression

Multi-Query Optimization: Why? Queries on views, nested queries, … Overlapping query batches generated by applications Update expressions for materialized views Query invocations with different parameters... Practical solutions needed!

Multi-Query Optimization: How? Set up the search space –Identify the common subexpressions Explore the search space efficiently –Find the best way to exploit the common subexpressions

Problems Materializing and sharing a CSE not necessarily cheaper Mutually exclusive alternatives (A JOIN B JOIN C) (B JOIN C JOIN D) (B JOIN C JOIN D) (C JOIN D JOIN E) (C JOIN D JOIN E) (B JOIN C)(C JOIN D) What to share: (B JOIN C) or (C JOIN D) ? Huge search space!

Earlier Work: Practical Solutions As early as 1976 Preprocess query before optimization [Hall, IBM-JRD76] As late as 1998 Postprocess optimized plans [Subramanium and Venkataraman, SIGMOD98] Query optimizer is not aware!

Earlier Work: Theoretical Studies [Sellis, TODS88], [Cosar et al., CIKM93], [Shim et al., DKE94],... Set of queries {Q1, Q2, …, Qn} For each query Qi, set of execution plans {Pi1, Pi2, …, Pim} Pij is a set of tasks from a common pool Pick a plan for each query such that the cost of tasks in the union is minimized Not integrated with existing optimizers, no practical study

Microsoft Experience with Paul Larson, Microsoft Research

Prototyping MQO on SQL-Server Add multi-query optimization capability to SQL-Server Well integrated with the existing optimization framework –another optimization level –minimal changes, minimal extra lines of code First cut: exhaustive –How slow can it be? A working prototype by the summer-end

What (almost) already exists in the SQL-Server Optimizer AND/OR Query-DAG representation of plan space Group (OR node) A B CD Op (AND node)

What actually exists in the SQL-Server Optimizer Relations cloned for each use A B1 C1 D B2 C2

Preprocessing Step: Query-DAG Unification Performed in a bottom-up traversal A B1 C1  D B2 C2    

Common Subexpression Identification Unified nodes are CSEs Common Subexpression A B CD

Exploring the Search Space: A Naïve Algorithm For each set S of common subexpressions –materialize each node in S –MatCost(S) = sum of materialization costs of the nodes in S –invoke optimizer to find the best plan for the root and for each node S –CompCost(S) = sum of costs of above plans –Cost(S) = MatCost(S) + CompCost(S) Pick S with the minimum Cost

Doing Better: Incremental Reoptimization Goal: best plan for Si  best plan for Sj Observation – Best plans change for only the ancestors of nodes in Si XOR Sj Algorithm: –Propagate changed costs in bottom-up topological order from nodes in Si XOR Sj –Update min-cost plan at each node visited –Do not propagate further up if min-cost plan remains unchanged at a node Work done at IIT-Bombay

min-cost min-cost Incremental Optimization: Example Si =  A B CD

Previous min-cost New min-cost Incremental Optimization: Example Si =  Sj = {(B JOIN C)} Now materialized A B CD   

Current Status A first-cut implementation working –Lines of C++ code added: 1500 approx.

Future Work Performance tuning and smarter data structures needed Ways to restrict enumeration taking DAG structure into account

Research at IIT-Bombay: Heuristics for MQO with S. Sudarshan, S. Seshadri

A Greedy Heuristic Pick nodes for materialization one at a time, in “benefit” order Benefit(n) = reduction in cost on materialization of n Benefit computation is expensive

Monotonicity Assumption Benefit of a node does not increase due to materialization of other nodes Exploited to avoid some benefit computations Optimization costs decrease by 90%

A Postpass Heuristic: Volcano-SH No change in Volcano best plan computation Cost-based materialization of nodes in best Volcano plan Implementation easy Low overhead Optimizer is not aware

A Volcano Variant: Volcano-RU Volcano best plan search aware of best plans for earlier queries –Cost based materialization of best plan nodes that are used by later queries Implementation easy Low overhead Local decisions, plan quality sensitive to query sequence

Experimental Conclusion Greedy –Expensive, but practical –Overheads typically offset by plan quality especially for expensive “canned” queries –Almost linear scaleup with query batch size typically, only the width of the Query DAG affected Volcano-RU –Mostly better than Volcano-SH, same overhead –Negligible overhead over Volcano recommended for cheap but complex queries

Conclusion Multi-query optimization is needed Multi-query optimization is practical! Multi-query optimization is an easy next step for DAG-based optimizers