Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

Slides:



Advertisements
Similar presentations
Recap: Mining association rules from large datasets
Advertisements

Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
An Introduction to Artificial Intelligence
DOLAP'04 - Washington DC1 Constructing Search Space for Materialized View Selection Dimiti Theodoratos Wugang Xu New Jersey Institute of Technology.
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
Optimizing Join Enumeration in Transformation-based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY VLDB 2014
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
The Volcano/Cascades Query Optimization Framework
Fast Algorithms For Hierarchical Range Histogram Constructions
Solving Problem by Searching
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
SPARK: Top-k Keyword Query in Relational Databases Yi Luo, Xuemin Lin, Wei Wang, Xiaofang Zhou Univ. of New South Wales, Univ. of Queensland SIGMOD 2007.
Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams Presented by: Andy Mason and Sheng Zhong Ahmed M.Ayad and Jeffrey.
Outline SQL Server Optimizer  Enumeration architecture  Search space: flexibility/extensibility  Cost and statistics Automatic Physical Tuning  Database.
Incremental Maintenance for Non-Distributive Aggregate Functions work done at IBM Almaden Research Center Themis Palpanas (U of Toronto) Richard Sidle.
Continuous Data Stream Processing  Music Virtual Channel – extensions  Data Stream Monitoring – tree pattern mining  Continuous Query Processing – sequence.
Chapter 6: Database Evolution Title: AutoAdmin “What-if” Index Analysis Utility Authors: Surajit Chaudhuri, Vivek Narasayya ACM SIGMOD 1998.
Flexible and Efficient XML Search with Complex Full-Text Predicates Sihem Amer-Yahia - AT&T Labs Research → Yahoo! Research Emiran Curtmola - University.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.
Applying Edge Partitioning to SPFD's 1 Applying Edge Partitioning to SPFD’s 219B Project Presentation Trevor Meyerowitz Mentor: Subarna Sinha Professor:
Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
A Theoretical Study of Optimization Techniques Used in Registration Area Based Location Management: Models and Online Algorithms Sandeep K. S. Gupta Goran.
Optimizing Queries Using Materialized Views Qiang Wang CS848.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,
Access Path Selection in a Relational Database Management System Selinger et al.
EN : Adv. Storage and TP Systems Cost-Based Query Optimization.
Database Management 9. course. Execution of queries.
Query Optimization Arash Izadpanah. Introduction: What is Query Optimization? Query optimization is the process of selecting the most efficient query-evaluation.
Lecture 4 - Query Optimization Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,
The Volcano Query Optimization Framework S. Sudarshan (based on description in Prasan Roy’s thesis Chapter 2)
Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University,
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Index Interactions in Physical Design Tuning Modeling, Analysis, and Applications Karl Schnaitter, UC Santa Cruz Neoklis Polyzotis, UC Santa Cruz Lise.
The Volcano Optimizer Generator Extensibility and Efficient Search.
To Tune or not to Tune? A Lightweight Physical Design Alerter Nico Bruno, Surajit Chaudhuri DMX Group, Microsoft Research VLDB’06.
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.
Johannes Kepler University Linz Department of Business Informatics Data & Knowledge Engineering Altenberger Str. 69, 4040 Linz Austria/Europe
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Speeding Up Warehouse Physical Design Using A Randomized Algorithm Minsoo Lee Joachim Hammer Dept. of Computer & Information Science & Engineering University.
Materialized View Selection and Maintenance using Multi-Query Optimization Hoshi Mistry Prasan Roy S. Sudarshan Krithi Ramamritham.
M.Kersten MonetDB, Cracking and recycling Martin Kersten CWI Amsterdam.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov.
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.
Database Systems, 8 th Edition SQL Performance Tuning Evaluated from client perspective –Most current relational DBMSs perform automatic query optimization.
Query Processing and Query Optimization Database System Implementation CSE 507 Slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
Chapter 14: Query Optimization
Chapter 13: Query Optimization
RE-Tree: An Efficient Index Structure for Regular Expressions
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Automatic Physical Design Tuning: Workload as a Sequence
View and Index Selection Problem in Data Warehousing Environments
A Framework for Testing Query Transformation Rules
Query Optimization.
Materializing Views With Minimal Size To Answer Queries
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Presentation transcript:

Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay

May 2000Multi-Query Optimization and Applications2 Motivation Queries often involve repeated computation –Queries on overlapping views, stored procedures, nested queries, etc. –Update expressions for a set of overlapping materialized views –Automatically generated queries XML-QL complex path expressions  SQL query batches Our focus: Faster query processing by avoiding repeated computation

May 2000Multi-Query Optimization and Applications3 Outline Multi-query optimization Application to related problems –Query result caching –Materialized view selection and maintenance Conclusions and future work

Multi-Query Optimization Prasan Roy, S. Seshadri, S. Sudarshan and Siddhesh Bhobe, Efficient and Extensible Algorithms for Multi-Query Optimization, ACM SIGMOD 2000

May 2000Multi-Query Optimization and Applications5 Motivating Example A B C B CD Best Plan for A JOIN B JOIN C Best Plan for B JOIN C JOIN D Foreign Key Dependency: A  B  C  D Total Cost =

May 2000Multi-Query Optimization and Applications6 BC Motivating Example A B C D Total Cost = 370 Benefit = Foreign Key Dependency: A  B  C  D

May 2000Multi-Query Optimization and Applications7 Problem Statement A B C D Find the cheapest plan exploiting transiently materialized common subexpressions (CSEs) –Assumption: No shared pipelines Common Subexpression

May 2000Multi-Query Optimization and Applications8 Problems Locally optimal subplans may not be globally optimal Mutually exclusive alternatives (A JOIN B JOIN C) (B JOIN C JOIN D) (B JOIN C JOIN D) (C JOIN D JOIN E) (C JOIN D JOIN E) (B JOIN C)(C JOIN D) What to share: (B JOIN C) or (C JOIN D) ? Materializing and sharing a CSE not necessarily cheaper

May 2000Multi-Query Optimization and Applications9 Example A B C B CD Best Plan for A JOIN B JOIN C Best Plan for B JOIN C JOIN D Foreign Key Dependency: A  B  C  D Total Cost =

May 2000Multi-Query Optimization and Applications10 BC Example A B C D Foreign Key Dependency: A  B  C  D Total Cost = 172 Benefit = -18

May 2000Multi-Query Optimization and Applications11 Approach 1. Set up the search space of execution plans 2. Explore the search space to find the best execution plan

May 2000Multi-Query Optimization and Applications12 Representation of Plan Space Equivalence Class (OR node) Operation (AND node) AND/OR Query DAG BC A ABC BCD CD AB C D B Example Plan (Solution Graph)

May 2000Multi-Query Optimization and Applications13 DAG Generation Modifications Unification Volcano: Duplicate subexpressions  No CSEs! BC A ABC AB C B BC BCD CD C D B Modification: Duplicate subexpressions unified

May 2000Multi-Query Optimization and Applications14 DAG Generation Modifications Subsumption Volcano: No expression subsumption  Missed CSEs  (A<10)  (A>50)  (A 50)  (A>50)  (A>10)  (A>50) Subsumptionderivation Modification: Subsumption derivations introduced

May 2000Multi-Query Optimization and Applications15 Exploring the Search Space An Exhaustive Algorithm Input: DAG for query Q Output: Set of nodes to materialize, corresp. best plan 1. Y = set of equivalence nodes in DAG 2. Pick X  Y which minimizes BestCost(Q, X) 3. Return X BestCost(Q, X) = cost of the best plan for Q given that the nodes in X are transiently materialized Too expensive! Need heuristics.

May 2000Multi-Query Optimization and Applications16 Exploring the Search Space A Greedy Heuristic Input: DAG for query Q Output: Set of nodes to materialize, corresp. best plan 1. X = {}; Y = set of equivalence nodes in DAG 2. While( Y  {} ) Pick z  Y which maximizes Benefit(z | Q, X) If( Benefit(z | Q, X) > 0 ) Y = Y – {z}; X = X U {z} Else Y = {} 3. Return X Benefit(z | Q, X) = BestCost(Q, X) - BestCost(Q, X U {z}) Appeared in [Gupta, ICDT97]. Our Contribution: improve efficiency

May 2000Multi-Query Optimization and Applications17 Improving Efficiency Summary Input: DAG for query Q Output: Set of nodes to materialize, corresp. best plan 1. X = {}; Y = set of equivalence nodes in DAG 2. While( Y  {} ) Pick z  Y which maximizes Benefit(z | Q, X) If( Benefit(z | Q, X) > 0 ) Y = Y – {z}; X = X U {z} Else Y = {} 3. Return X  Restrict the set of materialization candidates  Compute Benefit efficiently  Heuristically avoid computing Benefit for some nodes  

May 2000Multi-Query Optimization and Applications18 Improving Efficiency Only CSEs Materialized CSEs identified in a bottom-up traversal Common Subexpression BC A ABC BCD CD AB C D B

May 2000Multi-Query Optimization and Applications19 Improving Efficiency Summary Input: DAG for query Q Output: Set of nodes to materialize, corresp. best plan 1. X = {}; Y = set of equivalence nodes in DAG 2. While( Y  {} ) Pick z  Y which maximizes Benefit(z | Q, X) If( Benefit(z | Q, X) > 0 ) Y = Y – {z}; X = X U {z} Else Y = {} 3. Return X  Restrict the set of materialization candidates  Compute Benefit efficiently  Heuristically avoid computing Benefit for some nodes  

May 2000Multi-Query Optimization and Applications20 Efficient Benefit Computation Incremental Re-optimization X : Set of CSEs already materialized z : unmaterialized CSE Best plan given X materialized  Best plan given X U {z} materialized Observation Best plans change only for the ancestors of z

May 2000Multi-Query Optimization and Applications21 Incremental Re-optimization Example BC ABC BCD CD AB Best Plan X = {} z = (B JOIN C) BC C BA D 

May 2000Multi-Query Optimization and Applications22 Incremental Re-optimization Efficient Propagation Ancestor nodes visited bottom-up in a topological order –Guarantees no revisits Propagation path pruned if the current node’s best cost remains unchanged

May 2000Multi-Query Optimization and Applications23 Improving Efficiency Summary Input: DAG for query Q Output: Set of nodes to materialize, corresp. best plan 1. X = {}; Y = set of equivalence nodes in DAG 2. While( Y  {} ) Pick z  Y which maximizes Benefit(z | Q, X) If( Benefit(z | Q, X) > 0 ) Y = Y – {z}; X = X U {z} Else Y = {} 3. Return X  Restrict the set of materialization candidates  Compute Benefit efficiently  Heuristically avoid computing Benefit for some nodes  

May 2000Multi-Query Optimization and Applications24 Avoiding Benefit Computation Monotonicity Assumption –Benefit of a node does not increase due to materialization of other nodes Often true  An earlier benefit of a node is an upper bound on its current benefit  Do not recompute a node’s benefit if another node’s current benefit is greater Optimization costs decrease by 90%

May 2000Multi-Query Optimization and Applications25 Experimental Results TPCD-0.1 on Microsoft SQL Server 6.5 –using SQL rewriting for MQO

May 2000Multi-Query Optimization and Applications26 Alternatives to Greedy Volcano-SH A lightweight post-pass heuristic 1.Compute the best plan for each query independently, using Volcano 2.Find the set of nodes in the best plans to materialize (cost-based) Similar previous work [Subramanium and Venkataraman, SIGMOD 1998]

May 2000Multi-Query Optimization and Applications27 Alternatives to Greedy Volcano-RU A lightweight extension of Volcano 1.Batched queries optimized in sequence Q1, Q2, …, Qn 2.Find the best plan for query Qi given the best plans for queries Qj, j < i 3.Cost based materialization of nodes in best plans of Qj, j < i Plan quality sensitive to the query sequence

May 2000Multi-Query Optimization and Applications28 Experimental Results TPCD-0.1 query batches

May 2000Multi-Query Optimization and Applications29 Experimental Results TPCD-0.1 query batches

May 2000Multi-Query Optimization and Applications30 Features Easily implemented –First MQO implementation integrated with a state-of-the-art optimizer (as far as we know) –Also partially prototyped on Microsoft SQL-Server Support for index selection –Index modeled as physical property (like “interesting order”) Extensible and flexible –New operators, data models –Readily adapts to other problems Query result caching Materialized view selection/maintenance

Query Result Caching P. Roy, K. Ramamritham, S. Seshadri, P. Shenoy and S. Sudarshan, Don’t Trash Your Intermediate Results, Cache ‘em, Submitted for publication

May 2000Multi-Query Optimization and Applications32 Problem Statement Minimize the total execution time of an online workload by –Caching intermediate/final results of individual queries, and –Using these cached results to answer later queries

May 2000Multi-Query Optimization and Applications33 System Model

May 2000Multi-Query Optimization and Applications34 Contributions Intermediate as well as final results cached –Optimizer-driven cache management –Adapts to workload changes Cache-aware cost-based optimization –Novel framework for cached result matching

May 2000Multi-Query Optimization and Applications35 Experimental Results Overheads negligible Performance on 900 query TPCD-1 based uniform cube-point workload

Materialized View Selection and Maintenance Hoshi Mistry, Prasan Roy, K. Ramamritham and S. Sudarshan, Materialized View Selection and Maintenance Using Multi-Query Optimization, Submitted for publication

May 2000Multi-Query Optimization and Applications37 Problem Statement Speed up maintenance of a set of materialized views by –Exploiting CSEs between different view maintenance expressions –Selecting additional views to be materialized

May 2000Multi-Query Optimization and Applications38 Contributions Optimization of maintenance expressions –Support for transiently materialized “delta’’ views Nicely integrates transient vs permanent view materialization choices

May 2000Multi-Query Optimization and Applications39 Experimental Results Overheads negligible Performance benefit for maintenance of two TPCD-0.1 based SPJA views

May 2000Multi-Query Optimization and Applications40 Conclusion MQO is practical –Low overheads, high benefits –Easily implemented and integrated Leads to novel solutions to related problems –Query result caching –Materialized view selection and maintenance

May 2000Multi-Query Optimization and Applications41 Future Work Further extensions of MQO –Shared execution pipelines Query result caching in presence of updates Other problems –Continuous queries, XML view caching, etc.

May 2000Multi-Query Optimization and Applications42 Other Contributions Garbage Collection in Object Oriented Databases –Developed a “transaction-aware” cyclic reference counting algorithm –Provided a formal proof of correctness S. Ashwin, Prasan Roy, S. Seshadri, Avi Silberschatz and S. Sudarshan, Garbage Collection in Object-Oriented Databases Using Transactional Cyclic Reference Counting, VLDB 1997 Prasan Roy, S. Seshadri, Avi Silberschatz, S. Sudarshan and S. Ashwin, Garbage Collection in Object-Oriented Databases Using Transactional Cyclic Reference Counting, Invited Paper, VLDB Journal, August 1998