The Volcano/Cascades Query Optimization Framework

Slides:

Advertisements

Similar presentations

Representing Boolean Functions for Symbolic Model Checking Supratik Chakraborty IIT Bombay.

Advertisements

Graph Mining Laks V.S. Lakshmanan

Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,

Optimizing Join Enumeration in Transformation-based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY VLDB 2014

C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

Fast Algorithms For Hierarchical Range Histogram Constructions

Depth-First Search1 Part-H2 Depth-First Search DB A C E.

EXECUTION PLANS By Nimesh Shah, Amit Bhawnani. Outline  What is execution plan  How are execution plans created  How to get an execution plan  Graphical.

Transform and Conquer Chapter 6. Transform and Conquer Solve problem by transforming into: a more convenient instance of the same problem (instance simplification)

Tree Data Structures &Binary Search Tree 1. Trees Data Structures Tree  Nodes  Each node can have 0 or more children  A node can have at most one parent.

B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree

Fractional Cascading CSE What is Fractional Cascading anyway? An efficient strategy for dealing with iterative searches that achieves optimal.

Chapter 12: Expert Systems Design Examples

Outline SQL Server Optimizer  Enumeration architecture  Search space: flexibility/extensibility  Cost and statistics Automatic Physical Tuning  Database.

Sorting Chapter Sorting Consider list x 1, x 2, x 3, … x n We seek to arrange the elements of the list in order –Ascending or descending Some O(n.

B+-tree and Hashing.

UnInformed Search What to do when you don’t know anything.

1 On Compressing Web Graphs Michael Mitzenmacher, Harvard Micah Adler, Univ. of Massachusetts.

Course Review COMP171 Spring Hashing / Slide 2 Elementary Data Structures * Linked lists n Types: singular, doubly, circular n Operations: insert,

©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.

Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.

Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Bayes-ball—an Efficient Algorithm to Assess D-separation A Presentation for.

1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.

Review of Graphs A graph is composed of edges E and vertices V that link the nodes together. A graph G is often denoted G=(V,E) where V is the set of vertices.

Graph Algorithms Using Depth First Search Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Analysis of Algorithms.

Jingren Zhou, Per-Ake Larson, Ronnie Chaiken ICDE 2010 Talk by S. Sudarshan, IIT Bombay Some slides from original talk by Zhou et al. 1.

Database System Concepts 5 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Dr. Alexandra I. Cristea.

Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay.

Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.

Access Path Selection in a Relational Database Management System Selinger et al.

Database Management 9. course. Execution of queries.

Query Optimization Arash Izadpanah. Introduction: What is Query Optimization? Query optimization is the process of selecting the most efficient query-evaluation.

Prof. Swarat Chaudhuri COMP 482: Design and Analysis of Algorithms Spring 2012 Lecture 10.

Lecture 4 - Query Optimization Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch

The Volcano Query Optimization Framework S. Sudarshan (based on description in Prasan Roy’s thesis Chapter 2)

Dive into the Query Optimizer Dive into the Query Optimizer: Undocumented Insight Benjamin Nevarez Blog: benjaminnevarez.com

Mehdi Mohammadi March Western Michigan University Department of Computer Science CS Advanced Data Structure.

Memory Management during Run Generation in External Sorting – Larson & Graefe.

The Volcano Optimizer Generator Extensibility and Efficient Search.

Applications of Dynamic Programming and Heuristics to the Traveling Salesman Problem ERIC SALMON & JOSEPH SEWELL.

Spatial Query Processing Spatial DBs do not have a set of operators that are considered to be basic elements in a query evaluation. Spatial DBs handle.

Lecture 3: Uninformed Search

Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

Materialized View Selection and Maintenance using Multi-Query Optimization Hoshi Mistry Prasan Roy S. Sudarshan Krithi Ramamritham.

CS4432: Database Systems II Query Processing- Part 2.

Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay.

Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Optimization.

Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.

BDDs1 Binary Tree Representation The recursive Shannon expansion corresponds to a binary tree Example: Each path from the root to a leaf corresponds to.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Mining Complex Data COMP Seminar Spring 2011.

Query Processing and Query Optimization Database System Implementation CSE 507 Slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts.

Chapter 14: Query Optimization

Database System Implementation CSE 507

RankSQL: Query Algebra and Optimization for Relational Top-k Queries

Chapter 13: Query Optimization

CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12

Prepared by : Ankit Patel (226)

CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.

Overview of Query Optimization

CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12

Spatio-Temporal Databases

What to do when you don’t know anything know nothing

Lectures on Graph Algorithms: searching, testing and sorting

A Framework for Testing Query Transformation Rules

CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.

Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1

Presentation transcript:

The Volcano/Cascades Query Optimization Framework S. Sudarshan

Transformation Rules Commutativity Associativity Selection Push Down

Volcano/Cascades Framework for Query Optimization Based on equivalence rules Key benefit: extensibility As compared to System-R style join-order optimization+extensions: easy to add rules to deal with new operators e.g. outerjoin group-by/aggregate, limit, ... Memoization technique which generalizes System R style dynamic programming applicable even with equivalence rules Developed by Goetz Graefe as follow up to Exodus optimizer Used in SQL Server, Tandem, and Greenplum/Orca, and several other databases, increasing adoption Description in this talk based on Prasan Roy’s thesis Point 2 – not clear IIT Bombay

Enumeration of Equivalent Expressions Query optimizers use equivalence rules to systematically generate expressions equivalent to the given expression Can generate all equivalent expressions as follows: Repeat apply all applicable equivalence rules on every equivalent expression found so far add newly generated expressions to the set of equivalent expressions Until no new equivalent expressions are generated above

The above approach is very expensive in space and time Two approaches Optimized plan generation based on transformation rules Special case approach for queries with only selections, projections and joins

Implementing Transformation Based Optimization Space requirements reduced by sharing common sub-expressions: when E1 is generated from E2 by an equivalence rule, usually only the top level of the two are different, subtrees below are the same and can be shared using pointers E.g. when applying join commutativity Same sub-expression may get generated multiple times Detect duplicate sub-expressions and share one copy E1 E2

Implementing Transformation Based Optimization Time requirements are reduced by not generating all expressions Dynamic programming We will study only the special case of dynamic programming for join order optimization E1 E2

Steps in Transformation Rule Based Query Optimization 1. Logical plan space generation 2. Physical plan space generation 3. Search for best plan

Logical Query DAG

Logical Query DAG A Logical Query DAG (LQDAG) is a directed acyclic graph whose nodes can be divided into equivalence nodes and operation nodes Equivalence nodes have only operation nodes as children and Operation nodes have only equivalence nodes as children.

Steps in Creating LQDAG

Creating the LQDAG How to do this efficiently?

Checking for Duplicates Each equivalence node has an ID base case: relation IDs When a transformation is applied, need to check if expression is already present Idea: transformation is local, some equivalence nodes are just copied unchanged For all new operations in the transformation result, check (bottom up) if already present using a hash table hash table (aka memo structure in Volcano/Cascades) hash function h(operation, IDs of operation inputs) stores ID of equivalence node for which the above is a child if not present in hash table, create new equivalence node else reuse equivalence nodes ID when computing hash for parent

Physical Query DAG Take into account Physical properties algorithms for computing operations useful physical properties Physical properties generalizes System R notion of “interesting sort order” e.g. compression, encryption, location (in a distributed DB), etc. Enforcers returns same logical result, but with different physical properties Algorithms may also generate results with useful physical properties

Physical DAG Generation (e,p) ……cont ……

Physical DAG Generation

Physical Query DAG Physical Query DAG for A joinA.X=B.Y B

Physical Property Subsumption E.g. sort on (A,B) subsumes sort on (A) and sort(A) subsumes unsorted physical equivalence node e subsumes physical equivalence node e’ iff any plan that computes e can be used as a plan that computes e’ Useful for multiquery optimization But ignored by Volcano

Finding The Best Plan In Volcano: physical DAG generation interleaved with finding best plan branch and bound pruning, avoids exploring much of the search space in Prasan’s version: no pruning (required for MQO) Also in Prasan’s version: find best plan procedure split into two procedures one for best enforcer plan, and one for best algorithm plan

Finding The Best Plan

Finding Best Enforcer Plan

Finding Best Algorithm Plan

Original Volcano FindBestPlan FindBestPlan (LogExpr, PhysProp, Limit) if the pair LogExpr and PhysProp is in the look-up table if the cost in the look-up table < Limit return Plan and Cost else return failure /* else: optimization required */ create the set of possible "moves" from applicable transformations algorithms that give the required PhysProp enforcers for required PhysProp order the set of moves by promise

Original Volcano FindBestPlan for the most promising moves if the move uses a transformation apply the transformation creating NewLogExpr call FindBestPlan (NewLogExpr, PhysProp, Limit) else if the move uses an algorithm TotalCost := cost of the algorithm for each input I while TotalCost < Limit determine required physical properties PP for I Cost = FindBestPlan (I, PP, Limit − TotalCost) add Cost to TotalCost else /* move uses an enforcer */ TotalCost := cost of the enforcer modify PhysProp for enforced property call FindBestPlan for LogExpr with new PhysProp

Original Volcano FindBestPlan /* maintain the look-up table of explored facts */ if LogExpr is not in the look-up table insert LogExpr into the look-up table insert PhysProp and best plan found into look-up table return best Plan and Cost

Complexity of Rule Sets Pellenkoft [1997] showed that Associativity+commutativity leads to O(4n) time cost Due to duplicates, as against O(3n) with System-R style dynamic programming Proposed new ruleset RS-B2 ensuring O(3n) cost RS-B1 Commutativity + Left Associativity: Takes O(4^n) time RS-B2 Pellenkoft et. al [VLDB97] suggest new ruleset: O(3^n) time

Pellenkoft et al.’s Rule Set RS-B2 Key idea: disable certain transformation on the result of a transformation IIT Bombay

Avoiding Cross Products System R algorithm Dynamic programming algorithm to find best join order Time complexity: O(3n) for bushy join orders Plan space considered includes cross products For some common join topologies #cross-product free intermediate join results is polynomial E.g. chain, cycle, .. Can we reduce optimization time by avoiding cross products? Algorithms for generation of cross-product free join space Bottom up: DPccp (Moerkotte and Newmann [VLDB06]) Top-down: TDMinCutBranch (Fender et al. [ICDE11]), TDMinCutConservative (Fender et al. [ICDE12]) Time complexity is polynomial if #cross-product free intermediate join results is polynomial in size IIT Bombay

Cross-Product-Free Join Order Enumeration using Graph Partitioning Key idea for avoiding cross products while finding best join tree: For set S of relations, find all ways to partition S into S1 and S2 s.t. the join graph of S1 is connected, and so is the join graph of S2 there is an edge (join predicate) between S1 and S2 Simple recursive algorithm to find best plan in cross-product free join space using partitioning as above Efficient algorithms for finding all ways to partition S into S1 and S2 as above MinCutLazy (Dehaan and Tompa [SIGMOD07]) Fender et. al proposed MinCutBranch [ICDE11] and MinCutConservative [ICDE12] MinCutConservative is the most efficient currently. S S1 R1 R2 S2 R4 R3 IIT Bombay

Avoiding Cross-products in Transformation-Based Optimizers Key idea: suppress a transformation if its results in a cross-product Shanbhag and S., VLDB 2014 show RS-B1 modified to suppress cross products is complete but expensive RS-B2 extended to suppress cross products is not complete Propose new ruleset for innerjoins which Works in a non-local manner (considers maximal sets of adjacent joins) Exploits graph partitioning to avoid cross products Is very efficient in practice

Cascades Optimization Framework Extension to the Volcano framework, by Graefe et al. Notion of tasks, e.g. application of logical or physical equivalence rule At an equivalence node or at an operation node Execution of a task may result in creation of other tasks Allows tasks to be prioritized (but still in DFS)