Olga Papaemmanouil, Brandeis University Nga Tran, Brandeis University Mitch Cherniack, Brandeis University.

Slides:



Advertisements
Similar presentations
Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
Advertisements

A Deferred Cleansing Method for RFID Data Analytics IBM Almaden Research Center Jun Rao Sangeeta Doraiswamy Latha S. Colby University of California at.
Using the Optimizer to Generate an Effective Regression Suite: A First Step Murali M. Krishna Presented by Harumi Kuno HP.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
CS 540 Database Management Systems
Virtual Memory Introduction to Operating Systems: Module 9.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapters 14.
Outline SQL Server Optimizer  Enumeration architecture  Search space: flexibility/extensibility  Cost and statistics Automatic Physical Tuning  Database.
An Efficient Cost-Driven Selection Tool for Microsoft SQL Server Surajit ChaudhuriVivek Narasayya Indian Institute of Technology Bombay CS632 Course seminar.
CoPhy: A Scalable, Portable, and Interactive Index Advisor for Large Workloads Debabrata Dash, Anastasia Ailamaki, Neoklis Polyzotis 1.
Selectivity-Based Partitioning Alkis Polyzotis UC Santa Cruz.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
1 Distributed Databases CS347 Lecture 14 May 30, 2001.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Query Optimization Overview Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 2, 2004 Some slide content derived.
Query Processing & Optimization
©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.
Exploiting Application Semantics: Harvest, Yield CS 444A Fall 99 Software for Critical Systems Armando Fox & David Dill © 1999 Armando Fox.
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
Using Runtime Information for Adapting Enterprise Java Beans Application Servers Mircea Trofin *, John Murphy ** Performance Engineering Laboratory * DCU,
1 Experimental Evidence on Partitioning in Parallel Data Warehouses Pedro Furtado Prof. at Univ. of Coimbra & Researcher at CISUC DEI/CISUC-Universidade.
HadoopDB Presenters: Serva rashidyan Somaie shahrokhi Aida parbale Spring 2012 azad university of sanandaj 1.
Access Path Selection in a Relational Database Management System Selinger et al.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
Query optimization in relational DBs Leveraging the mathematical formal underpinnings of the relational model.
Query Optimization. Query Optimization Query Optimization The execution cost is expressed as weighted combination of I/O, CPU and communication cost.
Query Optimization (CB Chapter ) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented.
Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.
© ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Switch off your Mobiles Phones or Change Profile to Silent Mode.
The Volcano Optimizer Generator Extensibility and Efficient Search.
Virtual Memory. Background Virtual memory is a technique that allows execution of processes that may not be completely in the physical memory. Virtual.
Spatial Query Processing Spatial DBs do not have a set of operators that are considered to be basic elements in a query evaluation. Spatial DBs handle.
To Tune or not to Tune? A Lightweight Physical Design Alerter Nico Bruno, Surajit Chaudhuri DMX Group, Microsoft Research VLDB’06.
Database Systems Disk Management Concepts. WHY DO DISKS NEED MANAGING? logical information  physical representation bigger databases, larger records,
CS4432: Database Systems II Query Processing- Part 2.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.
Query Processing CS 405G Introduction to Database Systems.
1 Distributed Databases architecture, fragmentation, allocation Lecture 1.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
Chapter 13: Query Processing
Relevant Document Distribution Estimation Method for Resource Selection Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University
CS4432: Database Systems II Query Processing- Part 1 1.
Troubleshooting Dennis Shasha and Philippe Bonnet, 2013.
Feature Generation and Selection in SRL Alexandrin Popescul & Lyle H. Ungar Presented By Stef Schoenmackers.
Database Applications (15-415) DBMS Internals- Part VIII Lecture 19, March 29, 2016 Mohammad Hammoud.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
Database Applications (15-415) DBMS Internals- Part IX Lecture 20, March 31, 2016 Mohammad Hammoud.
BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data Authored by Sameer Agarwal, et. al. Presented by Atul Sandur.
Closing the Query Processing Loop in Oracle 11g Allison Lee, Mohamed Zait.
CHAPTER 19 Query Optimization. CHAPTER 19 Query Optimization.
Database Applications (15-415) DBMS Internals- Part VIII Lecture 17, Oct 30, 2016 Mohammad Hammoud.
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
SQL Server Optimizing Query Plans
Database Management Systems (CS 564)
Automatic Physical Design Tuning: Workload as a Sequence
Database Applications (15-415) DBMS Internals- Part VII Lecture 19, March 27, 2018 Mohammad Hammoud.
Execution Plans Demystified
Outline Introduction Background Distributed DBMS Architecture
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Presentation transcript:

Olga Papaemmanouil, Brandeis University Nga Tran, Brandeis University Mitch Cherniack, Brandeis University

 natural consequence of One-Size-D-N-F-A  many new niche data management systems  many new “redesigns” of DBMS components ◦ exploiting HW Architecture (e.g., shared-nothing) ◦ exploiting SW Architecture (e.g., column-stores)  needed: Development Environment (DE) for ◦ Rapid Prototyping ◦ Evaluation ◦ Refinement of alternative component designs

 Development Environment for Query Optimizers ◦ Rapid Prototyping:Declarative Specifications, Generator Tools ◦ Evaluation:Component Benchmarks ◦ Refinement:Component Profiling Tools  Proposal-Ware: |vision| >> |substance|  Results for Join Plan Enumerator (JPE) ◦ SQL Server: better plans with less optimizer work ◦ Vertica: Part of version 2.0 Query Optimizer

 Development Environment Overview  Initial Results: JPEs  Future Work: Other Components

1.Chooses Join Order 2.Chooses Join Algorithms and Access Methods 3.Chooses Pre-Join Data Transfer (Distributed DBs) PrunerPPMJPE Join Plan Selector Join Plan Enumerator Physical Plan Mapper

SELECT... FROM L,O,C,S WHERE... Parser PrunerPPMJPE Join Plan Selector LO CS Execution Engine LO C S LS O C LO C S INLJ HJ INLJ LO C S HJ INLJ LS O C HJ OC L S INLJHJ LS O C INLJ HJ Legend Borrowed DE Supported X Y Join Graph Logical Plan Physical Plan Legend Fixed Generated

SELECT... FROM L,O,C,S WHERE... Parser PrunerPPMJPE Join Plan Selector LO CS LO C S LO C S HJ LO C S INLJ LS C O OC S L LS C O LS C O HJ OC S L INLJ OC S L HJ LO C S LO C S INLJ LS C O OC S L HJ Execution Engine Legend Borrowed DE Supported X Y Join Graph Logical Plan Physical Plan Legend Fixed Generated

SELECT... FROM L,O,C,S WHERE... Parser PrunerPPMJPE Join Plan Selector LO CS Execution Engine Legend Borrowed DE Supported X Y Join Graph Logical Plan Physical Plan Legend Fixed Generated LS INLJ OC LO C S …

Parser PrunerPPMJPE Execution Engine Join Plan Selector Legend Borrowed DE Supported X Y Join Graph Logical Plan Physical Plan Legend Fixed Generated Controller Y Join Graph X Physical Plan CostOrder Property … JoinGraph Plan PNLJLJoin …

JPEPPM Pruner Controller Plan LJoin … Property Order …

JPE JPE Spec PPM PPM Spec Pruner Pruner Spec Controller Controller Spec Plan Plan Op Spec LJoin … Property Property Spec Order …

 Development Environment Overview  Initial Results: JPEs  Future Work: Other Components

 Why not universal (exhaustive) enumeration? 1. Scalability  most products include optimization-level “knob” (low = fast)  DBA Guide [Tow03]: high knob setting for simple queries only  issue of space (not just time) 2. Relies on Brittle Cost Models  Cardinality Errors of Cost Models well-known  DBA Guide [Tow03]: low knob setting can improve plans 3. Lightweight Query Optimizers  Useful for adaptive query optimization, automatic DB design  Therefore, we assume targeted JPEs desirable

 Intuition: ◦ good plans usually perform certain joins before others ◦ a good ‘early’ join might …  produce small intermediate results  exploit ephemeral properties of inputs (e.g., order)  JPEG-Generated JPEs: ◦ specified by “Join ranking function”  higher rank = preferable to perform earlier in plan ◦ only enumerate plans that perform joins in rank order

 An Example JPEG Enumeration L O 2 2 NC, L O C 2 N 2 L O C N L O C N L O C N, LO CN

 Some Example Join Rankings A.Cardinality: 1=m::1, 2=m::n B.Order:1=Both, 2=Larger, 3=Smaller, 4=Neither C.Indexed:1=Either, 2=Neither D.Size:1=At Least One Very Small Table 2=Everything Else 3=At Least One Very Large Table E. Distribution:1=“Both Join Partitioned/One Replicated 2=“Larger Partitioned” 3=“Smaller Partitioned” 4=“Neither Partitioned”

 Results: Replacing the SQL Server JPE Query (# Tables) SQL Server SQL Server + JPEG Chosen Plan (Sec) DJWUs Chosen Plan (Sec) DJWUs Q2 (5) Q3 (3) Q5 (6) Q7 (6) Q8 (8) Q9 (6) Q10 (4) Q11 (3) Q18 (3) Q21 (4) Query (# Tables) SQL Server SQL Server + JPEG Chosen Plan (Sec) DJWUs Chosen Plan (Sec) DJWUs Q2 (5) Q3 (3) Q5 (6) Q7 (6) Q8 (8) Q9 (6) Q10 (4) Q11 (3) Q18 (3) Q21 (4) Queries Based on TPC-H, Scale Factor 10 DWJU = “Distinct Join Work Unit” Measure of Optimizer Work (# of distinct logical joins that are costed)

 Results: Replacing the SQL Server JPE Query (# Tables) SQL Server SQL Server + JPEG Chosen Plan (Sec) DJWUs Chosen Plan (Sec) DJWUs Q2 (5) Q3 (3) Q5 (6) Q7 (6) Q8 (8) Q9 (6) Q10 (4) Q11 (3) Q18 (3) Q21 (4) Better Plan, Less Work Same Plan, Less Work Same Plan, Same Work

 How to benchmark JPE independently of cost model? ◦ Best Runtime of Enumerated Plans ◦ Worst Runtime of Enumerated Plans ◦ Average Runtime of Enumerated Plans ◦ Number of Enumerated Plans  Issues: ◦ Runtimes depend on execution engine ◦ How to reconcile # of Plans vs Quality of Plans

 Primary JPE Tuning Strategies: ◦ Merging/Splitting Join Ranks  Generates more/fewer enumerated plans ◦ Reorder Join Ranks  Generates different enumerated plans  Profiling Tools to Guide Above: ◦ Rank Prevalence: What % of joins have rank i? ◦ Merge Impact: What is % increase in enumerated plans resulting from merging ranks i and j? ◦ Reorder Impact: What is effect on benchmark metrics from swapping ranks i and j?

 Development Environment Overview  Initial Results: JPEs  Future Work: Other Components

 Possible Specification: ◦ >: Plan x Plan  Boolean ◦ Ps: Set of “interesting” properties  Possible Benchmarks: ◦ >-quality: % of times predicted better plan is actually better ◦ P-quality: % of plans identified as best that would have been pruned with property P identified as interesting

 Possible Specification ◦ Top-Down vs Bottom-Up Traversal ◦ Handoff Policy: conditions for transfering control from JPE to PPM/Pruner  Possible Benchmarks (C eager vs C lazy ): ◦ Eager-Benefit: % of subplans pruned earlier by C eager than by C lazy (assuming pruned by both) ◦ Eager-Penalty: % of subplans pruned by C eager but not by C lazy

 Early-Stage Project: Development Environment for Join Plan Selection Components of Query Optimizers  Will Support Per Component: Rapid PrototypingDeclarative Specification, Generator Tool EvaluationComponent Benchmarks RefinementProfiling Tools  Early Results for JPEs: ◦ JPEG: Enumerate According to “Join Ranking Functions” ◦ Used to build JPEs for SQL Server, Vertica

?