EN 600.619: Adv. Storage and TP Systems Cost-Based Query Optimization.

Slides:



Advertisements
Similar presentations
Examples of Physical Query Plan Alternatives
Advertisements

CS4432: Database Systems II
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
STHoles: A Multidimensional Workload-Aware Histogram Nicolas Bruno* Columbia University Luis Gravano* Columbia University Surajit Chaudhuri Microsoft Research.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Query Compiler. The Query Compiler Parses SQL query into parse tree Transforms parse tree into expression tree (logical query plan) Transforms logical.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
1 Relational Query Optimization Module 5, Lecture 2.
Algebraic Laws For the binary operators, we push the selection only if all attributes in the condition C are in R.
DB performance tuning using indexes Section 8.5 and Chapters 20 (Raghu)
Cs44321 CS4432: Database Systems II Query Optimizer – Cost Based Optimization.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
The Query Compiler Section 16.3 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin.
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
Project Description 198:541. Query Processing Project 1. Exact query answering using standard indexes 2. Advanced query processing  Multidimensional.
Database System Concepts 5 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 14: Query Optimization.
Cost-Based Plan Selection Choosing an Order for Joins Chapter 16.5 and16.6 by:- Vikas Vittal Rao ID: 124/227 Chiu Luk ID: 210.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
1 Query Optimization Vishy Poosala Bell Labs. 2 Outline Introduction Necessary Details –Cost Estimation –Result Size Estimation Standard approach for.
Algebraic Laws. {P1,P2,…..} {P1,C1>...} parse convert apply laws estimate result sizes consider physical plans estimate costs pick best execute Pi answer.
CS 4432query processing - lecture 121 CS4432: Database Systems II Lecture #12 Query Processing Professor Elke A. Rundensteiner.
1 Implementation of Relational Operations: Joins.
Database System Concepts 5 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Dr. Alexandra I. Cristea.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Access Path Selection in a Relational Database Management System Selinger et al.
COMP 5138 Relational Database Management Systems Semester 2, 2007 Lecture 12 Query Processing and Optimization.
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
Database Management 9. course. Execution of queries.
Query optimization in relational DBs Leveraging the mathematical formal underpinnings of the relational model.
CPS216: Advanced Database Systems Notes 08:Query Optimization (Plan Space, Query Rewrites) Shivnath Babu.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
Lecture 4 - Query Optimization Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Query Processing and Optimization
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
1 Relational Query Optimization Chapter Query Blocks: Units of Optimization  An SQL query is parsed into a collection of query blocks :  An SQL.
Chapters 15-16a1 (Slides by Hector Garcia-Molina, Chapters 15 and 16: Query Processing.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Optimization Overview Lecture 17. Today’s Lecture 1.Logical Optimization 2.Physical Optimization 3.Course Summary 2 Lecture 17.
CS 440 Database Management Systems Query Optimization 1.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 13: Query Processing
CS4432: Database Systems II Query Processing- Part 1 1.
Database Applications (15-415) DBMS Internals- Part IX Lecture 20, March 31, 2016 Mohammad Hammoud.
Query Processing and Query Optimization Database System Implementation CSE 507 Slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts.
Query Optimization. overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin) DBA,
1 Ullman et al. : Database System Principles Notes 6: Query Processing.
Chapter 14: Query Optimization
Prepared by : Ankit Patel (226)
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Evaluation of Relational Operations: Other Operations
Examples of Physical Query Plan Alternatives
Database Applications (15-415) DBMS Internals- Part IX Lecture 21, April 1, 2018 Mohammad Hammoud.
Query Processing CSD305 Advanced Databases.
Overview of Query Evaluation
Implementation of Relational Operations
Evaluation of Relational Operations: Other Techniques
Query Optimization.
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

EN : Adv. Storage and TP Systems Cost-Based Query Optimization

EN : Adv. Storage and TP Systems The Optimization Process Logical query plan –As an expression tree Rewrite query plan to improve performance Create physical plan –Select algorithms to implement logical plan

EN : Adv. Storage and TP Systems An Expression Tree SELECT title, birthdate FROM MovieStar, StarsIn WHERE year=1996 AND gender=‘F’ AND starName= name;

EN : Adv. Storage and TP Systems An Alternate (Better) Logical Plan SELECT title, birthdate FROM MovieStar, StarsIn WHERE year=1996 AND gender=‘F’ AND starName= name;

EN : Adv. Storage and TP Systems Query Optimization Heuristics Push operators as far down the plan as possible Do selections as soon as possible –Reduce intermediate result sizes Select then project Perform joins as late as possible –They are more costly Group associative and commutative operators –Let the physical plan reorder execution

EN : Adv. Storage and TP Systems Improving the Plan Through query rewriting Split the selection

EN : Adv. Storage and TP Systems Improving the Plan Through query rewriting Split the selection Push the projection

EN : Adv. Storage and TP Systems Grouping Operators The physical (not logical) plan should pick the order

EN : Adv. Storage and TP Systems The Physical Plan Choose algorithms and estimate result size to generate concrete costs of a plan E.g. joins –Discipline: Hash, Index, Sort –Materialize, pipeline, ripple, parallel, etc. Large literature on different disciplines for all operations –Suitable for an entire (albeit detailed) course Also, how to search for good plans –Branch and bound, hill climbing, dynamic programming, etc. Result size and choice of algorithm are independent –For relation algebra operations

EN : Adv. Storage and TP Systems Estimating Result Sizes Most inaccurate and difficult part of query processing –Cost of an operation is a f ( algorithm, size estimate ) –Given exact size, costing is very accurate Sometime sizing can be exact –Equality queries for unique attributes are 0/1 –Joins on key (foreign key) fields –Good schema design improves query execution For many operations it is difficult –Joins: expand (cross product) or reduce (more often) –Range queries: produce multiple tuples 50% accuracy is considered good……ugh!

EN : Adv. Storage and TP Systems Problems w/ Estimating Size Need to know result sizes a-priori –Know them exactly after query execution Techniques need to be lightweight –Performing I/O as part of estimation reduces query performance General approach –Statistics on underlying tables for important queries –Small, summary data structures (in-memory execution) Techniques –Histograms, sampling, wavelets

EN : Adv. Storage and TP Systems Histograms SELECT Jan.day, July,day FROM Jan, July WHERE Jan.temp = July.temp Join estimate = T 1 T 2 /V tuple product/width Estimate: 5x20/ x5/10 = 10 Better than est. w/out histogram 245x245/100 = 600

EN : Adv. Storage and TP Systems On Histograms Workload defined –Keep for important fields. Similar concept to indexes. Data defined –Keep when they improve performance. –Don’t need a histogram for the uniform distribution Complications –Update queries invalidate statistics –Need to be pre-computed, often prior to witnessing workload –Composing histograms (for multiple attributes) leads to inaccuracies What the world needs is fully incremental histograms on that support multi-attribute queries

EN : Adv. Storage and TP Systems STHoles Bruno, Chaudhuri, and Gravano. STHoles: A Multidimensional Workload-Aware Histogram, SIGMOD Generate histograms from analyzing query results –No examination of data sets –Leverage workload information and query feedback Supports overlapped and nested buckets –Multi-resolution histogram –Buckets allocated where they are most needed, e.g. if there are no queries to a region, no statistics are kept

EN : Adv. Storage and TP Systems Feedback-Based Optimization

EN : Adv. Storage and TP Systems Visualizing Histograms

EN : Adv. Storage and TP Systems Histogram Construction Start with an empty histogram New queries punch ‘holes’ in the histogram, creating regions of refinement

EN : Adv. Storage and TP Systems Policies Identify and drill candidate holes

EN : Adv. Storage and TP Systems Policies Shrink regions to preserve rectangular spaces –Ease of description and improved accuracy

EN : Adv. Storage and TP Systems Policies Merge buckets (with similar densities) to improve histogram under a space budget

EN : Adv. Storage and TP Systems STHoles Redux Quality histograms Runtime overhead (<10%) –Dynamic construction of histograms –But, no pre-processing Preferable in several situations –Frequently updated data, needs distribution to change –Shifting workloads -- STHoles can redirect attention to new regions dynamically. (This is what’s cool.)