Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.

Slides:



Advertisements
Similar presentations
Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
Advertisements

CS4432: Database Systems II
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Query processing and optimization. Advanced DatabasesQuery processing and optimization2 Definitions Query processing –translation of query into low-level.
Database System Concepts 5 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 14: Query Optimization.
Query Rewrite: Predicate Pushdown (through grouping) Select bid, Max(age) From Reserves R, Sailors S Where R.sid=S.sid GroupBy bid Having Max(age) > 40.
Ch.14: Query Optimization  Introduction  Catalog Information for Cost Estimation  Estimation of Statistics  Transformation of Relational Expressions.
Quick Review of Apr 17 material Multiple-Key Access –There are good and bad ways to run queries on multiple single keys Indices on Multiple Attributes.
Query Processing (overview)
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Database System Concepts 5 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 14: Query Optimization.
Ch.14: Query Optimization  Introduction  Catalog Information for Cost Estimation  Estimation of Statistics  Transformation of Relational Expressions.
Cost-Based Plan Selection Choosing an Order for Joins Chapter 16.5 and16.6 by:- Vikas Vittal Rao ID: 124/227 Chiu Luk ID: 210.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
Query Processing Overview Catalog Information for Cost Estimation
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
1 Query Optimization Vishy Poosala Bell Labs. 2 Outline Introduction Necessary Details –Cost Estimation –Result Size Estimation Standard approach for.
©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.
1 Query Processing Query Processing Overview Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions.
José Alferes Versão modificada de Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan Chapter 14: Query Optimization.
Chapter 13: Query Optimization
Database System Concepts 5 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Dr. Alexandra I. Cristea.
©Silberschatz, Korth and Sudarshan1.1Database System Concepts - 6 th Edition Chapter 13: Query Optimization.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Optimization.
COMP 5138 Relational Database Management Systems Semester 2, 2007 Lecture 12 Query Processing and Optimization.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts 3 rd Edition Chapter 12: Query Processing  Overview  Catalog Information for Cost Estimation.
Chapter 14 Query Optimization. Chapter 14: Query Optimization Introduction Catalog Information for Cost Estimation Estimation of Statistics Transformation.
Database Management 9. course. Execution of queries.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chap 14 Query Optimization.
©Silberschatz, Korth and Sudarshan1 Query Optimization Introduction Statistical (Catalog) Information for Cost Estimation Estimation of Statistics Cost-based.
Database System Concepts 5 th Ed. ©Silberschatz, Korth and Sudarshan Chapter 14: Query Optimization.
Database System Concepts 5 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 14: Query Optimization.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
CMSC424: Database Design Instructor: Amol Deshpande
Query Optimization Arash Izadpanah. Introduction: What is Query Optimization? Query optimization is the process of selecting the most efficient query-evaluation.
+ Under the hood: Query Optimization, Query Execution plans.
Lecture 4 - Query Optimization Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Status “Lifetime of a Query” –Query Rewrite –Query Optimization –Query Execution Optimization –Use cost-estimation to iterate over all possible plans,
©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan Chapter 13: Query Processing.
Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Chapter 12 Query Processing. Query Processing n Selection Operation n Sorting n Join Operation n Other Operations n Evaluation of Expressions 2.
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Chapter 14: Query Optimization Chapter 14: Query Optimization Introduction Transformation of Relational Expressions Catalog Information for Cost.
Chapter 14 Query Optimization. ©Silberschatz, Korth and Sudarshan14.2Database System Concepts 3 rd Edition Chapter 14: Query Optimization Introduction.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Optimization.
Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Cost Estimation For each plan considered, must estimate cost: –Must estimate cost of each operation in plan tree. Depends on input cardinalities. –Must.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 13: Query Processing
Query Processing  Basic Steps in Query Processing – an overview  Measures of Query Cost  Query Processing- Several algorithms  Selection Operation.
Query Processing and Query Optimization Database System Implementation CSE 507 Slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts.
J. GamperDMS 2006/07 1 Introduction Statistical information for cost estimation Transformation of relational expressions (equivalence rules) Rule-based.
Chapter 14: Query Optimization
Database System Implementation CSE 507
Database Management System
Chapter 13: Query Optimization
Chapter 13: Query Optimization
Chapter 12: Query Processing
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Chapter 14: Query Optimization
Chapter 12 Query Processing (1)
Lecture 5- Query Optimization (continued)
Chapter 14: Query Optimization
Chapter 14: Query Optimization
Presentation transcript:

Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park

Outline  Introduction  Catalog Information for Cost Estimation  Estimation of Statistics of Expression Results  Transformation of Relational Expressions  Choice of Evaluation Plans

Introduction (1/2)  Query optimization is the process selecting the most efficient query-evaluation plan among the many strategies possible for processing a given query  One aspect of optimization occurs at the relational-algebra level, where the system attempts to find an expression that is equivalent to the given expression, but more efficient to execute  Another aspect is selecting a detailed strategy for processing the query, such as choosing the algorithm for each operation  The difference in cost between a good strategy and a bad strategy is often substantial

Introduction (2/2)  Generation of the cheapest query evaluation plan involves several steps: 1. Generates logically equivalent expressions 2. Annotates resultant expressions to get alternative execution plans 3. Chooses the cheapest plan based on estimated cost  The overall process is called cost-based optimization

Catalog Information for Cost Estimation  n r : the number of tuples in relation r  b r :the number of blocks containing tuples of relation r  l r :the size of a tuple of relation r in bytes  f r :blocking factor of r — that is, the number of tuples of relation r that fit into one block  V(A,r): the number of distinct values that appear in r for attribute A; same as the size of  A (r)  If tuples of r are stored together physically in a file, then:

Selection Size Estimation Equality selection  A=v (r)  If we assume uniform distribution of values, the selection result can be estimated to have n r / V(A,r) tuples  It is often not realistic to assume that each value appears with equal probability; however, it is a reasonable approximation of reality in many cases  Some databases store the distribution of values for each attribute as a histogram

Selections Involving Comparisons Selections of the form  A≤v (r)  Let c denote the estimated number of tuples satisfying the condition  If min(A,r) and max(A,r) are available in catalog, if v < min(A,r),c = 0; if v ≥ max(A,r),c = n r ; otherwise,c = n r  (v − min(A,r)) / (max(A,r) − min(A,r))  In absence of statistical information, c is assumed to be n r / 2

Complex Selections  The selectivity of a condition  i is the probability that a tuple in the relation r satisfies  i. If s i is the number of satisfying tuples in r, the selectivity of  i is given by s i / n r  Conjunction:   1   2 ...   n (r) The estimated number of tuples in the result is: n r  (s 1  s 2  …  s n ) / n r n  Disjunction:   1   2 ...   n (r) The estimated number of tuples in the result is: n r  {1 − (1 − s 1 /n r )  (1 − s 2 /n r )  …  (1 − s n /n r ) }  Negation:   (r) The estimated number of tuples in the result is: n r – size(   (r))

Join Size Estimation (1/2)  Let r(R) and s(S) be relations  The Cartesian product r x s contains n r  n s tuples; each tuple occupies l r + l s bytes  If R  S = , r s is the same as r x s  If R  S is a key for R, a tuple of s will join with at most one tuple from r; therefore the number of tuples in r s is no longer greater than the number of tuples in s  If R  S is a foreign key in S referencing R, the number of tuples in r s is exactly the same as the number of tuples in s

Join Size Estimation (2/2) If R  S = {A} is not a key for R or S,  We estimate that every tuple in r produces n s / V(A,s) tuples in r s  Considering all tuples in r, we estimate that there are (n r  n s ) / V(A,s) tuples in r s  If we reverse the roles of r and s in the preceding estimate, we obtain the estimate of (n r  n s ) / V(A,r)  The lower of these two estimates is probably more accurate

Size Estimation for Other Operations  Projection: estimated size of  A (r) = V(A,r)  Aggregation : estimated size of A g F (r) = V(A,r)  For unions/intersections of selections on the same relation: rewrite and use size estimate for selections  E.g.   1 (r)    2 (r) can be rewritten as   1  2 (r)  For operations on different relations:  Estimated size of r  s = size of r + size of s  Estimated size of r  s = minimum of size of r and size of s  Estimated size of r – s = r  All the three estimates may be quite inaccurate, but provide upper bounds on the sizes

Transformation of Relational Expressions  Two relational algebra expressions are said to be equivalent if, on every legal database instance, the two expressions generate the same set of tuples (the order of the tuples is irrelevant)  An equivalence rule says that expressions of two forms are equivalent; we can replace an expression of the first form by an expression of the second form, or vice versa  The optimizer uses equivalence rules to transform expressions into other logically equivalent expressions

Some Equivalence Rules Rule 5 Rule 6a Rule 7a

Transformation Example  Performing the selection as early as possible reduces the size of the relation to be joined

Enumeration of Equivalent Expressions  Query optimizers use equivalence rules to systematically generate expressions equivalent to the given expression  Conceptually, generate all equivalent expressions by repeatedly applying equivalence rules until no more expressions can be found  The above approach is very expensive in space and time  Space requirements are reduced by sharing common subexpressions  Time requirements are reduced by not generating all expressions

Evaluation Plan  An evaluation plan defines exactly what algorithm is used for each operation, and how the execution of the operations is coordinated

Choice of Evaluation Plans (1/2)  One way to choose an evaluation plan for a query expression is simply to choose for each operation the cheapest algorithm for evaluating it  However, choosing the cheapest algorithm for each operation independently is not necessarily a good idea:  Merge-join may be costlier than hash-join, but may provide a sorted output which reduces the cost for an outer level aggregation  Therefore, to choose the best overall algorithm, we must consider even nonoptimal algorithms for individual operations  Thus, in addition to considering alternative expressions for a query, we must also consider alternative algorithms for each operation in an expression

Choice of Evaluation Plans (2/2)  There are two broad approaches to choose the best evaluation plan  The first searches all the plans, and chooses the best plan in a cost-based fashion  The second uses heuristics to choose a plan  Practical query optimizers incorporate elements of both approaches

Cost-Based Optimization  A cost-based optimizer generates a range of query-evaluation plans from the given query, and chooses the one with the least cost  For a complex query, the number of different query plans that are equivalent to a given plan can be large  As an illustration, consider finding the best join-order for r 1 r 2... r n  There are (2(n – 1))!/(n – 1)! different join orders for the above; with n = 7, the number is , with n = 10, the number is greater than 17.6 billion  Luckily, it is not necessary to generate all the join orders; using dynamic programming, the least-cost join order for any subset of {r 1, r 2,... r n } is computed only once and stored for future use

Join Order Optimization Algorithm procedure findbestplan(S) { if (bestplan[S].cost   ) return bestplan[S] // else bestplan[S] has not been computed earlier, compute it now for each non-empty subset S1 of S such that S1  S { P1= findbestplan(S1) P2= findbestplan(S − S1) A = best algorithm for joining results of P1 and P2 cost = P1.cost + P2.cost + cost of A if cost < bestplan[S].cost bestplan[S].cost = cost bestplan[S].plan = “execute P1.plan; execute P2.plan; join results of P1 and P2 using A” } return bestplan[S] }

Heuristic Optimization  Cost-based optimization is expensive, even with dynamic programming  Systems may use heuristics to reduce the number of choices that must be made in a cost-based fashion  Heuristic optimization transforms the query-tree by using a set of rules that typically (but not in all cases) improve execution performance:  Perform selection early (reduces the number of tuples)  Perform projection early (reduces the number of attributes)  Perform most restrictive selection and join operations before other similar operations  Some systems use only heuristics, others combine heuristics with partial cost-based optimization