1 Lecture 25: Query Optimization Wednesday, November 26, 2003.

Slides:



Advertisements
Similar presentations
Copyright © 2004 Pearson Education, Inc.. Chapter 15 Algorithms for Query Processing and Optimization.
Advertisements

Databases and Information Systems 1 Prof. Dr. Stefan Böttcher Fakultät EIM, Institut für Informatik Universität Paderborn WS 2009 / 2010 Contents: selectivity.
Query Optimization May 31st, Today A few last transformations Size estimation Join ordering Summary of optimization.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Query Optimization. Query Optimization Process (simplified a bit) Parse the SQL query into a logical tree: –identify distinct blocks (corresponding to.
Lecture 14: Query Optimization. This Lecture Query rewriting Cost estimation –We have learned how atomic operations are implemented and their cost –We’ll.
Cost-Based Transformations. Why estimate costs? Well, sometimes we don’t need cost estimations to decide applying some heuristic transformation. –E.g.
Cost-Based Plan Selection Chapter 16 Section 5 ID: 213.
Cs44321 CS4432: Database Systems II Query Optimizer – Cost Based Optimization.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #9.
1 Distributed Databases CS347 Lecture 14 May 30, 2001.
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
Cost-Based Transformations. Why estimate costs? Sometimes we don’t need cost estimations to decide applying some heuristic transformation. –E.g. Pushing.
CMSC724: Database Management Systems Instructor: Amol Deshpande
Database System Concepts 5 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 14: Query Optimization.
Cost-Based Plan Selection Choosing an Order for Joins Chapter 16.5 and16.6 by:- Vikas Vittal Rao ID: 124/227 Chiu Luk ID: 210.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
T HE Q UERY C OMPILER Prepared by : Ankit Patel (226)
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
CS411 Database Systems Kazuhiro Minami 12: Query Optimization.
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
Query Optimization Arash Izadpanah. Introduction: What is Query Optimization? Query optimization is the process of selecting the most efficient query-evaluation.
CPS216: Advanced Database Systems Notes 08:Query Optimization (Plan Space, Query Rewrites) Shivnath Babu.
Lecture 4 - Query Optimization Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Query Optimization March 6 th, Query Optimization Process (simplified a bit) Parse the SQL query into a logical tree: –identify distinct blocks.
Query Optimization March 10 th, Very Big Picture A query execution plan is a program. There are many of them. The optimizer is trying to chose a.
CPS216: Data-Intensive Computing Systems Introduction to Query Processing Shivnath Babu.
1 Lecture 25 Friday, November 30, Outline Query execution –Two pass algorithms based on indexes (6.7) Query optimization –From SQL to logical.
CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.
CS411 Database Systems Kazuhiro Minami 12: Query Optimization.
CS4432: Database Systems II Query Processing- Part 2.
CS 440 Database Management Systems Query Optimization 1.
1 Choosing an Order for Joins. 2 What is the best way to join n relations? SELECT … FROM A, B, C, D WHERE A.x = B.y AND C.z = D.z Hash-Join Sort-JoinIndex-Join.
1 Chapter 4 Generating Permutations and Combinations.
Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.
Query Optimization Problem Pick the best plan from the space of physical plans.
CSE 544: Lecture 14 Wednesday, 5/15/2002 Optimization, Size Estimation.
Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Query Optimization Spring 2016.
Query Processing and Query Optimization Database System Implementation CSE 507 Slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts.
CHAPTER 19 Query Optimization. CHAPTER 19 Query Optimization.
Chapter 14: Query Optimization
CS 440 Database Management Systems
Query Optimization Heuristic Optimization
Database System Implementation CSE 507
Seung-won Hwang, Kevin Chen-Chuan Chang
Lecture 26: Query Optimizations and Cost Estimation
Prepared by : Ankit Patel (226)
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Lecture 24: Query Execution and Optimization
Introduction to Query Optimization
Introduction to Database Systems CSE 444 Lecture 22: Query Optimization November 26-30, 2007.
Lecture 26: Query Optimization
Lecture 21: ML Optimizers
Query Optimization and Perspectives
Lecture 27: Optimizations
Lecture 24: Query Execution
CPSC-608 Database Systems
Monday, 5/13/2002 Hash table indexes, query optimization
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
CSE 544: Optimizations Wednesday, 5/10/2006.
Lecture 26 Monday, December 3, 2001.
Relational Query Optimization
Lecture 26: Wednesday, December 4, 2002.
Relational Query Optimization
Lecture 27 Wednesday, December 5, 2001.
Lecture 29: Final Review Wednesday, December 11, 2002.
Lecture 24: Wednesday, November 27, 2002.
Presentation transcript:

1 Lecture 25: Query Optimization Wednesday, November 26, 2003

2 Outline Cost-based optimization 16.5, 16.6

3 Cost-based Optimizations Main idea: apply algebraic laws, until estimated cost is minimal Practically: start from partial plans, introduce operators one by one –Will see in a few slides Problem: there are too many ways to apply the laws, hence too many (partial) plans

4 Cost-based Optimizations Approaches: Top-down: the partial plan is a top fragment of the logical plan Bottom up: the partial plan is a bottom fragment of the logical plan

5 Search Strategies Branch-and-bound: –Remember the cheapest complete plan P seen so far and its cost C –Stop generating partial plans whose cost is > C –If a cheaper complete plan is found, replace P, C Hill climbing: –Remember only the cheapest partial plan seen so far Dynamic programming: –Remember the all cheapest partial plans

6 Dynamic Programming Unit of Optimization: select-project-join Push selections down, pull projections up

7 Join Trees R1 ⋈ R2 ⋈ …. ⋈ Rn Join tree: A plan = a join tree A partial plan = a subtree of a join tree R3R1R2R4

8 Types of Join Trees Left deep: R3 R1 R5 R2 R4

9 Types of Join Trees Bushy: R3 R1 R2R4 R5

10 Types of Join Trees Right deep: R3 R1 R5 R2R4

11 Problem Given: a query R1 ⋈ R2 ⋈ … ⋈ Rn Assume we have a function cost() that gives us the cost of every join tree Find the best join tree for the query

12 Dynamic Programming Idea: for each subset of {R1, …, Rn}, compute the best plan for that subset In increasing order of set cardinality: –Step 1: for {R1}, {R2}, …, {Rn} –Step 2: for {R1,R2}, {R1,R3}, …, {Rn-1, Rn} –… –Step n: for {R1, …, Rn} It is a bottom-up strategy A subset of {R1, …, Rn} is also called a subquery

13 Dynamic Programming For each subquery Q ⊆ {R1, …, Rn} compute the following: –Size(Q) –A best plan for Q: Plan(Q) –The cost of that plan: Cost(Q)

14 Dynamic Programming Step 1: For each {Ri} do: –Size({Ri}) = B(Ri) –Plan({Ri}) = Ri –Cost({Ri}) = (cost of scanning Ri)

15 Dynamic Programming Step i: For each Q ⊆ {R1, …, Rn} of cardinality i do: –Compute Size(Q) (later…) –For every pair of subqueries Q’, Q’’ s.t. Q = Q’  Q’’ compute cost(Plan(Q’) ⋈ Plan(Q’’)) –Cost(Q) = the smallest such cost –Plan(Q) = the corresponding plan

16 Dynamic Programming Return Plan({R1, …, Rn})

17 Dynamic Programming To illustrate, we will make the following simplifications: Cost(P1 ⋈ P2) = Cost(P1) + Cost(P2) + size(intermediate result(s)) Intermediate results: –If P1 = a join, then the size of the intermediate result is size(P1), otherwise the size is 0 –Similarly for P2 Cost of a scan = 0

18 Dynamic Programming Example: Cost(R5 ⋈ R7) = 0 (no intermediate results) Cost((R2 ⋈ R1) ⋈ R7) = Cost(R2 ⋈ R1) + Cost(R7) + size(R2 ⋈ R1) = size(R2 ⋈ R1)

19 Dynamic Programming Relations: R, S, T, U Number of tuples: 2000, 5000, 3000, 1000 Size estimation: T(A ⋈ B) = 0.01*T(A)*T(B)

20 SubquerySizeCostPlan RS RT RU ST SU TU RST RSU RTU STU RSTU

21 SubquerySizeCostPlan RS100k0RS RT60k0RT RU20k0RU ST150k0ST SU50k0SU TU30k0TU RST3M60k(RT)S RSU1M20k(RU)S RTU0.6M20k(RU)T STU1.5M30k(TU)S RSTU30M60k+50k=110k(RT)(SU)

22 Dynamic Programming Summary: computes optimal plans for subqueries: –Step 1: {R1}, {R2}, …, {Rn} –Step 2: {R1, R2}, {R1, R3}, …, {Rn-1, Rn} –… –Step n: {R1, …, Rn} We used naïve size/cost estimations In practice: –more realistic size/cost estimations (next time) –heuristics for Reducing the Search Space Restrict to left linear trees Restrict to trees “without cartesian product” –need more than just one plan for each subquery: “interesting orders”

23 Reducing the Search Space Left-linear trees v.s. Bushy trees Trees without cartesian product Example: R(A,B) ⋈ S(B,C) ⋈ T(C,D) Plan: (R(A,B) ⋈ T(C,D)) ⋈ S(B,C) has a cartesian product – most query optimizers will not consider it

24 Counting the Number of Join Orders The mathematics we need to know: Given x 1, x 2,..., x n, how many permutations can we construct ? Answer: n! Example: permutations of x 1 x 2 x 3 x 4 are: x 1 x 2 x 3 x 4 x 2 x 4 x 3 x (there are 4! = 4*3*2*1 = 24 permutations)

25 Counting the Number of Join Orders The mathematics we need to know: Given the product x 0 x 1 x 2...x n, in how many ways can we place n pairs of parenthesis around them ? Answer: 1/(n+1)*C 2n n = (2n)!/((n+1)*(n!) 2 ) Example: for n=3 ((x 0 x 1 )(x 2 x 3 )) ((x 0 (x 1 x 2 ))x 3 ) (x 0 ((x 1 x 2 )x 3 )) ((x 0 x 1 )x 2 )x 3 ) (x 0 (x 1 (x 2 x 3 ))) There are 6!/(4*3!*3!) = 5 ways

26 Counting the Number of Join Orders (Exercise) R 0 (A 0,A 1 ) ⋈ R 1 (A 1,A 2 ) ⋈... ⋈ R n (A n,A n+1 ) The number of left linear join trees is: The number of left linear join trees without cartesian products is: The number of bushy join trees is: The number of bushy join trees without cartesian product is:

27 Number of Subplans Inspected by Dynamic Programming R 0 (A 0,A 1 ) ⋈ R 1 (A 1,A 2 ) ⋈... ⋈ R n (A n,A n+1 ) The number of left linear subplans inspected is: The number of left linear subplans without cartesian products inspected is: The number of bushy join subplans inspected is: The number of bushy join subplans without cartesian product:

28 Happy Thanksgivings !