1 Lecture 25: Query Optimization Wednesday, November 26, 2003
2 Outline Cost-based optimization 16.5, 16.6
3 Cost-based Optimizations Main idea: apply algebraic laws, until estimated cost is minimal Practically: start from partial plans, introduce operators one by one –Will see in a few slides Problem: there are too many ways to apply the laws, hence too many (partial) plans
4 Cost-based Optimizations Approaches: Top-down: the partial plan is a top fragment of the logical plan Bottom up: the partial plan is a bottom fragment of the logical plan
5 Search Strategies Branch-and-bound: –Remember the cheapest complete plan P seen so far and its cost C –Stop generating partial plans whose cost is > C –If a cheaper complete plan is found, replace P, C Hill climbing: –Remember only the cheapest partial plan seen so far Dynamic programming: –Remember the all cheapest partial plans
6 Dynamic Programming Unit of Optimization: select-project-join Push selections down, pull projections up
7 Join Trees R1 ⋈ R2 ⋈ …. ⋈ Rn Join tree: A plan = a join tree A partial plan = a subtree of a join tree R3R1R2R4
8 Types of Join Trees Left deep: R3 R1 R5 R2 R4
9 Types of Join Trees Bushy: R3 R1 R2R4 R5
10 Types of Join Trees Right deep: R3 R1 R5 R2R4
11 Problem Given: a query R1 ⋈ R2 ⋈ … ⋈ Rn Assume we have a function cost() that gives us the cost of every join tree Find the best join tree for the query
12 Dynamic Programming Idea: for each subset of {R1, …, Rn}, compute the best plan for that subset In increasing order of set cardinality: –Step 1: for {R1}, {R2}, …, {Rn} –Step 2: for {R1,R2}, {R1,R3}, …, {Rn-1, Rn} –… –Step n: for {R1, …, Rn} It is a bottom-up strategy A subset of {R1, …, Rn} is also called a subquery
13 Dynamic Programming For each subquery Q ⊆ {R1, …, Rn} compute the following: –Size(Q) –A best plan for Q: Plan(Q) –The cost of that plan: Cost(Q)
14 Dynamic Programming Step 1: For each {Ri} do: –Size({Ri}) = B(Ri) –Plan({Ri}) = Ri –Cost({Ri}) = (cost of scanning Ri)
15 Dynamic Programming Step i: For each Q ⊆ {R1, …, Rn} of cardinality i do: –Compute Size(Q) (later…) –For every pair of subqueries Q’, Q’’ s.t. Q = Q’ Q’’ compute cost(Plan(Q’) ⋈ Plan(Q’’)) –Cost(Q) = the smallest such cost –Plan(Q) = the corresponding plan
16 Dynamic Programming Return Plan({R1, …, Rn})
17 Dynamic Programming To illustrate, we will make the following simplifications: Cost(P1 ⋈ P2) = Cost(P1) + Cost(P2) + size(intermediate result(s)) Intermediate results: –If P1 = a join, then the size of the intermediate result is size(P1), otherwise the size is 0 –Similarly for P2 Cost of a scan = 0
18 Dynamic Programming Example: Cost(R5 ⋈ R7) = 0 (no intermediate results) Cost((R2 ⋈ R1) ⋈ R7) = Cost(R2 ⋈ R1) + Cost(R7) + size(R2 ⋈ R1) = size(R2 ⋈ R1)
19 Dynamic Programming Relations: R, S, T, U Number of tuples: 2000, 5000, 3000, 1000 Size estimation: T(A ⋈ B) = 0.01*T(A)*T(B)
20 SubquerySizeCostPlan RS RT RU ST SU TU RST RSU RTU STU RSTU
21 SubquerySizeCostPlan RS100k0RS RT60k0RT RU20k0RU ST150k0ST SU50k0SU TU30k0TU RST3M60k(RT)S RSU1M20k(RU)S RTU0.6M20k(RU)T STU1.5M30k(TU)S RSTU30M60k+50k=110k(RT)(SU)
22 Dynamic Programming Summary: computes optimal plans for subqueries: –Step 1: {R1}, {R2}, …, {Rn} –Step 2: {R1, R2}, {R1, R3}, …, {Rn-1, Rn} –… –Step n: {R1, …, Rn} We used naïve size/cost estimations In practice: –more realistic size/cost estimations (next time) –heuristics for Reducing the Search Space Restrict to left linear trees Restrict to trees “without cartesian product” –need more than just one plan for each subquery: “interesting orders”
23 Reducing the Search Space Left-linear trees v.s. Bushy trees Trees without cartesian product Example: R(A,B) ⋈ S(B,C) ⋈ T(C,D) Plan: (R(A,B) ⋈ T(C,D)) ⋈ S(B,C) has a cartesian product – most query optimizers will not consider it
24 Counting the Number of Join Orders The mathematics we need to know: Given x 1, x 2,..., x n, how many permutations can we construct ? Answer: n! Example: permutations of x 1 x 2 x 3 x 4 are: x 1 x 2 x 3 x 4 x 2 x 4 x 3 x (there are 4! = 4*3*2*1 = 24 permutations)
25 Counting the Number of Join Orders The mathematics we need to know: Given the product x 0 x 1 x 2...x n, in how many ways can we place n pairs of parenthesis around them ? Answer: 1/(n+1)*C 2n n = (2n)!/((n+1)*(n!) 2 ) Example: for n=3 ((x 0 x 1 )(x 2 x 3 )) ((x 0 (x 1 x 2 ))x 3 ) (x 0 ((x 1 x 2 )x 3 )) ((x 0 x 1 )x 2 )x 3 ) (x 0 (x 1 (x 2 x 3 ))) There are 6!/(4*3!*3!) = 5 ways
26 Counting the Number of Join Orders (Exercise) R 0 (A 0,A 1 ) ⋈ R 1 (A 1,A 2 ) ⋈... ⋈ R n (A n,A n+1 ) The number of left linear join trees is: The number of left linear join trees without cartesian products is: The number of bushy join trees is: The number of bushy join trees without cartesian product is:
27 Number of Subplans Inspected by Dynamic Programming R 0 (A 0,A 1 ) ⋈ R 1 (A 1,A 2 ) ⋈... ⋈ R n (A n,A n+1 ) The number of left linear subplans inspected is: The number of left linear subplans without cartesian products inspected is: The number of bushy join subplans inspected is: The number of bushy join subplans without cartesian product:
28 Happy Thanksgivings !