CPSC-608 Database Systems Fall 2018 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 Email: chen@cse.tamu.edu Notes #20
parse tree-lqp convertor Query Optimization An input database program P Prepare a collection C of efficient algorithms for operations in relational algebra; parser View processing, Semantic checking parse tree preprocessing parse tree parse tree-lqp convertor logic query plan push selections, group joins apply logic laws logic query plan reduce the size of intermediate results Optimization via logic and size logic query plan Lqp-pqp convertor take care of issues in optimization and security. physical query plan choices of algorithms, data structures, and computational modes Optimization via algorithms and cost Machine executable code
Improving logic plan via relation size Major Steps: Collect size parameters for stored relations: T(R), B(R), V(R,A) (the # of different values on attribute A) Set up estimation rules for size parameters on relational algebraic operators; Using logic laws to convert a logic query into the one that minimizes the (estimated) sizes of intermediate relations.
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ΠAV(R,A) } γ : T(γ(R)) = min{ T(R)/2, Πgrouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} C : T(R CS) = T(σC(R×S))
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) ⟹ B(δ(R)) γ : T(γ(R)) ⟹ B(γ(R)) σA=c: T(σA=c(R)) ⟹ B(σA=c(R)) σA<c: T(σA<c(R)) ⟹ B(σA<c(R)) ∩ : T(R ∩ S) ⟹ B(R ∩ S) U : T(R U S) ⟹ B(R U S) ‒ : T(R ‒ S) ⟹ B(R ‒ S) : T(R S) ⟹ B(R S) C : T(R CS) ⟹ B(R CS) B(W) = T(W)/#tuples-per-block
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : V(δ(R),A) = V(R,A) γ : V(γ(R),A) = V((R,A) (A is a grouping attribute) σA=c: V(σA=c(R),B) = V(R,B), V(σA=c(R),A) = 1 σA<c: V(σA<c(R),B) = V(R,B), V(σA<c(R),A) = V(R,A)/3 ∩ : V(R∩S,A) = V(S,A)/2 (assume V(R,A) ≥ V(S,A)) U : V(RUS,A) = V(R,A)+V(S,A)/2 (assume V(R,A) ≥ V(S,A)) ‒ : V(R‒S,A) = V(R,A) ‒ max{R(R,A)/2,T(S,A)/2} : V(R S,A) = min{V(R,A),V(S,A)} (A is a shared attribute) V(R S,A) = max{V(R,A),V(S,A)} (A is non-shared) C: V(R CS,A) = V(σC(R×S),A)
Logic Plan Improvement for Join via Size Two techniques: Estimating sizes of immediate relations For natural join: T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} also note: V(R(X, y) S(y, Z), y) = min{V(R, y), V(S, y)} V(R(X, y) S(y, Z), z) = V(R, z) or V(S, z) for z ≠ y Consider different order of an operation (((R S) T) U) = (R U) (S T)
Consider: A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 We want to have a good LQP for A B C D
Left-deep join tree
Left-deep join tree ? ? ? ?
Left-deep join tree (all 4! = 24 permutations) B C D A B D C A C B D A C D B A D B C A D C B B A C D B A D C B C A D B C D A B D A C B D C A C A B D C A D B C B A D C B D A C D A B C D B A D A B C D A C B D B A C D B C A D C A B D C B A
Left-deep join tree (all 4! = 24 permutations) B C D A B D C A C B D A C D B A D B C A D C B B A C D B A D C B C A D B C D A B D A C B D C A C A B D C A D B C B A D C B D A C D A B C D B A D A B C D A C B D B A C D B C A D C A B D C B A
Left-deep join tree (all 4! = 24 permutations) B C D A B D C A C B D A C D B A D B C A D C B B A C D B A D C B C A D B C D A B D A C B D C A C A B D C A D B C B A D C B D A C D A B C D B A D A B C D A C B D B A C D B C A D C A B D C B A
Left-deep join tree (all 4! = 24 permutations) B C D A B D C A C B D A C D B A D B C A D C B B A C D B A D C B C A D B C D A B D A C B D C A C A B D C A D B C B A D C B D A C D A B C D B A D A B C D A C B D B A C D B C A D C A B D C B A
Left-deep join tree (all 4! = 24 permutations) B C D A B D C A C B D A C D B A D B C A D C B B A C D B A D C B C A D B C D A B D A C B D C A C A B D C A D B C B A D C B D A C D A B C D B A D A B C D A C B D B A C D B C A D C A B D C B A
Left-deep join tree D C C A A B B D A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y)⋈S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} T(R(X, y1,y2)⋈S(y1, y2, Z)) = T(R)•T(S)/max{V(R, y1), V(S, y1)}max{V(R, y2), V(S, y2)}
Left-deep join tree D C C A A B B D 5000 V(*, c) = 500 A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y)⋈S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} T(R(X, y1,y2)⋈S(y1, y2, Z)) = T(R)•T(S)/max{V(R, y1), V(S, y1)}max{V(R, y2), V(S, y2)}
Left-deep join tree D C C A A B B D 10000 5000 V(*, c) = 500 A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y)⋈S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} T(R(X, y1,y2)⋈S(y1, y2, Z)) = T(R)•T(S)/max{V(R, y1), V(S, y1)}max{V(R, y2), V(S, y2)}
Left-deep join tree D C C A A B B D 10000 5000 V(*, c) = 500 cost = 15000 A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y)⋈S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} T(R(X, y1,y2)⋈S(y1, y2, Z)) = T(R)•T(S)/max{V(R, y1), V(S, y1)}max{V(R, y2), V(S, y2)}
Left-deep join tree D C C A A B B D 10000 50000 5000 V(*, c) = 500 1000000 V(*, a) = 50 V(*,b) = 100 cost = 15000 cost = 1050000 A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y)⋈S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} T(R(X, y1,y2)⋈S(y1, y2, Z)) = T(R)•T(S)/max{V(R, y1), V(S, y1)}max{V(R, y2), V(S, y2)}
Left-deep join tree D C C A A B B D 10000 50000 5000 V(*, c) = 500 1000000 V(*, a) = 50 V(*,b) = 100 cost = 15000 cost = 1050000 A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y)⋈S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} T(R(X, y1,y2)⋈S(y1, y2, Z)) = T(R)•T(S)/max{V(R, y1), V(S, y1)}max{V(R, y2), V(S, y2)}
Left-deep join tree (all 4! = 24 permutations) B C D A B D C A C B D A C D B A D B C A D C B 15000 55000 1010000 1010000 60000 20000 B A C D B A D C B C A D B C D A B D A C B D C A 12000 4000 1050000 1002000 C A B D C A D B C B A D C B D A C D A B C D B A 11000 3000 D A B C D A C B D B A C D B C A D C A B D C B A
Left-deep join tree (all 4! = 24 permutations) B C D A B D C A C B D A C D B A D B C A D C B 15000 55000 1010000 1010000 60000 20000 B A C D B A D C B C A D B C D A B D A C B D C A 12000 4000 1050000 1002000 C A B D C A D B C B A D C B D A C D A B C D B A 11000 3000 D A B C D A C B D B A C D B C A D C A B D C B A
Left-deep tree: general algorithm Input: n relations R1, R2, …, Rn Output: the best left-deep join of R1, R2, …, Rn Construct a left-deep tree T of n leaves; For each P of the permutations of the n relations R1, R2, …, Rn Do assign the n relations to the leaves of T in order of P; evaluate the cost of the plan; 3. Pick the plan with the permutation that gives the minimum cost.
Dynamic Programming Consider all tree structures.
Dynamic Programming Consider all tree structures. Again consider A B C D Five tree structures: Each of (a)-(d) has 12 different assignments, and (e) has 3 different assignments. So totally there are 51 different ways to join the 4 relations. Too many when the number of relations is relatively large. (a) (b) (c) (d) (e)
Dynamic Programming Consider D D D D C B A A A B A C C B B C
Dynamic Programming Consider D D D D C B A A A B A C C B B C
Dynamic Programming Consider We really only need to find the best way to join A B C , then join D with this best join. D D D D C B A A A B A C C B B C
Dynamic Programming Consider We really only need to find the best way to join A B C , then join D with this best join. How do we find the best join of A B C? D D D D C B A A A B A C C B B C
Dynamic Programming Consider We really only need to find the best way to join A B C , then join D with this best join. How do we find the best join of A B C? We consider all possible ways: (A B) C, (A C) B, (B C) A. D D D D C B A A A B A C C B B C
Dynamic programming: general algorithm Input: n relations R1, R2, …, Rn Output: the best join of R1, R2, …, Rn FOR each Ri DO {cost(Ri) = 0; size(Ri) = 0}; FOR each pair of Ri and Rj DO {cost(Ri, Rj) = 0; compute size(Ri Rj)}; FOR k = 3 TO n DO FOR any k relations S1, S2, …, Sk of R1, R2, …, Rn DO FOR each partition P = {(Si1, …, Sij ), (Sij+1,…, Sik )} of S1, S2, …, Sk DO cost(P) = cost(Si1, …, Sij) + size(Si1 … Sij) + cost(Sij+1, …, Sik) + size(Sij+1 … Sik ); let cost(S1, S2, …, Sk) be the smallest cost(P) among the above partitions; compute size(S1 S2 … Sk) (and remember this partition P); 4. Return cost(R1, R2, …, Rn).
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C A, B, D A, C, D B, C, D
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C A, B, D A, C, D B, C, D
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C A, B, D A, C, D B, C, D D C B B C B D C D
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C A, B, D A, C, D B, C, D D C B B C B D C D
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C A, B, D A, C, D B, C, D D C B B C B D C D
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C A, B, D A, C, D B, C, D 2000 D C B B C B D C D
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C A, B, D A, C, D B, C, D 2000 D C B B C B D C D 2000
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C A, B, D A, C, D B, C, D 2000 1000000 1000 D C B B C B D C D 2000 1000000 1000
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C A, B, D A, C, D B, C, D 2000 1000000 1000 D C B B C B D C D 2000 1000000 1000
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C A, B, D A, C, D B, C, D cost = 1000 size = 2000 B C D
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C cost = 2000 size = 10000 A, B, D cost = 5000 size = 50000 A, C, D cost = 1000 size = 10000 B, C, D cost = 1000 size = 2000 B C D
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C cost = 2000 size = 10000 A, B, D cost = 5000 size = 50000 A, C, D cost = 1000 size = 10000 B, C, D cost = 1000 size = 2000 B C D A, B, C, D
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C cost = 2000 size = 10000 A, B, D cost = 5000 size = 50000 A, C, D cost = 1000 size = 10000 B, C, D cost = 1000 size = 2000 B C D A, B, C, D A {B,C,D} B {A,C,D} C {A,B,D} D {A,B,C} {A,B} {C,D} {A,C} {B,D} {A,D} {B,C}
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C cost = 2000 size = 10000 A, B, D cost = 5000 size = 50000 A, C, D cost = 1000 size = 10000 B, C, D cost = 1000 size = 2000 B C D A, B, C, D A {B,C,D} B {A,C,D} C {A,B,D} D {A,B,C} {A,B} {C,D} {A,C} {B,D} {A,D} {B,C} 3000
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C cost = 2000 size = 10000 A, B, D cost = 5000 size = 50000 A, C, D cost = 1000 size = 10000 B, C, D cost = 1000 size = 2000 B C D A, B, C, D A {B,C,D} B {A,C,D} C {A,B,D} D {A,B,C} {A,B} {C,D} {A,C} {B,D} {A,D} {B,C} 3000 6000
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C cost = 2000 size = 10000 A, B, D cost = 5000 size = 50000 A, C, D cost = 1000 size = 10000 B, C, D cost = 1000 size = 2000 B C D A, B, C, D A {B,C,D} B {A,C,D} C {A,B,D} D {A,B,C} {A,B} {C,D} {A,C} {B,D} {A,D} {B,C} 3000 11000 55000 12000 6000 2000000 12000
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C cost = 2000 size = 10000 A, B, D cost = 5000 size = 50000 A, C, D cost = 1000 size = 10000 B, C, D cost = 1000 size = 2000 B C D A, B, C, D A {B,C,D} B {A,C,D} C {A,B,D} D {A,B,C} {A,B} {C,D} {A,C} {B,D} {A,D} {B,C} 3000 11000 55000 12000 6000 2000000 12000
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C cost = 2000 size = 10000 A, B, D cost = 5000 size = 50000 A, C, D cost = 1000 size = 10000 B, C, D cost = 1000 size = 2000 B C D A, B, C, D A A {B,C,D} B {A,C,D} C {A,B,D} D {A,B,C} {A,B} {C,D} {A,C} {B,D} {A,D} {B,C} 3000 11000 55000 12000 6000 2000000 12000
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C cost = 2000 size = 10000 A, B, D cost = 5000 size = 50000 A, C, D cost = 1000 size = 10000 B, C, D cost = 1000 size = 2000 B C D A, B, C, D A A {B,C,D} B {A,C,D} C {A,B,D} D {A,B,C} {A,B} {C,D} {A,C} {B,D} {A,D} {B,C} 3000 11000 55000 12000 6000 2000000 12000
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C cost = 2000 size = 10000 A, B, D cost = 5000 size = 50000 A, C, D cost = 1000 size = 10000 B, C, D cost = 1000 size = 2000 B C D A, B, C, D A B C D A {B,C,D} B {A,C,D} C {A,B,D} D {A,B,C} {A,B} {C,D} {A,C} {B,D} {A,D} {B,C} 3000 11000 55000 12000 6000 2000000 12000
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C cost = 2000 size = 10000 A, B, D cost = 5000 size = 50000 A, C, D cost = 1000 size = 10000 B, C, D cost = 1000 size = 2000 A, B, C, D cost = 3000 A B C D A {B,C,D} B {A,C,D} C {A,B,D} D {A,B,C} {A,B} {C,D} {A,C} {B,D} {A,D} {B,C} 3000 11000 55000 12000 6000 2000000 12000
Summary: Logic Plan Improvement for Join via Size: Estimating sizes of immediate relations Consider different order of an operation left-deep tree dynamic programming