CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #9
LQP Optimization with Size 2
Two techniques: 3
LQP Optimization with Size Two techniques: Estimating sizes of immediate relations For natural join: T(R(X, y) S(y, Z)) = T(R)T(S)/max{V(R, y), V(S, y)} 4
LQP Optimization with Size Two techniques: Estimating sizes of immediate relations For natural join: T(R(X, y) S(y, Z)) = T(R)T(S)/max{V(R, y), V(S, y)} Consider different order of an operation (((R S) T) U) = (R U) (S T) 5
Consider: A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 We want to have a good LQP for A B C D 6
Left-deep join tree 7
8 ? ? ? ?
Left-deep join tree (all 4! = 24 permutations) 9 AB C D BA C D AB D C BA D C CA B D CA D B BC A D CB A D DA B C DA C B DB A C DB C A DC A B DC B A CB D A CD A B CD B A BC D A BD A C BD C A AC B D AC D B AD B C AD C B
10 AB C D BA C D AB D C BA D C CA B D CA D B BC A D CB A D DA B C DA C B DB A C DB C A DC A B DC B A CB D A CD A B CD B A BC D A BD A C BD C A AC B D AC D B AD B C AD C B
Left-deep join tree (all 4! = 24 permutations) 11 AB C D BA C D AB D C BA D C CA B D CA D B BC A D CB A D DA B C DA C B DB A C DB C A DC A B DC B A CB D A CD A B CD B A BC D A BD A C BD C A AC B D AC D B AD B C AD C B
Left-deep join tree (all 4!/2 = 12 permutations) 12 AB C D BA C D AB D C BA D C CA B D CA D B BC A D CB A D DA B C DA C B DB A C DB C A DC A B DC B A CB D A CD A B CD B A BC D A BD A C BD C A AC B D AC D B AD B C AD C B
Left-deep join tree (all 4!/2 = 12 permutations) 13 AB C D BA C D AB D C BA D C CA B D CA D B BC A D CB A D DA B C DA C B DB A C DB C A DC A B DC B A CB D A CD A B CD B A BC D A BD A C BD C A AC B D AC D B AD B C AD C B
Left-deep join tree 14 A B C D B D A C A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)T(S)/max{V(R, y), V(S, y)}
Left-deep join tree 15 A B C D B D A C A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)T(S)/max{V(R, y), V(S, y)} 5000 V(*, c) = 500
Left-deep join tree 16 A B C D B D A C A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)T(S)/max{V(R, y), V(S, y)} 5000 V(*, c) =
Left-deep join tree 17 A B C D B D A C A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)T(S)/max{V(R, y), V(S, y)} 5000 V(*, c) = cost = 15000
Left-deep join tree 18 A B C D B D A C A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)T(S)/max{V(R, y), V(S, y)} 5000 V(*, c) = V(*, a) = 50 V(*,b) = cost = cost =
Left-deep join tree (all 4!/2 = 12 permutations) 19 AB C D BA C D AB D C BA D C CA B D CA D B BC A D CB A D DA B C DA C B DB A C DB C A DC A B DC B A CB D A CD A B CD B A BC D A BD A C BD C A AC B D AC D B AD B C AD C B
Left-deep join tree (all 4!/2 = 12 permutations) 20 AB C D BA C D AB D C BA D C CA B D CA D B BC A D CB A D DA B C DA C B DB A C DB C A DC A B DC B A CB D A CD A B CD B A BC D A BD A C BD C A AC B D AC D B AD B C AD C B
Left-deep tree: general algorithm Input: n relations R 1, R 2, …, R n Output: the best left-deep join of R 1, R 2, …, R n 1.Construct a left-deep tree T of n leaves; 2.For each P of the permutations of the n relations R 1, R 2, …, R n Do assign the n relations to the leaves of T in order of P; evaluate the cost of the plan; 3. Pick the plan with the permutation that gives the minimum cost. 21
Dynamic Programming Consider all tree structures. 22
Dynamic Programming Consider all tree structures. Again consider A B C D Five tree structures: Each of (a)-(d) has 12 different assignments, and (e) has 3 different assignments. So totally there are 51 different ways to join the 4 relations. Too many when the number of relations is relatively large. 23 (a) (e) (d) (c)(b)
Dynamic Programming Consider 24 D D D D A A A A B B B B C C C C
Dynamic Programming Consider 25 D D D D A A A A B B B B C C C C
Dynamic Programming Consider We really only need to find the best way to join A B C, then join D with this best join. 26 D D D D A A A A B B B B C C C C
Dynamic Programming Consider We really only need to find the best way to join A B C, then join D with this best join. How do we find the best join of A B C? 27 D D D D A A A A B B B B C C C C
Dynamic Programming Consider We really only need to find the best way to join A B C, then join D with this best join. How do we find the best join of A B C? We consider all possible ways: (A B) C, (A C) B, (B C) A. 28 D D D D A A A A B B B B C C C C
Dynamic programming: general algorithm Input: n relations R 1, R 2, …, R n Output: the best join of R 1, R 2, …, R n 1.FOR each R i DO {cost(R i ) = 0; size(R i ) = 0}; 2.FOR each pair of R i and R j DO {cost(R i, R j ) = 0; compute size(R i R j )}; 3.FOR k = 3 TO n DO FOR any k relations S 1, S 2, …, S k of R 1, R 2, …, R n DO FOR each partition P = {(S i 1, …, S i j ), (S i j+1,…, S i k )} of S 1, S 2, …, S k DO cost(P) = cost(S i 1, …, S i j ) + size(S i 1 … S i j ) + cost(S i j+1, …, S i k ) + size(S i j+1 … S i k ); let cost(S 1, S 2, …, S k ) be the smallest cost(P) among the above partitions; computer size(S 1 S 2 … S k ) (and remember this partition P); 4. Return cost(R 1, R 2, …, R n ). 29
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)T(S)/max{V(R, y), V(S, y)} 30 A cost = 0 size = 0 D cost = 0 size = 0 C cost = 0 size = 0 B cost = 0 size = 0
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)T(S)/max{V(R, y), V(S, y)} 31 A cost = 0 size = 0 D cost = 0 size = 0 C cost = 0 size = 0 B cost = 0 size = 0 A, B cost = 0 size = 5000 C, D cost = 0 size = 1000 B, D cost = 0 size = B, C cost = 0 size = 2000 A, D cost = 0 size = A, C cost = 0 size =
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)T(S)/max{V(R, y), V(S, y)} 32 A cost = 0 size = 0 D cost = 0 size = 0 C cost = 0 size = 0 B cost = 0 size = 0 A, B cost = 0 size = 5000 C, D cost = 0 size = 1000 B, D cost = 0 size = B, C cost = 0 size = 2000 A, D cost = 0 size = A, C cost = 0 size = A, B, CB, C, DA, C, DA, B, D
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)T(S)/max{V(R, y), V(S, y)} 33 A cost = 0 size = 0 D cost = 0 size = 0 C cost = 0 size = 0 B cost = 0 size = 0 A, B cost = 0 size = 5000 C, D cost = 0 size = 1000 B, D cost = 0 size = B, C cost = 0 size = 2000 A, D cost = 0 size = A, C cost = 0 size = A, B, CB, C, DA, C, DA, B, D
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)T(S)/max{V(R, y), V(S, y)} 34 A cost = 0 size = 0 D cost = 0 size = 0 C cost = 0 size = 0 B cost = 0 size = 0 A, B cost = 0 size = 5000 C, D cost = 0 size = 1000 B, D cost = 0 size = B, C cost = 0 size = 2000 A, D cost = 0 size = A, C cost = 0 size = A, B, CB, C, DA, C, DA, B, D CB B D B C C D D
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)T(S)/max{V(R, y), V(S, y)} 35 A cost = 0 size = 0 D cost = 0 size = 0 C cost = 0 size = 0 B cost = 0 size = 0 A, B cost = 0 size = 5000 C, D cost = 0 size = 1000 B, D cost = 0 size = B, C cost = 0 size = 2000 A, D cost = 0 size = A, C cost = 0 size = A, B, CB, C, DA, C, DA, B, D CB B D B C C D D
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)T(S)/max{V(R, y), V(S, y)} 36 A cost = 0 size = 0 D cost = 0 size = 0 C cost = 0 size = 0 B cost = 0 size = 0 A, B cost = 0 size = 5000 C, D cost = 0 size = 1000 B, D cost = 0 size = B, C cost = 0 size = 2000 A, D cost = 0 size = A, C cost = 0 size = A, B, CB, C, DA, C, DA, B, D CB B D B C C D D
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)T(S)/max{V(R, y), V(S, y)} 37 A cost = 0 size = 0 D cost = 0 size = 0 C cost = 0 size = 0 B cost = 0 size = 0 A, B cost = 0 size = 5000 C, D cost = 0 size = 1000 B, D cost = 0 size = B, C cost = 0 size = 2000 A, D cost = 0 size = A, C cost = 0 size = A, B, CB, C, DA, C, DA, B, D CB B D B C C D D 2000
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)T(S)/max{V(R, y), V(S, y)} 38 A cost = 0 size = 0 D cost = 0 size = 0 C cost = 0 size = 0 B cost = 0 size = 0 A, B cost = 0 size = 5000 C, D cost = 0 size = 1000 B, D cost = 0 size = B, C cost = 0 size = 2000 A, D cost = 0 size = A, C cost = 0 size = A, B, CB, C, DA, C, DA, B, D CB B D B C C D D 2000
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)T(S)/max{V(R, y), V(S, y)} 39 A cost = 0 size = 0 D cost = 0 size = 0 C cost = 0 size = 0 B cost = 0 size = 0 A, B cost = 0 size = 5000 C, D cost = 0 size = 1000 B, D cost = 0 size = B, C cost = 0 size = 2000 A, D cost = 0 size = A, C cost = 0 size = A, B, CB, C, DA, C, DA, B, D CB B D B C C D D
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)T(S)/max{V(R, y), V(S, y)} 40 A cost = 0 size = 0 D cost = 0 size = 0 C cost = 0 size = 0 B cost = 0 size = 0 A, B cost = 0 size = 5000 C, D cost = 0 size = 1000 B, D cost = 0 size = B, C cost = 0 size = 2000 A, D cost = 0 size = A, C cost = 0 size = A, B, CB, C, DA, C, DA, B, D CB B D B C C D D
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)T(S)/max{V(R, y), V(S, y)} 41 A cost = 0 size = 0 D cost = 0 size = 0 C cost = 0 size = 0 B cost = 0 size = 0 A, B cost = 0 size = 5000 C, D cost = 0 size = 1000 B, D cost = 0 size = B, C cost = 0 size = 2000 A, D cost = 0 size = A, C cost = 0 size = A, B, CB, C, D cost = 1000 size = 2000 A, C, DA, B, D B C D
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 D cost = 0 size = 0 C cost = 0 size = 0 B cost = 0 size = 0 A, B cost = 0 size = 5000 C, D cost = 0 size = 1000 B, D cost = 0 size = B, C cost = 0 size = 2000 A, D cost = 0 size = A, C cost = 0 size = A, B, C cost = 2000 size = B, C, D cost = 1000 size = 2000 A, C, D cost = 1000 size = A, B, D cost = 5000 size = B C D
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)T(S)/max{V(R, y), V(S, y)} 43 A cost = 0 size = 0 D cost = 0 size = 0 C cost = 0 size = 0 B cost = 0 size = 0 A, B cost = 0 size = 5000 C, D cost = 0 size = 1000 B, D cost = 0 size = B, C cost = 0 size = 2000 A, D cost = 0 size = A, C cost = 0 size = A, B, C cost = 2000 size = B, C, D cost = 1000 size = 2000 A, C, D cost = 1000 size = A, B, D cost = 5000 size = A, B, C, D B C D
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)T(S)/max{V(R, y), V(S, y)} 44 A cost = 0 size = 0 D cost = 0 size = 0 C cost = 0 size = 0 B cost = 0 size = 0 A, B cost = 0 size = 5000 C, D cost = 0 size = 1000 B, D cost = 0 size = B, C cost = 0 size = 2000 A, D cost = 0 size = A, C cost = 0 size = A, B, C cost = 2000 size = B, C, D cost = 1000 size = 2000 A, C, D cost = 1000 size = A, B, D cost = 5000 size = A, B, C, D A {B,C,D}DC B {A,C,D} {A,B,D} {A,B,C} {A,B} {C,D} {A,C} {B,D}{A,D} {B,C} B C D
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)T(S)/max{V(R, y), V(S, y)} 45 A cost = 0 size = 0 D cost = 0 size = 0 C cost = 0 size = 0 B cost = 0 size = 0 A, B cost = 0 size = 5000 C, D cost = 0 size = 1000 B, D cost = 0 size = B, C cost = 0 size = 2000 A, D cost = 0 size = A, C cost = 0 size = A, B, C cost = 2000 size = B, C, D cost = 1000 size = 2000 A, C, D cost = 1000 size = A, B, D cost = 5000 size = A, B, C, D A {B,C,D}DC B {A,C,D} {A,B,D} {A,B,C} {A,B} {C,D} {A,C} {B,D}{A,D} {B,C} 3000 B C D
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)T(S)/max{V(R, y), V(S, y)} 46 A cost = 0 size = 0 D cost = 0 size = 0 C cost = 0 size = 0 B cost = 0 size = 0 A, B cost = 0 size = 5000 C, D cost = 0 size = 1000 B, D cost = 0 size = B, C cost = 0 size = 2000 A, D cost = 0 size = A, C cost = 0 size = A, B, C cost = 2000 size = B, C, D cost = 1000 size = 2000 A, C, D cost = 1000 size = A, B, D cost = 5000 size = A, B, C, D A {B,C,D}DC B {A,C,D} {A,B,D} {A,B,C} {A,B} {C,D} {A,C} {B,D}{A,D} {B,C} B C D
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)T(S)/max{V(R, y), V(S, y)} 47 A cost = 0 size = 0 D cost = 0 size = 0 C cost = 0 size = 0 B cost = 0 size = 0 A, B cost = 0 size = 5000 C, D cost = 0 size = 1000 B, D cost = 0 size = B, C cost = 0 size = 2000 A, D cost = 0 size = A, C cost = 0 size = A, B, C cost = 2000 size = B, C, D cost = 1000 size = 2000 A, C, D cost = 1000 size = A, B, D cost = 5000 size = A, B, C, D A {B,C,D}DC B {A,C,D} {A,B,D} {A,B,C} {A,B} {C,D} {A,C} {B,D}{A,D} {B,C} B C D
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)T(S)/max{V(R, y), V(S, y)} 48 A cost = 0 size = 0 D cost = 0 size = 0 C cost = 0 size = 0 B cost = 0 size = 0 A, B cost = 0 size = 5000 C, D cost = 0 size = 1000 B, D cost = 0 size = B, C cost = 0 size = 2000 A, D cost = 0 size = A, C cost = 0 size = A, B, C cost = 2000 size = B, C, D cost = 1000 size = 2000 A, C, D cost = 1000 size = A, B, D cost = 5000 size = A, B, C, D A {B,C,D}DC B {A,C,D} {A,B,D} {A,B,C} {A,B} {C,D} {A,C} {B,D}{A,D} {B,C} B C D
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)T(S)/max{V(R, y), V(S, y)} 49 A cost = 0 size = 0 D cost = 0 size = 0 C cost = 0 size = 0 B cost = 0 size = 0 A, B cost = 0 size = 5000 C, D cost = 0 size = 1000 B, D cost = 0 size = B, C cost = 0 size = 2000 A, D cost = 0 size = A, C cost = 0 size = A, B, C cost = 2000 size = B, C, D cost = 1000 size = 2000 A, C, D cost = 1000 size = A, B, D cost = 5000 size = A, B, C, D A {B,C,D}DC B {A,C,D} {A,B,D} {A,B,C} {A,B} {C,D} {A,C} {B,D}{A,D} {B,C} A B C D
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)T(S)/max{V(R, y), V(S, y)} 50 A cost = 0 size = 0 D cost = 0 size = 0 C cost = 0 size = 0 B cost = 0 size = 0 A, B cost = 0 size = 5000 C, D cost = 0 size = 1000 B, D cost = 0 size = B, C cost = 0 size = 2000 A, D cost = 0 size = A, C cost = 0 size = A, B, C cost = 2000 size = B, C, D cost = 1000 size = 2000 A, C, D cost = 1000 size = A, B, D cost = 5000 size = A, B, C, D A {B,C,D}DC B {A,C,D} {A,B,D} {A,B,C} {A,B} {C,D} {A,C} {B,D}{A,D} {B,C} A B C D
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)T(S)/max{V(R, y), V(S, y)} 51 A cost = 0 size = 0 D cost = 0 size = 0 C cost = 0 size = 0 B cost = 0 size = 0 A, B cost = 0 size = 5000 C, D cost = 0 size = 1000 B, D cost = 0 size = B, C cost = 0 size = 2000 A, D cost = 0 size = A, C cost = 0 size = A, B, C cost = 2000 size = B, C, D cost = 1000 size = 2000 A, C, D cost = 1000 size = A, B, D cost = 5000 size = A, B, C, D A {B,C,D}DC B {A,C,D} {A,B,D} {A,B,C} {A,B} {C,D} {A,C} {B,D}{A,D} {B,C} A B C D B C D
Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)T(S)/max{V(R, y), V(S, y)} 52 A cost = 0 size = 0 D cost = 0 size = 0 C cost = 0 size = 0 B cost = 0 size = 0 A, B cost = 0 size = 5000 C, D cost = 0 size = 1000 B, D cost = 0 size = B, C cost = 0 size = 2000 A, D cost = 0 size = A, C cost = 0 size = A, B, C cost = 2000 size = B, C, D cost = 1000 size = 2000 A, C, D cost = 1000 size = A, B, D cost = 5000 size = A, B, C, D cost = 3000 A {B,C,D}DC B {A,C,D} {A,B,D} {A,B,C} {A,B} {C,D} {A,C} {B,D}{A,D} {B,C} A B C D
LQP Optimization with Size: Summary Estimating sizes of immediate relations Consider different order of an operation left-deep tree dynamic programming 53
Construction of Physical Query Plan
Input: an optimized LQP T, and a main memory constraint M × ∩ π σ σ σ G F E D C B A
Construction of Physical Query Plan Input: an optimized LQP T, and a main memory constraint M 1.Replacing each leaf R of T by “scan(R)”; × ∩ π σ σ σ scan(G) scan(F) scan(E) scan(D) scan(C) scan(B) scan(A)
Construction of Physical Query Plan Input: an optimized LQP T, and a main memory constraint M 1.Replacing each leaf R of T by “scan(R)”; 2.Combining the “scan’s” with other operations; × ∩ π σ σ σ scan(G) scan(F) scan(E) scan(D) scan(C) scan(B) scan(A) index-scan
Construction of Physical Query Plan Input: an optimized LQP T, and a main memory constraint M 1.Replacing each leaf R of T by “scan(R)”; 2.Combining the “scan’s” with other operations; 3.Replacing each internal node v of T by a proper algorithm; × ∩ π σ σ σ scan(G) scan(F) scan(E) scan(D) scan(C) scan(B) scan(A) index-scan J2P J1P CJ I1P
Construction of Physical Query Plan Input: an optimized LQP T, and a main memory constraint M 1.Replacing each leaf R of T by “scan(R)”; 2.Combining the “scan’s” with other operations; 3.Replacing each internal node v of T by a proper algorithm; 4.For each edge e in T, decide if e should be “materialized”; × ∩ π σ σ σ scan(G) scan(F) scan(E) scan(D) scan(C) scan(B) scan(A) index-scan J2P J1P CJ I1P
Construction of Physical Query Plan Input: an optimized LQP T, and a main memory constraint M 1.Replacing each leaf R of T by “scan(R)”; 2.Combining the “scan’s” with other operations; 3.Replacing each internal node v of T by a proper algorithm; 4.For each edge e in T, decide if e should be “materialized”; × ∩ π σ σ σ scan(G) scan(F) scan(E) scan(D) scan(C) scan(B) scan(A) index-scan J2P J1P CJ I1P
Construction of Physical Query Plan Input: an optimized LQP T, and a main memory constraint M 1.Replacing each leaf R of T by “scan(R)”; 2.Combining the “scan’s” with other operations; 3.Replacing each internal node v of T by a proper algorithm; 4.For each edge e in T, decide if e should be “materialized”; 5.Cut all materialized edges; × ∩ π σ σ σ scan(G) scan(F) scan(E) scan(D) scan(C) scan(B) scan(A) index-scan J2P J1P CJ I1P
Construction of Physical Query Plan Input: an optimized LQP T, and a main memory constraint M 1.Replacing each leaf R of T by “scan(R)”; 2.Combining the “scan’s” with other operations; 3.Replacing each internal node v of T by a proper algorithm; 4.For each edge e in T, decide if e should be “materialized”; 5.Cut all materialized edges; 6.Each subtree is a call to the subroutine at the root of the subtree. The order of the calls follows the bottom-up order in the structure. × ∩ π σ σ σ scan(G) scan(F) scan(E) scan(D) scan(C) scan(B) scan(A) index-scan J2P J1P CJ I1P 1 2 3
Construction of Physical Query Plan Input: an optimized LQP T, and a main memory constraint M 1.Replacing each leaf R of T by “scan(R)”; 2.Combining the “scan’s” with other operations; 3.Replacing each internal node v of T by a proper algorithm; 4.For each edge e in T, decide if e should be “materialized”; 5.Cut all materialized edges; 6.Each subtree is a call to the subroutine at the root of the subtree. The order of the calls follows the bottom-up order in the structure. × ∩ π σ σ σ scan(G) scan(F) scan(E) scan(D) scan(C) scan(B) scan(A) index-scan J2P J1P CJ I1P This produces an executable code for the input DB program
Physical Query Plan: Summary Replacing internal nodes of a LQP by proper algorithms; Deciding if a subroutine call should be pipelined or materialized; Many optimization techniques are involved here; In practice, heuristic optimization techniques are used to construct good physical query plans; The resulting physical query plan is an executable code.
secondary storage (disks) in tables (relations) database administrator DDL language database programmer DML (query) language DBMS file manager buffer manager main memory buffers index/file manager DML complier DDL complier query execution engine transaction manager concurrency control lock table logging & recovery graduate database
secondary storage (disks) in tables (relations) database administrator DDL language database programmer DML (query) language DBMS file manager buffer manager main memory buffers index/file manager DML complier DDL complier query execution engine transaction manager concurrency control lock table logging & recovery graduate database