CPSC-608 Database Systems

Slides:



Advertisements
Similar presentations
Recap: Mining association rules from large datasets
Advertisements

6.830 Lecture 10 Query Optimization 10/6/2014. Selinger Optimizer Algorithm algorithm: compute optimal way to generate every sub-join: size 1, size 2,...
Lecture 10 Query Optimization II Automatic Database Design.
Cost-Based Transformations. Why estimate costs? Well, sometimes we don’t need cost estimations to decide applying some heuristic transformation. –E.g.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #9.
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
THE QUERY COMPILER 16.6 CHOOSING AN ORDER FOR JOINS By: Nitin Mathur Id: 110 CS: 257 Sec-1.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
Cost-Based Transformations. Why estimate costs? Sometimes we don’t need cost estimations to decide applying some heuristic transformation. –E.g. Pushing.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
Choosing an Order for Joins (16.6) Neha Saxena (214) Instructor: T.Y.Lin.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CS411 Database Systems Kazuhiro Minami 12: Query Optimization.
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
Lecture 9 Query Optimization.
1 Lecture 25 Friday, November 30, Outline Query execution –Two pass algorithms based on indexes (6.7) Query optimization –From SQL to logical.
Chapters 15-16a1 (Slides by Hector Garcia-Molina, Chapters 15 and 16: Query Processing.
CPSC-608 Database Systems Fall 2015 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
1 Lecture 25: Query Optimization Wednesday, November 26, 2003.
Lecture 17: Query Execution Tuesday, February 28, 2001.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
CS 440 Database Management Systems Query Optimization 1.
CS4432: Database Systems II Query Processing- Part 1 1.
Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Query Optimization Spring 2016.
1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Query Optimization.
CS 440 Database Management Systems
15.1 – Introduction to physical-Query-plan operators
CS 540 Database Management Systems
CS 440 Database Management Systems
Database Management System
Lecture 26: Query Optimizations and Cost Estimation
Prepared by : Ankit Patel (226)
CPSC-608 Database Systems
CPSC-608 Database Systems
Frequent Pattern Mining
Chapter 12: Query Processing
Chapter 2: Intro to Relational Model
CPSC-608 Database Systems
CPSC-310 Database Systems
Lecture 26: Query Optimization
CS 347: Parallel and Distributed Data Management Notes 11: Network Partitions Hector Garcia-Molina CS347 Notes11.
CS143:Evaluation and Optimization
External Joins Query Optimization 10/4/2017
Optimization Algorithm
Design and Analysis of Multi-Factored Experiments
Outline - Query Processing
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Lecture 27: Optimizations
Chapter 2: Intro to Relational Model
Lecture 11 (Market Basket Analysis)
Example of a Relation attributes (or columns) tuples (or rows)
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
Lecture 26 Monday, December 3, 2001.
Completing the Physical-Query-Plan and Chapter 16 Summary ( )
CPSC-608 Database Systems
Lecture 26: Wednesday, December 4, 2002.
CPSC-608 Database Systems
Lecture 27 Wednesday, December 5, 2001.
Lecture 24: Wednesday, November 27, 2002.
Presentation transcript:

CPSC-608 Database Systems Fall 2018 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 Email: chen@cse.tamu.edu Notes #20

parse tree-lqp convertor Query Optimization An input database program P Prepare a collection C of efficient algorithms for operations in relational algebra; parser View processing, Semantic checking parse tree preprocessing parse tree parse tree-lqp convertor logic query plan push selections, group joins apply logic laws logic query plan reduce the size of intermediate results Optimization via logic and size logic query plan Lqp-pqp convertor take care of issues in optimization and security. physical query plan choices of algorithms, data structures, and computational modes Optimization via algorithms and cost Machine executable code

Improving logic plan via relation size Major Steps: Collect size parameters for stored relations: T(R), B(R), V(R,A) (the # of different values on attribute A) Set up estimation rules for size parameters on relational algebraic operators; Using logic laws to convert a logic query into the one that minimizes the (estimated) sizes of intermediate relations.

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ΠAV(R,A) } γ : T(γ(R)) = min{ T(R)/2, Πgrouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} C : T(R CS) = T(σC(R×S))

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) ⟹ B(δ(R)) γ : T(γ(R)) ⟹ B(γ(R)) σA=c: T(σA=c(R)) ⟹ B(σA=c(R)) σA<c: T(σA<c(R)) ⟹ B(σA<c(R)) ∩ : T(R ∩ S) ⟹ B(R ∩ S) U : T(R U S) ⟹ B(R U S) ‒ : T(R ‒ S) ⟹ B(R ‒ S) : T(R S) ⟹ B(R S) C : T(R CS) ⟹ B(R CS) B(W) = T(W)/#tuples-per-block

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : V(δ(R),A) = V(R,A) γ : V(γ(R),A) = V((R,A) (A is a grouping attribute) σA=c: V(σA=c(R),B) = V(R,B), V(σA=c(R),A) = 1 σA<c: V(σA<c(R),B) = V(R,B), V(σA<c(R),A) = V(R,A)/3 ∩ : V(R∩S,A) = V(S,A)/2 (assume V(R,A) ≥ V(S,A)) U : V(RUS,A) = V(R,A)+V(S,A)/2 (assume V(R,A) ≥ V(S,A)) ‒ : V(R‒S,A) = V(R,A) ‒ max{R(R,A)/2,T(S,A)/2} : V(R S,A) = min{V(R,A),V(S,A)} (A is a shared attribute) V(R S,A) = max{V(R,A),V(S,A)} (A is non-shared) C: V(R CS,A) = V(σC(R×S),A)

Logic Plan Improvement for Join via Size Two techniques: Estimating sizes of immediate relations For natural join: T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} also note: V(R(X, y) S(y, Z), y) = min{V(R, y), V(S, y)} V(R(X, y) S(y, Z), z) = V(R, z) or V(S, z) for z ≠ y Consider different order of an operation (((R S) T) U) = (R U) (S T)

Consider: A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 We want to have a good LQP for A B C D

Left-deep join tree

Left-deep join tree ? ? ? ?

Left-deep join tree (all 4! = 24 permutations) B C D A B D C A C B D A C D B A D B C A D C B B A C D B A D C B C A D B C D A B D A C B D C A C A B D C A D B C B A D C B D A C D A B C D B A D A B C D A C B D B A C D B C A D C A B D C B A

Left-deep join tree (all 4! = 24 permutations) B C D A B D C A C B D A C D B A D B C A D C B B A C D B A D C B C A D B C D A B D A C B D C A C A B D C A D B C B A D C B D A C D A B C D B A D A B C D A C B D B A C D B C A D C A B D C B A

Left-deep join tree (all 4! = 24 permutations) B C D A B D C A C B D A C D B A D B C A D C B B A C D B A D C B C A D B C D A B D A C B D C A C A B D C A D B C B A D C B D A C D A B C D B A D A B C D A C B D B A C D B C A D C A B D C B A

Left-deep join tree (all 4! = 24 permutations) B C D A B D C A C B D A C D B A D B C A D C B B A C D B A D C B C A D B C D A B D A C B D C A C A B D C A D B C B A D C B D A C D A B C D B A D A B C D A C B D B A C D B C A D C A B D C B A

Left-deep join tree (all 4! = 24 permutations) B C D A B D C A C B D A C D B A D B C A D C B B A C D B A D C B C A D B C D A B D A C B D C A C A B D C A D B C B A D C B D A C D A B C D B A D A B C D A C B D B A C D B C A D C A B D C B A

Left-deep join tree D C C A A B B D A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y)⋈S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} T(R(X, y1,y2)⋈S(y1, y2, Z)) = T(R)•T(S)/max{V(R, y1), V(S, y1)}max{V(R, y2), V(S, y2)}

Left-deep join tree D C C A A B B D 5000 V(*, c) = 500 A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y)⋈S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} T(R(X, y1,y2)⋈S(y1, y2, Z)) = T(R)•T(S)/max{V(R, y1), V(S, y1)}max{V(R, y2), V(S, y2)}

Left-deep join tree D C C A A B B D 10000 5000 V(*, c) = 500 A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y)⋈S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} T(R(X, y1,y2)⋈S(y1, y2, Z)) = T(R)•T(S)/max{V(R, y1), V(S, y1)}max{V(R, y2), V(S, y2)}

Left-deep join tree D C C A A B B D 10000 5000 V(*, c) = 500 cost = 15000 A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y)⋈S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} T(R(X, y1,y2)⋈S(y1, y2, Z)) = T(R)•T(S)/max{V(R, y1), V(S, y1)}max{V(R, y2), V(S, y2)}

Left-deep join tree D C C A A B B D 10000 50000 5000 V(*, c) = 500 1000000 V(*, a) = 50 V(*,b) = 100 cost = 15000 cost = 1050000 A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y)⋈S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} T(R(X, y1,y2)⋈S(y1, y2, Z)) = T(R)•T(S)/max{V(R, y1), V(S, y1)}max{V(R, y2), V(S, y2)}

Left-deep join tree D C C A A B B D 10000 50000 5000 V(*, c) = 500 1000000 V(*, a) = 50 V(*,b) = 100 cost = 15000 cost = 1050000 A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y)⋈S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} T(R(X, y1,y2)⋈S(y1, y2, Z)) = T(R)•T(S)/max{V(R, y1), V(S, y1)}max{V(R, y2), V(S, y2)}

Left-deep join tree (all 4! = 24 permutations) B C D A B D C A C B D A C D B A D B C A D C B 15000 55000 1010000 1010000 60000 20000 B A C D B A D C B C A D B C D A B D A C B D C A 12000 4000 1050000 1002000 C A B D C A D B C B A D C B D A C D A B C D B A 11000 3000 D A B C D A C B D B A C D B C A D C A B D C B A

Left-deep join tree (all 4! = 24 permutations) B C D A B D C A C B D A C D B A D B C A D C B 15000 55000 1010000 1010000 60000 20000 B A C D B A D C B C A D B C D A B D A C B D C A 12000 4000 1050000 1002000 C A B D C A D B C B A D C B D A C D A B C D B A 11000 3000 D A B C D A C B D B A C D B C A D C A B D C B A

Left-deep tree: general algorithm Input: n relations R1, R2, …, Rn Output: the best left-deep join of R1, R2, …, Rn Construct a left-deep tree T of n leaves; For each P of the permutations of the n relations R1, R2, …, Rn Do assign the n relations to the leaves of T in order of P; evaluate the cost of the plan; 3. Pick the plan with the permutation that gives the minimum cost.

Dynamic Programming Consider all tree structures.

Dynamic Programming Consider all tree structures. Again consider A B C D Five tree structures: Each of (a)-(d) has 12 different assignments, and (e) has 3 different assignments. So totally there are 51 different ways to join the 4 relations. Too many when the number of relations is relatively large. (a) (b) (c) (d) (e)

Dynamic Programming Consider D D D D C B A A A B A C C B B C

Dynamic Programming Consider D D D D C B A A A B A C C B B C

Dynamic Programming Consider We really only need to find the best way to join A B C , then join D with this best join. D D D D C B A A A B A C C B B C

Dynamic Programming Consider We really only need to find the best way to join A B C , then join D with this best join. How do we find the best join of A B C? D D D D C B A A A B A C C B B C

Dynamic Programming Consider We really only need to find the best way to join A B C , then join D with this best join. How do we find the best join of A B C? We consider all possible ways: (A B) C, (A C) B, (B C) A. D D D D C B A A A B A C C B B C

Dynamic programming: general algorithm Input: n relations R1, R2, …, Rn Output: the best join of R1, R2, …, Rn FOR each Ri DO {cost(Ri) = 0; size(Ri) = 0}; FOR each pair of Ri and Rj DO {cost(Ri, Rj) = 0; compute size(Ri Rj)}; FOR k = 3 TO n DO FOR any k relations S1, S2, …, Sk of R1, R2, …, Rn DO FOR each partition P = {(Si1, …, Sij ), (Sij+1,…, Sik )} of S1, S2, …, Sk DO cost(P) = cost(Si1, …, Sij) + size(Si1 … Sij) + cost(Sij+1, …, Sik) + size(Sij+1 … Sik ); let cost(S1, S2, …, Sk) be the smallest cost(P) among the above partitions; compute size(S1 S2 … Sk) (and remember this partition P); 4. Return cost(R1, R2, …, Rn).

Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0

Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000

Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C A, B, D A, C, D B, C, D

Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C A, B, D A, C, D B, C, D

Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C A, B, D A, C, D B, C, D D C B B C B D C D

Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C A, B, D A, C, D B, C, D D C B B C B D C D

Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C A, B, D A, C, D B, C, D D C B B C B D C D

Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C A, B, D A, C, D B, C, D 2000 D C B B C B D C D

Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C A, B, D A, C, D B, C, D 2000 D C B B C B D C D 2000

Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C A, B, D A, C, D B, C, D 2000 1000000 1000 D C B B C B D C D 2000 1000000 1000

Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C A, B, D A, C, D B, C, D 2000 1000000 1000 D C B B C B D C D 2000 1000000 1000

Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C A, B, D A, C, D B, C, D cost = 1000 size = 2000 B C D

Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C cost = 2000 size = 10000 A, B, D cost = 5000 size = 50000 A, C, D cost = 1000 size = 10000 B, C, D cost = 1000 size = 2000 B C D

Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C cost = 2000 size = 10000 A, B, D cost = 5000 size = 50000 A, C, D cost = 1000 size = 10000 B, C, D cost = 1000 size = 2000 B C D A, B, C, D

Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C cost = 2000 size = 10000 A, B, D cost = 5000 size = 50000 A, C, D cost = 1000 size = 10000 B, C, D cost = 1000 size = 2000 B C D A, B, C, D A {B,C,D} B {A,C,D} C {A,B,D} D {A,B,C} {A,B} {C,D} {A,C} {B,D} {A,D} {B,C}

Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C cost = 2000 size = 10000 A, B, D cost = 5000 size = 50000 A, C, D cost = 1000 size = 10000 B, C, D cost = 1000 size = 2000 B C D A, B, C, D A {B,C,D} B {A,C,D} C {A,B,D} D {A,B,C} {A,B} {C,D} {A,C} {B,D} {A,D} {B,C} 3000

Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C cost = 2000 size = 10000 A, B, D cost = 5000 size = 50000 A, C, D cost = 1000 size = 10000 B, C, D cost = 1000 size = 2000 B C D A, B, C, D A {B,C,D} B {A,C,D} C {A,B,D} D {A,B,C} {A,B} {C,D} {A,C} {B,D} {A,D} {B,C} 3000 6000

Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C cost = 2000 size = 10000 A, B, D cost = 5000 size = 50000 A, C, D cost = 1000 size = 10000 B, C, D cost = 1000 size = 2000 B C D A, B, C, D A {B,C,D} B {A,C,D} C {A,B,D} D {A,B,C} {A,B} {C,D} {A,C} {B,D} {A,D} {B,C} 3000 11000 55000 12000 6000 2000000 12000

Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C cost = 2000 size = 10000 A, B, D cost = 5000 size = 50000 A, C, D cost = 1000 size = 10000 B, C, D cost = 1000 size = 2000 B C D A, B, C, D A {B,C,D} B {A,C,D} C {A,B,D} D {A,B,C} {A,B} {C,D} {A,C} {B,D} {A,D} {B,C} 3000 11000 55000 12000 6000 2000000 12000

Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C cost = 2000 size = 10000 A, B, D cost = 5000 size = 50000 A, C, D cost = 1000 size = 10000 B, C, D cost = 1000 size = 2000 B C D A, B, C, D A A {B,C,D} B {A,C,D} C {A,B,D} D {A,B,C} {A,B} {C,D} {A,C} {B,D} {A,D} {B,C} 3000 11000 55000 12000 6000 2000000 12000

Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C cost = 2000 size = 10000 A, B, D cost = 5000 size = 50000 A, C, D cost = 1000 size = 10000 B, C, D cost = 1000 size = 2000 B C D A, B, C, D A A {B,C,D} B {A,C,D} C {A,B,D} D {A,B,C} {A,B} {C,D} {A,C} {B,D} {A,D} {B,C} 3000 11000 55000 12000 6000 2000000 12000

Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C cost = 2000 size = 10000 A, B, D cost = 5000 size = 50000 A, C, D cost = 1000 size = 10000 B, C, D cost = 1000 size = 2000 B C D A, B, C, D A B C D A {B,C,D} B {A,C,D} C {A,B,D} D {A,B,C} {A,B} {C,D} {A,C} {B,D} {A,D} {B,C} 3000 11000 55000 12000 6000 2000000 12000

Dynamic Programming: Example A(a, b): T(A) = 1000, V(A, a) = 100, V(A, b) = 200 B(b, c): T(B) = 1000, V(B, b) = 100, V(B, c) = 500 C(c, d): T(C) = 1000, V(C, c) = 20, V(C, d) = 1000 D(d, a): T(D) = 1000, V(D, d) = 1000, V(D, a) = 50 T(R(X, y) S(y, Z)) = T(R)•T(S)/max{V(R, y), V(S, y)} A cost = 0 size = 0 B cost = 0 size = 0 C cost = 0 size = 0 D cost = 0 size = 0 A, B cost = 0 size = 5000 A, C cost = 0 size = 1000000 A, D cost = 0 size = 10000 B, C cost = 0 size = 2000 B, D cost = 0 size = 1000000 C, D cost = 0 size = 1000 A, B, C cost = 2000 size = 10000 A, B, D cost = 5000 size = 50000 A, C, D cost = 1000 size = 10000 B, C, D cost = 1000 size = 2000 A, B, C, D cost = 3000 A B C D A {B,C,D} B {A,C,D} C {A,B,D} D {A,B,C} {A,B} {C,D} {A,C} {B,D} {A,D} {B,C} 3000 11000 55000 12000 6000 2000000 12000

Summary: Logic Plan Improvement for Join via Size: Estimating sizes of immediate relations Consider different order of an operation left-deep tree dynamic programming