CPSC-608 Database Systems Fall 2018 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 Email: chen@cse.tamu.edu Notes #19
parse tree-lqp convertor Query Optimization An input database program P Prepare a collection C of efficient algorithms for operations in relational algebra; parser View processing, Semantic checking parse tree preprocessing parse tree parse tree-lqp convertor logic query plan push selections, group joins apply logic laws logic query plan reduce the size of intermediate results Optimization via logic and size logic query plan Lqp-pqp convertor take care of issues in optimization and security. physical query plan choices of algorithms, data structures, and computational modes Optimization via algorithms and cost Machine executable code
Improving logic plan via relation size Major Steps: Collect size parameters for stored relations: T(R), B(R), V(R,A) (the # of different values on attribute A) Set up estimation rules for size parameters on relational algebraic operators; Using logic laws to convert a logic query into the one that minimizes the (estimated) sizes of intermediate relations. R S δ 500 150 1500 √ 5000 2000 5000 2000
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is small)
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) Assuming T(R) ≥ T(S). T(R ∩ S) can be as large as T(S), and can be as small as 0. So in average, take T(R ∩ S) = T(S)/2?
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) Assuming T(R) ≥ T(S). T(R ∩ S) can be as large as T(S), and can be as small as 0. So in average, take T(R ∩ S) = T(S)/2?
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) Assuming T(R) ≥ T(S). T(R ∩ S) can be as large as T(S), and can be as small as 0. So in average, take T(R ∩ S) = T(S)/2?
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) Assuming T(R) ≥ T(S). T(R ∩ S) can be as large as T(S), and can be as small as 0. So in average, take T(R ∩ S) = T(S)/2?
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R) + T(S)/2 (assume S is small)
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = (T(R)+T(S))/2 (assume S is smaller) Assuming T(R) ≥ T(S). T(R U S) can be as large as T(R)+T(S), and as small as T(S). So in average, take T(R U S) = (T(R)+T(S))/2 ?
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = (T(R)+T(S))/2 (assume S is smaller) Assuming T(R) ≥ T(S). T(R U S) can be as large as T(R)+T(S), and as small as T(R). So in average, take T(R U S) = (T(R)+T(S))/2 ?
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = (T(R)+T(S))/2 (assume S is smaller) Assuming T(R) ≥ T(S). T(R U S) can be as large as T(R)+T(S), and as small as T(R). So in average, take T(R U S) = T(R)+T(S)/2 ?
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) Assuming T(R) ≥ T(S). T(R U S) can be as large as T(R)+T(S), and as small as T(R). So in average, take T(R U S) = T(R)+T(S)/2 ?
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 T(R‒S) can be as large as T(R), and as small as 0. So take T(R‒S) = T(R)/2 ? S has no impact ! So take T(R‒S) = T(R) ‒ max{T(R)/2, T(S)/2} ?
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 T(R‒S) can be as large as T(R), and as small as 0. So take T(R‒S) = T(R)/2 ? S has no impact ! So take T(R‒S) = T(R) ‒ max{T(R)/2, T(S)/2} ?
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 T(R‒S) can be as large as T(R), and as small as 0. So take T(R‒S) = T(R)/2 ? S has no impact ! So take T(R‒S) = T(R) ‒ max{T(R)/2, T(S)/2} ?
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 T(R‒S) can be as large as T(R), and as small as 0. So take T(R‒S) = T(R)/2 ? S has no impact ! So take T(R‒S) = T(R) ‒ max{T(R)/2, T(S)/2} ?
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ max{T(R)/2, T(S)/2} T(R‒S) can be as large as T(R), and as small as 0. So take T(R‒S) = T(R)/2 ? S has no impact ! So take T(R‒S) = T(R) ‒ max{T(R)/2, T(S)/2} ?
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)}
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S))/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} Assumption. If V(R,A) > V(S,A), then every A-value in S will appear in R. Thus, for a tuple t with an A-value a in S, in average a appears T(R)/V(R,A) times in R. Thus, t can be joined with T(R)/V(R,A) tuples in R. This gives T(R⋈S) = T(R)T(S)/max{V(R,A),V(S,A)}
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S))/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} Assumption. If V(R,A) > V(S,A), then every A-value in S will appear in R. Thus, for a tuple t with an A-value a in S, in average a appears T(R)/V(R,A) times in R. Thus, t can be joined with T(R)/V(R,A) tuples in R. This gives T(R⋈S) = T(R)T(S)/max{V(R,A),V(S,A)}
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S))/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} Assumption. If V(R,A) > V(S,A), then every A-value in S will appear in R. Thus, for a tuple t with an A-value a in S, in average a appears T(R)/V(R,A) times in R. Thus, t can be joined with T(R)/V(R,A) tuples in R. This gives T(R⋈S) = T(R)T(S)/max{V(R,A),V(S,A)}
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S))/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} Assumption. If V(R,A) > V(S,A), then every A-value in S will appear in R. Thus, for a tuple t with an A-value a in S, in average a appears T(R)/V(R,A) times in R. Thus, t can be joined with T(R)/V(R,A) tuples in R. This gives T(R⋈S) = T(R)T(S)/max{V(R,A),V(S,A)}
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} C : T(R CS) = T(σC(R×S))
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} C : T(R CS) = T(σC(R×S))
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ΠAV(R,A) } γ : T(γ(R)) = min{ T(R)/2, Πgrouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} C : T(R CS) = T(σC(R×S))
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ΠAV(R,A) } γ : T(γ(R)) = min{ T(R)/2, Πgrouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} C : T(R CS) = T(σC(R×S))
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ΠAV(R,A) } γ : T(γ(R)) = min{ T(R)/2, Πgrouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} C : T(R CS) = T(σC(R×S)) Assume that the tables are stored in a clustered way. If we know the schemas of relations R and S, we will also know the schema of the relation W obtained by applying an operation on R and/or S, from which we know how much space a tuple in W will take. Therefore, the value B(W) can be computed from the value T(W). B(W) = T(W)/#tuples-per-block
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ΠAV(R,A) } γ : T(γ(R)) = min{ T(R)/2, Πgrouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} C : T(R CS) = T(σC(R×S)) Assume that the tables are stored in a clustered way. If we know the schemas of relations R and S, we will also know the schema of the relation W obtained by applying an operation on R and/or S, from which we know how much space a tuple in W will take. Therefore, the value B(W) can be computed from the value T(W). B(W) = T(W)/#tuples-per-block
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ΠAV(R,A) } γ : T(γ(R)) = min{ T(R)/2, Πgrouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} C : T(R CS) = T(σC(R×S)) Assume that the tables are stored in a clustered way. If we know the schemas of relations R and S, we will also know the schema of the relation W obtained by applying an operation on R and/or S, from which we know how much space a tuple in W will take. Therefore, the value B(W) can be computed from the value T(W). B(W) = T(W)/#tuples-per-block
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) ⟹ B(δ(R)) γ : T(γ(R)) ⟹ B(γ(R)) σA=c: T(σA=c(R)) ⟹ B(σA=c(R)) σA<c: T(σA<c(R)) ⟹ B(σA<c(R)) ∩ : T(R ∩ S) ⟹ B(R ∩ S) U : T(R U S) ⟹ B(R U S) ‒ : T(R ‒ S) ⟹ B(R ‒ S) : T(R S) ⟹ B(R S) C : T(R CS) ⟹ B(R CS) B(W) = T(W)/#tuples-per-block
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) ⟹ B(δ(R)) γ : T(γ(R)) ⟹ B(γ(R)) σA=c: T(σA=c(R)) ⟹ B(σA=c(R)) σA<c: T(σA<c(R)) ⟹ B(σA<c(R)) ∩ : T(R ∩ S) ⟹ B(R ∩ S) U : T(R U S) ⟹ B(R U S) ‒ : T(R ‒ S) ⟹ B(R ‒ S) : T(R S) ⟹ B(R S) C : T(R CS) ⟹ B(R CS)
Estimating size parameters (T,B,V) Similar to that for the parameter T
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : V(δ(R),A) = V(R,A) γ : V(γ(R),A) = V((R,A) (A is a grouping attribute) σA=c: V(σA=c(R),B) = V(R,B), V(σA=c(R),A) = 1 σA<c: V(σA<c(R),B) = V(R,B), V(σA<c(R),A) = V(R,A)/3 ∩ : V(R∩S,A) = V(S,A)/2 (assume V(R,A) ≥ V(S,A)) U : V(RUS,A) = V(R,A)+V(S,A)/2 (assume V(R,A) ≥ V(S,A)) ‒ : V(R‒S,A) = V(R,A) ‒ max{R(R,A)/2,T(S,A)/2} : V(R S,A) = min{V(R,A),V(S,A)} (A is a shared attribute) V(R S,A) = max{V(R,A),V(S,A)} (A is non-shared) C: V(R CS,A) = V(σC(R×S),A) Similar to that for the parameter T
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : V(δ(R),A) = V(R,A) γ : V(γ(R),A) = V((R,A) (A is a grouping attribute) σA=c: V(σA=c(R),B) = V(R,B), V(σA=c(R),A) = 1 σA<c: V(σA<c(R),B) = V(R,B), V(σA<c(R),A) = V(R,A)/3 ∩ : V(R∩S,A) = V(S,A)/2 (assume V(R,A) ≥ V(S,A)) U : V(RUS,A) = V(R,A)+V(S,A)/2 (assume V(R,A) ≥ V(S,A)) ‒ : V(R‒S,A) = V(R,A) ‒ max{R(R,A)/2,T(S,A)/2} : V(R S,A) = min{V(R,A),V(S,A)} (A is a shared attribute) V(R S,A) = max{V(R,A),V(S,A)} (A is non-shared) C: V(R CS,A) = V(σC(R×S),A) Containment Law: if V(R,A) > V(S,A), then all A-values in S are in R
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : V(δ(R),A) = V(R,A) γ : V(γ(R),A) = V((R,A) (A is a grouping attribute) σA=c: V(σA=c(R),B) = V(R,B), V(σA=c(R),A) = 1 σA<c: V(σA<c(R),B) = V(R,B), V(σA<c(R),A) = V(R,A)/3 ∩ : V(R∩S,A) = V(S,A)/2 (assume V(R,A) ≥ V(S,A)) U : V(RUS,A) = V(R,A)+V(S,A)/2 (assume V(R,A) ≥ V(S,A)) ‒ : V(R‒S,A) = V(R,A) ‒ max{R(R,A)/2,T(S,A)/2} : V(R S,A) = min{V(R,A),V(S,A)} (A is a shared attribute) V(R S,A) = max{V(R,A),V(S,A)} (A is non-shared) C: V(R CS,A) = V(σC(R×S),A) Preservation Law: if attribute A is not involved in the operation, then the # of A-values is unchanged.
Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : V(δ(R),A) = V(R,A) γ : V(γ(R),A) = V((R,A) (A is a grouping attribute) σA=c: V(σA=c(R),B) = V(R,B), V(σA=c(R),A) = 1 σA<c: V(σA<c(R),B) = V(R,B), V(σA<c(R),A) = V(R,A)/3 ∩ : V(R∩S,A) = V(S,A)/2 (assume V(R,A) ≥ V(S,A)) U : V(RUS,A) = V(R,A)+V(S,A)/2 (assume V(R,A) ≥ V(S,A)) ‒ : V(R‒S,A) = V(R,A) ‒ max{R(R,A)/2,T(S,A)/2} : V(R S,A) = min{V(R,A),V(S,A)} (A is a shared attribute) V(R S,A) = max{V(R,A),V(S,A)} (A is non-shared) C: V(R CS,A) = V(σC(R×S),A) The formulas for set/bag operations may depend on applications.
Improving logic plan via relation size Major Steps: Collect size parameters for stored relations: T(R), B(R), V(R,A) (the # of different values on attribute A) Set up estimation rules for size parameters on relational algebraic operators; Using logic laws to convert a logic query into the one that minimizes the (estimated) sizes of intermediate relations.
Improving logic plan via relation size Major Steps: Collect size parameters for stored relations: T(R), B(R), V(R,A) (the # of different values on attribute A) Set up estimation rules for size parameters on relational algebraic operators; Using logic laws to convert a logic query into the one that minimizes the (estimated) sizes of intermediate relations.
Improving logic plan via relation size R(a,b): T(R) = 5000, V(R,a) = 50, V(R,b) = 60, S(b,c): T(S) = 2000, V(S,b) = 200, V(S,c) = 100 R S δ σa=10 T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 R S δ T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 σa=10
Improving logic plan via relation size R(a,b): T(R) = 5000, V(R,a) = 50, V(R,b) = 60, S(b,c): T(S) = 2000, V(S,b) = 200, V(S,c) = 100 R S δ σa=10 T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 R S δ T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 σa=10 T(σA=c(R)) = T(R)/V(R,A) T(*)=100 V(*,a)=1 V(*,b)=60
Improving logic plan via relation size R(a,b): T(R) = 5000, V(R,a) = 50, V(R,b) = 60, S(b,c): T(S) = 2000, V(S,b) = 200, V(S,c) = 100 R S δ σa=10 T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 R S δ T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 σa=10 T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} T(*)=1000 T(σA=c(R)) = T(R)/V(R,A) T(*)=100 V(*,a)=1 V(*,b)=60
Improving logic plan via relation size R(a,b): T(R) = 5000, V(R,a) = 50, V(R,b) = 60, S(b,c): T(S) = 2000, V(S,b) = 200, V(S,c) = 100 R S δ σa=10 T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 R S δ T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 σa=10 T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} T(*)=1000 T(σA=c(R)) = T(R)/V(R,A) T(*)=100 V(*,a)=1 V(*,b)=60 T=100 V(R,a)=1 V(R,b)=60
Improving logic plan via relation size R(a,b): T(R) = 5000, V(R,a) = 50, V(R,b) = 60, S(b,c): T(S) = 2000, V(S,b) = 200, V(S,c) = 100 T(δ(S)) = min{ T(S)/2, ΠAV(R,A) } R S δ σa=10 T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 R S δ T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 σa=10 T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} T(*)=50 T(*)=1000 T(*)=1000 T(σA=c(R)) = T(R)/V(R,A) T(*)=100 V(*,a)=1 V(*,b)=60 T=100 V(R,a)=1 V(R,b)=60
Improving logic plan via relation size R(a,b): T(R) = 5000, V(R,a) = 50, V(R,b) = 60, S(b,c): T(S) = 2000, V(S,b) = 200, V(S,c) = 100 T(δ(S)) = min{ T(S)/2, ΠAV(R,A) } R S δ σa=10 T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 R S δ T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 σa=10 T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} T(*)=50 T(*)=1000 T(*)=1000 T(σA=c(R)) = T(R)/V(R,A) T(*)=100 V(*,a)=1 V(*,b)=60 T=100 V(R,a)=1 V(R,b)=60 Cost = 1100 Cost = 1150
Improving logic plan via relation size R(a,b): T(R) = 5000, V(R,a) = 50, V(R,b) = 60, S(b,c): T(S) = 2000, V(S,b) = 200, V(S,c) = 100 T(δ(S)) = min{ T(S)/2, ΠAV(R,A) } R S δ σa=10 T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 R S δ T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 σa=10 T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} T(*)=50 T(*)=1000 T(*)=1000 T(σA=c(R)) = T(R)/V(R,A) T(*)=100 V(*,a)=1 V(*,b)=60 T=100 V(R,a)=1 V(R,b)=60 √ Cost = 1100 Cost = 1150
Improving logic plan via relation size R(a,b): T(R) = 5000, V(R,a) = 50, V(R,b) = 60, S(b,c): T(S) = 2000, V(S,b) = 200, V(S,c) = 100 T(δ(S)) = min{ T(S)/2, ΠAV(R,A) } R S δ σa=10 T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 R S δ T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 σa=10 T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} T(*)=50 T(*)=1000 T(*)=1000 T(σA=c(R)) = T(R)/V(R,A) T(*)=100 V(*,a)=1 V(*,b)=60 T=100 V(R,a)=1 V(R,b)=60 To be more precise, we may also need to consider the #blocks √ Cost = 1100 Cost = 1150