CPSC-608 Database Systems

Slides:



Advertisements
Similar presentations
Cost-Based Transformations. Why estimate costs? Well, sometimes we don’t need cost estimations to decide applying some heuristic transformation. –E.g.
Advertisements

Algebraic Laws For the binary operators, we push the selection only if all attributes in the condition C are in R.
CS 4432query processing - lecture 141 CS4432: Database Systems II Lecture #14 Query Processing – Size Estimation Professor Elke A. Rundensteiner.
Estimating the Cost of Operations We don’t want to execute the query in order to learn the costs. So, we need to estimate the costs. How can we estimate.
CS CS4432: Database Systems II Query Processing – Size Estimation.
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinAkshay Shenoy Computer Science Dept San Jose State University.
Estimating the Cost of Operations. From l.q.p. to p.q.p Having parsed a query and transformed it into a logical query plan, we must turn the logical plan.
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
1 Lecture 22: Query Execution Wednesday, March 2, 2005.
Cost-Based Transformations. Why estimate costs? Sometimes we don’t need cost estimations to decide applying some heuristic transformation. –E.g. Pushing.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes 1.
CS 4432query processing - lecture 121 CS4432: Database Systems II Lecture #12 Query Processing Professor Elke A. Rundensteiner.
CS411 Database Systems Kazuhiro Minami 12: Query Optimization.
CSCE Database Systems Chapter 15: Query Execution 1.
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based.
Chapters 15-16a1 (Slides by Hector Garcia-Molina, Chapters 15 and 16: Query Processing.
Estimating the Cost of Operations. Suppose we have parsed a query and transformed it into a logical query plan (lqp) Also suppose all possible transformations.
CS 4432estimation - lecture 161 CS4432: Database Systems II Lecture #16 Query Processing : Estimating Sizes of Results Professor Elke A. Rundensteiner.
CS4432: Database Systems II Query Processing- Part 2.
CPSC-608 Database Systems Fall 2015 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
CS4432: Database Systems II Query Processing- Part 1 1.
Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Query Optimization Spring 2016.
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinAkshay Shenoy Computer Science Dept San Jose State University.
1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Query Optimization.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
15.1 – Introduction to physical-Query-plan operators
Module 2: Intro to Relational Model
CS 440 Database Management Systems
Query Processing Exercise Session 4.
Database Management System
Lecture 26: Query Optimizations and Cost Estimation
Lecture 27: Size/Cost Estimation
CPSC-608 Database Systems
CPSC-608 Database Systems
Chapter 2: Intro to Relational Model
CPSC-608 Database Systems
Query Execution Presented by Khadke, Suvarna CS 257
CPSC-310 Database Systems
Lecture 26: Query Optimization
Outline - Query Processing
Algebraic Laws.
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Lecture 27: Optimizations
Relational Algebra Friday, 11/14/2003.
Lecture 25: Query Optimization
Chapter 2: Intro to Relational Model
Query Execution Index Based Algorithms (15.6)
Example of a Relation attributes (or columns) tuples (or rows)
CPSC-608 Database Systems
Lecture 23: Query Execution
CPSC-608 Database Systems
CPSC-608 Database Systems
Chapter 2: Intro to Relational Model
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
Lecture 22: Query Execution
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
Outline - Query Processing
Completing the Physical-Query-Plan and Chapter 16 Summary ( )
Lecture 26: Wednesday, December 4, 2002.
CPSC-608 Database Systems
Lecture 27 Wednesday, December 5, 2001.
Lecture 20: Query Execution
Presentation transcript:

CPSC-608 Database Systems Fall 2018 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 Email: chen@cse.tamu.edu Notes #19

parse tree-lqp convertor Query Optimization An input database program P Prepare a collection C of efficient algorithms for operations in relational algebra; parser View processing, Semantic checking parse tree preprocessing parse tree parse tree-lqp convertor logic query plan push selections, group joins apply logic laws logic query plan reduce the size of intermediate results Optimization via logic and size logic query plan Lqp-pqp convertor take care of issues in optimization and security. physical query plan choices of algorithms, data structures, and computational modes Optimization via algorithms and cost Machine executable code

Improving logic plan via relation size Major Steps: Collect size parameters for stored relations: T(R), B(R), V(R,A) (the # of different values on attribute A) Set up estimation rules for size parameters on relational algebraic operators; Using logic laws to convert a logic query into the one that minimizes the (estimated) sizes of intermediate relations. R S δ 500 150 1500 √ 5000 2000 5000 2000

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is small)

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) Assuming T(R) ≥ T(S). T(R ∩ S) can be as large as T(S), and can be as small as 0. So in average, take T(R ∩ S) = T(S)/2?

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) Assuming T(R) ≥ T(S). T(R ∩ S) can be as large as T(S), and can be as small as 0. So in average, take T(R ∩ S) = T(S)/2?

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) Assuming T(R) ≥ T(S). T(R ∩ S) can be as large as T(S), and can be as small as 0. So in average, take T(R ∩ S) = T(S)/2?

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) Assuming T(R) ≥ T(S). T(R ∩ S) can be as large as T(S), and can be as small as 0. So in average, take T(R ∩ S) = T(S)/2?

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R) + T(S)/2 (assume S is small)

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = (T(R)+T(S))/2 (assume S is smaller) Assuming T(R) ≥ T(S). T(R U S) can be as large as T(R)+T(S), and as small as T(S). So in average, take T(R U S) = (T(R)+T(S))/2 ?

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = (T(R)+T(S))/2 (assume S is smaller) Assuming T(R) ≥ T(S). T(R U S) can be as large as T(R)+T(S), and as small as T(R). So in average, take T(R U S) = (T(R)+T(S))/2 ?

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = (T(R)+T(S))/2 (assume S is smaller) Assuming T(R) ≥ T(S). T(R U S) can be as large as T(R)+T(S), and as small as T(R). So in average, take T(R U S) = T(R)+T(S)/2 ?

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) Assuming T(R) ≥ T(S). T(R U S) can be as large as T(R)+T(S), and as small as T(R). So in average, take T(R U S) = T(R)+T(S)/2 ?

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 T(R‒S) can be as large as T(R), and as small as 0. So take T(R‒S) = T(R)/2 ? S has no impact ! So take T(R‒S) = T(R) ‒ max{T(R)/2, T(S)/2} ?

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 T(R‒S) can be as large as T(R), and as small as 0. So take T(R‒S) = T(R)/2 ? S has no impact ! So take T(R‒S) = T(R) ‒ max{T(R)/2, T(S)/2} ?

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 T(R‒S) can be as large as T(R), and as small as 0. So take T(R‒S) = T(R)/2 ? S has no impact ! So take T(R‒S) = T(R) ‒ max{T(R)/2, T(S)/2} ?

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 T(R‒S) can be as large as T(R), and as small as 0. So take T(R‒S) = T(R)/2 ? S has no impact ! So take T(R‒S) = T(R) ‒ max{T(R)/2, T(S)/2} ?

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ max{T(R)/2, T(S)/2} T(R‒S) can be as large as T(R), and as small as 0. So take T(R‒S) = T(R)/2 ? S has no impact ! So take T(R‒S) = T(R) ‒ max{T(R)/2, T(S)/2} ?

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)}

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S))/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} Assumption. If V(R,A) > V(S,A), then every A-value in S will appear in R. Thus, for a tuple t with an A-value a in S, in average a appears T(R)/V(R,A) times in R. Thus, t can be joined with T(R)/V(R,A) tuples in R. This gives T(R⋈S) = T(R)T(S)/max{V(R,A),V(S,A)}

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S))/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} Assumption. If V(R,A) > V(S,A), then every A-value in S will appear in R. Thus, for a tuple t with an A-value a in S, in average a appears T(R)/V(R,A) times in R. Thus, t can be joined with T(R)/V(R,A) tuples in R. This gives T(R⋈S) = T(R)T(S)/max{V(R,A),V(S,A)}

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S))/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} Assumption. If V(R,A) > V(S,A), then every A-value in S will appear in R. Thus, for a tuple t with an A-value a in S, in average a appears T(R)/V(R,A) times in R. Thus, t can be joined with T(R)/V(R,A) tuples in R. This gives T(R⋈S) = T(R)T(S)/max{V(R,A),V(S,A)}

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S))/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} Assumption. If V(R,A) > V(S,A), then every A-value in S will appear in R. Thus, for a tuple t with an A-value a in S, in average a appears T(R)/V(R,A) times in R. Thus, t can be joined with T(R)/V(R,A) tuples in R. This gives T(R⋈S) = T(R)T(S)/max{V(R,A),V(S,A)}

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} C : T(R CS) = T(σC(R×S))

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} C : T(R CS) = T(σC(R×S))

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ΠAV(R,A) } γ : T(γ(R)) = min{ T(R)/2, Πgrouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} C : T(R CS) = T(σC(R×S))

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ΠAV(R,A) } γ : T(γ(R)) = min{ T(R)/2, Πgrouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} C : T(R CS) = T(σC(R×S))

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ΠAV(R,A) } γ : T(γ(R)) = min{ T(R)/2, Πgrouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} C : T(R CS) = T(σC(R×S)) Assume that the tables are stored in a clustered way. If we know the schemas of relations R and S, we will also know the schema of the relation W obtained by applying an operation on R and/or S, from which we know how much space a tuple in W will take. Therefore, the value B(W) can be computed from the value T(W). B(W) = T(W)/#tuples-per-block

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ΠAV(R,A) } γ : T(γ(R)) = min{ T(R)/2, Πgrouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} C : T(R CS) = T(σC(R×S)) Assume that the tables are stored in a clustered way. If we know the schemas of relations R and S, we will also know the schema of the relation W obtained by applying an operation on R and/or S, from which we know how much space a tuple in W will take. Therefore, the value B(W) can be computed from the value T(W). B(W) = T(W)/#tuples-per-block

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ΠAV(R,A) } γ : T(γ(R)) = min{ T(R)/2, Πgrouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} C : T(R CS) = T(σC(R×S)) Assume that the tables are stored in a clustered way. If we know the schemas of relations R and S, we will also know the schema of the relation W obtained by applying an operation on R and/or S, from which we know how much space a tuple in W will take. Therefore, the value B(W) can be computed from the value T(W). B(W) = T(W)/#tuples-per-block

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) ⟹ B(δ(R)) γ : T(γ(R)) ⟹ B(γ(R)) σA=c: T(σA=c(R)) ⟹ B(σA=c(R)) σA<c: T(σA<c(R)) ⟹ B(σA<c(R)) ∩ : T(R ∩ S) ⟹ B(R ∩ S) U : T(R U S) ⟹ B(R U S) ‒ : T(R ‒ S) ⟹ B(R ‒ S) : T(R S) ⟹ B(R S) C : T(R CS) ⟹ B(R CS) B(W) = T(W)/#tuples-per-block

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) ⟹ B(δ(R)) γ : T(γ(R)) ⟹ B(γ(R)) σA=c: T(σA=c(R)) ⟹ B(σA=c(R)) σA<c: T(σA<c(R)) ⟹ B(σA<c(R)) ∩ : T(R ∩ S) ⟹ B(R ∩ S) U : T(R U S) ⟹ B(R U S) ‒ : T(R ‒ S) ⟹ B(R ‒ S) : T(R S) ⟹ B(R S) C : T(R CS) ⟹ B(R CS)

Estimating size parameters (T,B,V) Similar to that for the parameter T

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : V(δ(R),A) = V(R,A) γ : V(γ(R),A) = V((R,A) (A is a grouping attribute) σA=c: V(σA=c(R),B) = V(R,B), V(σA=c(R),A) = 1 σA<c: V(σA<c(R),B) = V(R,B), V(σA<c(R),A) = V(R,A)/3 ∩ : V(R∩S,A) = V(S,A)/2 (assume V(R,A) ≥ V(S,A)) U : V(RUS,A) = V(R,A)+V(S,A)/2 (assume V(R,A) ≥ V(S,A)) ‒ : V(R‒S,A) = V(R,A) ‒ max{R(R,A)/2,T(S,A)/2} : V(R S,A) = min{V(R,A),V(S,A)} (A is a shared attribute) V(R S,A) = max{V(R,A),V(S,A)} (A is non-shared) C: V(R CS,A) = V(σC(R×S),A) Similar to that for the parameter T

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : V(δ(R),A) = V(R,A) γ : V(γ(R),A) = V((R,A) (A is a grouping attribute) σA=c: V(σA=c(R),B) = V(R,B), V(σA=c(R),A) = 1 σA<c: V(σA<c(R),B) = V(R,B), V(σA<c(R),A) = V(R,A)/3 ∩ : V(R∩S,A) = V(S,A)/2 (assume V(R,A) ≥ V(S,A)) U : V(RUS,A) = V(R,A)+V(S,A)/2 (assume V(R,A) ≥ V(S,A)) ‒ : V(R‒S,A) = V(R,A) ‒ max{R(R,A)/2,T(S,A)/2} : V(R S,A) = min{V(R,A),V(S,A)} (A is a shared attribute) V(R S,A) = max{V(R,A),V(S,A)} (A is non-shared) C: V(R CS,A) = V(σC(R×S),A) Containment Law: if V(R,A) > V(S,A), then all A-values in S are in R

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : V(δ(R),A) = V(R,A) γ : V(γ(R),A) = V((R,A) (A is a grouping attribute) σA=c: V(σA=c(R),B) = V(R,B), V(σA=c(R),A) = 1 σA<c: V(σA<c(R),B) = V(R,B), V(σA<c(R),A) = V(R,A)/3 ∩ : V(R∩S,A) = V(S,A)/2 (assume V(R,A) ≥ V(S,A)) U : V(RUS,A) = V(R,A)+V(S,A)/2 (assume V(R,A) ≥ V(S,A)) ‒ : V(R‒S,A) = V(R,A) ‒ max{R(R,A)/2,T(S,A)/2} : V(R S,A) = min{V(R,A),V(S,A)} (A is a shared attribute) V(R S,A) = max{V(R,A),V(S,A)} (A is non-shared) C: V(R CS,A) = V(σC(R×S),A) Preservation Law: if attribute A is not involved in the operation, then the # of A-values is unchanged.

Estimating size parameters (T,B,V) π, τ,×: size parameters can be calculated precisely δ : V(δ(R),A) = V(R,A) γ : V(γ(R),A) = V((R,A) (A is a grouping attribute) σA=c: V(σA=c(R),B) = V(R,B), V(σA=c(R),A) = 1 σA<c: V(σA<c(R),B) = V(R,B), V(σA<c(R),A) = V(R,A)/3 ∩ : V(R∩S,A) = V(S,A)/2 (assume V(R,A) ≥ V(S,A)) U : V(RUS,A) = V(R,A)+V(S,A)/2 (assume V(R,A) ≥ V(S,A)) ‒ : V(R‒S,A) = V(R,A) ‒ max{R(R,A)/2,T(S,A)/2} : V(R S,A) = min{V(R,A),V(S,A)} (A is a shared attribute) V(R S,A) = max{V(R,A),V(S,A)} (A is non-shared) C: V(R CS,A) = V(σC(R×S),A) The formulas for set/bag operations may depend on applications.

Improving logic plan via relation size Major Steps: Collect size parameters for stored relations: T(R), B(R), V(R,A) (the # of different values on attribute A) Set up estimation rules for size parameters on relational algebraic operators; Using logic laws to convert a logic query into the one that minimizes the (estimated) sizes of intermediate relations.

Improving logic plan via relation size Major Steps: Collect size parameters for stored relations: T(R), B(R), V(R,A) (the # of different values on attribute A) Set up estimation rules for size parameters on relational algebraic operators; Using logic laws to convert a logic query into the one that minimizes the (estimated) sizes of intermediate relations.

Improving logic plan via relation size R(a,b): T(R) = 5000, V(R,a) = 50, V(R,b) = 60, S(b,c): T(S) = 2000, V(S,b) = 200, V(S,c) = 100 R S δ σa=10 T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 R S δ T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 σa=10

Improving logic plan via relation size R(a,b): T(R) = 5000, V(R,a) = 50, V(R,b) = 60, S(b,c): T(S) = 2000, V(S,b) = 200, V(S,c) = 100 R S δ σa=10 T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 R S δ T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 σa=10 T(σA=c(R)) = T(R)/V(R,A) T(*)=100 V(*,a)=1 V(*,b)=60

Improving logic plan via relation size R(a,b): T(R) = 5000, V(R,a) = 50, V(R,b) = 60, S(b,c): T(S) = 2000, V(S,b) = 200, V(S,c) = 100 R S δ σa=10 T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 R S δ T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 σa=10 T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} T(*)=1000 T(σA=c(R)) = T(R)/V(R,A) T(*)=100 V(*,a)=1 V(*,b)=60

Improving logic plan via relation size R(a,b): T(R) = 5000, V(R,a) = 50, V(R,b) = 60, S(b,c): T(S) = 2000, V(S,b) = 200, V(S,c) = 100 R S δ σa=10 T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 R S δ T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 σa=10 T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} T(*)=1000 T(σA=c(R)) = T(R)/V(R,A) T(*)=100 V(*,a)=1 V(*,b)=60 T=100 V(R,a)=1 V(R,b)=60

Improving logic plan via relation size R(a,b): T(R) = 5000, V(R,a) = 50, V(R,b) = 60, S(b,c): T(S) = 2000, V(S,b) = 200, V(S,c) = 100 T(δ(S)) = min{ T(S)/2, ΠAV(R,A) } R S δ σa=10 T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 R S δ T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 σa=10 T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} T(*)=50 T(*)=1000 T(*)=1000 T(σA=c(R)) = T(R)/V(R,A) T(*)=100 V(*,a)=1 V(*,b)=60 T=100 V(R,a)=1 V(R,b)=60

Improving logic plan via relation size R(a,b): T(R) = 5000, V(R,a) = 50, V(R,b) = 60, S(b,c): T(S) = 2000, V(S,b) = 200, V(S,c) = 100 T(δ(S)) = min{ T(S)/2, ΠAV(R,A) } R S δ σa=10 T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 R S δ T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 σa=10 T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} T(*)=50 T(*)=1000 T(*)=1000 T(σA=c(R)) = T(R)/V(R,A) T(*)=100 V(*,a)=1 V(*,b)=60 T=100 V(R,a)=1 V(R,b)=60 Cost = 1100 Cost = 1150

Improving logic plan via relation size R(a,b): T(R) = 5000, V(R,a) = 50, V(R,b) = 60, S(b,c): T(S) = 2000, V(S,b) = 200, V(S,c) = 100 T(δ(S)) = min{ T(S)/2, ΠAV(R,A) } R S δ σa=10 T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 R S δ T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 σa=10 T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} T(*)=50 T(*)=1000 T(*)=1000 T(σA=c(R)) = T(R)/V(R,A) T(*)=100 V(*,a)=1 V(*,b)=60 T=100 V(R,a)=1 V(R,b)=60 √ Cost = 1100 Cost = 1150

Improving logic plan via relation size R(a,b): T(R) = 5000, V(R,a) = 50, V(R,b) = 60, S(b,c): T(S) = 2000, V(S,b) = 200, V(S,c) = 100 T(δ(S)) = min{ T(S)/2, ΠAV(R,A) } R S δ σa=10 T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 R S δ T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 σa=10 T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} T(*)=50 T(*)=1000 T(*)=1000 T(σA=c(R)) = T(R)/V(R,A) T(*)=100 V(*,a)=1 V(*,b)=60 T=100 V(R,a)=1 V(R,b)=60 To be more precise, we may also need to consider the #blocks √ Cost = 1100 Cost = 1150