Proactive Re-Optimization Shivnath Babu, Pedo Bizarro, David DeWitt SIGMOD 2005 (presented by Steve Blundy & Oleg Rekutin)
Overview What’s wrong with reactive? Proactive via 3 core techniques Experiments
Reactive Re-optimization select from R, S where R.a=S.a and R.b>K 1 and R.c>K 2 σ buffer σ(R) actual σ(R) estimated A: B: ! !
Single-Point Limitation A: B:
Limited Information for Re-opt select from R, S, T where R.a=S.a and S.b=T.b and R.c>K 1 and R.d=K 2 σ(R) act σ(R) est ! ! !
Choosing a plan 1. Compute bounding boxes 2. Use them to generate robust plans and switchable plans 3. Use randomization to collect statistics
Bounding Boxes “Representing Uncertainty in Statistics” Are the upper and lower bounds for each estimated statistic
Bounding Boxes
Optimal Plan 1 Plan is optimal for all 3 points Choice is easy
Robust Plan 1 plan is, or close to, optimal for all 3 points 1 plan can be safely chosen
Switchable Plan There is a plan with close to optimal cost plan at each point Additional Requirements The decision can be deferred Actual statistics lie must within bounding box It is possible to switch between the plans
What is a “Switchable” Plan “Any two members of a switchable plan are said to be switchable with each other.”
Collecting statistics 1. Each operator collects some % in buffer 2. The eos(f) is emitted & statistics are calculated 3. Plan is chosen from switch plan members or re-optimization is run 4. Query processing proceeds
Questions Prevalence of switchable plans vs. case 4 How good is Rho at preventing re- optimizations How is Rho affected by large # estimates
Experiments Traditional Optimizer (TRAD) Validity-Ranges Optimizer (VRO)
2-Way Join Queries: Robust σ(A) est
2-Way Join Queries: Switchable σ(A) est σ(A) b. box
3-Way Join Example Shows the use of a Switchable Plan Some re-optimization still necessary
Pt|σ1(A)|TRADVRORioOpt A6 MBP17aInside range, P17aOutside box, re-optimize, P17aP17a B80 MBP17aInside range, P17aInside box, P17aP17a C160 MBP17aOutside range, re- optimize, P17d Inside box, P17dP17b D310 MBP17aOutside range, re- optimize, P17d Outside box, re-optimize, P17bP17b
Correlation-Based Mistakes
Query Complexity
Conclusion Rho refines statistics and uses switchable plans to forestall re-optimizations and prevent partial data loss Questions?