Proactive Re-Optimization Shivnath Babu, Pedo Bizarro, David DeWitt SIGMOD 2005 (presented by Steve Blundy & Oleg Rekutin)

Slides:



Advertisements
Similar presentations
Design of Experiments Lecture I
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
High School Graduation Test Review Domain: Data Analysis How is data presented, compared and used to predict future outcomes?
© Copyright 2001, Alan Marshall1 Regression Analysis Time Series Analysis.
Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications Piyush Shivam, Shivnath Babu, Jeffrey Chase Duke University.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
IntroductionAQP FamiliesComparisonNew IdeasConclusions Adaptive Query Processing in the Looking Glass Shivnath Babu (Stanford Univ.) Pedro Bizarro (Univ.
Selectivity Estimation for Optimizing Similarity Query in Multimedia Databases IDEAL 2003 Paper review.
1 8. Safe Query Languages Safe program – its semantics can be at least partially computed on any valid database input. Safety is tied to program verification,
ACM GIS An Interactive Framework for Raster Data Spatial Joins Wan Bae (Computer Science, University of Denver) Petr Vojtěchovský (Mathematics,
Dynamic Plan Migration for Continuous Query over Data Streams Yali Zhu, Elke Rundensteiner and George Heineman Database System Research Group Worcester.
Robust Query Processing through Progressive Optimization Volker Markl, Vijayshankar Raman, David Simmen, Guy Lohman, Hamid Pirahesh, Miso Cilimdzic Presented.
BCOR 1020 Business Statistics
Choosing an Order for Joins (16.6) Neha Saxena (214) Instructor: T.Y.Lin.
Hypothesis Testing Is It Significant?. Questions What is a statistical hypothesis? What is the null hypothesis? Why is it important for statistical tests?
Preparing Data for Analysis and Analyzing Spatial Data/ Geoprocessing Class 11 GISG 110.
Data Analysis: Part 3 Lesson 7.1. Data Analysis: Part 3 MM2D1. Using sample data, students will make informal inferences about population means and standard.
Lesson Logic in Constructing Confidence Intervals about a Population Mean where the Population Standard Deviation is Known.
Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)
THE STABILITY BOX IN INTERVAL DATA FOR MINIMIZING THE SUM OF WEIGHTED COMPLETION TIMES Yuri N. Sotskov Natalja G. Egorova United Institute of Informatics.
The AIE Monte Carlo Tool The AIE Monte Carlo tool is an Excel spreadsheet and a set of supporting macros. It is the main tool used in AIE analysis of a.
The AIE Monte Carlo Tool The AIE Monte Carlo tool is an Excel spreadsheet and a set of supporting macros. It is the main tool used in AIE analysis of a.
CS433 Modeling and Simulation Lecture 16 Output Analysis Large-Sample Estimation Theory Dr. Anis Koubâa 30 May 2009 Al-Imam Mohammad Ibn Saud University.
Statistical Interval for a Single Sample
90288 – Select a Sample and Make Inferences from Data The Mayor’s Claim.
Decision Making, Systems, Modeling, and Support
Analysis of algorithms Analysis of algorithms is the branch of computer science that studies the performance of algorithms, especially their run time.
Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University,
Online aggregation Joseph M. Hellerstein University of California, Berkley Peter J. Haas IBM Research Division Helen J. Wang University of California,
CHAPTER 8: Producing Data Sampling ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Ch 6-1 © 2004 Pearson Education, Inc. Pearson Prentice Hall, Pearson Education, Upper Saddle River, NJ Ostwald and McLaren / Cost Analysis and Estimating.
V pátek nebude přednáška. Cvičení v tomto týdnu bude.
1 CHAPTER 2 Decision Making, Systems, Modeling, and Support.
Rate-distortion Optimized Mode Selection Based on Multi-channel Realizations Markus Gärtner Davide Bertozzi Classroom Presentation 13 th March 2001.
2005MEE Software Engineering Lecture 11 – Optimisation Techniques.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 7-1 4th Lesson Estimating Population Values part 2.
Self-Managing Cost Models Shivnath Babu Stanford University.
Robust Query Processing through Progressive Optimization SIGMOD 2004 Volker Markl, Vijayshankar Raman, David Simmen, Guy Lohman, Hamid Pirahesh, Miso Cilimdzic.
Eddies: Continuously Adaptive Query Processing Ross Rosemark.
R ISK A NALYSIS & M ANAGEMENT. Risk – possibility that an undesirable event (called the risk event) could happen – Involve uncertainty and loss – Events.
1 1 Slide Simulation Professor Ahmadi. 2 2 Slide Simulation Chapter Outline n Computer Simulation n Simulation Modeling n Random Variables and Pseudo-Random.
Adaptive Ordering of Pipelined Stream Filters Babu, Motwani, Munagala, Nishizawa, and Widom SIGMOD 2004 Jun 13-18, 2004 presented by Joshua Lee Mingzhu.
Javad Azimi, Ali Jalali, Xiaoli Fern Oregon State University University of Texas at Austin In NIPS 2011, Workshop in Bayesian optimization, experimental.
Smart Sleeping Policies for Wireless Sensor Networks Venu Veeravalli ECE Department & Coordinated Science Lab University of Illinois at Urbana-Champaign.
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 20/02/ :23 PM 1 Multiple comparisons What are multiple.
Discovering Optimal Training Policies: A New Experimental Paradigm Robert V. Lindsey, Michael C. Mozer Institute of Cognitive Science Department of Computer.
Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.
Closing the Query Processing Loop in Oracle 11g Allison Lee, Mohamed Zait.
Last lecture summary Five numbers summary, percentiles, mean Box plot, modified box plot Robust statistic – mean, median, trimmed mean outlier Measures.
Modelling Complex Systems Video 4: A simple example in a complex way.
Software Project Planning. Software Engineering Estimation Estimation The SPM begins with a set of activities that are collectively called Project planning.
L2 Sampling Exercise A possible solution.
On the analysis of indexing schemes
Hypothesis Testing Is It Significant?.
R SE to the challenges of ntelligent systems
Proactive Re-optimization
CHAPTER 9 Testing a Claim
Robust Query Processing through Progressive Optimization
Range Cost Estimates Estimating Accuracy Trumpet
Hypothesis Testing Is It Significant?.
Power-Aware Databases
Recipe for any Hypothesis Test
The Practice of Statistics
Query Processing CSD305 Advanced Databases.
DATABASE HISTOGRAMS E0 261 Jayant Haritsa
(-4)*(-7)= Agenda Bell Ringer Bell Ringer
Review 1+3= 4 7+3= = 5 7+4= = = 6 7+6= = = 7+7+7=
Sampling Distributions
Robust Query Processing through Progressive Optimization
Business Statistics For Contemporary Decision Making 9th Edition
Presentation transcript:

Proactive Re-Optimization Shivnath Babu, Pedo Bizarro, David DeWitt SIGMOD 2005 (presented by Steve Blundy & Oleg Rekutin)

Overview What’s wrong with reactive? Proactive via 3 core techniques Experiments

Reactive Re-optimization select from R, S where R.a=S.a and R.b>K 1 and R.c>K 2 σ buffer σ(R) actual σ(R) estimated A: B: ! !

Single-Point Limitation A: B:

Limited Information for Re-opt select from R, S, T where R.a=S.a and S.b=T.b and R.c>K 1 and R.d=K 2 σ(R) act σ(R) est ! ! !

Choosing a plan 1. Compute bounding boxes 2. Use them to generate robust plans and switchable plans 3. Use randomization to collect statistics

Bounding Boxes “Representing Uncertainty in Statistics” Are the upper and lower bounds for each estimated statistic

Bounding Boxes

Optimal Plan 1 Plan is optimal for all 3 points Choice is easy

Robust Plan 1 plan is, or close to, optimal for all 3 points 1 plan can be safely chosen

Switchable Plan There is a plan with close to optimal cost plan at each point Additional Requirements  The decision can be deferred  Actual statistics lie must within bounding box  It is possible to switch between the plans

What is a “Switchable” Plan “Any two members of a switchable plan are said to be switchable with each other.”

Collecting statistics 1. Each operator collects some % in buffer 2. The eos(f) is emitted & statistics are calculated 3. Plan is chosen from switch plan members or re-optimization is run 4. Query processing proceeds

Questions Prevalence of switchable plans vs. case 4 How good is Rho at preventing re- optimizations How is Rho affected by large # estimates

Experiments Traditional Optimizer (TRAD) Validity-Ranges Optimizer (VRO)

2-Way Join Queries: Robust σ(A) est

2-Way Join Queries: Switchable σ(A) est σ(A) b. box

3-Way Join Example Shows the use of a Switchable Plan Some re-optimization still necessary

Pt|σ1(A)|TRADVRORioOpt A6 MBP17aInside range, P17aOutside box, re-optimize, P17aP17a B80 MBP17aInside range, P17aInside box, P17aP17a C160 MBP17aOutside range, re- optimize, P17d Inside box, P17dP17b D310 MBP17aOutside range, re- optimize, P17d Outside box, re-optimize, P17bP17b

Correlation-Based Mistakes

Query Complexity

Conclusion Rho refines statistics and uses switchable plans to forestall re-optimizations and prevent partial data loss Questions?