CONTROL group Joe Hellerstein, Ron Avnur, Christian Hidber, Bruce Lo, Chris Olston, Vijayshankar Raman, Tali Roth, Kirk Wylie, UC Berkeley CONTROL: Continuous.

Slides:



Advertisements
Similar presentations
A Non-Blocking Join Achieving Higher Early Result Rate with Statistical Guarantees Shimin Chen* Phillip B. Gibbons* Suman Nath + *Intel Labs Pittsburgh.
Advertisements

Copyright © 2004 Pearson Education, Inc.. Chapter 15 Algorithms for Query Processing and Optimization.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Sampling: Final and Initial Sample Size Determination
Online Aggregation Joe Hellerstein UC Berkeley Online Aggregation: Motivation Select AVG(grade) from ENROLL; A “fancy” interface: + Query Results AVG.
Online Aggregation Liu Long Aggregation Operations related to aggregating data in DBMS –AVG –SUM –COUNT.
Eddies: Continuously Adaptive Query Processing Ron Avnur Joseph M. Hellerstein UC Berkeley.
Joseph M. Hellerstein Peter J. Haas Helen J. Wang
CONTROL Overview CONTROL group Joe Hellerstein, Ron Avnur, Christian Hidber, Bruce Lo, Chris Olston, Vijayshankar Raman, Tali Roth, Kirk Wylie, UC Berkeley.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005
Estimation Procedures Point Estimation Confidence Interval Estimation.
ACM GIS An Interactive Framework for Raster Data Spatial Joins Wan Bae (Computer Science, University of Denver) Petr Vojtěchovský (Mathematics,
Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.
Parameter Estimation Chapter 8 Homework: 1-7, 9, 10.
VI Q ING V isual I nteractive Q ueryING Chris Olston UC Berkeley 14th IEEE Symposium on Visual Languages Halifax, Nova Scotia, Canada September 1st - 4th,
Physical Database Monitoring and Tuning the Operational System.
An Incremental Refining Spatial Join Algorithm for Estimating Query Results in GIS Wan D. Bae, Shayma Alkobaisi, Scott T. Leutenegger Department of Computer.
Interactive Query Processing in Scientific Applications David Liu UC Berkeley Computer Science Division.
T T07-01 Sample Size Effect – Normal Distribution Purpose Allows the analyst to analyze the effect that sample size has on a sampling distribution.
Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley.
Eddies: Continuously Adaptive Query Processing Based on a SIGMOD’2002 paper and talk by Avnur and Hellerstein.
A Crystal Ball for Data-Intensive Processing CONTROL group Joe Hellerstein, Ron Avnur, Christian Hidber, Bruce Lo, Chris Olston, Vijayshankar Raman, Tali.
Quantify prediction uncertainty (Book, p ) Prediction standard deviations (Book, p. 180): A measure of prediction uncertainty Calculated by translating.
CONGRESSIONAL SAMPLES FOR APPROXIMATE ANSWERING OF GROUP-BY QUERIES Swarup Acharya Phillip Gibbons Viswanath Poosala ( Information Sciences Research Center,
CS 345: Topics in Data Warehousing Tuesday, October 19, 2004.
Ripple Joins for Online Aggregation by Peter J. Haas and Joseph M. Hellerstein published in June 1999 presented by Ronda Hilton.
Approximate Encoding for Direct Access and Query Processing over Compressed Bitmaps Tan Apaydin – The Ohio State University Guadalupe Canahuate – The Ohio.
Page 1 Online Aggregation for Large MapReduce Jobs Niketan Pansare, Vinayak Borkar, Chris Jermaine, Tyson Condie VLDB 2011 IDS Fall Seminar
Online aggregation Joseph M. Hellerstein University of California, Berkley Peter J. Haas IBM Research Division Helen J. Wang University of California,
Ripple Joins for Online Aggregation by Peter J. Haas and Joseph M. Hellerstein published in June 1999 presented by Nag Prajval B.C.
DAQ: A New Paradigm for Approximate Query Processing Navneet Potti Jignesh Patel VLDB 2015.
Joseph M. Hellerstein Peter J. Haas Helen J. Wang Presented by: Calvin R Noronha ( ) Deepak Anand ( ) By:
© IBM Corporation 2005 Informix User Forum 2005 John F. Miller III Explaining SQLEXPLAIN ®
Graphs We often use graphs to show how two variables are related. All these examples come straight from your book.
Trust Me, I’m Partially Right: Incremental Visualization Lets Analysts Explore Large Datasets Faster Shengliang Dai.
Mystery 1Mystery 2Mystery 3.
Presented By Anirban Maiti Chandrashekar Vijayarenu
Output Analysis for Simulation
Evaluating Window Joins over Unbounded Streams Jaewoo Kang Jeffrey F. Naughton Stratis D. Viglas {jaewoo, naughton, Univ. of Wisconsin-Madison.
Write a function rule for a graph EXAMPLE 3 Write a rule for the function represented by the graph. Identify the domain and the range of the function.
Implementing Learning Science Research in the Design of the Online Statistics Classroom Joint Statistical Meetings – 2015 Camille Fairbourn Utah State.
By: Peter J. Haas and Joseph M. Hellerstein published in June 1999 : Presented By: Sthuti Kripanidhi 9/28/20101 CSE Data Exploration.
University of Texas at Arlington Presented By Srikanth Vadada Fall CSE rd Sep 2010 Dynamic Sample Selection for Approximate Query Processing.
Written By: Presented By: Swarup Acharya,Amr Elkhatib Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy Join Synopses for Approximate Query Answering.
Random Sampling in Database Systems: Techniques and Applications Ke Yi Hong Kong University of Science and Technology Big Data.
1 VLDB, Background What is important for the user.
 Normal Curves  The family of normal curves  The rule of  The Central Limit Theorem  Confidence Intervals  Around a Mean  Around a Proportion.
Statistics and Probability 9 What is the most... What is the least... How many... More people like... than.
Chapter 6 Confidence Intervals.
Chapter 6 Confidence Intervals.
Wander Join: Online Aggregation via Random Walks
Zhu Han University of Houston Thanks for Professor Dan Wang’s slides
A paper on Join Synopses for Approximate Query Answering
Proactive Re-optimization
Ripple Joins for Online Aggregation
Understanding Indexes in KB_SQL March 2001
Potter’s Wheel: An Interactive Data Cleaning System
Drum: A Rhythmic Approach to Interactive Analytics on Large Data
Approximate the area of the shaded region under the graph of the given function by using the indicated rectangles. (The rectangles have equal width.) {image}
Visualization of query processing over large-scale road networks
Chapter 15 QUERY EXECUTION.
Spatial Online Sampling and Aggregation
dbTouch: Analytics at your Fingertips
AQUA: Approximate Query Answering
Database Management Systems (CS 564)
Section 7.7 Introduction to Inference
Chapter 6 Confidence Intervals.
Pima Medical Institute Online Education
Pima Medical Institute Online Education
Pima Medical Institute Online Education
Presentation transcript:

CONTROL group Joe Hellerstein, Ron Avnur, Christian Hidber, Bruce Lo, Chris Olston, Vijayshankar Raman, Tali Roth, Kirk Wylie, UC Berkeley CONTROL: Continuous Output and Navigation Technology with Refinement On-Line

Batch vs. On-Line Processing Batch Processing –Gives 100% accurate answers, but users must wait for entire query to finish... On-Line Processing –Gives progressively refining answers as the query runs! –Allow users to control processing. Applications of On-Line Processing –Large, ad-hoc queries in domains where approximate answers are acceptable (“big picture”)

Demo Outline On-Line Aggregation –Refining estimates Statistics give confidence –User Control The user can speed up the processing of certain groups The user can stop the processing at any time On-Line Visualization –Displays an approximation of an image based on data while the data is being fetched Shows the estimated density and distribution of data estimate

On-Line Agg.: Query Processing New Access Methods –Randomly delivered data. –Index Striding We can take advantage of B-Trees to access the groups –Heap Striding More generally, on-line permutation Non-blocking Join Algorithms –Ripple Join Family RIPL = Rectangles of Increasing Perimeter Length Join progressively larger samples of two tables

Access Methods for On-Line Agg. Heap Stride (On-Line Permutation) –Reorder tuples on the fly to get a fair sample AAABABACDCDAAA...ABCDABCDABCD... Heap FileFair Sample Output Index Stride –Round-robin through the groups to get a fair sample Works with an index on the grouping column

Progressively refining join: Ripple Join –Ever-larger rectangles in R  S –Comes in naive, block, and hash flavors Multi-Table On-Line Aggregation Traditional R S Ripple R S Benefits: sample from both relations simultaneously gives better statistical confidences much faster intimate relationship between delivery and estimation

On-Line Aggregation User Interface User Controls Graph of Estimates w/Confidence Intervals Estimates for Each Group

On-Line Visualization: CLOUDS CLOUDS displays an approximation of an image based on data while the data is being fetched Conventional Algorithm CLOUDS Algorithm CLOUDS (with Index) Note that CLOUDS predicts the high density of cities in the Midwest

Quantifying the benefit of CLOUDS CLOUDS gives a better approximate image faster than the conventional algorithm Error Conventional CLOUDS Time (seconds)