Adaptive Sampling  Based on a hot-list algorithm by Gibbons and Matias (SIGMOD 1998)  Sample elements from the input set Frequently occurring elements.

Slides:



Advertisements
Similar presentations
The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.
Advertisements

Sampling From a Moving Window Over Streaming Data Brian Babcock * Mayur Datar Rajeev Motwani * Speaker Stanford University.
Randomized Algorithms Introduction Rom Aschner & Michal Shemesh.
Author: Chengchen, Bin Liu Publisher: International Conference on Computational Science and Engineering Presenter: Yun-Yan Chang Date: 2012/04/18 1.
Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.
Counting Distinct Objects over Sliding Windows Presented by: Muhammad Aamir Cheema Joint work with Wenjie Zhang, Ying Zhang and Xuemin Lin University of.
3/13/2012Data Streams: Lecture 161 CS 410/510 Data Streams Lecture 16: Data-Stream Sampling: Basic Techniques and Results Kristin Tufte, David Maier.
Fast Incremental Maintenance of Approximate histograms : Phillip B. Gibbons (Intel Research Pittsburgh) Yossi Matias (Tel Aviv University) Viswanath Poosala.
Mining Data Streams.
Randomized Algorithms Randomized Algorithms CS648 Lecture 9 Random Sampling part-I (Approximating a parameter) Lecture 9 Random Sampling part-I (Approximating.
Brian Babcock Surajit Chaudhuri Gautam Das at the 2003 ACM SIGMOD International Conference By Shashank Kamble Gnanoba.
Harikrishnan Karunakaran Sulabha Balan CSE  Introduction  Icicles  Icicle Maintenance  Icicle-Based Estimators  Quality & Performance  Conclusion.
Modeling and Analysis of Random Walk Search Algorithms in P2P Networks Nabhendra Bisnik, Alhussein Abouzeid ECSE, Rensselaer Polytechnic Institute.
1 CS 361 Lecture 5 Approximate Quantiles and Histograms 9 Oct 2002 Gurmeet Singh Manku
Randomized Algorithms Randomized Algorithms CS648 Lecture 20 Probabilistic Method (part 1) Lecture 20 Probabilistic Method (part 1) 1.
1 Maintaining Bernoulli Samples Over Evolving Multisets Rainer Gemulla Wolfgang Lehner Technische Universität Dresden Peter J. Haas IBM Almaden Research.
New Sampling-Based Summary Statistics for Improving Approximate Query Answers P. B. Gibbons and Y. Matias (ACM SIGMOD 1998) Rongfang Li Feb 2007.
1 Def: Let and be random variables of the discrete type with the joint p.m.f. on the space S. (1) is called the mean of (2) is called the variance of (3)
Streaming Algorithms for Robust, Real- Time Detection of DDoS Attacks S. Ganguly, M. Garofalakis, R. Rastogi, K. Sabnani Krishan Sabnani Bell Labs Research.
1 Software Testing and Quality Assurance Lecture 36 – Software Quality Assurance.
Algorithms for massive data sets Lecture 2 (Mar 14, 2004) Yossi Matias & Ely Porat (partially based on various presentations & notes)
Informed Content Delivery Across Adaptive Overlay Networks J. Byers, J. Considine, M. Mitzenmacher and S. Rost Presented by Ananth Rajagopala-Rao.
Adaptive Ordering of Pipelined Stream Filters S. Babu, R. Motwani, K. Munagala, I. Nishizawa, and J. Widom In Proc. of SIGMOD 2004, June 2004.
1 Algorithms for massive data sets Lecture 3 (March 2, 2003) Synopses, Samples & Sketches.
How should a computer shuffle?. Intro - 2 Comp 122, Goal  Input: Given n items to shuffle (cards, …)  Output: Return some list of exactly those n items;
Mining Long Sequential Patterns in a Noisy Environment Jiong Yang, Wei Wang, Philip S. Yu, Jiawei Han SIGMOD 2002.
Simpath: An Efficient Algorithm for Influence Maximization under Linear Threshold Model Amit Goyal Wei Lu Laks V. S. Lakshmanan University of British Columbia.
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets Rainer Gemulla (University of Technology Dresden) Wolfgang Lehner (University.
Hashed Samples Selectivity Estimators for Set Similarity Selection Queries.
Introduction to Summary Statistics
Sampling and sampling distibutions. Sampling from a finite and an infinite population Simple random sample (finite population) – Population size N, sample.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
Sampling in Space Restricted Settings Anup Bhattacharya IIT Delhi Joint work with Davis Issac (MPI), Ragesh Jaiswal (IITD) and Amit Kumar (IITD)
Adaptive Query Processing in Data Stream Systems Paper written by Shivnath Babu Kamesh Munagala, Rajeev Motwani, Jennifer Widom stanfordstreamdatamanager.
New Sampling-Based Summary Statistics for Improving Approximate Query Answers Yinghui Wang
Probabilistic Analysis and Randomized Algorithm. Average-Case Analysis  In practice, many algorithms perform better than their worse case  The average.
CHAN Siu Lung, Daniel CHAN Wai Kin, Ken CHOW Chin Hung, Victor KOON Ping Yin, Bob CURE: Efficient Clustering Algorithm for Large Databases for Large Databases.
Chapter 10 Sampling: Theories, Designs and Plans.
CURE: EFFICIENT CLUSTERING ALGORITHM FOR LARGE DATASETS VULAVALA VAMSHI PRIYA.
Summarizing Risk Analysis Results To quantify the risk of an output variable, 3 properties must be estimated: A measure of central tendency (e.g. µ ) A.
8-3: Probability and Probability Distributions English Casbarro Unit 8.
LIS 570 Selecting a Sample.
Adaptive Ordering of Pipelined Stream Filters Babu, Motwani, Munagala, Nishizawa, and Widom SIGMOD 2004 Jun 13-18, 2004 presented by Joshua Lee Mingzhu.
Histograms for Selectivity Estimation, Part II Speaker: Ho Wai Shing Global Optimization of Histograms.
Virtual-Channel Flow Control William J. Dally
Fast Parallel Algorithms for Edge-Switching to Achieve a Target Visit Rate in Heterogeneous Graphs Maleq Khan September 9, 2014 Joint work with: Hasanuzzaman.
Rainer Gemulla, Wolfgang Lehner and Peter J. Haas VLDB 2006 A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets 2008/8/27 1.
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
New Algorithms for Heavy Hitters in Data Streams David Woodruff IBM Almaden Joint works with Arnab Bhattacharyya, Vladimir Braverman, Stephen R. Chestnut,
ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti, Mong Li Lee, and Raghu Ramakrishnan Shruti P. Gopinath CSE 6339.
Jim Fawcett CSE 691 – Software Modeling and Analysis Fall 2000
Mining Data Streams (Part 1)
CSCI5570 Large Scale Data Processing Systems
Finding Frequent Items in Data Streams
Streaming & sampling.
Online Subpath Profiling
ICICLES: Self-tuning Samples for Approximate Query Answering
A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.
الأستاذ المساعد بقسم المناهج وطرق التدريس
Smita Vijayakumar Qian Zhu Gagan Agrawal
SWBAT: Review sampling distributions of sample proportions and means
Chapter 5: Probabilistic Analysis and Randomized Algorithms
Introduction to Stream Computing and Reservoir Sampling
Kabra and DeWitt presented by Zack Ives CSE 590DB, May 11, 1998
Approximation and Load Shedding Sampling Methods
C.2.10 Sample Questions.
C.2.8 Sample Questions.
Chapter 5: Probabilistic Analysis and Randomized Algorithms
C.2.8 Sample Questions.
Presentation transcript:

Adaptive Sampling  Based on a hot-list algorithm by Gibbons and Matias (SIGMOD 1998)  Sample elements from the input set Frequently occurring elements will be sampled more often Sampling probability determined at runtime, according to the allowed memory usage  Tradeoff between overhead and accuracy  Give an estimate of the sample’s accuracy

Concise Samples  Uniform random sampling  Maintain an pair for each element  The sample size can be much larger than the memory size  For skewed input sets the gain is much larger  Sampling is not applied at every block Vitter’s reservoir sampling

Concise Samples