Adaptive Ordering of Pipelined Stream Filters Babu, Motwani, Munagala, Nishizawa, and Widom SIGMOD 2004 Jun 13-18, 2004 presented by Joshua Lee Mingzhu.

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

CS4432: Database Systems II
ARGUS: Rete + DBMS = Efficient Persistent Profile Matching on Large-Volume Data Streams Chun Jin Language Technologies Institute School of Computer Science.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Efficient Constraint Monitoring Using Adaptive Thresholds Srinivas Kashyap, IBM T. J. Watson Research Center Jeyashankar Ramamirtham, Netcore Solutions.
Fast Bayesian Matching Pursuit Presenter: Changchun Zhang ECE / CMR Tennessee Technological University November 12, 2010 Reading Group (Authors: Philip.
Adaptive Monitoring of Bursty Data Streams Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani.
Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams Presented by: Andy Mason and Sheng Zhong Ahmed M.Ayad and Jeffrey.
IBM Software Group ® Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
ESTIMATION AND HYPOTHESIS TESTING
Quicksort Many of the slides are from Prof. Plaisted’s resources at University of North Carolina at Chapel Hill.
SIGMOD 2006University of Alberta1 Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters Presented by Fan Deng Joint work with.
IntroductionAQP FamiliesComparisonNew IdeasConclusions Adaptive Query Processing in the Looking Glass Shivnath Babu (Stanford Univ.) Pedro Bizarro (Univ.
Adaptive Sampling for Sensor Networks Ankur Jain ٭ and Edward Y. Chang University of California, Santa Barbara DMSN 2004.
Database Systems: A Practical Approach to Design, Implementation and Management International Computer Science S. Carolyn Begg, Thomas Connolly Lecture.
Adaptive Ordering of Pipelined Stream Filters S. Babu, R. Motwani, K. Munagala, I. Nishizawa, and J. Widom In Proc. of SIGMOD 2004, June 2004.
Evaluating Window Joins Over Unbounded Streams By Nishant Mehta and Abhishek Kumar.
Operator Placement for In-Network Stream Query Processing.
Exploiting Correlated Attributes in Acquisitional Query Processing Amol Deshpande University of Maryland Joint work with Carlos Sam
Chain: Operator Scheduling for Memory Minimization in Data Stream Systems Authors: Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani (Dept.
An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.
Chapter 17 Methodology – Physical Database Design for Relational Databases Transparencies © Pearson Education Limited 1995, 2005.
Query Processing Presented by Aung S. Win.
Data Selection In Ad-Hoc Wireless Sensor Networks Olawoye Oyeyele 11/24/2003.
Distributed Quality-of-Service Routing of Best Constrained Shortest Paths. Abdelhamid MELLOUK, Said HOCEINI, Farid BAGUENINE, Mustapha CHEURFA Computers.
Lecture 9 Methodology – Physical Database Design for Relational Databases.
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
Chapter 16 Methodology – Physical Database Design for Relational Databases.
Query Optimization Arash Izadpanah. Introduction: What is Query Optimization? Query optimization is the process of selecting the most efficient query-evaluation.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
Join Synopses for Approximate Query Answering Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy.
Researchers: Preet Bola Mike Earnest Kevin Varela-O’Hara Han Zou Advisor: Walter Rusin Data Storage Networks.
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
10/10/2012ISC239 Isabelle Bichindaritz1 Physical Database Design.
Energy-Efficient Monitoring of Extreme Values in Sensor Networks Loo, Kin Kong 10 May, 2007.
Adaptive Query Processing in Data Stream Systems Paper written by Shivnath Babu Kamesh Munagala, Rajeev Motwani, Jennifer Widom stanfordstreamdatamanager.
Weikang Qian. Outline Intersection Pattern and the Problem Motivation Solution 2.
Methodology – Physical Database Design for Relational Databases.
CS6321 Query Optimization Over Web Services Utkarsh Kamesh Jennifer Rajeev Shrivastava Munagala Wisdom Motwani Presented By Ajay Kumar Sarda.
BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data ACM EuroSys 2013 (Best Paper Award)
Eddies: Continuously Adaptive Query Processing Ross Rosemark.
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.
Adaptivity in continuous query systems Luis A. Sotomayor & Zhiguo Xu Professor Carlo Zaniolo CS240B - Spring 2003.
1 Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter Tucker This work.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Query Optimization for Stream Databases Presented by: Guillermo Cabrera Fall 2008.
Written By: Presented By: Swarup Acharya,Amr Elkhatib Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy Join Synopses for Approximate Query Answering.
Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.
ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti, Mong Li Lee, and Raghu Ramakrishnan Shruti P. Gopinath CSE 6339.
SDN Network Updates Minimum updates within a single switch
Practical Database Design and Tuning
Corelite Architecture: Achieving Rated Weight Fairness
QianZhu, Liang Chen and Gagan Agrawal
A paper on Join Synopses for Approximate Query Answering
Methodology – Physical Database Design for Relational Databases
Chapter 12: Query Processing
A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.
國立臺北科技大學 課程:資料庫系統 fall Chapter 18
Chapter 6 Normalization of Database Tables
Practical Database Design and Tuning
SAT-Based Optimization with Don’t-Cares Revisited
Smita Vijayakumar Qian Zhu Gagan Agrawal
Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani
Approximate Frequency Counts over Data Streams
Greedy Algorithms.
Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
DATABASE HISTOGRAMS E0 261 Jayant Haritsa
Lu Tang , Qun Huang, Patrick P. C. Lee
Presentation transcript:

Adaptive Ordering of Pipelined Stream Filters Babu, Motwani, Munagala, Nishizawa, and Widom SIGMOD 2004 Jun 13-18, 2004 presented by Joshua Lee Mingzhu Wei Some slides are adapted from the presentation slides in sigmod2004

Presentation Overview Motivation A-GREEDY Algorithm INDEPENDENT Algorithm SWEEP Algorithm LOCALSWAPS Algorithm Adaptive Ordering of Multiway Joins Over Streams Conclusion

Motivation Many queries on stream data involve processing commutative sets of filters Each filter is a stateless predicate that indicates whether a tuple should be further processed A tuple is output only if it satisfies all filters Applying filters in different orders can affect the processing time of the query The changing nature of stream data requires adaptive algorithms for filter ordering

Motivation: Example Consider the following example usage of a stream processing system: –A network traffic monitoring application built on a stream engine is used to monitor the amount of common traffic flowing through four routers, R 1, R 2, R 3, and R 4 over the last ten minutes.

Motivation: Filter Ordering If the filters are dependent, then the order of probing affects processing time For instance, it could be known that most traffic through R 1 comes from R 3 or R 4, but rarely from R 2 For incoming data on R 1, probing W 2 first would drop a tuple quickest, thus saving the time of probing W 3 and W 4

Motivation: Filter Ordering Challenges Filter selectivities may be correlated The candidate filter orderings to consider for N filters is N!, which increases quickly Attributes of stream data and arrival characteristics can change over time

Motivation: Adaptive Filter Ordering Challenges Run-time overhead incurred for continuous monitoring of statistics Convergence properties required to prove an adaptive algorithm generates an approximation to correct answer Speed of adaptivity in the face of changing data characteristics There is a three-way tradeoff: Algorithms that adapt quickly and have good convergence properties incur a lot of run-time overhead

A-GREEDY Algorithm The cost of a query plan ordering is t i is the cost of processing one tuple in filter i d(j|j-1) is the conditional probability that filter j will drop a tuple, given that no filter from 1 to j-1 dropped it

A-GREEDY Algorithm: Static Version and Greedy Invariant Static greedy algorithm 1.Choose the filter with highest unconditional drop probability d(i|0) as the first filter 2.Choose the filter with the highest conditional drop probability d(j|1) as the next filter 3.Choose the filter with highest conditional drop probability d(k|2) as the next filter 4.And so on Greedy Invariant occurs when a filter ordering satisfies the following

A-GREEDY Algorithm: Profiler Two logical components of A-Greedy: –profiler: continuously collects and maintains statistics about filter selectivities and processing costs –Reoptimizer: detects and corrects violations of the GI in the current filter ordering Challenge faced by the profiler: n2 n-1 conditional selectivities for n filters Solution: profile of recently dropped tuples

A-GREEDY Algorithm: Profiler Profile: a sliding window of profile tuples A profile tuple contains n boolean attributes b 1,…,bn corresponding to n filters Dropped tuples are sampled with some probability p If a tuple e is chosen for profiling, it will be tested by all remaining filters A new profile tuple e: bi = 1 if Fi drop e, otherwise bi=0

A-GREEDY Algorithm: Reoptimizer The reoptimizer maintains an ordering O such that O satisfies the GI How does the reoptimizer use profile to derive estimates of conditional selectivities? –It incrementally maintains a view over the profile window

A-GREEDY Algorithm: Reoptimizer Matrix: View over the profile window: n×n V[i, j]: number of tuples in the profile window that were dropped by F f(j) but not dropped by F f (1) F f (2),…, F f (i-1) V[i, j] is proportional to d (j|i-1)

A-GREEDY Algorithm: Run-time Overhead and Adaptivity Profile-tuple creation: needs additional n-I evaluations for a tuple dropped by F f(i). Profile-window maintenance: insertion and deletion of profile tuples, averages of filter processing times Matrix-view update: every update would cause access to up to n 2 /4 entries Detection and correction of GI violations Good convergence properties Fast adaptivity

Can we sacrifice some of A-Greedy’s convergence properties or adaptivity speed to reduce its run-time overhead? Three variants of A greedy are proposed Variants of A Greedy

Proceeds in stages During one stage, only checks for GI violations involving the filter at one specific position j Do not need to maintain entire matrix view: b f (1),…,b f (j) For each profiled tuple, needs to additionally evaluate F f (j) only By rotating j over 2,…,n, eventually detects and corrects all GI violations Same convergence properties as A-Greedy Reduced view and need for additional evaluations less overhead slower adaptivity: only 1 filter is profiled in each stage SWEEP Algorithm

INDEPENDENT Algorithm Assumes filters are independent we only need to maintain estimates of unconditional selectivities Lower view maintenance overhead Fast adaptivity Convergence: if assumption holds, optimal. Otherwise, can be O(n) times worse than GI orderings

LOCALSWAP Algorithm Monitors “local” violations only, i.e., violations involving adjacent filters LocalSwaps detects situations where a swap between adjacent filters Only need to maintain two diagonals of the view For each profiled tuple dropped by F f(i), only needs to additionally evaluate F f(i+1) xx xx xx x

Adaptive Ordering of Multiway Joins Over Streams MJoins maintain an ordering of {S0, S1,…, Sn-1}-{Si} for each stream Si New tuples arriving from Si is joined with other stream windows in that order Two-phase join algorithm –Drop-probing phase: the new tuple is used to probe all other windows in the specified order. If any window drops it, no further processing will be needed for it –output-generation phase: if no window drops the tuple, proceeds as conventional MJoins Drop probing resembles pipelined filters A-Greedy and its variants can be used to determine the orderings

Conclusion A-Greedy has good convergence properties and fast adaptivity, but incurs significant run-time overhead Three variants of A-Greedy are proposed, each lying at a different points along the tradeoff spectrum among convergence, runtime overhead, and speed of adaptivity