Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams Presented by: Andy Mason and Sheng Zhong Ahmed M.Ayad and Jeffrey.

Slides:



Advertisements
Similar presentations
Evaluating Window Joins over Unbounded Streams Author: Jaewoo Kang, Jeffrey F. Naughton, Stratis D. Viglas University of Wisconsin-Madison CS Dept. Presenter:
Advertisements

§12.4 Static Paging Algorithms
Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
Alex Cheung and Hans-Arno Jacobsen August, 14 th 2009 MIDDLEWARE SYSTEMS RESEARCH GROUP.
Group Recommendation: Semantics and Efficiency
Hadi Goudarzi and Massoud Pedram
A Paper on RANDOM SAMPLING OVER JOINS by SURAJIT CHAUDHARI RAJEEV MOTWANI VIVEK NARASAYYA PRESENTED BY, JEEVAN KUMAR GOGINENI SARANYA GOTTIPATI.
Lauritzen-Spiegelhalter Algorithm
Mining High-Speed Data Streams Presented by: Tyler J. Sawyer UVM Spring CS 332 Data Mining Pedro Domingos Geoff Hulten Sixth ACM SIGKDD International.
Understanding Operating Systems Fifth Edition
Maintaining Sliding Widow Skylines on Data Streams.
Managerial Decision Modeling with Spreadsheets
Linked Bernoulli Synopses Sampling Along Foreign Keys Rainer Gemulla, Philipp Rösch, Wolfgang Lehner Technische Universität Dresden Faculty of Computer.
Adaptive Monitoring of Bursty Data Streams Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani.
Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.
June 3, 2015Windows Scheduling Problems for Broadcast System 1 Amotz Bar-Noy, and Richard E. Ladner Presented by Qiaosheng Shi.
An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations Presenter: Liyan Zhang Presentation of ICS
Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.
Paper Title Your Name CMSC 838 Presentation. CMSC 838T – Presentation Motivation u Problem paper is trying to solve  Characteristics of problem  … u.
PSoup Kevin Menard CS 561 4/11/2005. Streaming Queries over Streaming Data Sirish Chandrasekaran UC Berkeley August 20, 2002 with Michael J. Franklin.
Evaluating Window Joins Over Unbounded Streams By Nishant Mehta and Abhishek Kumar.
VLDB Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute
Flow Algorithms for Two Pipelined Filtering Problems Anne Condon, University of British Columbia Amol Deshpande, University of Maryland Lisa Hellerstein,
Chain: Operator Scheduling for Memory Minimization in Data Stream Systems Authors: Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani (Dept.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.
1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Query Processing Presented by Aung S. Win.
Tutorial 10: Performing What-If Analyses
Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.
Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web Mohamed A. Sharaf Alexandros Labrinidis Panos K. Chrysanthis Kirk Pruhs Advanced Data.
Network Aware Resource Allocation in Distributed Clouds.
CS540/TE630 Computer Network Architecture Spring 2009 Tu/Th 10:30am-Noon Sue Moon.
Database Management 9. course. Execution of queries.
Index Tuning for Adaptive Multi-Route Data Stream Systems Karen Works, Elke A. Rundensteiner, and Emmanuel Agu Database Systems Research.
Master’s Thesis (30 credits) By: Morten Lindeberg Supervisors: Vera Goebel and Jarle Søberg Design, Implementation, and Evaluation of Network Monitoring.
Multiple Aggregations Over Data Streams Rui ZhangNational Univ. of Singapore Nick KoudasUniv. of Toronto Beng Chin OoiNational Univ. of Singapore Divesh.
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering.
2005MEE Software Engineering Lecture 11 – Optimisation Techniques.
Memory Management during Run Generation in External Sorting – Larson & Graefe.
Streaming Queries over Streaming Data Sirish Chandrasekaran (UC Berkeley) Michael J. Franklin (UC Berkeley) Presented by Andy Williamson.
Load Shedding Techniques for Data Stream Systems Brian Babcock Mayur Datar Rajeev Motwani Stanford University.
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington.
Virtual Memory The memory space of a process is normally divided into blocks that are either pages or segments. Virtual memory management takes.
Buffer-pool aware Query Optimization Ravishankar Ramamurthy David DeWitt University of Wisconsin, Madison.
Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.
Eddies: Continuously Adaptive Query Processing Ross Rosemark.
SQL/Lesson 7/Slide 1 of 32 Implementing Indexes Objectives In this lesson, you will learn to: * Create a clustered index * Create a nonclustered index.
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
Content caching and scheduling in wireless networks with elastic and inelastic traffic Group-VI 09CS CS CS30020 Performance Modelling in Computer.
Evaluating Window Joins over Unbounded Streams Jaewoo Kang Jeffrey F. Naughton Stratis D. Viglas {jaewoo, naughton, Univ. of Wisconsin-Madison.
Mining of Massive Datasets Ch4. Mining Data Streams
Adaptive Ordering of Pipelined Stream Filters Babu, Motwani, Munagala, Nishizawa, and Widom SIGMOD 2004 Jun 13-18, 2004 presented by Joshua Lee Mingzhu.
1 Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter Tucker This work.
Rate-Based Query Optimization for Streaming Information Sources Stratis D. Viglas Jeffrey F. Naughton.
Event Stream Processing with Out-of-Order Data Arrival Mo Liu Database System Research Group Worcester Polytechnic Institute.
R. Brafman and M. Tennenholtz Presented by Daniel Rasmussen.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
CIS-NG CASREP Information System Next Generation Shawn Baugh Amy Ramirez Amy Lee Alex Sanin Sam Avanessians.
DATABASE OPERATORS AND SOLID STATE DRIVES Geetali Tyagi ( ) Mahima Malik ( ) Shrey Gupta ( ) Vedanshi Kataria ( )
BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.
Database Management System
The Impact of Replacement Granularity on Video Caching
Finding Near-Duplicate Web Pages: A Large-Scale Evaluation of Algorithms By Monika Henzinger Presented.
Load Shedding Techniques for Data Stream Systems
Akshay Tomar Prateek Singh Lohchubh
Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani
Feifei Li, Ching Chang, George Kollios, Azer Bestavros
Presentation transcript:

Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams Presented by: Andy Mason and Sheng Zhong Ahmed M.Ayad and Jeffrey F.Naughton Database Group University of Wisconsin Material is partially referenced from SIGMOD 2004 [1]

Overview Introduction Semantics of Sliding Window Continuous Queries Cost Model Load Shedding Optimization Framework Experiments

Introduction The intent of the paper Find a execution plan that minimizes resource usage when resources are sufficient Find an execution plan that sheds tuples when resources are insufficient. Given a continuous query in a steady state, each execution plan is similar to a Queuing Network System Arriving tuples are clients Query operators are servers Execution plan is feasible if the system is stable If the plan is infeasible, load shedding is needed

Feasible and Infeasible Query Plan < >1 Load Shedding

Assumptions The time stamps are unique (no ties) Tuples arrive in the stream in a monotonically increasing order by its time stamp (no out of order arrival) There is no relational tables involved in the query Discussion: Why will make these assumptions? Static optimization –> Rates of input streams are slow changing Enough memory to hold the buffering requirements for any query plan

Semantics Definitions Data Stream Time-based Window Tuple-based Window Selection A filter takes a stream as input and outputs a stream Join A symmetric operator that takes two input streams The cost model

Variables

Rate and Window Calculations 1 Select output rate 2 Active window size 3 output rate of window join 4 Active size of window join 5 output rate of n-ary join of n streams 6 Active window size of n-ary join

Cost Model SELECT A.a, B.b, C.c FFROMA [ROWS 10] B [ROWS 10] C [ROWS 10] WHEREA.a = B.a ANDB.b = C.b An concrete example on the application of the cost model

Cost Model Plans

Outcome after Load Shedding

Load Shedding A form of approximation which reduces load by dropping tuples from the incoming streams Methods of Load Shedding Random dropping of tuples  Presented in this paper Achieved by inserting random drop boxes at several points in the query plan Semantic dropping of tuples Goal – Maximize output rate of the approximated query Problems addressed: Optimal placement of drop boxes in an execution plan and the optimal setting of their sampling rate Choice of plan to shed load from

Selection Only Queries Initial condition A query consisting of n consecutive filters An execution plan for it that orders the filters in asc order by a designated number n+1 possible combinations Observation: Only need to drop tuples directly from the streaming source before they are processed by any of the filters Conclusion: The plan with the lowest cost yields the highest rate

Join Queries Only consider tuple-based windows Shedding Load From a Specific Plan Choice of Plan for Load Shedding

Shedding Load from a Specific Plan Where do we put the drop boxes? Query plan joining n streams Binary joins Drop box can be put before each of the two inputs to the n - 1 join operators Plus a box right after the last join is performed 2n - 1 possible locations Obs: Sufficient to drop tuples from the input sources before they are processed by any join operator

Choice of Load Shedding Plan Intuition for Selection queries Pick plan with lowest resource utilization Join queries Plan with lowest resource utilization? This intuition does not always work Why?

Load Shedding Plan Example Plans shed load in the order of their average utilization Switch-over occurs ~ 4.5 milliseconds (plan b=best)

Observations from Example The plan with the lowest utilization is not always the best choice for shedding load When the join cost is ~ 14 milliseconds, the throughput of the best plan is more than twice the throughput of the lowest utilization plan Lowest utilization plan could be the worst choice Conclusion: Load shedding must be integrated in the optimization process

Optimization Framework Two areas Throughput of the plan Utilization cost of the plan Feasible queries Goal: Minimize cost of the plan Where throughput is fixed at its maximum value for all feasible queries Infeasible queries Goal: Maximize throughput of the plan Where cost is fixed at its maximum value for all p Assumption Search space of alternative plans always equipped with drop boxes All plans in the search space will be feasible Problem can be treated as unconstrained

Optimization Goal Maximize R(p) = plan throughput/plan cost Simplest optimization algorithm Generate the set of all plans of the query For each plan in the set Compute cost of the plan If cost > 1, insert drop boxes Compute R Return the plan that maximizes R(p)

Heuristic Optimizer Based on the original System R optimizer Builds the plan from the bottom-up by storing the best plans for successively larger subsets of the input streams Computing the best plan for any subset Test whether this subplan is feasible If infeasible, tune the values of the drop boxes placed at its input streams using load shedding alg

Computing the best subset plan Test whether this subplan is feasible If infeasible, tune the values of the drop boxes placed at its input streams using load shedding alg Store subplan At any stage If a drop box is placed in front of a stream which had another one from a previous round, the two are combined into one drop box whose selectivity is the product of the original two

Experiment Setup 1000 random continuous queries Each query reps join of five input streaming sources: A, B, C, D, E Window sizes and join selectivities fixed Rates were randomly picked from 10 to 1000 tuples/sec

Need for Reoptimization

Average Gain in Throughput over using the Lowest Utilization Plan At very low resources, the gain is very significant (almost 8 folds at the 1% mark)

Average and Maximum Gain

Heuristic Optimizer Except at very low resources, the performance of the heuristic optimizer is quite impressive

Summary Presented framework for static optimization of sliding window conjunctive queries over infinite streams Cost Model Load Shedding Load shedding must be integrated in the optimization process! Optimization Framework Experimental Results

References [1] homepage_files/LITERATURE/SIGMOD04-opt-shed-wisconsin.pdf [2] eek8/Stream_Maryam.pdf