Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Slides:



Advertisements
Similar presentations
DISTRIBUTED MULTIMEDIA SYSTEMS
Advertisements

QoS Strategy in DiffServ aware MPLS environment Teerapat Sanguankotchakorn, D.Eng. Telecommunications Program, School of Advanced Technologies Asian Institute.
Unix Systems Performance Tuning Project of COSC 513 Name: Qinghui Mu Instructor: Prof. Anvari.
Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD.
Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinVinayan Verenkar Computer Science Dept San Jose State University.
CS4432: Database Systems II
1 CONGESTION CONTROL. 2 Congestion Control When one part of the subnet (e.g. one or more routers in an area) becomes overloaded, congestion results. Because.
TELE202 Lecture 8 Congestion control 1 Lecturer Dr Z. Huang Overview ¥Last Lecture »X.25 »Source: chapter 10 ¥This Lecture »Congestion control »Source:
DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song.
1.  Congestion Control Congestion Control  Factors that Cause Congestion Factors that Cause Congestion  Congestion Control vs Flow Control Congestion.
Improvement on LEACH Protocol of Wireless Sensor Network
Playback-buffer Equalization For Streaming Media Using Stateless Transport Prioritization By Wai-tian Tan, Weidong Cui and John G. Apostolopoulos Presented.
Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams Presented by: Andy Mason and Sheng Zhong Ahmed M.Ayad and Jeffrey.
AQM for Congestion Control1 A Study of Active Queue Management for Congestion Control Victor Firoiu Marty Borden.
Aurora Proponent Team Wei, Mingrui Liu, Mo Rebuttal Team Joshua M Lee Raghavan, Venkatesh.
Analysis and Simulation of a Fair Queuing Algorithm
Chapter 10: Stream-based Data Management Title: Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core Authors:
1 Load Shedding Algorithm Evaluation Step –When to shed load? Load Shedding Road Map (LSRM) –Where to shed load? –How much load to shed?
Efficient Monitoring of QoS Parameters (EMQP) Authors: Vadim Drabkin Arie Orlovsky Constantine Elster Instructors: Dr. Danny Raz Mr. Ran Wolff.
Traffic Matrix Estimation: Existing Techniques and New Directions A. Medina (Sprint Labs, Boston University), N. Taft (Sprint Labs), K. Salamatian (University.
1 Experiment And Analysis of Dynamic TCP Acknowledgement Daeseob Lim Sam Lai Wing-Ho Gordon Wong.
1 Emulating AQM from End Hosts Presenters: Syed Zaidi Ivor Rodrigues.
Parallel Computation in Biological Sequence Analysis Xue Wu CMSC 838 Presentation.
1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.
An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Power saving technique for multi-hop ad hoc wireless networks.
Ns Simulation Final presentation Stella Pantofel Igor Berman Michael Halperin
Modeling Quality-Quantity based Communication Orr Srour under the supervision of Ishai Menache.
Chapter 13: Inference in Regression
Bell Labs Advanced Technologies EMEAAT Proprietary Information © 2004 Lucent Technologies1 Overview contributions for D27 Lucent Netherlands Richa Malhotra.
Capacity analysis of complex materials handling systems.
Network Aware Resource Allocation in Distributed Clouds.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Univ. of TehranAdv. topics in Computer Network1 Advanced topics in Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Time Parallel Simulations II ATM Multiplexers and G/G/1 Queues.
Master’s Thesis (30 credits) By: Morten Lindeberg Supervisors: Vera Goebel and Jarle Søberg Design, Implementation, and Evaluation of Network Monitoring.
Understanding the Performance of TCP Pacing Amit Aggarwal, Stefan Savage, Thomas Anderson Department of Computer Science and Engineering University of.
Distance-Dependent RED Policy (DDRED)‏ Sébastien LINCK, Eugen Dedu and François Spies LIFC Montbéliard - France ICN07.
Aurora – system architecture Pawel Jurczyk. Currently used DB systems Classical DBMS: –Passive repository storing data (HADP – human-active, DBMS- passive.
Runtime Optimization of Continuous Queries Balakumar K. Kendai and Sharma Chakravarthy Information Technology Laboratory Department of Computer Science.
Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Measuring the Capacity of a Web Server USENIX Sympo. on Internet Tech. and Sys. ‘ Koo-Min Ahn.
CS4432: Database Systems II Query Processing- Part 2.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Network teleology Damon Wischik
1 Network Simulation and Testing Polly Huang EE NTU
Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,
Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Song Liu‡, Sunil Prabhakar†, and Bin Yao‡ † Department of Computer.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,
Draft-deoliveira-diff-te-preemption-02.txt J. C. de Oliveira, JP Vasseur, L. Chen, C. Scoglio Updates: –Co-author: JP Vasseur –New preemption criterion.
Chapter 10 Congestion Control in Data Networks and Internets 1 Chapter 10 Congestion Control in Data Networks and Internets.
The Network Layer Congestion Control Algorithms & Quality-of-Service Chapter 5.
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
Link-Level Internet Structures
A Study of Group-Tree Matching in Large Scale Group Communications
Chapter 12: Query Processing
Introduction to Query Optimization
CONGESTION CONTROL.
Data Stream Management System (DSMS)
Load Shedding Techniques for Data Stream Systems
Load Shedding in Stream Databases – A Control-Based Approach
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
Congestion Control, Quality of Service, & Internetworking
Parallel Programming in C with MPI and OpenMP
IIS Progress Report 2016/01/18.
Presentation transcript:

Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey

Overview Loadshedding in Streams example Loadshedding in Streams example How Aurora looks at Load Shedding How Aurora looks at Load Shedding The algorithms Used by Aurora The algorithms Used by Aurora Experiments and results Experiments and results

Load Shedding in a DSMS Systems have a limit to how much fast data can be processed Systems have a limit to how much fast data can be processed When the rate is too high, Queues will build up waiting for system resources When the rate is too high, Queues will build up waiting for system resources Loadshedding discards some data so the system can flow Loadshedding discards some data so the system can flow Different from networking loadshedding Different from networking loadshedding Data has semantic value in DSMS Data has semantic value in DSMS QoS can be used to find the best stream to drop QoS can be used to find the best stream to drop

Hospital - Network Hospital - Network Stream of free doctors locations Stream of free doctors locations Stream of untreated patients locations, their condition (dieing, critical, injured, barely injured) Stream of untreated patients locations, their condition (dieing, critical, injured, barely injured) Output: match a patient with doctors within a certain distance Output: match a patient with doctors within a certain distance Join Doctors Patients Doctors who can work on a patient

Too many Patients, what to do? Loadshedding based on condition Loadshedding based on condition Official name “Triage” Official name “Triage” Most critical patients get treated first Most critical patients get treated first Filter added before the Join Filter added before the Join Selectivity based on amount of untreated patients Selectivity based on amount of untreated patients Join Doctors Patients Doctors who can work on a patient Condition Filter

Aurora Overview Push based data from streaming sources Push based data from streaming sources 3 kinds of Quality of Service 3 kinds of Quality of Service Latency Latency Shows utility drop as answers take longer to achieve Shows utility drop as answers take longer to achieve Value-based Value-based Shows which output values are most important Shows which output values are most important Loss-tolerance Loss-tolerance Shows how approximate answers affect a query Shows how approximate answers affect a query

Loadshedding Techniques Filters (semantic drop) Filters (semantic drop) Chooses what to shed based on QoS Chooses what to shed based on QoS Filter with a predicate in which selectivity = 1-p Filter with a predicate in which selectivity = 1-p Lowest utility tuples are dropped Lowest utility tuples are dropped Drops (random drop) Drops (random drop) Eliminates a random fraction of input Eliminates a random fraction of input Has a p% chance of dropping each incoming tuple Has a p% chance of dropping each incoming tuple

3 Questions of Load Shedding When When Load of system needs constant evaluation Load of system needs constant evaluation Where Where Dropping as early as possible saves most resources Dropping as early as possible saves most resources Can be a problem with streams that fan out and are used by multiple queries Can be a problem with streams that fan out and are used by multiple queries How much How much the percent for a random drop the percent for a random drop Make the predicate for a semantic drop(filter) Make the predicate for a semantic drop(filter)

Load Shedding in Aurora Aurora Catalog Aurora Catalog Holds QoS and other statistics Holds QoS and other statistics Network description Network description Loadshedder monitors these and input rates: makes loadshedding decisions Loadshedder monitors these and input rates: makes loadshedding decisions Inserts drops/filters into the query network, which are stored in the catalog Inserts drops/filters into the query network, which are stored in the catalog Load Shedder Catalog Query Network Input streamsoutput Network description Changes to Query plans Data rates

Equation N= network N= network I=input streams I=input streams C=processing capacity C=processing capacity Uaccuracy= utility from loss-tolerance QoS graph Uaccuracy= utility from loss-tolerance QoS graph H=Headroom factor, % of sys resources that can be used at a steady state H=Headroom factor, % of sys resources that can be used at a steady state If (Load(N(I)) > C then load shedding is needed (why no H) Goal is to get a new network N’ based on N but where: min{Uaccuracy(N(I))-Uaccuracy(N’(I))} and and (Load(N’(I)) < H * C

Load Shedding Algorithm Evaluation Step Evaluation Step When to shed load? When to shed load? Load Shedding Road Map (LSRM) Load Shedding Road Map (LSRM) Where to shed load? Where to shed load? How much load to shed? How much load to shed?

Load Evaluation Load Coefficients (L) Load Coefficients (L) the number of processor cycles required to push a single tuple through the network to the outputs the number of processor cycles required to push a single tuple through the network to the outputs c1s1c1s1 c2s2c2s2 cnsncnsn … IO L = n operators c i = cost s i = selectivity

Load Evaluation Load Coefficient L 1 = 10 + (0.5 * 10) + (0.5 * 0.8 * 5) + (0.5 * 10) = 22 L 2 = 10 + (0.8 * 5) = 14 1 c 1 = 10 s 1 = c 2 = 10 s 2 = c n = 5 s n = 1.0 I O1O1 4 c 2 = 10 s 2 = 0.9 O2O2 L 1 = 22 L 2 = 14 L 3 = 5 L 4 = 10 L(I) = 22

Stream Load (S) Stream Load (S) load created by the current stream rates load created by the current stream rates Load Evaluation S = m input streams L i = load coefficient r i = input rate

Load Evaluation Stream Load S = 22 * 10 = c 1 = 10 s 1 = c 2 = 10 s 2 = c n = 5 s n = 1.0 I O1O1 4 c 2 = 10 s 2 = 0.9 O2O2 L 1 = 22 L 2 = 14 L 3 = 5 L 4 = 10 L(I) = 22 r = 10

Queue Load (Q) Queue Load (Q) load due to any queues that may have built up since the last load evaluation step load due to any queues that may have built up since the last load evaluation step MELT_RATE = how fast to shrink the queues MELT_RATE = how fast to shrink the queues (queue length reduction per unit time) (queue length reduction per unit time) Load Evaluation Q = MELT_RATE * L i * q i L i = load coefficient q i = queue length

Load Evaluation Queue Load MELT_RATE = 0.1 Q = 0.1 * 5 * 100 = 50 1 c 1 = 10 s 1 = c 2 = 10 s 2 = c n = 5 s n = 1.0 I O1O1 4 c 2 = 10 s 2 = 0.9 O2O2 L 1 = 22 L 2 = 14 L 3 = 5 L 4 = 10 L(I) = 22 r = 10 q = 100

Load Evaluation Total Load Total Load (T) = S + Q T = = c 1 = 10 s 1 = c 2 = 10 s 2 = c n = 5 s n = 1.0 I O1O1 4 c 2 = 10 s 2 = 0.9 O2O2 L 1 = 22 L 2 = 14 L 3 = 5 L 4 = 10 L(I) = 22 r = 10 q = 100

The system is overloaded when The system is overloaded when Load Evaluation T > H * C headroom factor processing capacity

Load Shedding Algorithm Evaluation Step Evaluation Step When to drop? When to drop? Load Shedding Road Map (LSRM) Load Shedding Road Map (LSRM) How much to drop? How much to drop? Where to drop? Where to drop?

Load Shedding Road Map (LSRM) <Cycle Savings Coefficients (CSC) Drop Insertion Plan (DIP) Percent Delivery Cursors (PDC)> set of drops that will be inserted how many cycles will be saved where the system will be running when the DIP is adopted … max savings …(0,0,0,…,0)CSCDIPPDC ENTRY n …… ENTRY 1 cursor more load sheddingless load shedding

LSRM Construction set Drop Locations compute & sort Loss/Gain ratios how much to drop? take the least ratio insert Drop create LSRM entry how much to drop? take the least ratio insert Filter create LSRM entry determine predicate Drop-Based LSFilter-Based LS

Drop Locations Single Query set Drop Locations compute & sort Loss/Gain ratios Drop-Based LSFilter-Based LS 1 c 1 = 10 s 1 = c 2 = 10 s 2 = c n = 5 s n = 1.0 I O L 1 = 17 L 2 = 14 L 3 = 5 A BCD

Drop Locations Single Query set Drop Locations compute & sort Loss/Gain ratios Drop-Based LSFilter-Based LS 1 c 1 = 10 s 1 = c 2 = 10 s 2 = c n = 5 s n = 1.0 I O L 1 = 17 L 2 = 14 L 3 = 5 A

Drop Locations Shared Query 1 c 1 = 10 s 1 = c 2 = 10 s 2 = c n = 5 s n = 1.0 I O1O1 4 c 2 = 10 s 2 = 0.9 O2O2 L 1 = 22 L 2 = 14 L 3 = 5 L 4 = 10 A B C DE F set Drop Locations compute & sort Loss/Gain ratios Drop-Based LSFilter-Based LS

Drop Locations Shared Query 1 c 1 = 10 s 1 = c 2 = 10 s 2 = c n = 5 s n = 1.0 I O1O1 4 c 2 = 10 s 2 = 0.9 O2O2 L 1 = 22 L 2 = 14 L 3 = 5 L 4 = 10 A B C set Drop Locations compute & sort Loss/Gain ratios Drop-Based LSFilter-Based LS

Loss/Gain Ratio Loss Loss – utility loss as tuples are dropped Loss – utility loss as tuples are dropped – determined using loss-tolerance QoS graph – determined using loss-tolerance QoS graph set Drop Locations compute & sort Loss/Gain ratios Drop-Based LSFilter-Based LS % tuples utility Loss for first piece of graph = (1 – 0.7) / 50 = 0.006

Loss/Gain Ratio Gain Gain – processor cycles gained Gain – processor cycles gained R = input rate into drop operator L = load coefficient x = drop percentage D = cost of drop operator STEP_SIZE = increments for x to find G(x) Gain G(x) = set Drop Locations compute & sort Loss/Gain ratios Drop-Based LSFilter-Based LS

Drop-Based Load Shedding how much to drop? Take the least Loss/Gain ratio Take the least Loss/Gain ratio Determine the drop percentage p Determine the drop percentage p how much to drop? take the least ratio insert Drop create LSRM entry Drop-Based LS

Drop-Based Load Shedding where to drop? how much to drop? take the least ratio insert Drop create LSRM entry Drop-Based LS 1 c 1 = 10 s 1 = c 2 = 10 s 2 = c n = 5 s n = 1.0 I O L 1 = 17 L 2 = 14 L 3 = 5 A drop If there are other drops in the network, modify their drop percentages.

Drop-Based Load Shedding make LSRM entry All drop operators with the modified percentages form the DIP All drop operators with the modified percentages form the DIP Compute CSC Compute CSC Advance QoS cursors and store in PDC Advance QoS cursors and store in PDC LSRM Entry <Cycle Savings Coefficients (CSC) Drop Insertion Plan (DIP) Percent Delivery Cursors (PDC)> how much to drop? take the least ratio insert Drop create LSRM entry Drop-Based LS

Filter-Based Load Shedding how much to drop? predicate for filter Start dropping from the interval Start dropping from the interval with the lowest utility. Keep a sorted list of intervals according to their utility and relative frequency. Keep a sorted list of intervals according to their utility and relative frequency. Find out how much to drop and what intervals are needed to. Find out how much to drop and what intervals are needed to. Determine the predicate for filter. Determine the predicate for filter. how much to drop? take the least ratio insert Filter create LSRM entry determine predicate Filter-Based LS

Filter-Based Load Shedding place the filter how much to drop? take the least ratio insert Filter create LSRM entry determine predicate Filter-Based LS 1 c 1 = 10 s 1 = c 2 = 10 s 2 = c n = 5 s n = 1.0 I O L 1 = 17 L 2 = 14 L 3 = 5 A filter If there are other filters in the network, modify their selectivities.

Experiment setup Simulated network Simulated network Processing tuple time simulated by having the simulator process use the cpu for amount of time needed for an operator to consume a tuple Processing tuple time simulated by having the simulator process use the cpu for amount of time needed for an operator to consume a tuple Process for each input stream Process for each input stream randomly created network randomly created network Num querys, Num operations for querys chosen Num querys, Num operations for querys chosen Random networks a good benchmark? Random networks a good benchmark?

Experiments Used only Join, Filter, Union Aurora Operators Used only Join, Filter, Union Aurora Operators Filters were simple comparison predicates of the form: Filters were simple comparison predicates of the form: Input_value > filter_constant Input_value > filter_constant Filters and Drops loadshedding were Compared to 4 Admission Control Algorithms Filters and Drops loadshedding were Compared to 4 Admission Control Algorithms Similar in style to networking loadshedding Similar in style to networking loadshedding

Evaluation Methods Loss-tolerance, and Value-based QoS were used Loss-tolerance, and Value-based QoS were used Tuple Utility is the utility from Loss-tolerance QoS Tuple Utility is the utility from Loss-tolerance QoS K= num time segments K= num time segments n i = num tuples per time segment i n i = num tuples per time segment i u i = loss-tolerance utility for each tuple during time segment i u i = loss-tolerance utility for each tuple during time segment i

Value Utility Value Utility is the Utility from value-based QoS Value Utility is the Utility from value-based QoS f i = relative frequency of tuples in value interval i with no drops f i = relative frequency of tuples in value interval i with no drops f i ’ =frequency relative to the total number of tuples f i ’ =frequency relative to the total number of tuples U i =average value utility for value interval i U i =average value utility for value interval i When there are multiple queries, Overall Utility is the sum of the utilities for each query When there are multiple queries, Overall Utility is the sum of the utilities for each query

Algorithms Input-Random Input-Random One random stream is chosen, and tuples are shed untill excess load is covered One random stream is chosen, and tuples are shed untill excess load is covered if the whole stream is shed and there is still excess load, another random stream is chosen if the whole stream is shed and there is still excess load, another random stream is chosen Input-Cost-Top Input-Cost-Top Similar to Input-Random, but uses the input stream with the most costly input Similar to Input-Random, but uses the input stream with the most costly input Input-Uniform Input-Uniform Distributes load shedding uniformly by each input stream Distributes load shedding uniformly by each input stream Input-Cost-Uniform Input-Cost-Uniform Load is shed of all input streams, weighted by their cost Load is shed of all input streams, weighted by their cost

Results – Tuple Utility Loss Observations: QoS driven Algorithms Perform better Filter works better then Drop

Results -Value utility loss Filter-LS is clearly the best Drop-LS is no better then the Admission control algorithms

Conclusion Loadshedding is important to DSMS Loadshedding is important to DSMS Many variables to considor when planning to use Loadshedding Many variables to considor when planning to use Loadshedding Drop and Filter are two QoS driven algorithms Drop and Filter are two QoS driven algorithms QoS based strategies work better then Admission control QoS based strategies work better then Admission control

Questions Drop and Filter were the two QoS loadshedding algorithms given here. Are there any others? Drop and Filter were the two QoS loadshedding algorithms given here. Are there any others? Admission Control may be a viable option in processing network requests, but in a streaming database system the connection is already made. Where putting the incoming tuples into a buffer to in effect deny the stream bandwidth, would this increase utility? Admission Control may be a viable option in processing network requests, but in a streaming database system the connection is already made. Where putting the incoming tuples into a buffer to in effect deny the stream bandwidth, would this increase utility? Why are REDs useful or not useful for streaming databases? Why are REDs useful or not useful for streaming databases?

More Questions When we have a low bandwidth connection like a sensor that is unreliable and when a significant amount of traffic is out of order, is TCP the best transport protocol? When we have a low bandwidth connection like a sensor that is unreliable and when a significant amount of traffic is out of order, is TCP the best transport protocol? When there is high traffic, to what extent should the network do the load shedding? Should the database system be doing more because it knows the semantics of the tuples? When there is high traffic, to what extent should the network do the load shedding? Should the database system be doing more because it knows the semantics of the tuples? So the idea of Admission control doesn't directly cross-over from networks to streaming databases. But does the idea of buffering the input when the process becomes overloaded, achieve the same effect? Why doesn't aurora have this? So the idea of Admission control doesn't directly cross-over from networks to streaming databases. But does the idea of buffering the input when the process becomes overloaded, achieve the same effect? Why doesn't aurora have this?