Download presentation
Presentation is loading. Please wait.
1
1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey
2
2 Overview Loadshedding in Streams example How Aurora looks at Load Shedding The algorithms Used by Aurora Experiments and results
3
3 Load Shedding in a DSMS Systems have a limit to how much fast data can be processed When the rate is too high, Queues will build up waiting for system resources Loadshedding discards some data so the system can flow Different from networking loadshedding –Data has semantic value in DSMS –QoS can be used to find the best stream to drop
4
4 Hospital - Network –Stream of free doctors locations –Stream of untreated patients locations, their condition (dieing, critical, injured, barely injured) –Output: match a patient with doctors within a certain distance Join Doctors Patients Doctors who can work on a patient
5
5 Too many Patients, what to do? Loadshedding based on condition –Official name “Triage” –Most critical patients get treated first –Filter added before the Join Selectivity based on amount of untreated patients Join Doctors Patients Doctors who can work on a patient Condition Filter
6
6 Aurora Overview Push based data from streaming sources 3 kinds of Quality of Service –Latency Shows utility drop as answers take longer to achieve –Value-based Shows which output values are most important –Loss-tolerance Shows how approximate answers affect a query
7
7 Loadshedding Techniques Filters (semantic drop) –Chooses what to shed based on QoS –Filter with a predicate in which selectivity = 1- p –Lowest utility tuples are dropped Drops (random drop) –Eliminates a random fraction of input –Has a p% chance of dropping each incoming tuple
8
8 3 Questions of Load Shedding When –Load of system needs constant evaluation Where –Dropping as early as possible saves most resources Can be a problem with streams that fan out and are used by multiple queries How much –the percent for a random drop –Make the predicate for a semantic drop (filter)
9
9 Load Shedding in Aurora Aurora Catalog –Holds QoS and other statistics –Network description Loadshedder monitors these and input rates: makes loadshedding decisions –Inserts drops/filters into the query network, which are stored in the catalog Load Shedder Catalog Query Network Input streamsoutput Network description Changes to Query plans Data rates
10
10 Equation N= network I=input streams C=processing capacity Uaccuracy= utility from loss-tolerance QoS graph H=Headroom factor, % of sys resources that can be used at a steady state If (Load(N(I)) > C then load shedding is needed (why no H) Goal is to get a new network N’ based on N but where: min{Uaccuracy(N(I))-Uaccuracy(N’(I))} and (Load(N’(I)) < H * C
11
11 Load Shedding Algorithm Evaluation Step –When to shed load? Load Shedding Road Map (LSRM) –Where to shed load? –How much load to shed?
12
12 Load Evaluation Load Coefficients (L) [processor cycles / tuple] –the number of processor cycles required to push a single tuple through the network to the outputs c1s1c1s1 c2s2c2s2 cnsncnsn … IO L = n operators c i = cost s i = selectivity
13
13 Load Evaluation Load Coefficient L 1 = 10 + (0.5 * 10) + (0.5 * 0.8 * 5) + (0.5 * 10) = 22 L 2 = 10 + (0.8 * 5) = 14 1 c 1 = 10 s 1 = 0.5 2 c 2 = 10 s 2 = 0.8 3 c n = 5 s n = 1.0 I O1O1 4 c 2 = 10 s 2 = 0.9 O2O2 L 1 = 22 L 2 = 14 L 3 = 5 L 4 = 10 L(I) = 22
14
14 Stream Load (S) –load created by the current stream rates Load Evaluation S = m input streams L i = load coefficient r i = input rate
15
15 Load Evaluation Stream Load S = 22 * 10 = 220 1 c 1 = 10 s 1 = 0.5 2 c 2 = 10 s 2 = 0.8 3 c n = 5 s n = 1.0 I O1O1 4 c 2 = 10 s 2 = 0.9 O2O2 L 1 = 22 L 2 = 14 L 3 = 5 L 4 = 10 L(I) = 22 r = 10
16
16 Queue Load (Q) –load due to any queues that may have built up since the last load evaluation step MELT_RATE = how fast to shrink the queues (queue length reduction per unit time) Load Evaluation Q = MELT_RATE * L i * q i L i = load coefficient q i = queue length
17
17 Load Evaluation Queue Load MELT_RATE = 0.1 Q = 0.1 * 5 * 100 = 50 1 c 1 = 10 s 1 = 0.5 2 c 2 = 10 s 2 = 0.8 3 c n = 5 s n = 1.0 I O1O1 4 c 2 = 10 s 2 = 0.9 O2O2 L 1 = 22 L 2 = 14 L 3 = 5 L 4 = 10 L(I) = 22 r = 10 q = 100
18
18 Load Evaluation Total Load Total Load (T) = S + Q T = 220 + 50 = 270 1 c 1 = 10 s 1 = 0.5 2 c 2 = 10 s 2 = 0.8 3 c n = 5 s n = 1.0 I O1O1 4 c 2 = 10 s 2 = 0.9 O2O2 L 1 = 22 L 2 = 14 L 3 = 5 L 4 = 10 L(I) = 22 r = 10 q = 100
19
19 The system is overloaded when Load Evaluation T > H * C headroom factor processing capacity
20
20 Load Shedding Algorithm Evaluation Step –When to drop? Load Shedding Road Map (LSRM) –How much to drop? –Where to drop?
21
21 Load Shedding Road Map (LSRM) <Cycle Savings Coefficients (CSC) Drop Insertion Plan (DIP) Percent Delivery Cursors (PDC)> set of drops that will be inserted how many cycles will be saved where the system will be running when the DIP is adopted … max savings … (0,0,0,…,0) CSC DIP PDC ENTRY n……ENTRY 1 cursor more load sheddingless load shedding
22
22 LSRM Construction set Drop Locations compute & sort Loss/Gain ratios how much to drop? take the least ratio insert Drop create LSRM entry how much to drop? take the least ratio insert Filter create LSRM entry determine predicate Drop-Based LSFilter-Based (Semantic) LS
23
23 Drop Locations Single Query set Drop Locations compute & sort Loss/Gain ratios Drop-Based LSFilter-Based LS 1 c 1 = 10 s 1 = 0.5 2 c 2 = 10 s 2 = 0.8 3 c n = 5 s n = 1.0 I O L 1 = 17 L 2 = 14 L 3 = 5 A BCD
24
24 Drop Locations Single Query set Drop Locations compute & sort Loss/Gain ratios Drop-Based LSFilter-Based LS 1 c 1 = 10 s 1 = 0.5 2 c 2 = 10 s 2 = 0.8 3 c n = 5 s n = 1.0 I O L 1 = 17 L 2 = 14 L 3 = 5 A
25
25 Drop Locations Shared Query 1 c 1 = 10 s 1 = 0.5 2 c 2 = 10 s 2 = 0.8 3 c n = 5 s n = 1.0 I O1O1 4 c 2 = 10 s 2 = 0.9 O2O2 L 1 = 22 L 2 = 14 L 3 = 5 L 4 = 10 A B C DE F set Drop Locations compute & sort Loss/Gain ratios Drop-Based LSFilter-Based LS
26
26 Drop Locations Shared Query 1 c 1 = 10 s 1 = 0.5 2 c 2 = 10 s 2 = 0.8 3 c n = 5 s n = 1.0 I O1O1 4 c 2 = 10 s 2 = 0.9 O2O2 L 1 = 22 L 2 = 14 L 3 = 5 L 4 = 10 A B C set Drop Locations compute & sort Loss/Gain ratios Drop-Based LSFilter-Based LS
27
27 Loss/Gain Ratio Loss Loss – utility loss as tuples are dropped – determined using loss-tolerance QoS graph set Drop Locations compute & sort Loss/Gain ratios Drop-Based LSFilter-Based LS 100500 % tuples0 0.7 1 utility Loss for first piece of graph = (1 – 0.7) / 50 = 0.006
28
28 Loss/Gain Ratio Gain Gain – processor cycles gained R = input rate into drop operator L = load coefficient x = drop percentage D = cost of drop operator STEP_SIZE = increments for x to find G(x) (To garanty G(x) > 0) Gain G(x) = set Drop Locations compute & sort Loss/Gain ratios Drop-Based LSFilter-Based LS
29
29 Drop-Based Load Shedding how much to drop? Take the least Loss/Gain ratio Determine the drop percentage p how much to drop? take the least ratio insert Drop create LSRM entry Drop-Based LS
30
30 Drop-Based Load Shedding where to drop? how much to drop? take the least ratio insert Drop create LSRM entry Drop-Based LS 1 c 1 = 10 s 1 = 0.5 2 c 2 = 10 s 2 = 0.8 3 c n = 5 s n = 1.0 I O L 1 = 17 L 2 = 14 L 3 = 5 A drop If there are other drops in the network, modify their drop percentages.
31
31 Drop-Based Load Shedding make LSRM entry All drop operators with the modified percentages form the DIP Compute CSC Advance QoS cursors and store in PDC LSRM Entry <Cycle Savings Coefficients (CSC) Drop Insertion Plan (DIP) Percent Delivery Cursors (PDC)> how much to drop? take the least ratio insert Drop create LSRM entry Drop-Based LS
32
32 Filter-Based (Semantic) Load Shedding how much to drop? predicate for filter Start dropping from the interval with the lowest utility. Keep a sorted list of intervals according to their utility and relative frequency. Find out how much to drop and what intervals are needed to. Determine the predicate for filter. how much to drop? take the least ratio insert Filter create LSRM entry determine predicate Filter-Based LS
33
33 Filter-Based Load Shedding place the filter how much to drop? take the least ratio insert Filter create LSRM entry determine predicate Filter-Based LS 1 c 1 = 10 s 1 = 0.5 2 c 2 = 10 s 2 = 0.8 3 c n = 5 s n = 1.0 I O L 1 = 17 L 2 = 14 L 3 = 5 A filter If there are other filters in the network, modify their selectivities.
34
34 Experiment setup Simulated network –Processing tuple time simulated by having the simulator process use the cpu for amount of time needed for an operator to consume a tuple –Process for each input stream –randomly created network Num querys, Num operations for querys chosen Random networks a good benchmark?
35
35 Experiments Used only Join, Filter, Union Aurora Operators –Filters were simple comparison predicates of the form: Input_value > filter_constant Filters and Drops loadshedding were Compared to 4 Admission Control Algorithms –Similar in style to networking loadshedding
36
36 Evaluation Methods Loss-tolerance, and Value-based QoS were used Tuple Utility is the utility from Loss-tolerance QoS –K= num time segments –n i = num tuples per time segment i –u i = loss-tolerance utility for each tuple during time segment i
37
37 Value Utility Value Utility is the Utility from value-based QoS –f i = relative frequency of tuples in value interval i with no drops –f i ’=frequency relative to the total number of tuples –U i =average value utility for value interval i When there are multiple queries, Overall Utility is the sum of the utilities for each query
38
38 Algorithms Input-Random –One random stream is chosen, and tuples are shed untill excess load is covered –if the whole stream is shed and there is still excess load, another random stream is chosen Input-Cost-Top –Similar to Input-Random, but uses the input stream with the most costly input Input-Uniform –Distributes load shedding uniformly by each input stream Input-Cost-Uniform –Load is shed of all input streams, weighted by their cost
39
39 Results – Tuple Utility Loss Observations: QoS driven Algorithms Perform better Filter works better then Drop
40
40 Results -Value utility loss Filter-LS is clearly the best Drop-LS is no better then the Admission control algorithms
41
41 Conclusion Loadshedding is important to DSMS Many variables to considor when planning to use Loadshedding Drop and Filter are two QoS driven algorithms QoS based strategies work better then Admission control
42
42 Questions Drop and Filter were the two QoS loadshedding algorithms given here. Are there any others? Admission Control may be a viable option in processing network requests, but in a streaming database system the connection is already made. Where putting the incoming tuples into a buffer to in effect deny the stream bandwidth, would this increase utility? Why are REDs useful or not useful for streaming databases?
43
43 More Questions When we have a low bandwidth connection like a sensor that is unreliable and when a significant amount of traffic is out of order, is TCP the best transport protocol? When there is high traffic, to what extent should the network do the load shedding? Should the database system be doing more because it knows the semantics of the tuples? So the idea of Admission control doesn't directly cross- over from networks to streaming databases. But does the idea of buffering the input when the process becomes overloaded, achieve the same effect? Why doesn't aurora have this?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.