Download presentation
Presentation is loading. Please wait.
Published byGregory Stafford Modified over 9 years ago
1
Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University
2
Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 2 Stream Processing Monitoring Apps Financial Data Streams Surveillance Network Monitoring Click Stream Analysis Traffic Monitoring Sensor Network
3
Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 3 Distributed Stream Processing
4
Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 4 Roadmap Problem Statement Linear Load Model Feasible Set The Algorithm Extensions Lower Bound of Input Rates Non-linear Load Model Network Bandwidth / Communication Overhead Experimental Results Related Work Conclusions
5
Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 5 Problem Statement Goal Find an operator distribution with the largest feasible set size r1r1 r2r2 r1r1 r1r1 r2r2 r1r1 r2r2 Input Rate Space Operator Distribution feasible infeasible Feasible Set
6
Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 6 Linear Load Model r j - input rate of input j (tuples/sec) c k - processing cost of operator o k (CPU cycles/tuple) l(o k ) - the processing load of operator o k (CPU cycles/sec) s k - selectivity of operator o k ( [# output tuples] / [# of input tuples] ) o1o1 o1o1 o3o3 o3o3 o2o2 o2o2 o4o4 o4o4
7
Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 7 Example Feasible Sets o1o1 o1o1 o3o3 o3o3 o2o2 o2o2 o4o4 o4o4 r1r1 r2r2 0 o1o1 o1o1 o4o4 o4o4 o2o2 o2o2 o3o3 o3o3 r1r1 r2r2 0 o1o1 o1o1 o3o3 o3o3 o2o2 o2o2 o4o4 o4o4 r1r1 r2r2 0
8
Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 8 “Ideal” Feasible Set Theorem 1. Feasible Set is maximized when load coefficients of each input are perfectly balanced over all nodes (relative to their capacities) o1o1 o1o1 o3o3 o3o3 o2o2 o2o2 o4o4 o4o4 r1r1 r2r2 0 r1r1 r2r2 0
9
Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 9 Resilient Operator Distribution Algorithm 1. Compute the Ideal Feasible Set 2. Sort Operators based on Load Coefficients 3. For each operator, determine the destination server r2r2 0 r1r1 Ideal Feasible Set
10
Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 10 Result: R.O.D. vs Load Balancing 10 nodes 5 input streams
11
Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 11 Result: Latency of a Network Monitoring Query
12
Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 12 Extension: Network Bandwidth & Comm. Overhead Network Bandwidth Comm. Overhead
13
Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 13 Extension: Nonlinear Load Model Add an artificial variable … r1r1 … o1o1 o1o1 ouou ouou o u+1 omom omom … r1r1 o1o1 o1o1 ouou ouou r2r2 … omom omom r2r2
14
Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 14 Extension: Lower Bound of Input Rates Use the lower bound instead of the origin 0 r1r1 r2r2 0 r1r1 r2r2
15
Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 15 Related Work Traditional Distributed Systems - Load balancing and load sharing [Shivaratri92] [Diekmann97] - Parallel query processing [DeWitt92] - Graph partitioning [Walshaw97] [Schloegel00] Stream Processing Systems - Load management Flux [Shah03] – data partitioning based parallel continuous query processing Medusa [Balazinska04] – federated distributed stream processing
16
Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 16 Conclusion Distributed Stream Processing Resilient Operator Distribution - Maximize feasible set size Performance - Much better than conventional load distribution algorithms
17
Backup Slides
18
Computation Complexity Computation time is determined by n – number of nodes m –number of operators d –number of system input streams k – number of samples in load time series Static operator distribution Dynamic operator distribution
19
Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 19 Heuristics Heuristic #1 Choose the case where feasibility boundaries are close on each axis Heuristic #2 Choose the case where all the feasibility boundaries are far from the orgin. r1r1 r2r2 0 r1r1 r2r2 0 r1r1 r2r2 0 r1r1 r2r2 0
20
Resilient vs. Optimal 2 nodes 4 input streams
21
Varying Bandwidth Constraints Resilient vs. Connected-Load-Balancing
22
Varying Data Communication CPU Overhead Resilient vs. Connected-Load-Balancing
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.