Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.

Slides:



Advertisements
Similar presentations
Cognitive Radio Communications and Networks: Principles and Practice By A. M. Wyglinski, M. Nekovee, Y. T. Hou (Elsevier, December 2009) 1 Chapter 12 Cross-Layer.
Advertisements

Load Management and High Availability in Borealis Magdalena Balazinska, Jeong-Hyon Hwang, and the Borealis team MIT, Brown University, and Brandeis University.
Choosing an Order for Joins
The Design of the Borealis Stream Processing Engine Daniel J. Abadi1, Yanif Ahmad2, Magdalena Balazinska1, Ug ̆ur C ̧ etintemel2, Mitch Cherniack3, Jeong-Hyon.
Lava: A Reality Check of Network Coding in Peer-to-Peer Live Streaming Mea Wang, Baochun Li Department of Electrical and Computer Engineering University.
1 Asian Institute of Technology May 2009 MULTI-CONSTRAINED OPTIMAL PATH QUALITY OF SERVICE (QoS) ROUTING WITH INACCURATE LINK STATE INFORMATION AIT Master.
1 EL736 Communications Networks II: Design and Algorithms Class8: Networks with Shortest-Path Routing Yong Liu 10/31/2007.
The Design of the Borealis Stream Processing Engine Brandeis University, Brown University, MIT Magdalena BalazinskaNesime Tatbul MIT Brown.
Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams Presented by: Andy Mason and Sheng Zhong Ahmed M.Ayad and Jeffrey.
Selectivity-Based Partitioning Alkis Polyzotis UC Santa Cruz.
Towards Feasibility Region Calculus: An End-to-end Schedulability Analysis of Real- Time Multistage Execution William Hawkins and Tarek Abdelzaher Presented.
Gueyoung Jung, Nathan Gnanasambandam, and Tridib Mukherjee International Conference on Cloud Computing 2012.
1 Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes Yunfeng Lin, Ben Liang, Baochun Li INFOCOM 2007.
A Comparison of Layering and Stream Replication Video Multicast Schemes Taehyun Kim and Mostafa H. Ammar.
Adaptive Sampling for Sensor Networks Ankur Jain ٭ and Edward Y. Chang University of California, Santa Barbara DMSN 2004.
VLDB Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute
Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database.
Scalable Distributed Stream System Mitch Cherniack, Hari Balakrishnan, Magdalena Balazinska, Don Carney, Uğur Çetintemel, Ying Xing, and Stan Zdonik Proceedings.
An Optimization Problem in Adaptive Virtual Environments Ananth I. Sundararaj Manan Sanghi Jack R. Lange Peter A. Dinda Prescience Lab Department of Computer.
Multiagent Planning with Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.
Applying Control Theory to Stream Processing Systems Wei Xu Bill Kramer Joe Hellerstein.
Models and Issues in Data Streaming Presented By :- Ankur Jain Department of Computer Science 6/23/03 A list of relevant papers is available at
CS 591 A11 Algorithms for Data Streams Dhiman Barman CS 591 A1 Algorithms for the New Age 2 nd Dec, 2002.
BRASS Analysis of QuasiStatic Scheduling Techniques in a Virtualized Reconfigurable Machine Yury Markovskiy, Eylon Caspi, Randy Huang, Joseph Yeh, Michael.
ElasticTree: Saving Energy in Data Center Networks 許倫愷 2013/5/28.
Load Balancing Dan Priece. What is Load Balancing? Distributed computing with multiple resources Need some way to distribute workload Discreet from the.
DEXA 2005 Quality-Aware Replication of Multimedia Data Yicheng Tu, Jingfeng Yan and Sunil Prabhakar Department of Computer Sciences, Purdue University.
MobSched: An Optimizable Scheduler for Mobile Cloud Computing S. SindiaS. GaoB. Black A.LimV. D. AgrawalP. Agrawal Auburn University, Auburn, AL 45 th.
Mirek Riedewald Department of Computer Science Cornell University Efficient Processing of Massive Data Streams for Mining and Monitoring.
SCAN: a Scalable, Adaptive, Secure and Network-aware Content Distribution Network Yan Chen CS Department Northwestern University.
07/21/2005 Senmetrics1 Xin Liu Computer Science Department University of California, Davis Joint work with P. Mohapatra On the Deployment of Wireless Sensor.
The Design of the Borealis Stream Processing Engine CIDR 2005 Brandeis University, Brown University, MIT Kang, Seungwoo Ref.
Min Xu1, Yunfeng Zhu2, Patrick P. C. Lee1, Yinlong Xu2
Network Aware Resource Allocation in Distributed Clouds.
Time Parallel Simulations II ATM Multiplexers and G/G/1 Queues.
Bounding Variance and Expectation of Longest Path Lengths in DAGs Jeff Edmonds, York University Supratik Chakraborty, IIT Bombay.
A new model and architecture for data stream management.
Traffic Based Pathway Optimization Michael LeGore TJHSST CSL.
RIDA: A Robust Information-Driven Data Compression Architecture for Irregular Wireless Sensor Networks Nirupama Bulusu (joint work with Thanh Dang, Wu-chi.
1 Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565.
Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management Author: Raul Castro Fernandez, Matteo Migliavacca, et al.
Real-Time Support for Mobile Robotics K. Ramamritham (+ Li Huan, Prashant Shenoy, Rod Grupen)
1 Iterative Integer Programming Formulation for Robust Resource Allocation in Dynamic Real-Time Systems Sethavidh Gertphol and Viktor K. Prasanna University.
Accommodating Bursts in Distributed Stream Processing Systems Yannis Drougas, ESRI Vana Kalogeraki, AUEB
A Membrane Algorithm for the Min Storage problem Dipartimento di Informatica, Sistemistica e Comunicazione Università degli Studi di Milano – Bicocca WMC.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
A new model and architecture for data stream management.
Computer Network Lab. Integrated Coverage and Connectivity Configuration in Wireless Sensor Networks SenSys ’ 03 Xiaorui Wang, Guoliang Xing, Yuanfang.
Content caching and scheduling in wireless networks with elastic and inelastic traffic Group-VI 09CS CS CS30020 Performance Modelling in Computer.
Static Process Scheduling
CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.
Network-Aware Query Processing for Stream- based Application Yanif Ahmad, Ugur Cetintemel - Brown University VLDB 2004.
1 Slides by Yong Liu 1, Deep Medhi 2, and Michał Pióro 3 1 Polytechnic University, New York, USA 2 University of Missouri-Kansas City, USA 3 Warsaw University.
Rate-Based Query Optimization for Streaming Information Sources Stratis D. Viglas Jeffrey F. Naughton.
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
1 Semijoin Reduction in Query Processors Stocker, Kossman, Braumandl, Kemper Integrating Semi-Join-Reducers into State-of-the-Art Query Processors ICDE.
1 Chapter 6 Reformulation-Linearization Technique and Applications.
BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.
R-Storm: Resource Aware Scheduling in Storm
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
Stat 261 Two phase method.
Constraint-Based Routing
Applying Control Theory to Stream Processing Systems
Key agreement in wireless sensor network
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)
Speaker : Lee Heon-Jong
An Optimization Problem in Adaptive Virtual Environments
Performance-Robust Parallel I/O
Stream-Lined Data Management
Outline Introduction Background Distributed DBMS Architecture
Presentation transcript:

Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University

Jeong-Hyon Hwang 2 Stream Processing Monitoring Apps Financial Data Streams Surveillance Network Monitoring Click Stream Analysis Traffic Monitoring Sensor Network

Jeong-Hyon Hwang 3 Distributed Stream Processing

Jeong-Hyon Hwang 4 Roadmap Problem Statement Linear Load Model Feasible Set The Algorithm Extensions Lower Bound of Input Rates Non-linear Load Model Network Bandwidth / Communication Overhead Experimental Results Related Work Conclusions

Jeong-Hyon Hwang 5 Problem Statement Goal Find an operator distribution with the largest feasible set size r1r1 r2r2 r1r1 r1r1 r2r2 r1r1 r2r2 Input Rate Space Operator Distribution feasible infeasible Feasible Set

Jeong-Hyon Hwang 6 Linear Load Model r j - input rate of input j (tuples/sec) c k - processing cost of operator o k (CPU cycles/tuple) l(o k ) - the processing load of operator o k (CPU cycles/sec) s k - selectivity of operator o k ( [# output tuples] / [# of input tuples] ) o1o1 o1o1 o3o3 o3o3 o2o2 o2o2 o4o4 o4o4

Jeong-Hyon Hwang 7 Example Feasible Sets o1o1 o1o1 o3o3 o3o3 o2o2 o2o2 o4o4 o4o4 r1r1 r2r2 0 o1o1 o1o1 o4o4 o4o4 o2o2 o2o2 o3o3 o3o3 r1r1 r2r2 0 o1o1 o1o1 o3o3 o3o3 o2o2 o2o2 o4o4 o4o4 r1r1 r2r2 0

Jeong-Hyon Hwang 8 “Ideal” Feasible Set Theorem 1. Feasible Set is maximized when load coefficients of each input are perfectly balanced over all nodes (relative to their capacities) o1o1 o1o1 o3o3 o3o3 o2o2 o2o2 o4o4 o4o4 r1r1 r2r2 0 r1r1 r2r2 0

Jeong-Hyon Hwang 9 Resilient Operator Distribution Algorithm 1. Compute the Ideal Feasible Set 2. Sort Operators based on Load Coefficients 3. For each operator, determine the destination server r2r2 0 r1r1 Ideal Feasible Set

Jeong-Hyon Hwang 10 Result: R.O.D. vs Load Balancing 10 nodes 5 input streams

Jeong-Hyon Hwang 11 Result: Latency of a Network Monitoring Query

Jeong-Hyon Hwang 12 Extension: Network Bandwidth & Comm. Overhead Network Bandwidth Comm. Overhead

Jeong-Hyon Hwang 13 Extension: Nonlinear Load Model Add an artificial variable … r1r1 … o1o1 o1o1 ouou ouou o u+1 omom omom … r1r1 o1o1 o1o1 ouou ouou r2r2 … omom omom r2r2

Jeong-Hyon Hwang 14 Extension: Lower Bound of Input Rates Use the lower bound instead of the origin 0 r1r1 r2r2 0 r1r1 r2r2

Jeong-Hyon Hwang 15 Related Work Traditional Distributed Systems - Load balancing and load sharing [Shivaratri92] [Diekmann97] - Parallel query processing [DeWitt92] - Graph partitioning [Walshaw97] [Schloegel00] Stream Processing Systems - Load management Flux [Shah03] – data partitioning based parallel continuous query processing Medusa [Balazinska04] – federated distributed stream processing

Jeong-Hyon Hwang 16 Conclusion Distributed Stream Processing Resilient Operator Distribution - Maximize feasible set size Performance - Much better than conventional load distribution algorithms

Backup Slides

Computation Complexity Computation time is determined by n – number of nodes m –number of operators d –number of system input streams k – number of samples in load time series Static operator distribution Dynamic operator distribution

Jeong-Hyon Hwang 19 Heuristics Heuristic #1 Choose the case where feasibility boundaries are close on each axis Heuristic #2 Choose the case where all the feasibility boundaries are far from the orgin. r1r1 r2r2 0 r1r1 r2r2 0 r1r1 r2r2 0 r1r1 r2r2 0

Resilient vs. Optimal 2 nodes 4 input streams

Varying Bandwidth Constraints Resilient vs. Connected-Load-Balancing

Varying Data Communication CPU Overhead Resilient vs. Connected-Load-Balancing