E-Storm: Replication-based State Management in Distributed Stream Processing Systems Xunyun Liu, Aaron Harwood, Shanika Karunasekera, Benjamin Rubinstein.

Slides:



Advertisements
Similar presentations
DISTRIBUTED COMPUTING PARADIGMS
Advertisements

Remus: High Availability via Asynchronous Virtual Machine Replication
Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Spark: Cluster Computing with Working Sets
Distributed Computations
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Overview Distributed vs. decentralized Why distributed databases
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Database Replication. Replication Replication is the process of sharing information so as to ensure consistency between redundant resources, such as software.
1 The Google File System Reporter: You-Wei Zhang.
Computer System Architectures Computer System Software
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
QoS Support in High-Speed, Wormhole Routing Networks Mario Gerla, B. Kannan, Bruce Kwan, Prasasth Palanti,Simon Walton.
IMDGs An essential part of your architecture. About me
Distributed File System By Manshu Zhang. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill,
A Fault Tolerant Protocol for Massively Parallel Machines Sayantan Chakravorty Laxmikant Kale University of Illinois, Urbana-Champaign.
Practical Byzantine Fault Tolerance
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Chap 7: Consistency and Replication
)1()1( Presenter: Noam Presman Advanced Topics in Storage Systems – Semester B 2013 Authors: A.Cidon, R.Stutsman, S.Rumble, S.Katti,
Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.
EEC 688/788 Secure and Dependable Computing Lecture 9 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Lecture 24: GFS.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
PART1 Data collection methodology and NM paradigms 1.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
R-Storm: Resource Aware Scheduling in Storm
Optimizing Distributed Actor Systems for Dynamic Interactive Services
HERON.
Chapter 19: Network Management
Distributed Cache Technology in Cloud Computing and its Application in the GIS Software Wang Qi Zhu Yitong Peng Cheng
Introduction to Distributed Platforms
Guangxiang Du*, Indranil Gupta
Prepared by Ertuğrul Kuzan
Chapter 10 Data Analytics for IoT
Cluster Communications
PREGEL Data Management in the Cloud
EEC 688/788 Secure and Dependable Computing
Channel Allocation (MAC)
Gregory Kesden, CSE-291 (Storage Systems) Fall 2017
Gregory Kesden, CSE-291 (Cloud Computing) Fall 2016
Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.
Supporting Fault-Tolerance in Streaming Grid Applications
Replication Middleware for Cloud Based Storage Service
Boyang Peng, Le Xu, Indranil Gupta
湖南大学-信息科学与工程学院-计算机与科学系
EECS 498 Introduction to Distributed Systems Fall 2017
Middleware for Fault Tolerant Applications
Consistency and Replication
Design pattern for cloud Application
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Anand Bhat*, Soheil Samii†, Raj Rajkumar* *Carnegie Mellon University
Presentation transcript:

E-Storm: Replication-based State Management in Distributed Stream Processing Systems Xunyun Liu, Aaron Harwood, Shanika Karunasekera, Benjamin Rubinstein and Rajkumar Buyya The Cloud Computing and Distributed Systems Lab The University of Melbourne, Australia

Outline of Presentation Background Stream Processing Apache Storm Performance Issue with the Current Approach Solution Overview Basic Idea Framework Design State Management Framework Error-free Execution Failure Recovery Evaluation Conclusions and Future Work

Stream Processing Background Stream Data Process-once-arrival Paradigm Arriving continuously & possible infinite Various data sources & structures Transient value & short data lifespan Asynchronous & unpredictable Process-once-arrival Paradigm Computation Queries over the most recent data Computations are generally independent Strong latency constraint Result Incrementally result update Persistence of data is not required Stream processing is an emerging paradigm that harnesses the potential of transient data in motion Asynchronous: source of data doesn't interact with the stream processing directly, like by waiting for an answer

Distributed Stream Processing System Background Distributed Stream Processing System Logic Level Inter-connected operators Data streams flow through these operators to undergo different types of computation Middleware Level Data Stream Management System (DSMS) Apache Storm, Samza… Infrastructure Level A set of distributed hosts in cloud or cluster environment Organised in Master/Slave model By far we have only introduced the stream processing as a abstract concept, it has to be carried out by concrete stream processing applications, also known as streaming applications. A typical streaming application consists of three tiers, the highest tiers is the logic level, where continuous queries are implemented as standing-by and inter-connected operators that continuously filter the data streams until the developers explicitly shut them off. The second tier is the middleware level, like Database management systems, various Data Stream Management Systems live here to support the upper-level logic and manage continuous data streams with intermediate event queues and processing entities. The third tiers is the computing infrastructure, composed by a centralized machine or a set of distributed hosts.

A Sketch of Apache Storm Background A Sketch of Apache Storm Operator Parallelization Topology Logical View of Storm Physical View of Storm Task Scheduling

Fault-tolerance in Storm Background Fault-tolerance in Storm Supervised and stateless daemon execution Worker processes heartbeat back to Supervisors and Nimbus via Zookeeper, as well as locally If a worker process dies (fails to heartbeat), the Supervisor will restart it. If a worker process dies repeatedly, Nimbus will reassign the work to other nodes in the cluster If a supervisor dies, Nimbus will reassign the work to other nodes If Nimbus dies, topologies will continue to function normally, but won’t be able to perform reassignments. If a worker process dies (fails to heartbeat), the Supervisor will restart it. If a worker process dies repeatedly, Nimbus will reschedule the worker.

Fault-tolerance in Storm Background Fault-tolerance in Storm Supervised and stateless daemon execution Worker processes heartbeat back to Supervisors and Nimbus via Zookeeper, as well as locally If a worker process dies (fails to heartbeat), the Supervisor will restart it. If a worker process dies repeatedly, Nimbus will reassign the work to other nodes in the cluster If a supervisor dies, Nimbus will reassign the work to other nodes If Nimbus dies, topologies will continue to function normally, but won’t be able to perform reassignments. If a Supervisor dies, an external process monitoring tool will restart it If a Worker node dies, the tasks assigned to that machine will time-out and Nimbus will reassign those tasks to other machines.

Fault-tolerance in Storm Background Fault-tolerance in Storm Supervised and stateless daemon execution Worker processes heartbeat back to Supervisors and Nimbus via Zookeeper, as well as locally If a worker process dies (fails to heartbeat), the Supervisor will restart it. If a worker process dies repeatedly, Nimbus will reassign the work to other nodes in the cluster If a supervisor dies, Nimbus will reassign the work to other nodes If Nimbus dies, topologies will continue to function normally, but won’t be able to perform reassignments. If Nimbus dies, topologies will continue to function normally, but won’t be able to perform reassignments. Storm v1.0.0 introduces the highly available Nimbus to eliminate the single point of failure

Fault-tolerance in Storm Background Fault-tolerance in Storm Message delivery guarantee (At-least-once by default)

Fault-tolerance in Storm Background Fault-tolerance in Storm Checkpointing-based State Persistence New spout added, which sends checkpoint messages across the whole topology through a separate internal stream Stateful bolts save their states as snapshots Used Chandy-Lamport algorithm to guarantee the consistency of distributed snapshots Storm has abstractions for bolts to save and retrieve the state of its operations. There is a default implementation that provides state persistence in a remote Redis cluster. So the framework automatically and periodically snapshots the state of the bolts across the topology in a consistent manner.

Performance Issue with the Current Approach Background Performance Issue with the Current Approach A remote data store is constantly involved High state synchronization overhead Significant access delay to the remote data store Hard to tune the frequency of checkpointing Excessive overhead Risk losing uncommitted states Storm has abstractions for bolts to save and retrieve the state of its operations. There is a default implementation that provides state persistence in a remote Redis cluster. So the framework automatically and periodically snapshots the state of the bolts across the topology in a consistent manner. Access Delay Synchronization Overhead Redis

Outline of Presentation Background Stream Processing Apache Storm Performance Issue with the Current Approach Solution Overview Basic Idea Framework Design State Management Framework Error-free Execution Failure Recovery Evaluation Conclusions and Future Work

Basic Idea: Fine-grained Active Replication Solution Overview Basic Idea: Fine-grained Active Replication Duplicate the execution of stateful tasks Maintain multiple state backups independently Primary Task Shadow Task

Basic Idea: Fine-grained Active Replication Solution Overview Basic Idea: Fine-grained Active Replication Primary task and shadow tasks are placed on separate nodes Restarted tasks recover their states from the alive partners

Framework Design Solution Overview Provide replication API Hide adaptation effort Framework Design

Framework Design Solution Overview Monitor the health of states Send recovery request after detecting a issue Framework Design

Framework Design Solution Overview Watch Zookeeper to monitor recovery request Initialise, oversee and finalise recovery process Framework Design

Framework Design Solution Overview Encapsulates the task execution with logic to handle state transfer and recovery Framework Design

Framework Design Solution Overview Decouple senders and receivers during the state transfer process Framework Design Task wrappers perform state management without synchronization and leader selection

Outline of Presentation Background Stream Processing Apache Storm Performance Issue with the Current Approach Solution Overview Basic Idea Framework Design State Management Framework Error-free Execution Failure Recovery Evaluation Conclusions and Future Work

State Management Framework Error-free Execution Determine task role based on task ID Rewire tasks using a replication-aware grouping policy

State Management Framework Error-free Execution Replication-aware Task Placement Based on greedy heuristic Only places shadow tasks Shadow tasks from the same fleet are spread as far as possible Communicating tasks are placed as close as possible

State Management Framework Failure Recovery Storm restarts the failed tasks State monitor sends recovery request Recovery manager initialises the recovery process Task wrapper conducts the state transfer process autonomously and transparently

State Management Framework Failure Recovery Simultaneous state transfer without synchronization In a failure-affected fleet, only one alive task gets to write its states Restarted tasks query the state transmit station for accessing their lost state

Outline of Presentation Background Stream Processing Apache Storm Performance Issue with the Current Approach Solution Overview Basic Idea Framework Design State Management Framework Error-free Execution Failure Recovery Evaluation Conclusions and Future Work

Experiment Setup Evaluation Nectar IaaS Cloud Two test applications 10 worker nodes: 2 VCPUs, 6GB RAM and 30GB disk 1 Nimbus, 1 Zookeeper, 1 Kestrel node Two test applications Synthetic test application URL extraction topology Profiling environment

Overhead of State Persistence Evaluation Overhead of State Persistence Synthetic application Throughput Latency

Overhead of State Persistence Evaluation Overhead of State Persistence Realistic application Throughput Latency

Overhead of Maintaining More Replicas Evaluation Overhead of Maintaining More Replicas Throughput changes Latency changes

Performance of Recovery Evaluation Performance of Recovery

Outline of Presentation Background Stream Processing Apache Storm Performance Issue with the Current Approach Solution Overview Basic Idea Framework Design State Management Framework Error-free Execution Failure Recovery Evaluation Conclusions and Future Work

Conclusions and Future work Proposed a replication-based state management system Low overhead on error-free execution Concurrent and high performance recovery in the case of failures Identified overhead of checkpointing Frequent state access Remote synchronization Future work Adaptive replication schemes Intelligent replica placement strategies Location-aware recovery protocol

© Copyright The University of Melbourne 2017