From Rivulets to Rivers: Elastic Stream Processing in Heron

Slides:

Advertisements

Similar presentations

The Moab Grid Suite CSS´ 06 – Bonn – July 28, 2006.

Advertisements

The Case for Drill-Ready Cloud Computing Vision Paper Tanakorn Leesatapornwongsa and Haryadi S. Gunawi 1.

Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.

Workload Management Workpackage Massimo Sgaravatto INFN Padova.

Copyright ©2009 Opher Etzion Event Processing Course Engineering and implementation considerations (related to chapter 10)

Workload Management Massimo Sgaravatto INFN Padova.

CH 13 Server and Network Monitoring. Hands-On Microsoft Windows Server Objectives Understand the importance of server monitoring Monitor server.

Windows Server 2008 Chapter 11 Last Update

Cliff Evans Management Lead Microsoft UK System Center Overview.

Self Adaptivity in Grid Computing Reporter : Po - Jen Lo Sathish S. Vadhiyar and Jack J. Dongarra.

9/14/2015B.Ramamurthy1 Operating Systems : Overview Bina Ramamurthy CSE421/521.

Click to add text TWA Cloud Integration with Tivoli Service Automation Manager TWS Education.

Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)

Cluster Reliability Project ISIS Vanderbilt University.

Chapter 4 Realtime Widely Distributed Instrumention System.

By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming  To allocate scarce memory resources.

The Owner Share scheduler for a distributed system 2009 International Conference on Parallel Processing Workshops Reporter: 李長霖.

CERN - IT Department CH-1211 Genève 23 Switzerland Castor External Operation Face-to-Face Meeting, CNAF, October 29-31, 2007 CASTOR2 Disk.

XOberon Operating System CLAUDIA MARIN CS 550 Fall 2005.

 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.

Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.

By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.

Very Large Scale Stream Processing inside Alibaba Alibaba.

Monitoring and Managing Server Performance. Server Monitoring To become familiar with the server’s performance – typical behavior Prevent problems before.

Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.

Performance Performance is about time and the software system’s ability to meet timing requirements.

Towards High Performance Processing of Streaming Data May Supun Kamburugamuve, Saliya Ekanayake, Milinda Pathirage and Geoffrey C. Fox Indiana.

Software Defined Networking BY RAVI NAMBOORI. Overview  Origins of SDN.  What is SDN ?  Original Definition of SDN.  What = Why We need SDN ?  Conclusion.

Distributed Systems Architectures Chapter 12. Objectives  To explain the advantages and disadvantages of different distributed systems architectures.

SQL Database Management

Heron: a stream data processing engine

Connected Infrastructure

R-Storm: Resource Aware Scheduling in Storm

Workload Management Workpackage

Smart Building Solution

Machine Learning Library for Apache Ignite

SEDA: An Architecture for Scalable, Well-Conditioned Internet Services

Introduction to Distributed Platforms

Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.

Operating Systems : Overview

Hands-On Microsoft Windows Server 2008

GWE Core Grid Wizard Enterprise (

Alternative system models

Smart Building Solution

Microsoft SharePoint Server 2016

Running Apache Flink® Everywhere

Connected Infrastructure

Gregory Kesden, CSE-291 (Storage Systems) Fall 2017

Gregory Kesden, CSE-291 (Cloud Computing) Fall 2016

Management of Virtual Execution Environments 3 June 2008

Gabor Madl Ph.D. Candidate, UC Irvine Advisor: Nikil Dutt

Northbound API Dan Shmidt | January 2017

Operating Systems : Overview

湖南大学-信息科学与工程学院-计算机与科学系

Evaluating Transaction System Performance

Operating Systems : Overview

Henge: Intent-Driven Multi-Tenant Stream Processing

Resource-Efficient and QoS-Aware Cluster Management

Operating Systems : Overview

Operating Systems : Overview

Operating Systems : Overview

Overview of Workflows: Why Use Them?

Building global and highly-available services using Windows Azure

Experiences in Running Workloads over OSG/Grid3

Xen and the Art of Virtualization

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

Model-based Adaptation for Self-Healing Systems David Garlan, Bradley Schmert ELSEVIER Sciences of Computer Programming 57 (2005) 이경렬

Presentation transcript:

From Rivulets to Rivers: Elastic Stream Processing in Heron Bill Graham , Twitter - @billgraham Ashvin Agrawal, Microsoft Avrilia Floratou, Microsoft

Prediction is very difficult, especially if it’s about the future. Nils Bohr We cannot direct the wind, but we can adjust the sails. Dolly Parton

Outline Heron Overview Elastic Scaling Challenges Current Implementation Work in Progress – Auto-scaling

A realtime, distributed, fault-tolerant stream processing engine. Heron A realtime, distributed, fault-tolerant stream processing engine.

About Heron Developed by Twitter in 2014 Open sourced in May 2016 Storm API compatible Isolation at all levels: Topology Container Task (process-based) At least once, at most once semantics Backpressure Low resource overhead (< 10%)

Logical Topology Bolt 1 Spout 1 Bolt 4 Bolt 2 Spout 2 Bolt 5 Bolt 3

Physical Execution Bolt 1 Spout 1 Bolt 4 Bolt 2 Spout 2 Bolt 5 Bolt 3

Packing Plan How to distribute instances onto containers? IPacking.pack()

Topology Submission Containers Allocated Processes Initialize Instances Register Stream Manager Registers S1 S2 B3 S1 S2 B3 S1 B2 B3 Data Flows B4 B5 B6 B4 B5 B6 B4 B5 B6 heron submit Heron Client Stream Manager Stream Manager Stream Manager PackingPlan Heron Scheduler Container 0 Topology Master

Data Rate Variations

Parallelism Challenges Anticipating component parallelism is difficult Changing parallelism is costly - O(hour) code change, review, merge, build, kill, submit Tuning for load spikes or valleys is manual - O(day) Under-provisioning leads to back pressure leads to support costs Over-provisioning is the norm

Over-provisioning CPU Requested CPU Used 40% 25%

Elastic Scaling Opportunity Reduce administration cost Reduce support cost Reduce hardware cost Provide better SLA

Ordinary Topology Management Process User Tasks Heron System Tasks Releases Resources Kill Topology Submit Topology Create Packing Acquire Resources Monitor / Estimate Build State Start Topology Install Topology Time Consuming Tasks

Low-cost Topology “update” 2 2 3 4 4 3

Optimized Topology Scale-up Process User Tasks Heron System Tasks Kill Topology Submit Topology Create Packing Acquire Resources Update Topology Pause Topology Add / Reduce Resources Un-Pause Topology Prepare Components Monitor / Estimate Build State Start Topology Install Topology

heron “update” … Aims to Maintain Uniform Component Distribution $ heron update my_cluster/user/dev MyTopology \ --component-parallelism=bolt1:20 \ --component-parallelism=bolt2:40 Available in 0.14.5 Aims to Maintain Uniform Component Distribution Execution Time O(mins) Aggressively Prunes Containers Minimizes Disruption Customizable Through IRepacking.repack()

Current Limitations Automated state transition not yet supported Component scaling event notification : IUpdatable.update() Example: KafkaSpout queue partition mappings Fields group routing might change Workaround: pause topology > cache flush interval before scaling Algorithmic Auto-Scaling Modifying an existing packing plan can be more complex than creating one from scratch

Algorithmic Auto-Scaling … User Tasks User Tasks Heron System Tasks Heron System Tasks Submit Topology Create Packing Acquire Resources Update Topology Pause Topology Add / Reduce Resources Un-Pause Topology Prepare Components Monitor / Estimate Build State Start Topology Install Topology

Auto-Scaling Heron uses Dhalion to adjust to external shocks. Dhalion is a framework that provides self-regulating capabilities to Heron and will be open-sourced in the near future. Dhalion periodically observes the state of the topology and determines whether resources should be scaled up or down. Heron should automatically identify variations in the incoming load and react to them.

Using Dhalion to Auto-Scale Dhalion’s scales up and down the topology resources as needed while still keeping the topology in a steady state where backpressure is not observed Resource Overprovisioning Diagnoser Pending Packets Detector Bolt Scale Down Resolver Symptoms Resource Underprovisioning Diagnoser Diagnosis Bolt Scale Up Resolver Resolver Invocation Metrics Backpressure Detector Data Skew Diagnoser Data Skew Resolver Processing Rate Skew Detector Restart Instances Resolver Slow Instances Diagnoser Symptom Detection Diagnosis Generation Resolution

Initial Results Dhalion is able to adjust the topology resources on-the-fly when workload spikes occur. Our policy eventually reaches a healthy state where backpressure is not observed and the overall throughput is maximized.

Future Plans Use Dhalion to enforce throughput and latency SLOs and to auto-tune Heron topologies. Open-source Dhalion and the auto-scaling policy as part of Heron. Combine scaling with stateful stream processing.

Get Involved http://github.com/twitter/heron http://heronstreaming.io @heronstreaming

Up Next Anomaly detection in real-time data streams using Heron Arun Kejariwal, Machine Zone Karthik Ramasamy, Twitter

Questions?