From Rivulets to Rivers: Elastic Stream Processing in Heron

Slides:



Advertisements
Similar presentations
The Moab Grid Suite CSS´ 06 – Bonn – July 28, 2006.
Advertisements

The Case for Drill-Ready Cloud Computing Vision Paper Tanakorn Leesatapornwongsa and Haryadi S. Gunawi 1.
Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Copyright ©2009 Opher Etzion Event Processing Course Engineering and implementation considerations (related to chapter 10)
Workload Management Massimo Sgaravatto INFN Padova.
CH 13 Server and Network Monitoring. Hands-On Microsoft Windows Server Objectives Understand the importance of server monitoring Monitor server.
Windows Server 2008 Chapter 11 Last Update
Cliff Evans Management Lead Microsoft UK System Center Overview.
Tyson Condie.
Self Adaptivity in Grid Computing Reporter : Po - Jen Lo Sathish S. Vadhiyar and Jack J. Dongarra.
9/14/2015B.Ramamurthy1 Operating Systems : Overview Bina Ramamurthy CSE421/521.
Click to add text TWA Cloud Integration with Tivoli Service Automation Manager TWS Education.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
Cluster Reliability Project ISIS Vanderbilt University.
Chapter 4 Realtime Widely Distributed Instrumention System.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming  To allocate scarce memory resources.
The Owner Share scheduler for a distributed system 2009 International Conference on Parallel Processing Workshops Reporter: 李長霖.
CERN - IT Department CH-1211 Genève 23 Switzerland Castor External Operation Face-to-Face Meeting, CNAF, October 29-31, 2007 CASTOR2 Disk.
XOberon Operating System CLAUDIA MARIN CS 550 Fall 2005.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
Very Large Scale Stream Processing inside Alibaba Alibaba.
Monitoring and Managing Server Performance. Server Monitoring To become familiar with the server’s performance – typical behavior Prevent problems before.
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
Performance Performance is about time and the software system’s ability to meet timing requirements.
Towards High Performance Processing of Streaming Data May Supun Kamburugamuve, Saliya Ekanayake, Milinda Pathirage and Geoffrey C. Fox Indiana.
Software Defined Networking BY RAVI NAMBOORI. Overview  Origins of SDN.  What is SDN ?  Original Definition of SDN.  What = Why We need SDN ?  Conclusion.
Distributed Systems Architectures Chapter 12. Objectives  To explain the advantages and disadvantages of different distributed systems architectures.
SQL Database Management
Heron: a stream data processing engine
Connected Infrastructure
R-Storm: Resource Aware Scheduling in Storm
HERON.
Workload Management Workpackage
Smart Building Solution
Machine Learning Library for Apache Ignite
SEDA: An Architecture for Scalable, Well-Conditioned Internet Services
Introduction to Distributed Platforms
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Blue Mixology.
Operating Systems : Overview
Hands-On Microsoft Windows Server 2008
GWE Core Grid Wizard Enterprise (
Alternative system models
Smart Building Solution
Microsoft SharePoint Server 2016
Running Apache Flink® Everywhere
Connected Infrastructure
Gregory Kesden, CSE-291 (Storage Systems) Fall 2017
Gregory Kesden, CSE-291 (Cloud Computing) Fall 2016
Management of Virtual Execution Environments 3 June 2008
Gabor Madl Ph.D. Candidate, UC Irvine Advisor: Nikil Dutt
Northbound API Dan Shmidt | January 2017
Operating Systems : Overview
湖南大学-信息科学与工程学院-计算机与科学系
Evaluating Transaction System Performance
Operating Systems : Overview
Henge: Intent-Driven Multi-Tenant Stream Processing
Resource-Efficient and QoS-Aware Cluster Management
Operating Systems : Overview
Operating Systems : Overview
Operating Systems : Overview
Overview of Workflows: Why Use Them?
Building global and highly-available services using Windows Azure
Experiences in Running Workloads over OSG/Grid3
Xen and the Art of Virtualization
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
Model-based Adaptation for Self-Healing Systems David Garlan, Bradley Schmert ELSEVIER Sciences of Computer Programming 57 (2005) 이경렬
Presentation transcript:

From Rivulets to Rivers: Elastic Stream Processing in Heron Bill Graham , Twitter - @billgraham Ashvin Agrawal, Microsoft Avrilia Floratou, Microsoft

Prediction is very difficult, especially if it’s about the future. Nils Bohr We cannot direct the wind, but we can adjust the sails. Dolly Parton

Outline Heron Overview Elastic Scaling Challenges Current Implementation Work in Progress – Auto-scaling

A realtime, distributed, fault-tolerant stream processing engine. Heron A realtime, distributed, fault-tolerant stream processing engine.

About Heron Developed by Twitter in 2014 Open sourced in May 2016 Storm API compatible Isolation at all levels: Topology Container Task (process-based) At least once, at most once semantics Backpressure Low resource overhead (< 10%)

Logical Topology Bolt 1 Spout 1 Bolt 4 Bolt 2 Spout 2 Bolt 5 Bolt 3

Physical Execution Bolt 1 Spout 1 Bolt 4 Bolt 2 Spout 2 Bolt 5 Bolt 3

Packing Plan How to distribute instances onto containers? IPacking.pack()

Topology Submission Containers Allocated Processes Initialize Instances Register Stream Manager Registers S1 S2 B3 S1 S2 B3 S1 B2 B3 Data Flows B4 B5 B6 B4 B5 B6 B4 B5 B6 heron submit Heron Client Stream Manager Stream Manager Stream Manager PackingPlan Heron Scheduler Container 0 Topology Master

Data Rate Variations

Parallelism Challenges Anticipating component parallelism is difficult Changing parallelism is costly - O(hour) code change, review, merge, build, kill, submit Tuning for load spikes or valleys is manual - O(day) Under-provisioning leads to back pressure leads to support costs Over-provisioning is the norm

Over-provisioning CPU Requested CPU Used 40% 25%

Elastic Scaling Opportunity Reduce administration cost Reduce support cost Reduce hardware cost Provide better SLA

Ordinary Topology Management Process User Tasks Heron System Tasks Releases Resources Kill Topology Submit Topology Create Packing Acquire Resources Monitor / Estimate Build State Start Topology Install Topology Time Consuming Tasks

Low-cost Topology “update” 2 2 3 4 4 3

Optimized Topology Scale-up Process User Tasks Heron System Tasks Kill Topology Submit Topology Create Packing Acquire Resources Update Topology Pause Topology Add / Reduce Resources Un-Pause Topology Prepare Components Monitor / Estimate Build State Start Topology Install Topology

heron “update” … Aims to Maintain Uniform Component Distribution $ heron update my_cluster/user/dev MyTopology \ --component-parallelism=bolt1:20 \ --component-parallelism=bolt2:40 Available in 0.14.5 Aims to Maintain Uniform Component Distribution Execution Time O(mins) Aggressively Prunes Containers Minimizes Disruption Customizable Through IRepacking.repack()

Current Limitations Automated state transition not yet supported Component scaling event notification : IUpdatable.update() Example: KafkaSpout queue partition mappings Fields group routing might change Workaround: pause topology > cache flush interval before scaling Algorithmic Auto-Scaling Modifying an existing packing plan can be more complex than creating one from scratch

Algorithmic Auto-Scaling … User Tasks User Tasks Heron System Tasks Heron System Tasks Submit Topology Create Packing Acquire Resources Update Topology Pause Topology Add / Reduce Resources Un-Pause Topology Prepare Components Monitor / Estimate Build State Start Topology Install Topology

Auto-Scaling Heron uses Dhalion to adjust to external shocks. Dhalion is a framework that provides self-regulating capabilities to Heron and will be open-sourced in the near future. Dhalion periodically observes the state of the topology and determines whether resources should be scaled up or down. Heron should automatically identify variations in the incoming load and react to them.

Using Dhalion to Auto-Scale Dhalion’s scales up and down the topology resources as needed while still keeping the topology in a steady state where backpressure is not observed Resource Overprovisioning Diagnoser Pending Packets Detector Bolt Scale Down Resolver Symptoms Resource Underprovisioning Diagnoser Diagnosis Bolt Scale Up Resolver Resolver Invocation Metrics Backpressure Detector Data Skew Diagnoser Data Skew Resolver Processing Rate Skew Detector Restart Instances Resolver Slow Instances Diagnoser Symptom Detection Diagnosis Generation Resolution

Initial Results Dhalion is able to adjust the topology resources on-the-fly when workload spikes occur. Our policy eventually reaches a healthy state where backpressure is not observed and the overall throughput is maximized.

Future Plans Use Dhalion to enforce throughput and latency SLOs and to auto-tune Heron topologies. Open-source Dhalion and the auto-scaling policy as part of Heron. Combine scaling with stateful stream processing.

Get Involved http://github.com/twitter/heron http://heronstreaming.io @heronstreaming

Up Next Anomaly detection in real-time data streams using Heron Arun Kejariwal, Machine Zone Karthik Ramasamy, Twitter

Questions?