From Rivulets to Rivers: Elastic Stream Processing in Heron

From Rivulets to Rivers: Elastic Stream Processing in Heron
Bill Graham , Twitter Ashvin Agrawal, Microsoft Avrilia Floratou, Microsoft

Prediction is very difficult, especially if it’s about the future.
Nils Bohr We cannot direct the wind, but we can adjust the sails. Dolly Parton

Outline Heron Overview Elastic Scaling Challenges
Current Implementation Work in Progress – Auto-scaling

A realtime, distributed, fault-tolerant stream processing engine.
Heron A realtime, distributed, fault-tolerant stream processing engine.

About Heron Developed by Twitter in 2014 Open sourced in May 2016
Storm API compatible Isolation at all levels: Topology Container Task (process-based) At least once, at most once semantics Backpressure Low resource overhead (< 10%)

Logical Topology Bolt 1 Spout 1 Bolt 4 Bolt 2 Spout 2 Bolt 5 Bolt 3

Physical Execution Bolt 1 Spout 1 Bolt 4 Bolt 2 Spout 2 Bolt 5 Bolt 3

Packing Plan How to distribute instances onto containers?
IPacking.pack()

Topology Submission Containers Allocated Processes Initialize
Instances Register Stream Manager Registers S1 S2 B3 S1 S2 B3 S1 B2 B3 Data Flows B4 B5 B6 B4 B5 B6 B4 B5 B6 heron submit Heron Client Stream Manager Stream Manager Stream Manager PackingPlan Heron Scheduler Container 0 Topology Master

Data Rate Variations

Parallelism Challenges
Anticipating component parallelism is difficult Changing parallelism is costly - O(hour) code change, review, merge, build, kill, submit Tuning for load spikes or valleys is manual - O(day) Under-provisioning leads to back pressure leads to support costs Over-provisioning is the norm

Over-provisioning CPU Requested CPU Used 40% 25%

Elastic Scaling Opportunity
Reduce administration cost Reduce support cost Reduce hardware cost Provide better SLA

Ordinary Topology Management Process
User Tasks Heron System Tasks Releases Resources Kill Topology Submit Topology Create Packing Acquire Resources Monitor / Estimate Build State Start Topology Install Topology Time Consuming Tasks

Low-cost Topology “update”
2 2 3 4 4 3

Optimized Topology Scale-up Process
User Tasks Heron System Tasks Kill Topology Submit Topology Create Packing Acquire Resources Update Topology Pause Topology Add / Reduce Resources Un-Pause Topology Prepare Components Monitor / Estimate Build State Start Topology Install Topology

heron “update” … Aims to Maintain Uniform Component Distribution
$ heron update my_cluster/user/dev MyTopology \ --component-parallelism=bolt1:20 \ --component-parallelism=bolt2:40 Available in Aims to Maintain Uniform Component Distribution Execution Time O(mins) Aggressively Prunes Containers Minimizes Disruption Customizable Through IRepacking.repack()

Current Limitations Automated state transition not yet supported
Component scaling event notification : IUpdatable.update() Example: KafkaSpout queue partition mappings Fields group routing might change Workaround: pause topology > cache flush interval before scaling Algorithmic Auto-Scaling Modifying an existing packing plan can be more complex than creating one from scratch

Algorithmic Auto-Scaling …
User Tasks User Tasks Heron System Tasks Heron System Tasks Submit Topology Create Packing Acquire Resources Update Topology Pause Topology Add / Reduce Resources Un-Pause Topology Prepare Components Monitor / Estimate Build State Start Topology Install Topology

Auto-Scaling Heron uses Dhalion to adjust to external shocks.
Dhalion is a framework that provides self-regulating capabilities to Heron and will be open-sourced in the near future. Dhalion periodically observes the state of the topology and determines whether resources should be scaled up or down. Heron should automatically identify variations in the incoming load and react to them.

Using Dhalion to Auto-Scale
Dhalion’s scales up and down the topology resources as needed while still keeping the topology in a steady state where backpressure is not observed Resource Overprovisioning Diagnoser Pending Packets Detector Bolt Scale Down Resolver Symptoms Resource Underprovisioning Diagnoser Diagnosis Bolt Scale Up Resolver Resolver Invocation Metrics Backpressure Detector Data Skew Diagnoser Data Skew Resolver Processing Rate Skew Detector Restart Instances Resolver Slow Instances Diagnoser Symptom Detection Diagnosis Generation Resolution

Initial Results Dhalion is able to adjust the topology resources on-the-fly when workload spikes occur. Our policy eventually reaches a healthy state where backpressure is not observed and the overall throughput is maximized.

Future Plans Use Dhalion to enforce throughput and latency SLOs
and to auto-tune Heron topologies. Open-source Dhalion and the auto-scaling policy as part of Heron. Combine scaling with stateful stream processing.

Get Involved

Up Next Anomaly detection in real-time data streams using Heron Arun Kejariwal, Machine Zone Karthik Ramasamy, Twitter

Questions?

From Rivulets to Rivers: Elastic Stream Processing in Heron

Similar presentations

Presentation on theme: "From Rivulets to Rivers: Elastic Stream Processing in Heron"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

From Rivulets to Rivers: Elastic Stream Processing in Heron

Similar presentations

Presentation on theme: "From Rivulets to Rivers: Elastic Stream Processing in Heron"— Presentation transcript:

Similar presentations

About project

Feedback