Download presentation
Presentation is loading. Please wait.
1
Stela: Enabling Stream Processing Systems to Scale-in and Scale-out On-demand
Boyang Peng, Le Xu, Indranil Gupta University of Illinois at Urbana Champaign Distributed Protocol Research Group (DPRG)
2
Contributions First work to describe and implement on-demand elasticity within Storm Development of novel metric ETP Evaluation of our system on micro-benchmark applications as well as on applications used in production by Yahoo!
3
Data Processing Model Acyclic DAG of operators operators are stateless
An instance (of an operator) is an instantiation of the operator’s processing logic and the physical entity that executes the operator’s logic
4
Expected Throughput Percentage
the impact (percentage) that each operator has towards the application throughput Component 1 has ETP=0 Component 3 has ETP of 1000/4500=2/9 (note that component 6 is congested)
5
Find ETP for each operator
if o.child = null then return ProcessingRateMap.get(o) //o is a sink end if SubtreeSum ← 0; for each descendant child ∈ o do if child.congested = true then continue; // if the child is congested, give up the subtree rooted at that child else SubtreeSum+ = FINDETP(child); end if end for return SubtreeSum
6
Stela Scale-out compute N: # of instances being added on new machines
N = # of new machines * current instance count / current machine count For each instance slot: Pick component C with highest ETP Update C’s execution rate assuming new instance assigned: update all components’ ETPs
7
Stela Scale-in Find ETPSum for all machines
Remove machine with lowest ETP Sum Round Robin schedule instances on removed machine starting with machine with lowest ETPSum
8
Overview of Storm Nimbus node (master node, similar to the Hadoop JobTracker): Uploads computations for execution Distributes code across the cluster Launches workers across the cluster Monitors computation and reallocates workers as needed ZooKeeper nodes – coordinates the Storm cluster Supervisor nodes – communicates with Nimbus through Zookeeper, starts and stops workers according to signals from Nimbus
9
Overview of Storm Five key abstractions help to understand how Storm processes data: Tuples– an ordered list of elements. For example, a “4-tuple” might be (7, 1, 3, 7) Streams – an unbounded sequence of tuples. Spouts –sources of streams in a computation (e.g. a Twitter API) Bolts – process input streams and produce output streams. They can: run functions; filter, aggregate, or join data; or talk to databases. Topologies – the overall calculation, represented visually as a network of spouts and bolts (as in the following diagram)
10
Implementation
11
Evaluation Experimental Setup (Emulab) Ubuntu 12.04
100 Mbps VLAN connecting all machines PC 3000 3 GHZ dual core processor 2 GB of memory 10,000 RPM 146 GB SCSI disks D710 2.4 GHz 64-bit Quad Core 12 GB of memory 750 GB SATA disks
12
Micro-benchmark Experiments
13
Micro-benchmark Experiments
14
Micro-benchmark Experiments
Stela 65% better Stela 65% better Stela 45% better Stela 120% better
15
Yahoo Topologies
16
Yahoo Topologies
17
Convergence Time
18
Stela achieves 87.5% and 75% less down time
Scale-in Experiments Stela achieves 87.5% and 75% less down time
19
Scale-in Experiments Yahoo PageLoad Topology
20
Summary scale-out, Stela achieves throughput that is % higher than Storm’s reduces interruption to 12.5% For scale-in, Stela performs % better
21
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.