Boyang Peng, Le Xu, Indranil Gupta

Stela: Enabling Stream Processing Systems to Scale-in and Scale-out On-demand
Boyang Peng, Le Xu, Indranil Gupta University of Illinois at Urbana Champaign Distributed Protocol Research Group (DPRG)

Contributions First work to describe and implement on-demand elasticity within Storm Development of novel metric ETP Evaluation of our system on micro-benchmark applications as well as on applications used in production by Yahoo!

Data Processing Model Acyclic DAG of operators operators are stateless
An instance (of an operator) is an instantiation of the operator’s processing logic and the physical entity that executes the operator’s logic

Expected Throughput Percentage
the impact (percentage) that each operator has towards the application throughput Component 1 has ETP=0 Component 3 has ETP of 1000/4500=2/9 (note that component 6 is congested)

Find ETP for each operator
if o.child = null then return ProcessingRateMap.get(o) //o is a sink end if SubtreeSum ← 0; for each descendant child ∈ o do if child.congested = true then continue; // if the child is congested, give up the subtree rooted at that child else SubtreeSum+ = FINDETP(child); end if end for return SubtreeSum

Stela Scale-out compute N: # of instances being added on new machines
N = # of new machines * current instance count / current machine count For each instance slot: Pick component C with highest ETP Update C’s execution rate assuming new instance assigned: update all components’ ETPs

Stela Scale-in Find ETPSum for all machines
Remove machine with lowest ETP Sum Round Robin schedule instances on removed machine starting with machine with lowest ETPSum

Overview of Storm Nimbus node (master node, similar to the Hadoop JobTracker): Uploads computations for execution Distributes code across the cluster Launches workers across the cluster Monitors computation and reallocates workers as needed ZooKeeper nodes – coordinates the Storm cluster Supervisor nodes – communicates with Nimbus through Zookeeper, starts and stops workers according to signals from Nimbus

Overview of Storm Five key abstractions help to understand how Storm processes data: Tuples– an ordered list of elements. For example, a “4-tuple” might be (7, 1, 3, 7) Streams – an unbounded sequence of tuples. Spouts –sources of streams in a computation (e.g. a Twitter API) Bolts – process input streams and produce output streams. They can: run functions; filter, aggregate, or join data; or talk to databases. Topologies – the overall calculation, represented visually as a network of spouts and bolts (as in the following diagram)

Implementation

Evaluation Experimental Setup (Emulab) Ubuntu 12.04
100 Mbps VLAN connecting all machines PC 3000 3 GHZ dual core processor 2 GB of memory 10,000 RPM 146 GB SCSI disks D710 2.4 GHz 64-bit Quad Core 12 GB of memory 750 GB SATA disks

Micro-benchmark Experiments

Micro-benchmark Experiments
Stela 65% better Stela 65% better Stela 45% better Stela 120% better

Yahoo Topologies

Convergence Time

Stela achieves 87.5% and 75% less down time
Scale-in Experiments Stela achieves 87.5% and 75% less down time

Scale-in Experiments Yahoo PageLoad Topology

Summary scale-out, Stela achieves throughput that is % higher than Storm’s reduces interruption to 12.5% For scale-in, Stela performs % better

Questions?

Boyang Peng, Le Xu, Indranil Gupta

Similar presentations

Presentation on theme: "Boyang Peng, Le Xu, Indranil Gupta"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Boyang Peng, Le Xu, Indranil Gupta

Similar presentations

Presentation on theme: "Boyang Peng, Le Xu, Indranil Gupta"— Presentation transcript:

Similar presentations

About project

Feedback