Coflow A Networking Abstraction For Cluster Applications UC Berkeley Mosharaf Chowdhury Ion Stoica
Cluster Applications Multi-Stage Data Flows Computation interleaved with communication Computation Distributed Runs on many machines Communication Structured Between machine groups 2 Driver
A Flow Sequence of packets Independent Often the unit for network scheduling, traffic engineering, load balancing etc. Multiple Parallel Flows Independent Yet, semantically bound Shared objective 3 Driver Communication Abstraction Minimize Completio n Time
Coflow A collection of flows between two groups of machines that are bound together by application-specific semantics Captures 1.Structure 2.Shared Objective 3.Semantics 4 ‘
We Want To… Better schedule the network Intra-coflow Inter-coflow Write the communication layer of a new application Without reinventing the wheel Add unsupported coflows to an application, or Replace an existing coflow implementation Independent of applications 5
6 Coflow AP I The Network (Physically or Logically Centralized Controller) Cluster Applications
7 Coflow AP I Goals 1.Separate intent from mechanisms 2.Convey application-specific semantics to the network Goals 1.Separate intent from mechanisms 2.Convey application-specific semantics to the network
8 Coflow AP I Shuffl e finishe s MapReduc e Job finishes create(SHUFFLE) handle put(handle, id, content) get(handle, id) content terminate(handle) Driver
Choice of algorithms Default WSS 1 Choice of mechanism App vs. Network layer Pull vs. Push Choice of algorithms Default WSS 1 Choice of mechanism App vs. Network layer Pull vs. Push 9 mapper s reducer s shuffl e 1. Orchestra, SIGCOMM’2011 Coflow Flexibilit y
10 mappe rs reducer s shuffl e driver (JobTrack er) broadcas b create(BCAST) … put(b, id, content) … get(b, id) … Coflow Flexibilit y
11 mapper s reducer s shuffl e driver (JobTrack er) broadca b create(BCAST) s create(SHUFFLE, ord=[b ~> s]) put(b, id, content) … terminate(b) get(b, id) put(s, id s1 ) … Coflow Flexibilit y
Throughput-Sensitive Applications 12 Minimize Completion Time After 2 seconds
Throughput-Sensitive Applications 13 After 2 seconds After 7 secondsAfter 4 seconds Minimize Completion Time
Throughput-Sensitive Applications 14 After 2 seconds After 7 seconds Minimize Completion Time Free up resources without hurting application- perceived communication time
HotNets 2012 Latency-Sensitive Applications 15 Top-level Aggregato r Mid-level Aggregato rs Workers
Top-level Aggregato r Mid-level Aggregato rs Workers Latency-Sensitive Applications 16 HotNets 2012 Meet Deadline 1,2 1. D3, SIGCOMM’ PDQ, SIGCOMM’2012 HotNets-XI: Home Page conferences.sigcomm.org/hotnets/2012/ The Eleventh ACM Workshop on Hot Topics in Networks (HotNets-XI) will bring together people with interest in computer networks to engage in a lively debate... HotNets Workshop | acm sigcomm The Workshop on Hot Topics in Networks (HotNets) was created in 2002 to discuss early-stage, creative... HotNets-XI, Seattle, WA area, October 29-30, HotNets-XI: Call for Papers conferences.sigcomm.org/hotnets/2012/cfp.shtml The Eleventh ACM Workshop on Hot Topics in Networks (HotNets-XI) will bring together researchers in computer networks and systems to engage in a lively... Coflow accepted at HotNets' Sep 13, 2012 – Update: Coflow camera-ready is available online! Tell us what you think! Our position paper to address the lack of a networking abstraction for... Limit impact to as few requests as possible
One More Thing… 1. Critical Path Scheduling 2. OpenTCP 3. Structured Streams 4. … 17
Coflow UC Berkeley Mosharaf Chowdhury A semantically-bound collection of flows Conveys application intent to the network Allows better management of network resources Provides greater flexibility in designing applications
Communication of a cluster application is represented by a partially-ordered set of coflows Network allocation takes place among these partially-ordered sets of coflows Critical Path Scheduling 19 S B S A S S A S
20 OperationCaller create(PATTERN, [opt]) handleDriver put(handle, id, content, [opt]) resultSender get(handle, id, [opt]) contentReceiver terminate(handle, [opt]) resultDriver Coflow AP I
Throughput-Sensitive Applications 21 Local shuffle finishes Shuffle finishes Data Flow Minimize Completion Time 1 MapReduc e Framewor k Job finishes Map Stage Reduce Stage 1. Orchestra, SIGCOMM’2011
22 Coflow Resourc e Allocation 1. Weights [Across Apps] mappe rs reducers shuffle 1 mappe rs reducers shuffle 2 Job 1 Job 2 Weighted sharing between shuffle 1 create(SHUFFLE, weight=1) shuffle 2 create(SHUFFLE, weight=2) … Weighted sharing between shuffle 1 create(SHUFFLE, weight=1) shuffle 2 create(SHUFFLE, weight=2) …
23 Strict shuffle 1 create(SHUFFLE, pri=3) shuffle 2 create(SHUFFLE, pri=5) … Strict shuffle 1 create(SHUFFLE, pri=3) shuffle 2 create(SHUFFLE, pri=5) … Coflow Resourc e Allocation 2. Priorities [Across Apps] mappe rs reducers shuffle 1 mappe rs reducers shuffle 2 Job 1 Job 2
24 Coflow Resourc e Allocation 3. Dependencies [Within Apps] mappe rs reducers shuffle 2 driver broadcast (b) mappe rs reducers shuffle 1 Job 1 Job 2 aggregation(a gg) finishes_before b create ( BCAST ) shuffle 2 create ( SHUFFLE, ord=[b ~> shuffle 2 ]) agg create ( AGGR, ord=[shuffle 2 ~> agg]) finishes_before b create ( BCAST ) shuffle 2 create ( SHUFFLE, ord=[b ~> shuffle 2 ]) agg create ( AGGR, ord=[shuffle 2 ~> agg])
25 Coflow Resourc e Allocatio n Communication of a cluster application is represented by a partially-ordered set of coflows Network allocation takes place among these partially-ordered sets of coflows Communication of a cluster application is represented by a partially-ordered set of coflows Network allocation takes place among these partially-ordered sets of coflows S B S A S S A S