Download presentation
Presentation is loading. Please wait.
Published byBethanie Doris Hensley Modified over 6 years ago
1
Managing Data Transfer in Computer Clusters with Orchestra
Mosharaf Chowdhury, Matei Zaharia, Justin Ma, Michael I. Jordan, Ion Stoica Presented by Guanhua Wang With some figures from Mosharaf’s SIGCOMM talk
2
Motivation Data Transfer is Expensive.
Facebook : Shuffle costs 33% job running time Spam classifier Twitter: : Broadcast/Shuffle accounts 42% of the iteration time.
3
Motivation Data Transfer limits scalability
Collaborative filtering job @
4
Problems Flow-level scheduling does not work well.
Lack of job-level semantics No global optimization on all flows within a shuffle/broadcast stage (Tail latency effect, straggler flows) Tail latency effect, straggler flows
5
Flow -> Transfer Max job performance, optimize at transfer level
Transfer: Flows transporting data between two stages of a job Shuffle & Broadcast We define a transfer as the set of all flows transporting data between two stages of a job. Frame work like Dryad, Spark
6
Orchestra Implement weighted fair sharing between transfers
ITC Inter-Transfer Controller (ITC) Implement weighted fair sharing between transfers Fair sharing FIFO Priority TC (shuffle) Shuffle Transfer Controller (TC) Broadcast Transfer Controller (TC) TC (broadcast) TC (broadcast) Broadcast Transfer Controller (TC) Hadoop shuffle WSS HDFS Tree Cornet HDFS Tree Cornet Infer and utilize topology information to reduce transmission overhead shuffle broadcast 1 broadcast 2 Assign flow rates to optimize shuffle completion time
7
Orchestra Implement weighted fair sharing between transfers
ITC Inter-Transfer Controller (ITC) Implement weighted fair sharing between transfers Fair sharing FIFO Priority TC (shuffle) Shuffle Transfer Controller (TC) Broadcast Transfer Controller (TC) TC (broadcast) TC (broadcast) Broadcast Transfer Controller (TC) Hadoop shuffle WSS HDFS Tree Cornet HDFS Tree Cornet Infer and utilize topology information to reduce transmission overhead shuffle broadcast 1 broadcast 2 Assign flow rates to optimize shuffle completion time
8
Inter-Transfer Controller (ITC)
Weighted fair sharing Each transfer is assigned a weight Congested links shared proportionally to transfers’ weights Implementation: Weighted Flow Assignment (WFA) Each transfer gets a number of TCP connections proportional to its weight Requires no changes in the network nor in end host OSes F (permissible TCP flows per host among transfers) Transfer i -> weight i (wi)
9
Without Inter-transfer Scheduling Priority Scheduling in Conductor
ITC Performance Shuffle using 30 nodes on EC2 Two priority classes FIFO within each class Low priority transfer 2GB per reducer High priority transfers 250MB per reducer Without Inter-transfer Scheduling Priority Scheduling in Conductor 43% reduction in high priority transfers 6% increase of the low priority transfer
10
Orchestra Implement weighted fair sharing between transfers
ITC Inter-Transfer Controller (ITC) Implement weighted fair sharing between transfers Fair sharing FIFO Priority TC (shuffle) Shuffle Transfer Controller (TC) Broadcast Transfer Controller (TC) TC (broadcast) TC (broadcast) Broadcast Transfer Controller (TC) Hadoop shuffle WSS HDFS Tree Cornet HDFS Tree Cornet Infer and utilize topology information to reduce transmission overhead shuffle broadcast 1 broadcast 2 Assign flow rates to optimize shuffle completion time
11
Broadcast Transfer (Cornet)
BitTorrent-like protocol optimized for datacenters Difference from BitTorrent Observations Cornet Design Decisions High-bandwidth, low-latency network Large block size (4-16MB) No selfish or malicious peers No need for incentives (e.g., TFT) No (un)choking Everyone stays till the end Topology matters Topology-aware broadcast Choking is a temporary refusal to upload
12
Topology-Aware Cornet
Transfer times between two nodes Same rack < different racks Cornet knows which rack each node is in. 1 Receiver requests new peers. 2 Transfer Controller (TC) gives high priority to nodes within the same rack as the receiver (not to choosing among all possible recipients). When a receiver requests for a new set of peers from the TC, instead of choosing among all possible recipients (as in vanilla Cornet), the TC gives priority to nodes on the same rack as the receiver.
13
Cornet Performance 1 GB data to 100 receivers on EC2
4.5x to 5x improvement
14
Orchestra Implement weighted fair sharing between transfers
ITC Inter-Transfer Controller (ITC) Implement weighted fair sharing between transfers Fair sharing FIFO Priority TC (shuffle) Shuffle Transfer Controller (TC) Broadcast Transfer Controller (TC) TC (broadcast) TC (broadcast) Broadcast Transfer Controller (TC) Hadoop shuffle WSS HDFS Tree Cornet HDFS Tree Cornet Infer and utilize topology information to reduce transmission overhead shuffle broadcast 1 broadcast 2 Assign flow rates to optimize shuffle completion time
15
Status quo in Shuffle r1 r2 s1 s2 s3 s4 s5
Links to r1 and r2 are full: 3 time units Link from s3 is full: 2 time units Completion time: 5 time units
16
Weighted Shuffle Scheduling (WSS)
Idea: Allocate rates to each flow using weighted fair sharing Weight of a sender-receiver flow is proportional to the total amount of data to be sent. Implementation: Each receiver open a different numbers of TCP connections to each sender, which is proportional to the amount of data it need to fetch.
17
Weighted Shuffle Scheduling (WSS)
r1 r2 1 2 s1 s2 s3 s4 s5 Completion time: 4 time units Up to 1.5X improvement
18
Orchestra Performance
Faster spam classification Evaluate on Spark Communication reduced from 42% to 28% of the iteration time Overall 22% reduction in iteration time
19
Orchestra Performance
Scalability evaluation Before After 1.9x faster at 90 nodes
20
Contributions Optimize at the level of transfers instead of individual flows (Weighted Shuffle Scheduling & Cornet) Inter-transfer coordination (Weighted Flow Assignment (WFA))
21
Discussion Orchestra control transfer at APP level
Good: No changes in the network nor in end host OSes Bad: Cannot perfectly control the network or protection against misbehaving hosts. Weight = No. of TCP connections, is it enough? Seawall for cross flow congestion control, etc. (robust, flexible) WSS outperforming fair sharing is rare in practice. Data between sender/receiver are roughly balanced, Weighted Shuffle Scheduling = fair sharing. Weighted Flow Assignment
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.