Efficient Coflow Scheduling with Varys Mosharaf Chowdhury, Yuan Zong, Ion Stoica Presenters: Chi-Fan Chu, Chao-Han Tsai EECS 582 – W16
Outline Background and motivation SEBF and MADD Varys overview Evaluation Q & A EECS 582 – W16
Motivation Network-level optimization is agnostic to app-level requirement. Such mismatch hurts app-level performance, even when flow completion time (FCT) or fairness improve. Need an application-aware network flow scheduling. Traditional techniques to optimize flow-level metrics do not perform well in optimizing data-parallel application’s data flows. The requirement mismatch hurts the application-level performance, even when flow-level metrics improves. Such as flow completion time and flow fairness. EECS 582 – W16
Coflow A collection of flows that share a common performance goal, e.g. Minimizing the completion time of the latest flow. Ensuring that flows meet a common deadline. Assume the amount of data of each flow is known before it starts. The flows of a coflow are independent The input of a flow does not depend on the output of another in the same coflow. The endpoints of these flows can be in one or more machines. How to represent the application-level requirement? Here we use the idea of coflow. EECS 582 – W16
Network Model Here is the network model that we use for analysis. We consider the whole network fabric in the datacenter as one non-blocking switch, and focus only on its in-port and out-port, that is, the NICs. This abstraction is simple and is used to simplify the analysis. For example, here we have two coflows, one is orange, one is blue. The number on it represent the size of the flow. Now we need to schedule the flow over the network. EECS 582 – W16
Different Flow Scheduling The first three are the traditional flow-level scheduling, The first is… The second is… The third is… Non of them take the coflow information into account, that is, no matter how the flows are scheduled at P2 and P3, at P1 port, the orange coflow will always require at least 4 time units to finish. Then, there is no benefit to finish orange flows at P2, P3 before time 4, since the coflow will never complete before 4 as the bottleneck is at P1. Therefore, the idea here is that why not schedule other flows first, and only make sure orange flows can finish before the bottleneck time. The scheduling taking inter-coflow Information into account can provide the best performance (min CCT). EECS 582 – W16
Inter-coflow Scheduling Objective Minimize coflow completion time (CCT) Guarantee completion within coflow deadline and starvation freedom NP-hard, unfortunately Proved by reducing the problem from concurrent open shop scheduling with coupled resources Require effective heuristics to get practical solutions EECS 582 – W16
Ordering Heuristic: SEBF Diverse coflow characteristics in modern data centers: Length: size of largest flow in bytes Width: number of parallel flows Size: sum of all its flows in bytes Skew: coefficient of variation of its flows in terms of their size Before we start, we show the diversity of coflow in terms of these four characteristics. According to the analysis of real trace from Facebook, the coflows vary widely in all four characteristics. For example, the ratio of sender and receiver ports can be very different across coflows. The sender can be bottleneck in almost a third of the coflows. And combine these two figures, it is shown that most of the traffic in the data center are from a handful of large coflows, while most of the coflow size are very small. This made our focus only on scheduling large flows. EECS 582 – W16
Ordering Heuristic: SEBF (Cont.) Why not shortest- or smallest-first policy in minimize FCT? Does not use information of coflow. Why not Shortest-Coflow-First (SCF)? Does not take into account the width of a coflow. Why not Narrowest-Coflow-First (NCF)? Cannot differentiate between a short coflow from a long one. So there already exists some heuristic. Why are they not suitable for inter-coflow scheduling? EECS 582 – W16
Ordering Heuristic: SEBF (Cont.) Smallest-Effective-Bottleneck-First (SEBF) A heuristic that considers coflows’ length, width, size, and skew. A coflow C transfer 𝑗 𝑑 𝑖𝑗 amount of data through ingress port 𝑃 𝑖 𝑖𝑛 and 𝑖 𝑑 𝑖𝑗 amount of data through egress port 𝑃 𝑗 𝑜𝑢𝑡 Minimum CCT (𝚪) Basically, we try to calculate the bottleneck time that the coflow will require at least. According to this formula. The Rem means the remaining bandwidth estimated by our system. dij: amount of data transferred from Piin to Pjout at rij EECS 582 – W16
Allocation Algorithm: MADD Observation Completing a flow faster than the bottleneck does not improve the CCT in coflow scheduling Given 𝚪 (the coflow bottleneck as computed using SEBF heuristic), Minimum coflow completion time can be obtained as long as ALL flows finish at time 𝚪 Minimum-Allocation-for-Desired-Duration (MADD) Allocates the least amount of bandwidth to complete a coflow in minimum possible time (Similar to PACMan’s concept) Once the bottleneck time, Gamma, is estimated. We then use MADD algorithm to allocate transmission rate for each flow. The idea is very similar to PACMan. We only allocates the least amount of bandwidth to complete a coflow in minimum possible time. EECS 582 – W16
Varys Varys master: Varys master schedules coflows from different frameworks using global coordination. It works in two modes, either to minimize CCT or to meet deadlines. Frameworks use a client library to interact with Varys to register and define coflows. The master aggregates all interactions to create a global view of the network and determines rates of flows in each coflow Varys daemons: Varys daemons handle time-decoupled coflows, where senders and receivers are not simultaneously active. Data-intensive applications often complete after writing their output to disk. Whenever the corresponding receivers are ready, Vary daemon serve them by coordinating with the master. Also, these daemons periodically measures the network usage and report to Varys master. Master aggregates the report to estimate current utilization and use remaining bandwidth. EECS 582 – W16
Performance in Minimizing CCT Figure 6a shows that inter-coflow scheduling reduced the average and 95th percentile completion times of communication-dominated jobs by up to 2.50X and 2.94X in EC2 experiments. Figure 6b shows that the corresponding average and 95th percentile improvements in average CCT were up to 3.16X and 3.84X. We can find that jobs become increasingly faster as the communication represents a higher fraction of their completion times. EECS 582 – W16
Performance in Minimizing CCT Figure 8a presents CDFs of CCTs for all coflows. Per-flow fairness performs better only for some of the tiny sub-second coflows, which still use TCP fair sharing. We can find that as coflows become larger, the advantage of inter-coflow scheduling become more prominent. EECS 582 – W16
Performance for Deadline-Sensitive Coflows Inter-coflow scheduling allowed almost 2X more coflows to complete within corresponding deadlines in EC2 experiments. 57% of coflows met their deadlines using Varys while only 30% of coflows met their deadlines using the default setting. We can find that more than 25% of the admitted coflows missed their deadlines. Varys admitted more coflows than it should had. This is due to lack of network support in estimating network utilizations. After doubling the deadlines, we found that almost 94% of the admitted coflows succeeded using Varys. EECS 582 – W16
Conclusion Slow down all the flows in a coflow to match the completion time of the flow that will take the longest to finish. The combination of SEBF and MADD is not necessarily optimal, but work well in practice. EECS 582 – W16
Q & A The centralized architecture of Varys may become bottleneck as the cluster size grows. Can Varys evolve to be decentralized? The deadline miss rate is still high in the experiment due to lack of network support in estimating network utilization. Is such utility easy to obtain? If not, is there a workaround? Doesn’t the network model used in the paper over simplify the real environment? EECS 582 – W16