Efficient Coflow Scheduling with Varys

Slides:



Advertisements
Similar presentations
Impact of Interference on Multi-hop Wireless Network Performance Kamal Jain, Jitu Padhye, Venkat Padmanabhan and Lili Qiu Microsoft Research Redmond.
Advertisements

Management and Control of Domestic Smart Grid Technology IEEE Transactions on Smart Grid, Sep Albert Molderink, Vincent Bakker Yong Zhou
Big Data + SDN SDN Abstractions. The Story Thus Far Different types of traffic in clusters Background Traffic – Bulk transfers – Control messages Active.
Varys Efficient Coflow Scheduling Mosharaf Chowdhury,
Outline LP formulation of minimal cost flow problem
Scheduling in Distributed Systems Gurmeet Singh CS 599 Lecture.
SDN + Storage.
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
ISE480 Sequencing and Scheduling Izmir University of Economics ISE Fall Semestre.
1 of 14 1 /23 Flexibility Driven Scheduling and Mapping for Distributed Real-Time Systems Paul Pop, Petru Eles, Zebo Peng Department of Computer and Information.
Advanced Computer Networking Congestion Control for High Bandwidth-Delay Product Environments (XCP Algorithm) 1.
Ion Stoica, Scott Shenker, and Hui Zhang SIGCOMM’98, Vancouver, August 1998 subsequently IEEE/ACM Transactions on Networking 11(1), 2003, pp Presented.
Balancing Interconnect and Computation in a Reconfigurable Array Dr. André DeHon BRASS Project University of California at Berkeley Why you don’t really.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
Kuang-Hao Liu et al Presented by Xin Che 11/18/09.
A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO.
Chapter 2: Model of scheduling problem Components of any model: Decision variables –What we can change to optimize the system, i.e., model output Parameters.
Bogdan Tanasa, Unmesh D. Bordoloi, Petru Eles, Zebo Peng Department of Computer and Information Science, Linkoping University, Sweden December 3, 2010.
Katz, Stoica F04 EECS 122: Introduction to Computer Networks Packet Scheduling and QoS Computer Science Division Department of Electrical Engineering and.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion MSM.
Guaranteed Smooth Scheduling in Packet Switches Isaac Keslassy (Stanford University), Murali Kodialam, T.V. Lakshman, Dimitri Stiliadis (Bell-Labs)
Design, Implementation, and Evaluation of Differentiated Caching Services Ying Lu, Tarek F. Abdelzaher, Avneesh Saxena IEEE TRASACTION ON PARALLEL AND.
CSE 421 Algorithms Richard Anderson Lecture 6 Greedy Algorithms.
Online Data Gathering for Maximizing Network Lifetime in Sensor Networks IEEE transactions on Mobile Computing Weifa Liang, YuZhen Liu.
UNIVERSITY OF ELECTRONIC SCIENCE & TECHNOLOGY OF CHINA IEEE INFOCOM 2015, Hong Kong RAPIER: Integrating Routing and Scheduling for Coflow-aware Data Center.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
1 Algorithms for Bandwidth Efficient Multicast Routing in Multi-channel Multi-radio Wireless Mesh Networks Hoang Lan Nguyen and Uyen Trang Nguyen Presenter:
1 Chapter 7 Dynamic Job Shops Advantages/Disadvantages Planning, Control and Scheduling Open Queuing Network Model.
Packet Scheduling From Ion Stoica. 2 Packet Scheduling  Decide when and what packet to send on output link -Usually implemented at output interface 1.
Optimal Scheduling of File Transfers with Divisible Sizes on Multiple Disjoint Paths Mugurel Ionut Andreica Polytechnic University of Bucharest Computer.
1 Maintaining Logical and Temporal Consistency in RT Embedded Database Systems Krithi Ramamritham.
Combinatorial Optimization Chapter 1. Problems and Algorithms  1.1 Two Problems (representative problems)  The Traveling Salesman Problem 47 drilling.
Engineering Jon Turner Computer Science & Engineering Washington University Coarse-Grained Scheduling for Multistage Interconnects.
1 Short Term Scheduling. 2  Planning horizon is short  Multiple unique jobs (tasks) with varying processing times and due dates  Multiple unique jobs.
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley.
Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Rounding scheme if r * j  1 then r j := 1  When the number of processors assigned in the continuous solution is between 0 and 1 for each task, the speed.
Static Process Scheduling
Author Utility-Based Scheduling for Bulk Data Transfers between Distributed Computing Facilities Xin Wang, Wei Tang, Raj Kettimuthu,
Sunpyo Hong, Hyesoon Kim
6.888 Lecture 8: Networking for Data Analytics Mohammad Alizadeh Spring  Many thanks to Mosharaf Chowdhury (Michigan) and Kay Ousterhout (Berkeley)
Great Theoretical Ideas in Computer Science.
Scheduling Jobs Across Geo-distributed Datacenters Chien-Chun Hung, Leana Golubchik, Minlan Yu Department of Computer Science University of Southern California.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Aalo Efficient Coflow Scheduling Without Prior Knowledge Mosharaf Chowdhury, Ion Stoica UC Berkeley.
Presented by Haoran Wang
Parallel Programming By J. H. Wang May 2, 2017.
PA an Coordinated Memory Caching for Parallel Jobs
Cof low A Networking Abstraction for Distributed
Parallel Programming in C with MPI and OpenMP
Chapter 6: CPU Scheduling
Operating Systems CPU Scheduling.
EECS 582 Final Review Mosharaf Chowdhury EECS 582 – F16.
FACILITY LAYOUT Facility layout means:
CPU Scheduling G.Anuradha
Module 5: CPU Scheduling
Multi-hop Coflow Routing and Scheduling in Data Centers
Chapter5: CPU Scheduling
Chapter 6: CPU Scheduling
The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’
Process Scheduling B.Ramamurthy 4/11/2019.
Process Scheduling B.Ramamurthy 4/7/2019.
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Parallel Programming in C with MPI and OpenMP
Chapter 6: CPU Scheduling
EECS 122: Introduction to Computer Networks Packet Scheduling and QoS
Module 5: CPU Scheduling
Towards Predictable Datacenter Networks
Presentation transcript:

Efficient Coflow Scheduling with Varys Mosharaf Chowdhury, Yuan Zong, Ion Stoica Presenters: Chi-Fan Chu, Chao-Han Tsai EECS 582 – W16

Outline Background and motivation SEBF and MADD Varys overview Evaluation Q & A EECS 582 – W16

Motivation Network-level optimization is agnostic to app-level requirement. Such mismatch hurts app-level performance, even when flow completion time (FCT) or fairness improve. Need an application-aware network flow scheduling. Traditional techniques to optimize flow-level metrics do not perform well in optimizing data-parallel application’s data flows. The requirement mismatch hurts the application-level performance, even when flow-level metrics improves. Such as flow completion time and flow fairness. EECS 582 – W16

Coflow A collection of flows that share a common performance goal, e.g. Minimizing the completion time of the latest flow. Ensuring that flows meet a common deadline. Assume the amount of data of each flow is known before it starts. The flows of a coflow are independent The input of a flow does not depend on the output of another in the same coflow. The endpoints of these flows can be in one or more machines. How to represent the application-level requirement? Here we use the idea of coflow. EECS 582 – W16

Network Model Here is the network model that we use for analysis. We consider the whole network fabric in the datacenter as one non-blocking switch, and focus only on its in-port and out-port, that is, the NICs. This abstraction is simple and is used to simplify the analysis. For example, here we have two coflows, one is orange, one is blue. The number on it represent the size of the flow. Now we need to schedule the flow over the network. EECS 582 – W16

Different Flow Scheduling The first three are the traditional flow-level scheduling, The first is… The second is… The third is… Non of them take the coflow information into account, that is, no matter how the flows are scheduled at P2 and P3, at P1 port, the orange coflow will always require at least 4 time units to finish. Then, there is no benefit to finish orange flows at P2, P3 before time 4, since the coflow will never complete before 4 as the bottleneck is at P1. Therefore, the idea here is that why not schedule other flows first, and only make sure orange flows can finish before the bottleneck time. The scheduling taking inter-coflow Information into account can provide the best performance (min CCT). EECS 582 – W16

Inter-coflow Scheduling Objective Minimize coflow completion time (CCT) Guarantee completion within coflow deadline and starvation freedom NP-hard, unfortunately Proved by reducing the problem from concurrent open shop scheduling with coupled resources Require effective heuristics to get practical solutions EECS 582 – W16

Ordering Heuristic: SEBF Diverse coflow characteristics in modern data centers: Length: size of largest flow in bytes Width: number of parallel flows Size: sum of all its flows in bytes Skew: coefficient of variation of its flows in terms of their size Before we start, we show the diversity of coflow in terms of these four characteristics. According to the analysis of real trace from Facebook, the coflows vary widely in all four characteristics. For example, the ratio of sender and receiver ports can be very different across coflows. The sender can be bottleneck in almost a third of the coflows. And combine these two figures, it is shown that most of the traffic in the data center are from a handful of large coflows, while most of the coflow size are very small. This made our focus only on scheduling large flows. EECS 582 – W16

Ordering Heuristic: SEBF (Cont.) Why not shortest- or smallest-first policy in minimize FCT? Does not use information of coflow. Why not Shortest-Coflow-First (SCF)? Does not take into account the width of a coflow. Why not Narrowest-Coflow-First (NCF)? Cannot differentiate between a short coflow from a long one. So there already exists some heuristic. Why are they not suitable for inter-coflow scheduling? EECS 582 – W16

Ordering Heuristic: SEBF (Cont.) Smallest-Effective-Bottleneck-First (SEBF) A heuristic that considers coflows’ length, width, size, and skew. A coflow C transfer 𝑗 𝑑 𝑖𝑗 amount of data through ingress port 𝑃 𝑖 𝑖𝑛 and 𝑖 𝑑 𝑖𝑗 amount of data through egress port 𝑃 𝑗 𝑜𝑢𝑡 Minimum CCT (𝚪) Basically, we try to calculate the bottleneck time that the coflow will require at least. According to this formula. The Rem means the remaining bandwidth estimated by our system. dij: amount of data transferred from Piin to Pjout at rij EECS 582 – W16

Allocation Algorithm: MADD Observation Completing a flow faster than the bottleneck does not improve the CCT in coflow scheduling Given 𝚪 (the coflow bottleneck as computed using SEBF heuristic), Minimum coflow completion time can be obtained as long as ALL flows finish at time 𝚪 Minimum-Allocation-for-Desired-Duration (MADD) Allocates the least amount of bandwidth to complete a coflow in minimum possible time (Similar to PACMan’s concept) Once the bottleneck time, Gamma, is estimated. We then use MADD algorithm to allocate transmission rate for each flow. The idea is very similar to PACMan. We only allocates the least amount of bandwidth to complete a coflow in minimum possible time. EECS 582 – W16

Varys Varys master: Varys master schedules coflows from different frameworks using global coordination. It works in two modes, either to minimize CCT or to meet deadlines. Frameworks use a client library to interact with Varys to register and define coflows. The master aggregates all interactions to create a global view of the network and determines rates of flows in each coflow Varys daemons: Varys daemons handle time-decoupled coflows, where senders and receivers are not simultaneously active. Data-intensive applications often complete after writing their output to disk. Whenever the corresponding receivers are ready, Vary daemon serve them by coordinating with the master. Also, these daemons periodically measures the network usage and report to Varys master. Master aggregates the report to estimate current utilization and use remaining bandwidth. EECS 582 – W16

Performance in Minimizing CCT Figure 6a shows that inter-coflow scheduling reduced the average and 95th percentile completion times of communication-dominated jobs by up to 2.50X and 2.94X in EC2 experiments. Figure 6b shows that the corresponding average and 95th percentile improvements in average CCT were up to 3.16X and 3.84X. We can find that jobs become increasingly faster as the communication represents a higher fraction of their completion times. EECS 582 – W16

Performance in Minimizing CCT Figure 8a presents CDFs of CCTs for all coflows. Per-flow fairness performs better only for some of the tiny sub-second coflows, which still use TCP fair sharing. We can find that as coflows become larger, the advantage of inter-coflow scheduling become more prominent. EECS 582 – W16

Performance for Deadline-Sensitive Coflows Inter-coflow scheduling allowed almost 2X more coflows to complete within corresponding deadlines in EC2 experiments. 57% of coflows met their deadlines using Varys while only 30% of coflows met their deadlines using the default setting. We can find that more than 25% of the admitted coflows missed their deadlines. Varys admitted more coflows than it should had. This is due to lack of network support in estimating network utilizations. After doubling the deadlines, we found that almost 94% of the admitted coflows succeeded using Varys. EECS 582 – W16

Conclusion Slow down all the flows in a coflow to match the completion time of the flow that will take the longest to finish. The combination of SEBF and MADD is not necessarily optimal, but work well in practice. EECS 582 – W16

Q & A The centralized architecture of Varys may become bottleneck as the cluster size grows. Can Varys evolve to be decentralized? The deadline miss rate is still high in the experiment due to lack of network support in estimating network utilization. Is such utility easy to obtain? If not, is there a workaround? Doesn’t the network model used in the paper over simplify the real environment? EECS 582 – W16