Download presentation
Presentation is loading. Please wait.
Published byHilda Bridges Modified over 9 years ago
1
UNIVERSITY OF ELECTRONIC SCIENCE & TECHNOLOGY OF CHINA IEEE INFOCOM 2015, Hong Kong RAPIER: Integrating Routing and Scheduling for Coflow-aware Data Center Networks Yangming Zhao (UESTC), Kai Chen (HKUST), Wei Bai (HKUST), Minlan Yu (USC), Chen Tian (HUST), Yanhui Geng (Huawei), Yiming Zhang (NUDT), Dan Li (Tsinghua), Sheng Wang (UESTC) zhaoyangming@uestc.edu.cn
2
UNIVERSITY OF ELECTRONIC SCIENCE & TECHNOLOGY OF CHINA IEEE INFOCOM 2015, Hong Kong p.2 Coflow-aware Traffic Optimization Why traffic optimization in data center networks? –Improve traffic scalability –Improve QoS Why coflow-aware? –Minimize average flow completion time –Minimize average coflow completion time How to optimize network traffic? –Routing (Hedera, Micro-TE) –Scheduling (Varys, Baraat, pFabric) In cluster computing frameworks, a stage cannot complete, or sometimes even start, before it receives all the flows in a coflow from the previous stage An individual flow can be treated as a special coflow Why not joint optimization?
3
UNIVERSITY OF ELECTRONIC SCIENCE & TECHNOLOGY OF CHINA IEEE INFOCOM 2015, Hong Kong p.3 Motivation Example Two coflows: Coflow a: f a1 =40Mb, f a2 =100Mb; Coflow b: f b1 =60Mb, f b2 =100Mb Link bandwidths are all 100Mbps Case 1: ECMP + Scheduling Traffic unbalance may occur due to the route collision incurred by ECMP Average CCT=1.5ms
4
UNIVERSITY OF ELECTRONIC SCIENCE & TECHNOLOGY OF CHINA IEEE INFOCOM 2015, Hong Kong p.4 Motivation Example Two coflows: Coflow a: f a1 =40Mb, f a2 =100Mb; Coflow b: f b1 =60Mb, f b2 =100Mb Link bandwidths are all 100Mbps Case 2: Coflow-agnostic Load balancing + Scheduling Average CCT=1.5ms Consider routing and scheduling separately cannot optimize average CCT Routing should also take flow dependence in a coflow into account
5
UNIVERSITY OF ELECTRONIC SCIENCE & TECHNOLOGY OF CHINA IEEE INFOCOM 2015, Hong Kong p.5 Motivation Example Two coflows: Coflow a: f a1 =40Mb, f a2 =100Mb; Coflow b: f b1 =60Mb, f b2 =100Mb Link bandwidths are all 100Mbps Case 3: Coflow-aware routing + scheduling Average CCT=1.3ms Jointly optimize routing and scheduling can minimize average CCT
6
UNIVERSITY OF ELECTRONIC SCIENCE & TECHNOLOGY OF CHINA IEEE INFOCOM 2015, Hong Kong Desirable Properties of RAPIER p.6
7
UNIVERSITY OF ELECTRONIC SCIENCE & TECHNOLOGY OF CHINA IEEE INFOCOM 2015, Hong Kong Main idea Coflow-level Routing –Distribute all the flows in a coflow evenly in the network Coflow-level Scheduling –Minimal remaining time first principle Starvation-free –Scheduling a coflow first if it is waiting for a long time Work-conserving –Distribute all the bandwidth if there is a demand to serve Coexistence –Route mice flows with ECMP and highest priority p.7
8
UNIVERSITY OF ELECTRONIC SCIENCE & TECHNOLOGY OF CHINA IEEE INFOCOM 2015, Hong Kong RAPIER in a Nutshell p.8 For starvation-free For minimal remaining time first For work-conserving
9
UNIVERSITY OF ELECTRONIC SCIENCE & TECHNOLOGY OF CHINA IEEE INFOCOM 2015, Hong Kong Minimize single coflow completion time p.9 Non-linear with integer variable Let a i =1/t i Non-linear with integer variable Relax integer constraint Let m k ij =a i x k ij Linear programming Route demand i to j on the path with largest x and resolve (2) Non-linear without integer variable
10
UNIVERSITY OF ELECTRONIC SCIENCE & TECHNOLOGY OF CHINA IEEE INFOCOM 2015, Hong Kong Relaxation and Rounding p.10 Problem (2) Problem (4) Theorem 1: Assume the minimum CCT is t min and t alg is the CCT obtained by Algorithm 2, then where K is the number of candidate paths for each flow
11
UNIVERSITY OF ELECTRONIC SCIENCE & TECHNOLOGY OF CHINA IEEE INFOCOM 2015, Hong Kong Bandwidth Allocation p.11 Large coflow first for starvation-free Large flow first to reduce CCT
12
UNIVERSITY OF ELECTRONIC SCIENCE & TECHNOLOGY OF CHINA IEEE INFOCOM 2015, Hong Kong Implementation Central controller –Algorithm 1 End host enforcement modules –OpenFlow based explicit routing –Bandwidth enforcement p.12 No device modification is required!!
13
UNIVERSITY OF ELECTRONIC SCIENCE & TECHNOLOGY OF CHINA IEEE INFOCOM 2015, Hong Kong Experiment on Testbed Pronto 3295 48-port Gigabit Ethernet switch with PicOS 2.04 system Each server has a 4-core Intel E5-1410 2.8GHz CPU, 8G memory, 500GB hard disk and 1G Ethernet NICs The OS of servers is Debian 6.0 64bit version with Linux 2.6.38.3 kernel p.13
14
UNIVERSITY OF ELECTRONIC SCIENCE & TECHNOLOGY OF CHINA IEEE INFOCOM 2015, Hong Kong Experiment Results p.14 Coflow ID Flow IDsourceDestinationVolume(GB) Coflow Completion Time(s) RAPIERRoutingBaseline 1 123123 M1 M2 M3 M4 M5 M9 3.17 5.29 50.684.1107.1 2 4545 M8 M6 M5 10.6 5.29 100.9203.0289.5 3 6767 M7 M9 M4 M6 17.9 10.6 201.1204.1289.2 Average completion time117.5163.7228.6 RAPIER can save 48.6% of the average CCT compared to the baseline scheme, and it can reduce the average CCT by 28.22% compared to the routing-only scheme
15
UNIVERSITY OF ELECTRONIC SCIENCE & TECHNOLOGY OF CHINA IEEE INFOCOM 2015, Hong Kong Simulation Settings C/C++ based flow level simulator CPLEX 10.0 for solving LP Fattree 、 VL2 with 512 servers Flows in a coflow arrive simultaneously Inter-coflow arrival rate follows a Poisson distribution p.15
16
UNIVERSITY OF ELECTRONIC SCIENCE & TECHNOLOGY OF CHINA IEEE INFOCOM 2015, Hong Kong Impact of coflow width Reduce average CCT by up to 79.44% in Fattree, and 55.55% in VL2 Routing-only scheme performs better when coflow width is small. Scheduling-only scheme performs better when coflow width is large. p.16
17
UNIVERSITY OF ELECTRONIC SCIENCE & TECHNOLOGY OF CHINA IEEE INFOCOM 2015, Hong Kong Impact of coflow number RAPIER keeps relatively stable performance with different coflow number. Scheduling-only scheme is more effective in VL2 than in Fattree p.17
18
UNIVERSITY OF ELECTRONIC SCIENCE & TECHNOLOGY OF CHINA IEEE INFOCOM 2015, Hong Kong Impact of inter-coflow arrival interval The average CCT is decreased with the increase of average inter- coflow arrival interval The same trend as scheduling-only scheme when the inter-coflow arrival interval is small The same trend as routing-only scheme when the inter-coflow arrival interval is large p.18
19
UNIVERSITY OF ELECTRONIC SCIENCE & TECHNOLOGY OF CHINA IEEE INFOCOM 2015, Hong Kong Simulation Results Summary In light-load scenario, routing contributes more by solving the flow path collision problem in ECMP. In heavy-load scenario, scheduling contributes more by determining the sending order of flows/coflows. RAPIER integrates both schemes and gets all the benefits from them. p.19
20
UNIVERSITY OF ELECTRONIC SCIENCE & TECHNOLOGY OF CHINA IEEE INFOCOM 2015, Hong Kong Conclusion RAPIER is a system which optimizes average coflow completion time in DCNs by integrating routing and scheduling. RAPIER follows the minimal remaining time first to reduce the average coflow completion time. We implement the prototype of RAPIER Simulation results show that RAPIER can greatly reduce the average coflow completion time in DCNs. p.20
21
UNIVERSITY OF ELECTRONIC SCIENCE & TECHNOLOGY OF CHINA IEEE INFOCOM 2015, Hong Kong The end! Thanks for your attention! p.21
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.