Fast Congestion Control in RDMA-Based Datacenter Networks

Slides:



Advertisements
Similar presentations
Congestion Control and Fairness Models Nick Feamster CS 4251 Computer Networking II Spring 2008.
Advertisements

B 黃冠智.
COS 461 Fall 1997 Routing COS 461 Fall 1997 Typical Structure.
George Michelogiannakis, Nan Jiang, Daniel Becker, William J. Dally This work was completed in Stanford University.
Jaringan Komputer Lanjut Packet Switching Network.
Cloud Control with Distributed Rate Limiting Raghaven et all Presented by: Brian Card CS Fall Kinicki 1.
Improving TCP Performance over Mobile Ad Hoc Networks by Exploiting Cross- Layer Information Awareness Xin Yu Department Of Computer Science New York University,
Restricted Slow-Start for TCP William Allcock 1,2, Sanjay Hegde 3 and Rajkumar Kettimuthu 1,2 1 Argonne National Laboratory 2 The University of Chicago.
Balajee Vamanan et al. Deadline-Aware Datacenter TCP (D 2 TCP) Balajee Vamanan, Jahangir Hasan, and T. N. Vijaykumar.
Router-assisted congestion control Lecture 8 CS 653, Fall 2010.
Congestion control in data centers
Defense: Christopher Francis, Rumou duan Data Center TCP (DCTCP) 1.
Improving TCP Performance over Ad-hoc Network 11/28/2000 Xuanming Dong, Duke Lee, and Jin Wang Course Project for EE228A --- Fall 2000 (Professor Jean.
Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks TCP.
A Scalable, Commodity Data Center Network Architecture Mohammad Al-Fares, Alexander Loukissas, Amin Vahdat Presented by Gregory Peaker and Tyler Maclean.
Congestion Control for High Bandwidth-Delay Product Environments Dina Katabi Mark Handley Charlie Rohrs.
Detail: Reducing the Flow Completion Time Tail in Datacenter Networks SIGCOMM PIGGY.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
B 李奕德.  Abstract  Intro  ECN in DCTCP  TDCTCP  Performance evaluation  conclusion.
Explicit Allocation of Best-Effort Service Goal: Allocate different rates to different users during congestion Can charge different prices to different.
HP Labs 1 IEEE Infocom 2003 End-to-End Congestion Control for InfiniBand Jose Renato Santos, Yoshio Turner, John Janakiraman HP Labs.
TCP/IP1 Address Resolution Protocol Internet uses IP address to recognize a computer. But IP address needs to be translated to physical address (NIC).
R2C2: A Network Stack for Rack-scale Computers Paolo Costa, Hitesh Ballani, Kaveh Razavi, Ian Kash Microsoft Research Cambridge EECS 582 – W161.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs in Parallel Machines Dr. Xiao Qin Auburn University
William Stallings Data and Computer Communications
Data Center TCP (DCTCP)
Resilient Datacenter Load Balancing in the Wild
How to Train your Dragonfly
By, Nirnimesh Ghose, Master of Science,
Transmission Control Protocol (TCP) Retransmission and Time-Out
Topics discussed in this section:
Satellite TCP Lecture 19 04/10/02.
Hydra: Leveraging Functional Slicing for Efficient Distributed SDN Controllers Yiyang Chang, Ashkan Rezaei, Balajee Vamanan, Jahangir Hasan, Sanjay Rao.
Distributed Systems CS
Advanced Computer Networks
HyGenICC: Hypervisor-based Generic IP Congestion Control for Virtualized Data Centers Conference Paper in Proceedings of ICC16 By Ahmed M. Abdelmoniem,
ECE 544: Traffic engineering (supplement)
TCP Congestion Control at the Network Edge
CIS, University of Delaware
What is a router? A router is a device that connects multiple computers together. Not to be confused with a switch Routers transmit packets of data across.
Congestion Control and Resource Allocation
TCP Congestion Control
Srinivas Narayana MIT CSAIL October 7, 2016
Chapter 3 Part 3 Switching and Bridging
Queue Dynamics with Window Flow Control
Chapter 16: Distributed System Structures
Transport Layer Unit 5.
Microsoft Research Stanford University
Hamed Rezaei, Mojtaba Malekpourshahraki, Balajee Vamanan
Augmenting Proactive Congestion Control with Aeolus
Analysis of Congestion Control Mechanisms in Congestion Control
TCP in Mobile Ad-hoc Networks
TimeTrader: Exploiting Latency Tail to Save Datacenter Energy for Online Search Balajee Vamanan, Hamza Bin Sohail, Jahangir Hasan, and T. N. Vijaykumar.
FAST TCP : From Theory to Experiments
AMP: A Better Multipath TCP for Data Center Networks
COS 561: Advanced Computer Networks
TCP in Wireless Ad-hoc Networks
The University of Adelaide, School of Computer Science
EE 122: Lecture 7 Ion Stoica September 18, 2001.
Distributed Systems CS
Chapter 3 Part 3 Switching and Bridging
Lecture 16, Computer Networks (198:552)
Routing and the Network Layer (ref: Interconnections by Perlman
2019/5/13 A Weighted ECMP Load Balancing Scheme for Data Centers Using P4 Switches Presenter:Hung-Yen Wang Authors:Peng Wang, George Trimponias, Hong Xu,
Congestion Control and Resource Allocation
Achieving Resilient Routing in the Internet
TCP: Transmission Control Protocol Part II : Protocol Mechanisms
Review of Internet Protocols Transport Layer
Lecture 6, Computer Networks (198:552)
Distributed Systems CS
Presentation transcript:

Fast Congestion Control in RDMA-Based Datacenter Networks Jiachen Xue, Muhammad Usama Chaudhry, Balajee Vamanan, Mithuna Thottethodi, and T. N. Vijaykumar DASR MOTIVATION METHODOLODY RDMA is becoming prevalent in datacenters Direct apportioning of sending rates Micro-benchmark (20 node test bed) each node equipped with eight 4-core AMD Opteron processors running at 2.8GHz, 256 GB of RAM Mellanox connectX-3 HCA (56Gbps) Workload Micro-benchmark: synthetic workload with fixed size messages RDMA shortens latency by a factor of 50 compared to TCP potential replacement for TCP in datacenters Inefficiencies of existing RDMA networks Provides hop-by-hop flow control and rate based end-to-end congestion control suboptimal for the well-known datacenter congestion problem, called incast where multiple flows collide at a switch causing queueing delays and long latency tails Convergence can be slow Congestion control schemes react to congestion by Detecting congestion: RTT, ECN, Reacting to congestion: Throttle at senders; converge to desired rate over several iterations At-scale simulations (ns-3) 1024 hosts connected in an over-subscribed Clos topology with over-subscription factor of 4 Workload Based on real datacenter traffic characteristics sender continues to transmit at the line rate without throttling as it sees the ‘N=1’ in the Acks from the receiver. when a second sender initiates a flow to the same receiver, the receiver piggy-backs the updated ‘N=2’ value to each sender, throttling their rates to half of the line rate, which can be sustained in steady state. MICRO-BENCHMARK RESULT DASR reduces the median and tails by 2.5 – 3.3x DART STATE MACHINE State Machine to handle receiver and in-network congestion Key Observation While general congestion is complex and may require iterative convergence, the simpler and common case of receiver congestion can be addressed quicker via specialization. AT-SCALE SIMULATIONS 99th Percentile of Short Flows the absence of ECN marks and low receiver throughput requires no action line-rate achieved but no ECN marks indicates no contention at receiver receiver congestion handles by DASR by piggybacking the ’n’ values while omitting ECN marks the receiver throughput lower than the line-rate and presence of ECN marks indicates in-network congestion and DART falls back to DCQCN and include ECN marks in ACKs INTRODUCTION DART reduces tail latency by 4.8x Throughput of Long Flows Our proposal: Dart employs a divide-and-specialize approach to congestion control; Addresses receiver congestion via direct apportioning of sending rates by using the sender count to achieve accurate and faster, one-RTT convergence of sending rates than previous schemes which are iterative; Addresses spatially-localized in-network congestion via in-order flow deflection whereas previous schemes reorder packets which is not supported by RDMA. DART achieves 58% higher throughput IN-ORDER FLOW DEFLECTION CONCLUDING REMARKS DART deflects packets of short flows to avoid serialization penalty Dart divide-and-specialize approach isolates common case of receiver congestion and Further sub-divides the remaining in- network congestion into the simpler spatially-localized and the harder spatially-dispersed cases Dart converges to the desired sending rate in one RTT and achieves 60% (2.5x) lower latency than and similar throughput as InfiniBand Packet routing with DFT lookup maintains a small content-addressable memory (CAM) called deflected flow table (DFT) at each switch entries in the DFT are allocated when the start packet of an RDMA message is chosen for deflection PREVIOUS WORK Congestion Control: DCTCP modulates sending rate by observing ECN marks in each RTT DCQCN uses ECN marks and provides congestion control for RoCE Timely uses round-trip times (RTT) measurements for rate control all these schemes needs many RTTs to converge to the appropriate sending rate ABOUT THE AUTHORS Jiachen Xue (xuej@purdue.edu) Ph.D. Student at Purdue University (ECE Department) Muhammad Usama Chaudhry (mchaud30@uic.edu) MS Student at UIC (CS Department) Balajee Vamanan (bvamanan@uic.edu) Assistant Professor at UIC (CS Department) https://www.cs.uic.edu/~balajee Mithuna Thottethodi (mithuna@purdue.edu) Professor at Purdue University (ECE Department) https://engineering.purdue.edu/~mithuna T. N. Vijaykumar (vijay@ecn.purdue.edu) https://engineering.purdue.edu/~vijay Entries in the DFT are deallocated when the end of an RDMA message passed through the switch Load Balancing: DIBS deflect packets randomly to avoid dropping of the packet DRILL distributes load at granularity of packets based on queues and randomization algorithm Hermes reroutes based on global congestion and failure all these schemes require reordering mechanism at end-host but RDMA does not support packet reordering