ICTCP: Incast Congestion Control for TCP in Data Center Networks∗

Slides:



Advertisements
Similar presentations
Martin Suchara, Ryan Witt, Bartek Wydrowski California Institute of Technology Pasadena, U.S.A. TCP MaxNet Implementation and Experiments on the WAN in.
Advertisements

Congestion Control and Fairness Models Nick Feamster CS 4251 Computer Networking II Spring 2008.
TCP Vegas: New Techniques for Congestion Detection and Control.
CSIT560 Internet Infrastructure: Switches and Routers Active Queue Management Presented By: Gary Po, Henry Hui and Kenny Chong.
Improving TCP Performance over Mobile Ad Hoc Networks by Exploiting Cross- Layer Information Awareness Xin Yu Department Of Computer Science New York University,
1 End to End Bandwidth Estimation in TCP to improve Wireless Link Utilization S. Mascolo, A.Grieco, G.Pau, M.Gerla, C.Casetti Presented by Abhijit Pandey.
Restricted Slow-Start for TCP William Allcock 1,2, Sanjay Hegde 3 and Rajkumar Kettimuthu 1,2 1 Argonne National Laboratory 2 The University of Chicago.
Copyright © 2005 Department of Computer Science 1 Solving the TCP-incast Problem with Application-Level Scheduling Maxim Podlesny, University of Waterloo.
The War Between Mice and Elephants Presented By Eric Wang Liang Guo and Ibrahim Matta Boston University ICNP
Congestion control in data centers
Transport Layer3-1 Congestion Control. Transport Layer3-2 Principles of Congestion Control Congestion: r informally: “too many sources sending too much.
1 Lecture 10: TCP Performance Slides adapted from: Congestion slides for Computer Networks: A Systems Approach (Peterson and Davis) Chapter 3 slides for.
CSCE 515: Computer Network Programming Chin-Tser Huang University of South Carolina.
Week 9 TCP9-1 Week 9 TCP 3 outline r 3.5 Connection-oriented transport: TCP m segment structure m reliable data transfer m flow control m connection management.
High-performance bulk data transfers with TCP Matei Ripeanu University of Chicago.
1 TCP Transport Control Protocol Reliable In-order delivery Flow control Responds to congestion “Nice” Protocol.
1 Chapter 3 Transport Layer. 2 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4.
1 Emulating AQM from End Hosts Presenters: Syed Zaidi Ivor Rodrigues.
Data Communication and Networks
1 Manpreet Singh, Prashant Pradhan* and Paul Francis * MPAT: Aggregate TCP Congestion Management as a Building Block for Internet QoS.
1 K. Salah Module 6.1: TCP Flow and Congestion Control Connection establishment & Termination Flow Control Congestion Control QoS.
TCP Congestion Control
Lect3..ppt - 09/12/04 CIS 4100 Systems Performance and Evaluation Lecture 3 by Zornitza Genova Prodanoff.
IA-TCP A Rate Based Incast- Avoidance Algorithm for TCP in Data Center Networks Communications (ICC), 2012 IEEE International Conference on 曾奕勳.
TCP Incast in Data Center Networks
3: Transport Layer3b-1 Principles of Congestion Control Congestion: r informally: “too many sources sending too much data too fast for network to handle”
Transport Layer 4 2: Transport Layer 4.
Transport Layer3-1 Chapter 3 outline r 3.1 Transport-layer services r 3.2 Multiplexing and demultiplexing r 3.3 Connectionless transport: UDP r 3.4 Principles.
Transport Layer3-1 Chapter 3 outline r 3.1 Transport-layer services r 3.2 Multiplexing and demultiplexing r 3.3 Connectionless transport: UDP r 3.4 Principles.
Principles of Congestion Control Congestion: informally: “too many sources sending too much data too fast for network to handle” different from flow control!
UDT: UDP based Data Transfer Protocol, Results, and Implementation Experiences Yunhong Gu & Robert Grossman Laboratory for Advanced Computing / Univ. of.
B 李奕德.  Abstract  Intro  ECN in DCTCP  TDCTCP  Performance evaluation  conclusion.
ACN: RED paper1 Random Early Detection Gateways for Congestion Avoidance Sally Floyd and Van Jacobson, IEEE Transactions on Networking, Vol.1, No. 4, (Aug.
Chapter 12 Transmission Control Protocol (TCP)
27th, Nov 2001 GLOBECOM /16 Analysis of Dynamic Behaviors of Many TCP Connections Sharing Tail-Drop / RED Routers Go Hasegawa Osaka University, Japan.
Transport Layer3-1 Announcements r Collect homework r New homework: m Ch3#3,4,7,10,12,16,18-20,25,26,31,33,37 m Due Wed Sep 24 r Reminder: m Project #1.
TCP Trunking: Design, Implementation and Performance H.T. Kung and S. Y. Wang.
Copyright 2008 Kenneth M. Chipps Ph.D. Controlling Flow Last Update
Transport Layer3-1 TCP throughput r What’s the average throughout of TCP as a function of window size and RTT? m Ignore slow start r Let W be the window.
A Self-Configuring RED Gateway Wu-chang Feng, Dilip Kandlur, Debanjan Saha, Kang Shin INFOCOM ‘99.
AQM & TCP models Courtesy of Sally Floyd with ICIR Raj Jain with OSU.
1 Transport Layer Lecture 10 Imran Ahmed University of Management & Technology.
Transport Layer3-1 Chapter 3 outline r 3.1 Transport-layer services r 3.2 Multiplexing and demultiplexing r 3.3 Connectionless transport: UDP r 3.4 Principles.
T. S. Eugene Ngeugeneng at cs.rice.edu Rice University1 COMP/ELEC 429/556 Introduction to Computer Networks Principles of Congestion Control Some slides.
ECE 4110 – Internetwork Programming
Random Early Detection (RED) Router notifies source before congestion happens - just drop the packet (TCP will timeout and adjust its window) - could make.
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
TCP continued. Discussion – TCP Throughput TCP will most likely generate the saw tooth type of traffic. – A rough estimate is that the congestion window.
1 Three ways to (ab)use Multipath Congestion Control Costin Raiciu University Politehnica of Bucharest.
Shuihai Hu, Wei Bai, Kai Chen, Chen Tian (NJU), Ying Zhang (HP Labs), Haitao Wu (Microsoft) Sing Hong Kong University of Science and Technology.
ICTCP: Incast Congestion Control for TCP in Data Center Networks By: Hilfi Alkaff.
Low-Latency Software Rate Limiters for Cloud Networks
Topics discussed in this section:
Approaches towards congestion control
Chapter 3 outline 3.1 transport-layer services
OTCP: SDN-Managed Congestion Control for Data Center Networks
The Transport Layer (TCP)
Chapter 6 TCP Congestion Control
COMP 431 Internet Services & Protocols
Chapter 3 outline 3.1 Transport-layer services
TCP-LP Distributed Algorithm for Low-Priority Data Transfer
AMP: A Better Multipath TCP for Data Center Networks
Chapter 6 TCP Congestion Control
COMP/ELEC 429/556 Introduction to Computer Networks
SICC: SDN-Based Incast Congestion Control For Data Centers Ahmed M
CS4470 Computer Networking Protocols
Transport Layer: Congestion Control
Chapter 3 outline 3.1 Transport-layer services
TCP flow and congestion control
Review of Internet Protocols Transport Layer
Presentation transcript:

ICTCP: Incast Congestion Control for TCP in Data Center Networks∗ Haitao Wu ⋆ , Zhenqian Feng ⋆ †, Chuanxiong Guo ⋆ , Yongguang Zhang ⋆ {hwu, v-zhfe, chguo, ygz}@microsoft.com, ⋆ Microsoft Research Asia, China †School of computer, National University of Defense Technology, China B99106017 圖資三 謝宗昊

Outline Background Design Rationale Algorithm Implementation Experimental results Discussion and conclusion

Outline Background Design Rationale Algorithm Implementation Experimental results Discussions, related work and conclusion

Background In distributed file systems, files are stored at multiple servers. TCP does not work well for many-to-one traffic pattern on high-bandwidth, low-latency networks.

Background Three preconditions of data center Be well structured and layered to achieve high-bandwidth and low-latency. Buffer size of ToR (top-of-rack) Barrier synchronized many-to-one traffic pattern is common in data center network Transmission data volume for such traffic pattern is usually small

Background TCP incast collapse Previous solution Due to multiple connections overflow the Ethernet switch buffer in a short period of time. Intense packet losses and thus TCP retransmission and timeout Previous solution Reducing the waiting time for packet loss Control switch buffer occupation to avoid overflow by using ECN and modified TCP at both sender and receiver side

Background This paper focus on: Avoiding packet losses before incast congestion Modify TCP receiver only Receiver side knows the throughput of all TCP connections and the available bandwidth

Background Well controlling the receive windows is challenging Receive window should be small enough to avoid incast congestion Also should be large enough for good performance and other non-incast cases Good setting for one scenario may not fit well to others

Background The technical novelities in this paper: Use the available bandwidth as a quota to coordinate the receive window increase Per flow congestion control is performed independently in slotted time of RTT on each connection Receive window adjustment is based on the ratio of difference of measured and expected throughput over expected one

Background TCP incast congestion “Goodput” is thorughput obtained and observed at applicaiotn layer TCP incast congestion Happen when multiple sending servers under the same ToR switch send to one receiver server simultaneously TCP throughput is severely degraded on incast congestion

Background TCP goodput, receive window and RTT A small static TCP receive buffer may prevent TCP incast congestion collaspe → Can’t work dynamically Requires either losses or ECN marks to trigger windows decrease

Background TCP goodput, receive window and RTT TCP Vegas: Make the assumption that increase of RTT is only caused by packet queuing at bottleneck buffer. Unfortunately, the increase of TCP RTT in high-bandwidth, low-latency does not follow such assumption

Outline Background Design Rationale Algorithm Implementation Experimental results Discussion and conclusion

Design Rationale Goal Improve TCP performance for incast congestion. No new TCP option or modification to TCP header.

Design Rationale Three observation which form the base for ICTCP Available bandwidth at receiver side is the signal for receiver to do congestion control. The frequency of receive window based congestion control should be made according to the per-flow feedback-loop independenty A receive window based scheme should adjust the window according to both link congestion status and also application requirement. Set a proper receiver window to all TCP connections sharing the same last-hop Due to the parallel TCP connections may belong to the same job

Outline Background Design Rationale Algorithmn Implementation Experimental results Disscussion and conclusion

Algorithm Available bandwidth C: The link capacity of the interface on receiver server BWT: Bandwidth of total incoming traffic observed on that interface : :Parameter to absorb potential oversubscribed during windows adjustment BWA: The quota of all incoming connections to increase receive window for higher throughtput

Algorithm Available bandwidth

Algorithm Window adjustment on single connection : Incoming measured throughput : Sample of current throughput (on connection i)

Algorithm Window adjustment on single connection : : Expected throughput : Receive window of I We have the max procedure to endure <=

Algorithm Window adjustment on single connection : The ratio of throughput difference of connection i <= , thus \

Algorithm Window adjustment on single connection We have two thresholds , ( > )to differentiate three case: <= or <= → increase receive window if in global second sub-slot and having enough quota of available bandwidth → decrease receive window by one MSS^2 if this condtion hold for three continuous RTT Otherwise, keep current receive window Initiate newly established or long time idle connection in slow start Go into congestion avoidance when above second and third is met, or the first case is met but no enough quota

Algorithm Fairness controller for multiple connections Fairness is only considered among low-latency flows For windows decrease, cut the receive window by MSS^3, for connections that have receive window larger than average. For windows increase, be automatically achieved by algorithm we have talked about.

Outline Background Design Rationale Algorithm Implementation Experimental results Discussion and conclusion

Implement Develop ICTCP as a NDIS driver on Windows OS. Naturally supports the case for virtual machine The incoming throughput in very short time scale can be easily obtained. Does not touch TCP/UP implementation in Windows kernel.

Implement Redirect the packet to header parser module Packet header is parsed and the information on flow table is updated Algorithm module is responsible for receive window calculation If a TCP ACK packet is sent out, the header modifier change the receive window field in TCP header if need.

Implement Support for Virtual Machines The total capacity of virtual NICs is typically configured high than physical as most virtual machine won’t be busy at the same time The observed virtual link capacity and available bandwidth does not represent the real value There are two solution Change the setting to make the total capacity of virtual NICs equal to physical NIC Deploy a ICTCP driver on virtual machine host server

Implement Obtain fine-grained RTT at receiver Define the reverse RTT as the RTT after a exponential filter at the TCP receiver side. The reverse RTT can be obtained in data traffic on both side. The data traffic on reverse direction may not be enough for keep obtaining live reverse RTT → Use TCP timestamp For implement, modify the timestamp counter into 100ns granularity

Outline Background Design Rationale Algorithm Implementation Experimental results Discussion and conclusion

Experimental results

Experimental results

Experimental results

Experimental results

Experimental results

Experimental results

Experimental results

Experimental results

Outline Background Design Rationale Algorithm Implementation Experimental results Discussion and conclusion

Discussion and Conclusion Discussion three issues Scalability: if the number of connections become extremely large Switching the receive window between several value How to handle congestion while sender and receiver are not under the same switch Use ECN to obtain congestion information Whether ICTCP works for future high-bandwidth low-latency network The switch buffer should be enlarged correspondingly The MSS should be enlarged.

Discussion and Conclusion Focus on receiver based congestion control to prevent packet loss Adjust TCP receive window on the ratio of difference of achieved and expected per connection throughput Experimental results show that ICTCP is effective to avoid congestion

Thanks for listening