TCP Tuning Domenico Vicinanza DANTE, Cambridge, UK EGI Technical Forum 2013, Madrid, Spain.

Slides:



Advertisements
Similar presentations
Introduction 1 Lecture 13 Transport Layer (Transmission Control Protocol) slides are modified from J. Kurose & K. Ross University of Nevada – Reno Computer.
Advertisements

Umut Girit  One of the core members of the Internet Protocol Suite, the set of network protocols used for the Internet. With UDP, computer.
1 Transport Protocols & TCP CSE 3213 Fall April 2015.
CCNA – Network Fundamentals
Guide to TCP/IP, Third Edition
UDP & TCP Where would we be without them!. UDP User Datagram Protocol.
Provides a reliable unicast end-to-end byte stream over an unreliable internetwork.
BZUPAGES.COM 1 User Datagram Protocol - UDP RFC 768, Protocol 17 Provides unreliable, connectionless on top of IP Minimal overhead, high performance –No.
Transport Layer – TCP (Part1) Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF.
Chapter 7 – Transport Layer Protocols
Copyright 1999, S.D. Personick. All Rights Reserved. Telecommunications Networking II Lecture 32 Transmission Control Protocol (TCP) Ref: Tanenbaum pp:
TCP loss sensitivity analysis ADAM KRAJEWSKI, IT-CS-CE.
1 TCP CSE May TCP Services Flow control Connection establishment and termination Congestion control 2.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 OSI Transport Layer Network Fundamentals – Chapter 4.
Netprog: TCP Details1 TCP Details Introduction to Networking John Otto TA Jan 31, 2007 Recital 4.
CSCE 515: Computer Network Programming TCP Details Wenyuan Xu Department of Computer Science and Engineering.
Department of Electronic Engineering City University of Hong Kong EE3900 Computer Networks Transport Protocols Slide 1 Transport Protocols.
TCP. Learning objectives Reliable Transport in TCP TCP flow and Congestion Control.
WXES2106 Network Technology Semester /2005 Chapter 8 Intermediate TCP CCNA2: Module 10.
Gursharan Singh Tatla Transport Layer 16-May
Process-to-Process Delivery:
Lect3..ppt - 09/12/04 CIS 4100 Systems Performance and Evaluation Lecture 3 by Zornitza Genova Prodanoff.
1 Transport Layer Computer Networks. 2 Where are we?
Chapter 5 Transport layer With special emphasis on Transmission Control Protocol (TCP)
3: Transport Layer3b-1 TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581 r full duplex data: m bi-directional data flow in same connection m MSS: maximum.
TCOM 509 – Internet Protocols (TCP/IP) Lecture 04_b Transport Protocols - TCP Instructor: Dr. Li-Chuan Chen Date: 09/22/2003 Based in part upon slides.
TCP : Transmission Control Protocol Computer Network System Sirak Kaewjamnong.
TCP Lecture 13 November 13, TCP Background Transmission Control Protocol (TCP) TCP provides much of the functionality that IP lacks: reliable service.
26-TCP Dr. John P. Abraham Professor UTPA. TCP  Transmission control protocol, another transport layer protocol.  Reliable delivery  Tcp must compensate.
University of the Western Cape Chapter 12: The Transport Layer.
TCP1 Transmission Control Protocol (TCP). TCP2 Outline Transmission Control Protocol.
The Transmission Control Protocol (TCP) Application Services (Telnet, FTP, , WWW) Reliable Stream Transport (TCP) Connectionless Packet Delivery.
Transport Layer Moving Segments. Transport Layer Protocols Provide a logical communication link between processes running on different hosts as if directly.
1 TCP: Reliable Transport Service. 2 Transmission Control Protocol (TCP) Major transport protocol used in Internet Heavily used Completely reliable transfer.
HighSpeed TCP for High Bandwidth-Delay Product Networks Raj Kettimuthu.
Copyright 2008 Kenneth M. Chipps Ph.D. Controlling Flow Last Update
Chapter 24 Transport Control Protocol (TCP) Layer 4 protocol Responsible for reliable end-to-end transmission Provides illusion of reliable network to.
What is TCP? Connection-oriented reliable transfer Stream paradigm
1 CS 4396 Computer Networks Lab TCP – Part II. 2 Flow Control Congestion Control Retransmission Timeout TCP:
Transmission Control Protocol (TCP) BSAD 146 Dave Novak Sources: Network+ Guide to Networks, Dean 2013.
TCP continued. Discussion – TCP Throughput TCP will most likely generate the saw tooth type of traffic. – A rough estimate is that the congestion window.
© Janice Regan, CMPT 128, CMPT 371 Data Communications and Networking Congestion Control 0.
IP1 The Underlying Technologies. What is inside the Internet? Or What are the key underlying technologies that make it work so successfully? –Packet Switching.
McGraw-Hill Chapter 23 Process-to-Process Delivery: UDP, TCP Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
TCP/IP1 Address Resolution Protocol Internet uses IP address to recognize a computer. But IP address needs to be translated to physical address (NIC).
Transmission Control Protocol (TCP) TCP Flow Control and Congestion Control CS 60008: Internet Architecture and Protocols Department of CSE, IIT Kharagpur.
11 CS716 Advanced Computer Networks By Dr. Amir Qayyum.
1 TCP ProtocolsLayer name DNSApplication TCP, UDPTransport IPInternet (Network ) WiFi, Ethernet Link (Physical)
3. END-TO-END PROTOCOLS (PART 1) Rocky K. C. Chang Department of Computing The Hong Kong Polytechnic University 22 March
Chapter 9: Transport Layer
Fast Retransmit For sliding windows flow control we waited for a timer to expire before beginning retransmission of a packet TCP uses an additional mechanism.
Instructor Materials Chapter 9: Transport Layer
Chapter 3 outline 3.1 transport-layer services
Transmission Control Protocol (TCP)
5. End-to-end protocols (part 1)
Process-to-Process Delivery, TCP and UDP protocols
TCP.
TCP.
Transport Layer Unit 5.
TCP - Part II Relates to Lab 5. This is an extended module that covers TCP flow control, congestion control, and error control in TCP.
Process-to-Process Delivery:
PUSH Flag A notification from the sender to the receiver to pass all the data the receiver has to the receiving application. Some implementations of TCP.
Transport Protocols: TCP Segments, Flow control and Connection Setup
Transport Layer: Congestion Control
Transport Protocols: TCP Segments, Flow control and Connection Setup
Process-to-Process Delivery: UDP, TCP
TCP flow and congestion control
Computer Networks Protocols
Review of Internet Protocols Transport Layer
Presentation transcript:

TCP Tuning Domenico Vicinanza DANTE, Cambridge, UK EGI Technical Forum 2013, Madrid, Spain

2 Connect | Communicate | Collaborate TCP Transmission Control Protocol (TCP) One of the original core protocols of the Internet protocol suite (IP) >90% of the internet traffic Transport layer Delivery of a stream of bytes between programs running on computers connected to a local area network, intranet or the public Internet. TCP communication is: Connection oriented Reliable Ordered Error-checked Web browsers, mail servers, file transfer programs use TCP

3 Connect | Communicate | Collaborate Connection-Oriented A connection is established before any user data is transferred. If the connection cannot be established the user program is notified. If the connection is ever interrupted the user program(s) is notified.

4 Connect | Communicate | Collaborate TCP uses a sequence number to identify each byte of data. Sequence number identifies the order of the bytes sent Data can be reconstructed in order regardless: Fragmentation Disordering Packet loss that may occur during transmission. For every payload byte transmitted, the sequence number is incremented. Reliable

5 Connect | Communicate | Collaborate The block of data that TCP asks IP to deliver is called a TCP segment. Each segment contains: Data Control information TCP Segments

6 Connect | Communicate | Collaborate TCP Segment Format Destination Port Options (if any) Data 1 byte Source Port Sequence Number Acknowledgment Number 1 byte offset Reser. Control Window Checksum Urgent Pointer

7 Connect | Communicate | Collaborate Client Starts A client starts by sending a SYN segment with the following information: Client’s ISN (generated pseudo-randomly) Maximum Receive Window for client. Optionally (but usually) MSS (largest datagram accepted).

8 Connect | Communicate | Collaborate Server Response When a waiting server sees a new connection request, the server sends back a SYN segment with: Server’s ISN (generated pseudo-randomly) Request Number is Client ISN+1 Maximum Receive Window for server. Optionally (but usually) MSS

9 Connect | Communicate | Collaborate Connection established! When the Server’s SYN is received, the client sends back an ACK with: Acknowledgment Number is Server’s ISN+1

10 Connect | Communicate | Collaborate SYN ISN=X SYN ISN=X Client Server SYN ISN=Y ACK=X+1 SYN ISN=Y ACK=X+1 ACK=Y+1 In blocks:

11 Connect | Communicate | Collaborate Cumulative acknowledgement Cumulative acknowledgment: The receiver sends an acknowledgment when it has received all data preceding the acknowledged sequence number. Inefficient when packets are lost. Example: 10,000 bytes are sent in 10 different TCP packets and the first packet is lost during transmission. The receiver cannot say that it received bytes 1,000 to 9,999 successfully Thus the sender may then have to resend all 10,000 bytes. TCP Packets

12 Connect | Communicate | Collaborate Selective acknowledgment Selective acknowledgment (SACK) option is defined in RFC 2018 Acknowledge discontinuous blocks of packets received correctly The acknowledgement can specify a number of SACK blocks In the previous example above: The receiver would send SACK with sequence numbers 1000 and The sender thus retransmits only the first packet, bytes 0 to 999. TCP Packets

13 Connect | Communicate | Collaborate TCP works by: buffering data at sender and receiver Buffering Image source:

14 Connect | Communicate | Collaborate Dynamic transmission tuning Data to send are temporarily stored in a send buffer where it stays until the data is ACK’d. The TCP layer won’t accept data from the application unless there is buffer space. Both the client and server announce how much buffer space remains Window field in a TCP segment, with every ACK TCP can know when it is time to send a datagram.

15 Connect | Communicate | Collaborate Flow control Limits the sender rate to guarantee reliable delivery. Avoid flooding The receiver continually hints the sender on how much data can be received When the receiving host buffer fills the next ack contains a 0 in the window size this stop transfer and allow the data in the buffer to be processed.

16 Connect | Communicate | Collaborate TCP Tuning Adjust the network congestion avoidance parameters for TCP Typically used over high-bandwidth, high-latency networks Long-haul links (Long Fat Networks) Intercontinental circuits Well-tuned networks can perform up to many times faster

17 Connect | Communicate | Collaborate Tuning buffers Most operating systems limit the amount of system memory that can be used by a TCP connection. Maximum TCP Buffer (Memory) space. Default max values are typically too small for network measurement and troubleshooting purposes. Linux (as many OSes) supports separate send and receive buffer limits Buffer limits can be adjusted by The user The application Other mechanisms within the maximum memory limits above.

18 Connect | Communicate | Collaborate BDP Bandwidth Delay Product BDP=Bandwidth * Round Trip Time Number of bytes in flight to fill path Max number of un-acknowledged packets on the wire Max number of simultaneous bits in transit between the transmitter and the receiver. High performance networks have very large BDPs.

19 Connect | Communicate | Collaborate TCP receive buffer

20 Connect | Communicate | Collaborate Optimising buffers TCP receiver and sender buffers needs tuning They should be ideally equal to BDP to achieve maximum throughput The sending side should also allocate the same amount of memory After data has been sent on the network the sending side must hold it in memory until it has been ack’d If the receiver is far away, acks will take a long time to arrive. If the send memory is small, it can saturate and block transmission.

21 Connect | Communicate | Collaborate Madrid-Mumbai Bandwidth

22 Connect | Communicate | Collaborate Madrid-Mumbai Retransmission

23 Connect | Communicate | Collaborate BDP as optimal buffer parameter Bandwidth increases with buffer size until it reaches BDP RTT ~168ms Bandwidth limit to 1GE interface  ~20 Mbytes. Bandwidth increases with buffer size until it reaches BDP RTT ~168ms Bandwidth limit to 1GE interface  ~20 Mbytes.

24 Connect | Communicate | Collaborate Checking send and receive buffers To check the current value type either: $ sysctl net.core.rmem_max net.core.rmem_max = $ sysctl net.core.wmem_max net.core.wmem_max = or $ cat /proc/sys/net/core/rmem_max $ cat /proc/sys/net/core/wmem_max 65535

25 Connect | Communicate | Collaborate Setting send and receive buffers To change those value simply type: sysctl -w net.core.rmem_max= sysctl -w net.core.wmem_max= In this example the value 32MByte has been chosen: 32 x 1024 x 1024 = Byte

26 Connect | Communicate | Collaborate Autotuning buffers Automatically tunes the TCP receive window size for each individual connection Based on BDP and rate at which the application reads data from the connection Linux autotuning TCP buffer limits can be also tuned Arrays of three values: minimum, initial and maximum buffer size. Used to: Set the bounds on autotuning Balance memory usage while under memory stress. Controls on the actual memory usage (not just TCP window size) So it includes memory used by the socket data structures The maximum values have to be larger than the BDP Example: for a BDP of the order of 20MB, we can chose 32MB

27 Connect | Communicate | Collaborate Check and set autotuning buffers To check the TCP autotuning buffers we can use sysctl: $ sysctl net.ipv4.tcp_rmem $ sysctl net.ipv4.tcp_wmem It is best to set it to some optimal value for typical small flows. Excessively large initial buffer waste memory and can even hurt performance. To set them: $ sysctl -w net.ipv4.tcp_rmem=" " $ sysctl -w net.ipv4.tcp_wmem=" "

28 Connect | Communicate | Collaborate Checking and enabling autotuning TCP autotuning is normally enabled by default. To check type: $ sysctl net.ipv4.tcp_moderate_rcvbuf 1 or $ cat /proc/sys/net/ipv4/tcp_moderate_rcvbuf 1 If the parameter tcp_moderate_rcvbuf is present and has value 1 then autotuning is enabled. With autotuning, the receiver buffer size (and TCP window size) is dynamically updated (autotuned) for each connection If not enabled, it is possible to enabled it by typing: $ sysctl -w net.ipv4.tcp_moderate_rcvbuf=1

29 Connect | Communicate | Collaborate Interface queue length Improvement at NIC driver level Increase the size of the interface queue. To do this, run the following command. $ ifconfig eth0 txqueuelen 1000 TXQueueLen: max size of packets that can be buffered on the egress queue of a linux net interface. Higher queues: more packets can be buffered and hence not lost. In TCP, an overflow of this queue will cause loss TCP will enter in the congestion control mode

30 Connect | Communicate | Collaborate Additional tuning Verify that the following variables are all set to the default value of 1 net.ipv4.tcp_window_scaling net.ipv4.tcp_timestamps net.ipv4.tcp_sack Otherwise set them using $ sysctl –w net.ipv4.tcp_window_scaling = 1 $ sysctl –w net.ipv4.tcp_timestamps = 1 $ sysctl –w net.ipv4.tcp_sack = 1

31 Connect | Communicate | Collaborate What not to change We suggest not to adjust tcp_mem unless there is some specific need. It is an array that determines how the system balances the total network buffer space against all other LOWMEM memory usage. Initialized at boot time to appropriate fractions of the available system memory. In the same way there is normally no need to adjust rmem_default or wmem_default These are the default buffer sizes for non-TCP sockets (e.g. unix domain and UDP sockets).

32 Connect | Communicate | Collaborate Congestion window and slow start Congestion window: Estimation how much congestion there is between sender and receiver It is maintained at the sender Slow start: increase the congestion window after a connection is initialized and after a timeout. It starts with a window of 1 maximum segment size (MSS). For every packet acknowledged, the congestion window increases by 1 MSS The congestion window effectively doubles for every round trip time (RTT). Actually not so slow…

33 Connect | Communicate | Collaborate TCP Congestion control Initially one algorithm available Reno Linear increment of the congestion window It typically drops to half the size when a packet is lost Starting from Linux 2.6.7, alternative congestion control algorithms were implemented recover quickly from packet loss on high-speed and high BDP networks. The choice of congestion control options is selected when the kernel is built.

34 Connect | Communicate | Collaborate Some congestion control examples The following are some of the options are available in the 2.6 kernel: reno: Traditional TCP used by almost all other OSes (default with old Linux kernel). It adjusts congestion window based on packet loss. The slow start has an additive Increase window on each Ack and a Multiplicative Decrease on loss cubic: Faster (cubic function) recovery on packet loss Efficient for high-BDP network bic: Combines two schemes called additive increase and binary search increase. It promises fairness as well as good scalability. Under small congestion windows, binary search increase is designed to provide TCP friendliness. Default congestion-control in many Linux distribution

35 Connect | Communicate | Collaborate Some congestion control examples Cont. hstcp: An adaptive algorithm that: Increases its additive increase parameter and decreases its decrease parameter in relation to the current congestion window size. vegas: It measure bandwidth based on RTT and adjust congestion window on bandwidth westwood: optimized for lossy networks. The focus in on wireless networks (where packet loss does not necessarily mean congestion). htcp: Hamilton TCP: Optimized congestion control algorithm for high speed networks with high latency (LFN: Long Fat Networks). Hamilton TCP increases the rate of additive increase as the time since the previous loss increases. This avoids the problem of making flows more aggressive if their windows are already large (cubic).

36 Connect | Communicate | Collaborate Congestion control: Reno with two flows

37 Connect | Communicate | Collaborate Congestion control: BIC with two flows

38 Connect | Communicate | Collaborate Congestion control: Hamilton with two flows

39 Connect | Communicate | Collaborate During the next session… Practical examples about TCP buffers Use perfSONAR web UI

40 Connect | Communicate | Collaborate | | Connect | Communicate | Collaborate

41 Connect | Communicate | Collaborate Checking and setting congestion control To get a list of congestion control algorithms that are available in your kernel, run: $ sysctl net.ipv4.tcp_available_congestion_control net.ipv4.tcp_available_congestion_control = cubic reno bic To know which is the congestion control in use $ sysctl net.ipv4.tcp_congestion_control reno To set the congestion control sysctl -w net.ipv4.tcp_congestion_control=cubic

42 Connect | Communicate | Collaborate Final considerations Large MTUs: Linux host is configured to use 9K MTUs But the connection is using 1500 byte packets,  then it is actually needed 9/1.5 = 6 times more buffer space in order to fill the pipe.  In fact some device drivers only allocate memory in power of two sizes, so it could be even needed 16/1.5 = 11 times more

43 Connect | Communicate | Collaborate Final considerations (cont.) Very large BDP paths: For very large BDP links (>20 MB) there could be some Linux SACK implementation problem. Too many packets in flight when it gets a SACK event  too long to locate the SACKed packet  TCP timeout and CWND goes back to 1 packet. Restricting the TCP buffer size to about 12 MB seems to avoid this problem – but clearly limits the total throughput. Another solution is to disable SACK

44 Connect | Communicate | Collaborate GEANT Slides

45 Connect | Communicate | Collaborate Europe’s 100Gbps Network - e-Infrastructure for the “data deluge” Latest transmission and switching technology Routers with 100Gbps capability Optical transmission platform designed to provide 500Gbps super-channels 12,000km of dark fibre Over 100,000km of leased capacity (including transatlantic connections) 28 main sites covering European footprint

46 Connect | Communicate | Collaborate GÉANT Global Connectivity - at the heart of global research networking GÉANT connects 65 countries outside of Europe, reaching all continents through international partners

47 Connect | Communicate | Collaborate Supporting the growth of R&E Communities - transforming how researchers collaborate GÉANT delivers real value and benefit to society by enabling research communities to transform the way they collaborate on ground breaking research Together with Europe’s NRENs, GÉANT connects 50 million users in 10,000 institutions across Europe Health and Medicine | Energy | Environment | Particle Physics Radio Astronomy | Arts & Education | Society

48 Connect | Communicate | Collaborate Innovation through collaboration - for delivery of advanced networking services Building the GÉANT “eco-system” through development and delivery of a world-class networking service portfolio: Flexible connectivity options & test-bed facilities Performance tools & expertise Advanced AAI, cloud and mobility services Collaborative research into state-of-the-art technology network architectures - mobility, cloud, sensor, scientific content delivery, high-speed mobile identity and trust technologies paradigm shifts in service provisioning and management influencing global standards development Open Calls to widen the scope and agility for innovation Delivering innovative services to end users, their projects and institutions across Europe and beyond: secure access to the network and resources they need, when and where they want it.