Anders Magnusson TCP Tuning and E2E Performance TREFpunkt - October 20, 2004.

Slides:



Advertisements
Similar presentations
Martin Suchara, Ryan Witt, Bartek Wydrowski California Institute of Technology Pasadena, U.S.A. TCP MaxNet Implementation and Experiments on the WAN in.
Advertisements

Tuning and Evaluating TCP End-to-End Performance in LFN Networks P. Cimbál* Measurement was supported by Sven Ubik**
SHARKFEST '09 | Stanford University | June 15–18, 2009 Tips and Tricks: Case Studies Laura Chappell Founder, Wireshark University
TCP Performance over IPv6 Yoshinori Kitatsuji KDDI R&D Laboratories, Inc.
TCP transfers over high latency/bandwidth network & Grid TCP Sylvain Ravot
1 Transport Protocols & TCP CSE 3213 Fall April 2015.
BZUPAGES.COM 1 User Datagram Protocol - UDP RFC 768, Protocol 17 Provides unreliable, connectionless on top of IP Minimal overhead, high performance –No.
Restricted Slow-Start for TCP William Allcock 1,2, Sanjay Hegde 3 and Rajkumar Kettimuthu 1,2 1 Argonne National Laboratory 2 The University of Chicago.
CS162 Section Lecture 9. KeyValue Server Project 3 KVClient (Library) Client Side Program KVClient (Library) Client Side Program KVClient (Library) Client.
Ahmed El-Hassany CISC856: CISC 856 TCP/IP and Upper Layer Protocols Slides adopted from: Injong Rhee, Lisong Xu.
Börje Josefsson GRID Connections in. GigaSunet: GigaSunet: Nationwide 10 Gbit/s network Core network (22 cities): km of lambdas. 27 inter-city segments.
Fundamentals of Computer Networks ECE 478/578 Lecture #21: TCP Window Mechanism Instructor: Loukas Lazos Dept of Electrical and Computer Engineering University.
Transport Layer3-1 Congestion Control. Transport Layer3-2 Principles of Congestion Control Congestion: r informally: “too many sources sending too much.
CSCE 515: Computer Network Programming Chin-Tser Huang University of South Carolina.
CSCE 515: Computer Network Programming Chin-Tser Huang University of South Carolina.
Week 9 TCP9-1 Week 9 TCP 3 outline r 3.5 Connection-oriented transport: TCP m segment structure m reliable data transfer m flow control m connection management.
1 Chapter 3 Transport Layer. 2 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4.
Department of Electronic Engineering City University of Hong Kong EE3900 Computer Networks Transport Protocols Slide 1 Transport Protocols.
TCP. Learning objectives Reliable Transport in TCP TCP flow and Congestion Control.
TCP: flow and congestion control. Flow Control Flow Control is a technique for speed-matching of transmitter and receiver. Flow control ensures that a.
Courtesy: Nick McKeown, Stanford 1 TCP Congestion Control Tahir Azim.
3: Transport Layer3b-1 Principles of Congestion Control Congestion: r informally: “too many sources sending too much data too fast for network to handle”
Transport Layer 4 2: Transport Layer 4.
PA3: Router Junxian (Jim) Huang EECS 489 W11 /
CMPT 471 Networking II Address Resolution IPv4 ARP RARP 1© Janice Regan, 2012.
Principles of Congestion Control Congestion: informally: “too many sources sending too much data too fast for network to handle” different from flow control!
TCP Lecture 13 November 13, TCP Background Transmission Control Protocol (TCP) TCP provides much of the functionality that IP lacks: reliable service.
TCP1 Transmission Control Protocol (TCP). TCP2 Outline Transmission Control Protocol.
High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group.
2000 년 11 월 20 일 전북대학교 분산처리실험실 TCP Flow Control (nagle’s algorithm) 오 남 호 분산 처리 실험실
HighSpeed TCP for High Bandwidth-Delay Product Networks Raj Kettimuthu.
TCP Trunking: Design, Implementation and Performance H.T. Kung and S. Y. Wang.
Copyright 2008 Kenneth M. Chipps Ph.D. Controlling Flow Last Update
Networking Fundamentals. Basics Network – collection of nodes and links that cooperate for communication Nodes – computer systems –Internal (routers,
Chapter 24 Transport Control Protocol (TCP) Layer 4 protocol Responsible for reliable end-to-end transmission Provides illusion of reliable network to.
CS640: Introduction to Computer Networks Aditya Akella Lecture 15 TCP – III Reliability and Implementation Issues.
Computer Networking Lecture 18 – More TCP & Congestion Control.
TCP: Transmission Control Protocol Part II : Protocol Mechanisms Computer Network System Sirak Kaewjamnong Semester 1st, 2004.
1 CS 4396 Computer Networks Lab TCP – Part II. 2 Flow Control Congestion Control Retransmission Timeout TCP:
Transport Layer3-1 Chapter 3 outline r 3.1 Transport-layer services r 3.2 Multiplexing and demultiplexing r 3.3 Connectionless transport: UDP r 3.4 Principles.
Topic 3 Analysing network traffic
TCP transfers over high latency/bandwidth networks Internet2 Member Meeting HENP working group session April 9-11, 2003, Arlington T. Kelly, University.
Winter 2008CS244a Handout 71 CS244a: An Introduction to Computer Networks Handout 7: Congestion Control Nick McKeown Professor of Electrical Engineering.
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Chapter 13 TCP Implementation.
ECE 4110 – Internetwork Programming
Chapter 11.4 END-TO-END ISSUES. Optical Internet Optical technology Protocol translates availability of gigabit bandwidth in user-perceived QoS.
TCP continued. Discussion – TCP Throughput TCP will most likely generate the saw tooth type of traffic. – A rough estimate is that the congestion window.
TCP transfers over high latency/bandwidth networks & Grid DT Measurements session PFLDnet February 3- 4, 2003 CERN, Geneva, Switzerland Sylvain Ravot
© Janice Regan, CMPT 128, CMPT 371 Data Communications and Networking Congestion Control 0.
UDP : User Datagram Protocol 백 일 우
TCP/IP1 Address Resolution Protocol Internet uses IP address to recognize a computer. But IP address needs to be translated to physical address (NIC).
Transmission Control Protocol (TCP) TCP Flow Control and Congestion Control CS 60008: Internet Architecture and Protocols Department of CSE, IIT Kharagpur.
Computer Networks 1000-Transport layer, TCP Gergely Windisch v spring.
Window Control Adjust transmission rate by changing Window Size
Transmission Control Protocol (TCP) Retransmission and Time-Out
Topics discussed in this section:
The Transport Layer (TCP)
Chapter 3 outline 3.1 Transport-layer services
Chapter 5 TCP Sliding Window
Lecture 19 – TCP Performance
Automatic TCP Buffer Tuning
Network Core and QoS.
PUSH Flag A notification from the sender to the receiver to pass all the data the receiver has to the receiving application. Some implementations of TCP.
TCP Overview.
Transport Layer: Congestion Control
The Transport Layer Chapter 6.
TCP: Transmission Control Protocol Part II : Protocol Mechanisms
Network Core and QoS.
Lecture 6, Computer Networks (198:552)
Presentation transcript:

Anders Magnusson TCP Tuning and E2E Performance TREFpunkt - October 20, 2004

Anders Magnusson October 20, 2004 The speed-of-light problem The sender must store every sent packet until it has received an ACK from the receiver Due to the speed of light limitations this might take a while, even in small countries like Sweden Theoretical RTT Luleå-Stockholm is (1000/300000)*2 = 6.7ms, in reality 20ms TCP window size to keep up with 1Gbit/s must then be (1000/8)*.02 = 2.5Mbyte

Anders Magnusson October 20, 2004 Operating system buffers Inside the operating system kernel there are usually a bunch of different buffers affecting performance The term buffers is somewhat misleading, usually it is just some sort of data structure that is used to reference data in memory (but in theory it could as well be real buffers)

Anders Magnusson October 20, 2004 TCP window buffers The TCP window sizes can be adjusted on virtually all operating systems There are two windows, send and receive The window size for one direction of flow is set to MIN(senders send window, receivers receive window) The send window must be large enough to keep all segments sent during the RTT

Anders Magnusson October 20, 2004 Socket buffers Limits the amount of data an application may write to the kernel before being blocked Often combined with the TCP send window, when ACKs are received the socket buffer data is adjusted accordingly Must be >= TCP window size to avoid limitations

Anders Magnusson October 20, 2004 MBUF clusters There are limitations how many network buffers (in many OSes called MBUFs) that may be allocated MBUFs may have external storage associated with them, allocated out of a separate (limited) area These buffers are often allocated at compile time and it is not uncommon that physical memory is static allocated for them

Anders Magnusson October 20, 2004 Other knobs to turn RFC1323 Turns on Window scaling option needed to use larger TCP windows than 64k Initial window size Avoid slow-start by injecting many packets into the network at connection startup Interface queues Be able to store the packets that are ready to send until the network interface can transmit them

Anders Magnusson October 20, 2004 Problems often seen Packet loss On a long-distance high-speed connection, packet loss in a TCP flow will reduce the speed significantly If the sender enters congestion avoidance, the congestion window will open linearly, and with large windows this will be really slow With an RTT of 185ms and window size of 25MB it will take around 50 minutes to reach full speed

Anders Magnusson October 20, 2004 Problems often seen Packet bursts During the startup of a TCP bulk flow, the exponential increase in packet injection into the network during slow-start may cause packet bursts on links with large bandwidth-delay product The result may be that intermediate switches/routers must drop packets, even though the TCP self-clocking would not permit more packets to be sent than could be received

Anders Magnusson October 20, 2004 Problems often seen ACK/window updates Traditional approach for bulk flows is for the receiver to send an ACK each second received packet Window updates are sent as soon as data is delivered to the receiving process This will cause the return traffic to be more than half the number of the transmitted packets Interrupts, packet handling in the sending host may use a significant amount of CPU

Anders Magnusson October 20, 2004 Problems often seen ARP timeouts When an ARP entry times out, it is usually just removed from the ARP cache, and the next packet will initiate a new ARP request If there is an ongoing packet flow, this approach may cause packets to be dropped until an ARP reply is received

Anders Magnusson October 20, 2004 Tuning of NetBSD sysctl -w net.inet.tcp.rfc1323=1 Activate window scaling and timestamp options due to RFC1323. sysctl -w kern.somaxkva=[sbmax] Set maximum size for all socket buffers together in the system sysctl -w kern.sbmax=[sbmax] Set maximum size of socket buffer for one TCP flow sysctl -w net.inet.tcp.recvspace=[wstd] sysctl -w net.inet.tcp.sendspace=[wstd] Set max size of TCP windows. sysctl kern.mbuf.nmbclusters View maximum number of mbuf clusters. Used for storage of data packets to/from the network interface. Can only be set by recompiling Your kernel.

Anders Magnusson October 20, 2004 Tuning of FreeBSD sysctl net.inet.tcp.rfc1323=1 Activate window scaling and timestamp options due to RFC1323. sysctl ipc.maxsockbuf=[sbmax] Set maximum size of TCP window. sysctl net.inet.tcp.recvspace=[wstd] sysctl net.inet.tcp.sendspace=[wstd] Set max size of TCP windows. sysctl kern.ipc.nmbclusters View maximum number of mbuf clusters. Used for storage of data packets to/from the network interface. Can only be set att boot time.

Anders Magnusson October 20, 2004 Tuning of Linux echo "1" > /proc/sys/net/ipv4/tcp_window_scaling Activate window scaling according to RFC 1323 echo [wmax] > /proc/sys/net/core/rmem_max echo [wmax] > /proc/sys/net/core/wmem_max Set maximum size of TCP windows. echo [wmax] > /proc/sys/net/core/rmem_default echo [wmax] > /proc/sys/net/core/wmem_default Set default size of TCP windows. echo "[wmin] [wstd] [wmax]" > /proc/sys/net/ipv4/tcp_rmem echo "[wmin] [wstd] [wmax]" > /proc/sys/net/ipv4/tcp_wmem Set min, default, max windows. Used by the autotuning function. echo "bmin bdef bmax" > /proc/sys/net/ipv4/tcp_mem Set maximum total TCP buffer-space allocatable. Used by the autotuning function.

Anders Magnusson October 20, 2004 Tuning of Windows (2k, XP, 2k3) HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet \Services\Tcpip\Parameters\Tcp1323Opts=1 Turn on window scaling option HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet \Services\Tcpip\Parameters\TcpWindowSize =[wmax] Set maximum size of TCP window

Anders Magnusson October 20, 2004 How to set a Land Speed Record Recipe: Really high-quality networks Hardware capable of sending/receiving fast enough Operating system without foolish bottlenecks Enthusiasts that spend weekends sending an obscene amount of data between Luleå and San Jose

Anders Magnusson October 20, 2004 GigaSunet OC-192 core Sprintlink OC-192 core 10GE OC192 End host in Luleå, Sweden End host in San Jose, CA SUNET Internet Land Speed Record - Network setup Network path consists of 42(!) router hops, using paths shared with other users of the networks.

Anders Magnusson October 20, 2004 Records submitted September bytes in 3648 real seconds = 4310 Mbit/second 1831 Gbytes in almost exactly an hour packets/second transferred with an MTU of 4470 bytes Record submitted for the IPv4 single and multiple stream class is Petabit-meters/second (which is a 78% increase of our previous record)

Anders Magnusson October 20, 2004 Compared with others Compared to the previous record, we can note that we achieved this, using Less powerful end hosts 200% longer distance Less than half the MTU size (which generates heavier CPU-load on the end- hosts) The normal GigaSunet and Sprintlink production infrastructures

Anders Magnusson October 20, 2004 Fiber path for the Internet LSR Distance from Luleå, Sweden to San Jose, CA is approximately 28,983 km (18,013 miles)

Anders Magnusson October 20, 2004 Network load

Anders Magnusson October 20, 2004 More to read… Describes how the Land Speed Record(s) were achieved About end-to-end performance in GigaSunet