High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group.

Slides:



Advertisements
Similar presentations
Michele Pagano – A Survey on TCP Performance Evaluation and Modeling 1 Department of Information Engineering University of Pisa Network Telecomunication.
Advertisements

CSCI-1680 Transport Layer II Based partly on lecture notes by David Mazières, Phil Levis, John Jannotti Rodrigo Fonseca.
TCP transfers over high latency/bandwidth network & Grid TCP Sylvain Ravot
LOGO Transmission Control Protocol 12 (TCP) Data Flow.
DataTAG CERN Oct 2002 R. Hughes-Jones Manchester Initial Performance Measurements With DataTAG PCs Gigabit Ethernet NICs (Work in progress Oct 02)
1 End to End Bandwidth Estimation in TCP to improve Wireless Link Utilization S. Mascolo, A.Grieco, G.Pau, M.Gerla, C.Casetti Presented by Abhijit Pandey.
1 TCP - Part II. 2 What is Flow/Congestion/Error Control ? Flow Control: Algorithms to prevent that the sender overruns the receiver with information.
Introduction 1 Lecture 14 Transport Layer (Transmission Control Protocol) slides are modified from J. Kurose & K. Ross University of Nevada – Reno Computer.
School of Information Technologies TCP Congestion Control NETS3303/3603 Week 9.
Congestion Control Tanenbaum 5.3, /12/2015Congestion Control (A Loss Based Technique: TCP)2 What? Why? Congestion occurs when –there is no reservation.
CdL was here DataTAG/WP7 Amsterdam June 2002 R. Hughes-Jones Manchester 1 EU DataGrid - Network Monitoring Richard Hughes-Jones, University of Manchester.
TCP on High-Speed Networks Sangtae Ha and Injong Rhee North Carolina State University.
MULTIMEDIA TRAFFIC MANAGEMENT ON TCP/IP OVER ATM-UBR By Dr. ISHTIAQ AHMED CH.
1 Chapter 3 Transport Layer. 2 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4.
TCP in Heterogeneous Network Md. Ehtesamul Haque # P.
PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester 1 UDP Performance and PCI-X Activity of the Intel 10 Gigabit Ethernet Adapter on: HP rx2600 Dual Itanium.
CdL was here DataTAG CERN Sep 2002 R. Hughes-Jones Manchester 1 European Topology: NRNs & Geant SuperJANET4 CERN UvA Manc SURFnet RAL.
Introduction 1 Lecture 14 Transport Layer (Congestion Control) slides are modified from J. Kurose & K. Ross University of Nevada – Reno Computer Science.
TCP: flow and congestion control. Flow Control Flow Control is a technique for speed-matching of transmitter and receiver. Flow control ensures that a.
KEK Network Qi Fazhi KEK SW L2/L3 Switch for outside connections Central L2/L3 Switch A Netscreen Firewall Super Sinet Router 10GbE 2 x GbE IDS.
Transport Layer 4 2: Transport Layer 4.
Large File Transfer on 20,000 km - Between Korea and Switzerland Yusung Kim, Daewon Kim, Joonbok Lee, Kilnam Chon
A TCP/IP transport layer for the DAQ of the CMS Experiment Miklos Kozlovszky for the CMS TriDAS collaboration CERN European Organization for Nuclear Research.
COMT 4291 Communications Protocols and TCP/IP COMT 429.
Principles of Congestion Control Congestion: informally: “too many sources sending too much data too fast for network to handle” different from flow control!
UDT: UDP based Data Transfer Yunhong Gu & Robert Grossman Laboratory for Advanced Computing University of Illinois at Chicago.
NORDUnet 2003, Reykjavik, Iceland, 26 August 2003 High-Performance Transport Protocols for Data-Intensive World-Wide Grids T. Kelly, University of Cambridge,
High-speed TCP  FAST TCP: motivation, architecture, algorithms, performance (by Cheng Jin, David X. Wei and Steven H. Low)  Modifying TCP's Congestion.
Data Transport Challenges for e-VLBI Julianne S.O. Sansa* * With Arpad Szomoru, Thijs van der Hulst & Mike Garret.
Internet data transfer record between CERN and California Sylvain Ravot (Caltech) Paolo Moroni (CERN)
Masaki Hirabaru NICT Koganei 3rd e-VLBI Workshop October 6, 2004 Makuhari, Japan Performance Measurement on Large Bandwidth-Delay Product.
TERENA Networking Conference, Zagreb, Croatia, 21 May 2003 High-Performance Data Transport for Grid Applications T. Kelly, University of Cambridge, UK.
Transport Layer 3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March
CS640: Introduction to Computer Networks Aditya Akella Lecture 15 TCP – III Reliability and Implementation Issues.
TCP: Transmission Control Protocol Part II : Protocol Mechanisms Computer Network System Sirak Kaewjamnong Semester 1st, 2004.
1 CS 4396 Computer Networks Lab TCP – Part II. 2 Flow Control Congestion Control Retransmission Timeout TCP:
Transport Layer3-1 Chapter 3 outline r 3.1 Transport-layer services r 3.2 Multiplexing and demultiplexing r 3.3 Connectionless transport: UDP r 3.4 Principles.
TCP transfers over high latency/bandwidth networks Internet2 Member Meeting HENP working group session April 9-11, 2003, Arlington T. Kelly, University.
1 TCP - Part II. 2 What is Flow/Congestion/Error Control ? Flow Control: Algorithms to prevent that the sender overruns the receiver with information.
Performance Engineering E2EpiPEs and FastTCP Internet2 member meeting - Indianapolis World Telecom Geneva October 15, 2003
Janey C. Hoe Laboratory for Computer Science at MIT 노상훈, Pllab.
Advance Computer Networks Lecture#09 & 10 Instructor: Engr. Muhammad Mateen Yaqoob.
Recap of Lecture 19 If symptoms persist, please consult Dr Jacobson.
Chapter 11.4 END-TO-END ISSUES. Optical Internet Optical technology Protocol translates availability of gigabit bandwidth in user-perceived QoS.
TCP transfers over high latency/bandwidth networks & Grid DT Measurements session PFLDnet February 3- 4, 2003 CERN, Geneva, Switzerland Sylvain Ravot
Recap Slow start introduced cwnd Slow start introduced cwnd Can transmit up to Can transmit up to min( cwnd, offered window ) Flow control by the sender.
Final EU Review - 24/03/2004 DataTAG is a project funded by the European Commission under contract IST Richard Hughes-Jones The University of.
1 Testing TCP Westwood+ over Transatlantic Links at 10 Gigabit/Second rate Saverio Mascolo Dipartimento di Elettrotecnica ed Elettronica Politecnico di.
Increasing TCP's CWND based on Throughput draft-you-iccrg-throughput-based-cwnd-increasing-00 Jianjie You IETF92 Dallas.
@Yuan Xue A special acknowledge goes to J.F Kurose and K.W. Ross Some of the slides used in this lecture are adapted from their.
1 Network Transport Layer: TCP Analysis and BW Allocation Framework Y. Richard Yang 3/30/2016.
@Yuan Xue A special acknowledge goes to J.F Kurose and K.W. Ross Some of the slides used in this lecture are adapted from their.
Karn’s Algorithm Do not use measured RTT to update SRTT and SDEV Calculate backoff RTO when a retransmission occurs Use backoff RTO for segments until.
Window Control Adjust transmission rate by changing Window Size
TCP - Part II Relates to Lab 5. This is an extended module that covers TCP flow control, congestion control, and error control in TCP.
CS450 – Introduction to Networking Lecture 19 – Congestion Control (2)
Chapter 3 outline 3.1 transport-layer services
TCP Vegas: New Techniques for Congestion Detection and Avoidance
Transport Protocols over Circuits/VCs
Understanding Throughput & TCP Windows
TCP - Part II Relates to Lab 5. This is an extended module that covers TCP flow control, congestion control, and error control in TCP.
Lecture 19 – TCP Performance
TCP Performance over a 2.5 Gbit/s Transatlantic Circuit
Transport Layer: Congestion Control
TCP flow and congestion control
High-Performance Data Transport for Grid Applications
Review of Internet Protocols Transport Layer
When to use and when not to use BBR:
Presentation transcript:

High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group

Slide 2 HENP WG Goal #3 Share information and provide advice on the configuration of routers, switches, PCs and network interfaces, and network testing and problem resolution, to achieve high performance over local and wide area networks in production.

Slide 3 Overview TCPTCP TCP congestion avoidance algorithmTCP congestion avoidance algorithm TCP parameters tuningTCP parameters tuning Gigabit Ethernet Adapter performanceGigabit Ethernet Adapter performance

Slide 4 TCP Algorithms Slow Start Congestion Avoidance Connection opening : cwnd = 1 segment Exponential increase for cwnd until cwnd = SSTHRESH cwnd = SSTHRESH Additive increase for cwnd Retransmission timeout SSTHRESH:=cwnd/2 cwnd:= 1 segment Retransmission timeout SSTHRESH:=cwnd/2 Fast Recovery Exponential increase beyond cwnd Retransmission timeout SSTHRESH:=cwnd/2 3 duplicate ack received Expected ack received cwnd:=cwnd/2

Slide 5 TCP Congestion Avoidance behavior (I) Assumption The time spent in slow start is neglected The time to recover a loss is neglected No buffering (Max. congestion window size = Bandwidth Delay Product) Constant RTT The congestion window is opened at the constant rate of one segment per RTT, so each cycle is W/2. The throughput is the area under the curve. W W/2 (RTT) WW/2

Slide 6 Example Assumption Bandwidth = 600 Mbps RTT = 170 ms (CERN – CalTech) BDB = Mbytes Cycle = 12.3 minutes Time to transfer 10 Gbyte? W W/2 (RTT) WW/ Min 3.8 minutes to transfer 10 GBytes if cwnd = 6.45 Mbytes at the beginning of the congestion avoidance state. (Throughput = 350 Mbps) 2.4 minutes to transfer 10 Gbyte if cwnd = Mbyte at the beginning of the congestion avoidance state (Throughput = 550 Mbps) Initial SSTRESH

Slide 7 TCP Congestion Avoidance behavior (II) Area #1 Cwnd Throughput < Bandwidth RTT constant Throughput = Cwnd / RTT Area #2 Cwnd > BDP => Througput = Bandwith RTT increase (proportional to cwnd) W W/2 (RTT) WW/2 BDP Buffering capacity We take into account the buffering space. (cwnd) Area #1Area #2

Slide 8 Tuning Keep the congestion window size in the yellow area : Limit the maximum congestion widow size to avoid loss Smaller backoff W (Time) BDP (Cwnd) W (Time) BDP (Cwnd) Limit the maximum congestion avoidance window size In the application In the OS Smaller backoff TCP Multi-streams After a loss : Cwnd := Cwnd × back_off 0.5 < Back_off < 1 Limiting the maximum congestion avoidance widow size and setting a large initial ssthresh, we reached 125 Mbps throughput between CERN and Caltech and 143 Mbps throughput between CERN and Chicago through the 155 Mbps of the transatlantic link.

Slide 9 Tuning TCP parameters Buffer space that the kernel allocates for each socket Kernel 2.2 echo > /proc/sys/net/core/rmem_max echo > /proc/sys/net/core/wmem_max Kernel 2.4 echo " " > /proc/sys/net/ipv4/tcp_rmem echo " " > /proc/sys/net/ipv4/tcp_wmem The 3 values are respectively min, default, and max. Socket buffer settings: Setsockopt() of SO_RCVBUF and SO_SNDBUF Has to be set after calling socket() but before bind() Kernel 2.2 : default value is 32KB Kernel 2.4 : default value can be set in /proc/sys/net/ipv4 (see above) Initial SSTRHESH Set the initial ssthresh to a value larger than the bandwidth delay product No parameter to set this value in Linux 2.2 and 2.4 => Modified linux kernel Slow Start Exponential increase for cwnd until cwnd = SSTHRESH Congestion Avoidance Additive increase for cwnd Connection opening : cwnd = 1 segment Cwnd = SSTHRESH

Slide 10 Gigabit Ethernet NICs performances NIC tested 3com: 3C996-T Syskonnect: SK-9843 SK-NET GE SX Intel: PRO/1000 T and PRO/1000 XF 32 and 64 bit PCI Motherboards Measurements Back to back linux PCs Latest drivers available TCP throughput Two different tests: Iperf and gensink. Gensink is a tool written at CERN for benchmarking TCP network performance Performance measurement with Iperf: We ran 10 consecutive TCP transfers of 20 seconds each. Using the time command, we measured the CPU utilization. iperf -c pcgiga-gbe – t 20 We report the throughput min/avg/max of the 10 transfers. Performance measurement with gensink: We ran transfers of 10 Gbyte. Gensink allow us to measure the throughput and the CPU utilization over the last 10 Mbyte transmitted.

Slide 11 Syskonnect - SX, PCI 32 bit 33 MHZ Setup: GbE adapter: SK-9843 SK-NET GE SX; Driver included in the kernel CPU: PIV (1500 Mhz) PCI:32 bit 33MHz Motherboard: Intel D850GB RedHat 7.2 Kernel Iperf test: Gensink test: Throughput min / avg / max = 256 / 448 / 451 MbpsCPU utilization average= sec/Mbyte

Slide 12 Intel - SX, PCI 32 bit 33 MHZ Setup: GbE adapter: Intel PRO/1000 XF; Driver e1000; Version CPU: PIV (1500 Mhz) PCI:32 bit 33MHz Motherboard: Intel D850GB RedHat 7.2 Kernel Iperf test: Gensink test: Throughput min / avg / max = 380 / 609 / 631 MbpsCPU utilization average= sec/Mbyte

Slide 13 3Com - Cu, PCI 64 bit 66 MHZ Setup: GbE adapter: 3C996-T; Driver bcm5700; Version CPU: 2 x AMD Athlon MP PCI:64 bit 66MHz Motherboard: Dual AMD Athlon MP Motherboard RedHat 7.2 Kernel Iperf test Throughput (Mbps)CPU utilization (%)CPU utilisation per Mbit/s (% / Mbps) Min Max Average Gensink test: Throughput min / avg / max = 232 / 889 / 945 MbpsCPU utilization average= sec/Mbyte

Slide 14 Intel - Cu, PCI 64 bit 66 MHZ Setup GbE adapter: Intel PRO/1000 T; Driver e1000; Version CPU: 2 x AMD Athlon MP PCI:64 bit 66MHz Motherboard: Dual AMD Athlon MP Motherboard RedHat 7.2 Kernel Iperf test : Gensink test: Throughput min / avg / max = 429 / 905 / 943 MbpsCPU utilization average= sec/Mbyte

Slide 15 Intel - SX, PCI 64 bit 66 MHZ Setup GbE adapter: Intel PRO/1000 XF; Driver e1000; Version CPU: 2 x AMD Athlon MP PCI:64 bit 66MHz Motherboard: Dual AMD Athlon MP Motherboard RedHat 7.2 Kernel Iperf test : Gensink test: Throughput min / avg / max = 222 / 799 / 940 MbpsCPU utilization average= sec/Mbyte

Slide 16 Syskonnect - SX, PCI 64 bit 66 MHZ Setup GbE adapter: SK-9843 SK-NET GE SX; Driver included in the kernel CPU: 2 x AMD Athlon MP PCI:64 bit 66MHz Motherboard: Dual AMD Athlon MP Motherboard RedHat 7.2 Kernel Iperf test Throughput min / avg / max = 146 / 936 / 947 MbpsCPU utilization average= sec/Mbyte Gensink test:

Slide 17 Summary 32 PCI bus Intel NICs achieved the highest throughput (600 Mbps) with the smallest CPU utilization. Syskonnect NICs achieved only 450 Mbps with a higher CPU utilization. 32 Vs 64 PCI bus 64 PCI bus is needed to get high throughput: We multiplied by 2 the throughput by moving Syskonnect NICs from 32 to 64 PCI buses. We increased the throughput by 300 Mbps by moving Intel NICs from 32 to 64 PCI buses. 64 PCI bus Syskonnect NICs achieved the highest throughput (930 Mbps) with the highest CPU utilization. Intel NICs performances are unstable. 3Com NICs are a good compromise between stability, performance, CPU utilization and cost. Unfortunately, we couldn’t test the 3Com NIC with fiber connector. Cu Vs Fiber connector We could not measure important differences. Strange behavior of Intel NICs. The throughout achieve by Intel NICs is unstable.

Slide 18 Questions ?