High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group.

High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group

HENP WG Goal #3 Share information and provide advice on the configuration of routers, switches, PCs and network interfaces, and network testing and problem resolution, to achieve high performance over local and wide area networks in production.

Overview TCPTCP TCP congestion avoidance algorithmTCP congestion avoidance algorithm TCP parameters tuningTCP parameters tuning Gigabit Ethernet Adapter performanceGigabit Ethernet Adapter performance

TCP Algorithms Slow Start Congestion Avoidance Connection opening : cwnd = 1 segment Exponential increase for cwnd until cwnd = SSTHRESH cwnd = SSTHRESH Additive increase for cwnd Retransmission timeout SSTHRESH:=cwnd/2 cwnd:= 1 segment Retransmission timeout SSTHRESH:=cwnd/2 Fast Recovery Exponential increase beyond cwnd Retransmission timeout SSTHRESH:=cwnd/2 3 duplicate ack received Expected ack received cwnd:=cwnd/2

TCP Congestion Avoidance behavior (I) Assumption The time spent in slow start is neglected The time to recover a loss is neglected No buffering (Max. congestion window size = Bandwidth Delay Product) Constant RTT The congestion window is opened at the constant rate of one segment per RTT, so each cycle is W/2. The throughput is the area under the curve. W W/2 (RTT) WW/2

Example Assumption Bandwidth = 600 Mbps RTT = 170 ms (CERN – CalTech) BDB = 12.75 Mbytes Cycle = 12.3 minutes Time to transfer 10 Gbyte? W W/2 (RTT) WW/2 12.3 Min 3.8 minutes to transfer 10 GBytes if cwnd = 6.45 Mbytes at the beginning of the congestion avoidance state. (Throughput = 350 Mbps) 2.4 minutes to transfer 10 Gbyte if cwnd = 12.05 Mbyte at the beginning of the congestion avoidance state (Throughput = 550 Mbps) Initial SSTRESH

TCP Congestion Avoidance behavior (II) Area #1 Cwnd Throughput < Bandwidth RTT constant Throughput = Cwnd / RTT Area #2 Cwnd > BDP => Througput = Bandwith RTT increase (proportional to cwnd) W W/2 (RTT) WW/2 BDP Buffering capacity We take into account the buffering space. (cwnd) Area #1Area #2

Tuning Keep the congestion window size in the yellow area : Limit the maximum congestion widow size to avoid loss Smaller backoff W (Time) BDP (Cwnd) W (Time) BDP (Cwnd) Limit the maximum congestion avoidance window size In the application In the OS Smaller backoff TCP Multi-streams After a loss : Cwnd := Cwnd × back_off 0.5 < Back_off < 1 Limiting the maximum congestion avoidance widow size and setting a large initial ssthresh, we reached 125 Mbps throughput between CERN and Caltech and 143 Mbps throughput between CERN and Chicago through the 155 Mbps of the transatlantic link.

Tuning TCP parameters Buffer space that the kernel allocates for each socket Kernel 2.2 echo 262144 > /proc/sys/net/core/rmem_max echo 262144 > /proc/sys/net/core/wmem_max Kernel 2.4 echo "4096 87380 4194304" > /proc/sys/net/ipv4/tcp_rmem echo "4096 65536 4194304" > /proc/sys/net/ipv4/tcp_wmem The 3 values are respectively min, default, and max. Socket buffer settings: Setsockopt() of SO_RCVBUF and SO_SNDBUF Has to be set after calling socket() but before bind() Kernel 2.2 : default value is 32KB Kernel 2.4 : default value can be set in /proc/sys/net/ipv4 (see above) Initial SSTRHESH Set the initial ssthresh to a value larger than the bandwidth delay product No parameter to set this value in Linux 2.2 and 2.4 => Modified linux kernel Slow Start Exponential increase for cwnd until cwnd = SSTHRESH Congestion Avoidance Additive increase for cwnd Connection opening : cwnd = 1 segment Cwnd = SSTHRESH

Gigabit Ethernet NICs performances NIC tested 3com: 3C996-T Syskonnect: SK-9843 SK-NET GE SX Intel: PRO/1000 T and PRO/1000 XF 32 and 64 bit PCI Motherboards Measurements Back to back linux PCs Latest drivers available TCP throughput Two different tests: Iperf and gensink. Gensink is a tool written at CERN for benchmarking TCP network performance Performance measurement with Iperf: We ran 10 consecutive TCP transfers of 20 seconds each. Using the time command, we measured the CPU utilization. [root@pcgiga-2]#time iperf -c pcgiga-gbe – t 20 We report the throughput min/avg/max of the 10 transfers. Performance measurement with gensink: We ran transfers of 10 Gbyte. Gensink allow us to measure the throughput and the CPU utilization over the last 10 Mbyte transmitted.

Syskonnect - SX, PCI 32 bit 33 MHZ Setup: GbE adapter: SK-9843 SK-NET GE SX; Driver included in the kernel CPU: PIV (1500 Mhz) PCI:32 bit 33MHz Motherboard: Intel D850GB RedHat 7.2 Kernel 2.4.17 Iperf test: Gensink test: Throughput min / avg / max = 256 / 448 / 451 MbpsCPU utilization average= 0.097 sec/Mbyte

Intel - SX, PCI 32 bit 33 MHZ Setup: GbE adapter: Intel PRO/1000 XF; Driver e1000; Version 4.1.7 CPU: PIV (1500 Mhz) PCI:32 bit 33MHz Motherboard: Intel D850GB RedHat 7.2 Kernel 2.4.17 Iperf test: Gensink test: Throughput min / avg / max = 380 / 609 / 631 MbpsCPU utilization average= 0.040 sec/Mbyte

3Com - Cu, PCI 64 bit 66 MHZ Setup: GbE adapter: 3C996-T; Driver bcm5700; Version 2.0.18 CPU: 2 x AMD Athlon MP PCI:64 bit 66MHz Motherboard: Dual AMD Athlon MP Motherboard RedHat 7.2 Kernel 2.4.7 Iperf test Throughput (Mbps)CPU utilization (%)CPU utilisation per Mbit/s (% / Mbps) Min.83543.80.052 Max.84351.50.061 Average83846.90.056 Gensink test: Throughput min / avg / max = 232 / 889 / 945 MbpsCPU utilization average= 0.0066 sec/Mbyte

Intel - Cu, PCI 64 bit 66 MHZ Setup GbE adapter: Intel PRO/1000 T; Driver e1000; Version 4.1.7 CPU: 2 x AMD Athlon MP PCI:64 bit 66MHz Motherboard: Dual AMD Athlon MP Motherboard RedHat 7.2 Kernel 2.4.7 Iperf test : Gensink test: Throughput min / avg / max = 429 / 905 / 943 MbpsCPU utilization average= 0.0065 sec/Mbyte

Intel - SX, PCI 64 bit 66 MHZ Setup GbE adapter: Intel PRO/1000 XF; Driver e1000; Version 4.1.7 CPU: 2 x AMD Athlon MP PCI:64 bit 66MHz Motherboard: Dual AMD Athlon MP Motherboard RedHat 7.2 Kernel 2.4.7 Iperf test : Gensink test: Throughput min / avg / max = 222 / 799 / 940 MbpsCPU utilization average= 0.0062 sec/Mbyte

Syskonnect - SX, PCI 64 bit 66 MHZ Setup GbE adapter: SK-9843 SK-NET GE SX; Driver included in the kernel CPU: 2 x AMD Athlon MP PCI:64 bit 66MHz Motherboard: Dual AMD Athlon MP Motherboard RedHat 7.2 Kernel 2.4.7 Iperf test Throughput min / avg / max = 146 / 936 / 947 MbpsCPU utilization average= 0.0083 sec/Mbyte Gensink test:

Summary 32 PCI bus Intel NICs achieved the highest throughput (600 Mbps) with the smallest CPU utilization. Syskonnect NICs achieved only 450 Mbps with a higher CPU utilization. 32 Vs 64 PCI bus 64 PCI bus is needed to get high throughput: We multiplied by 2 the throughput by moving Syskonnect NICs from 32 to 64 PCI buses. We increased the throughput by 300 Mbps by moving Intel NICs from 32 to 64 PCI buses. 64 PCI bus Syskonnect NICs achieved the highest throughput (930 Mbps) with the highest CPU utilization. Intel NICs performances are unstable. 3Com NICs are a good compromise between stability, performance, CPU utilization and cost. Unfortunately, we couldn’t test the 3Com NIC with fiber connector. Cu Vs Fiber connector We could not measure important differences. Strange behavior of Intel NICs. The throughout achieve by Intel NICs is unstable.

Questions ?

High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group.

Similar presentations

Presentation on theme: "High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group.

Similar presentations

Presentation on theme: "High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group."— Presentation transcript:

Similar presentations

About project

Feedback