Final EU Review - 24/03/2004 DataTAG is a project funded by the European Commission under contract IST Richard Hughes-Jones The University of Manchester, UK Final EU DataTAG Review High Performance Network Demonstration March 24, 2004
Final EU DataTAG Review, 24 March It works ? what’s the Problem with TCP TCP has 2 phases: SLowstart & Congestion Avoidance AIMD and High Bandwidth – Long Distance networks Poor performance of TCP in high bandwidth wide area networks is due in part to the TCP congestion control algorithm - cwnd congestion window For each ack in a RTT without loss: cwnd -> cwnd + a / cwnd - Additive Increase, a=1 For each window experiencing loss: cwnd -> cwnd – b (cwnd) - Multiplicative Decrease, b= ½ Time to recover from 1 packet loss ~100 ms rtt:
Final EU DataTAG Review, 24 March Investigation of new TCP Stacks High Speed TCP a and b vary depending on current cwnd using a table a increases more rapidly with larger cwnd – returns to the ‘optimal’ cwnd size sooner for the network path b decreases less aggressively and, as a consequence, so does the cwnd. The effect is that there is not such a decrease in throughput. Scalable TCP a and b are fixed adjustments for the increase and decrease of cwnd a = 1/1 00 – the increase is greater than TCP Reno b = 1/8 – the decrease on loss is less than TCP Reno Scalable over any link speed. Fast TCP Uses round trip time as well as packet loss to indicate congestion with rapid convergence to fair equilibrium for throughput. HSTCP-LP High Speed (Low Priority) – backs off if rtt increases BiC-TCP – Additive increase large cwnd; binary search small cwnd H-TCP – after congestion standard then switch to high performance ●●●
Final EU DataTAG Review, 24 March Comparison of TCP Stacks TCP Response Function Throughput vs Loss Rate – steeper: faster recovery Drop packets in kernel MB-NG rtt 6ms DataTAG rtt 120 ms
Final EU DataTAG Review, 24 March Gigabit: Tuning PCI-X mmrbc 1024 bytes mmrbc 2048 bytes mmrbc 4096 bytes 5.7Gbit/s mmrbc 512 bytes CSR Access PCI-X Sequence Data Transfer Interrupt & CSR Update byte packets every 200 µs Intel PRO/10GbE LR Adapter PCI-X bus occupancy vs mmrbc Measured times Times based on PCI-X times from the logic analyser Expected throughput ~7 Gbit/s
Final EU DataTAG Review, 24 March Multi-Gigabit flows at SC2003 BW Challenge Three Server systems with 10 GigEthernet NICs Used the DataTAG altAIMD stack 9000 byte MTU Send mem-mem iperf TCP streams From SLAC/FNAL booth in Phoenix to: Pal Alto PAIX rtt 17 ms, window 30 MB Shared with Caltech booth 4.37 Gbit hstcp I=5% Then 2.87 Gbit I=16% Fall when 10 Gbit on link 3.3Gbit Scalable I=8% Tested 2 flows sum 1.9Gbit I=39% Chicago Starlight rtt 65 ms, window 60 MB Phoenix CPU 2.2 GHz 3.1 Gbit hstcp I=1.6% Amsterdam SARA rtt 175 ms, window 200 MB Phoenix CPU 2.2 GHz 4.35 Gbit hstcp I=6.9% Very Stable Both used Abilene to Chicago
Final EU DataTAG Review, 24 March Application design – Throughput 2Gbyte file transferred RAID0 disks Web100 output every 10 ms Gridftp See alternate 600/800 Mbit and zero MB - NG Apache web server + curl-based client See steady 720 Mbit
Final EU DataTAG Review, 24 March DataTAG Testbed
Final EU DataTAG Review, 24 March High Throughput Demo CERN PoP – DataTag - Geneva v11gvav11chi OC Gb SDH DataTAG Link 10 GEth Juniper T320 Cisco 7609 Cisco 7606 Caltech PoP - Starlight - Chicago Dual Zeon 3 GHz Send data with TCP Drop Packets Monitor TCP with Web100
Final EU DataTAG Review, 24 March HS-TCP Slowstart then congestion phases Drop 1 in 10 6
Final EU DataTAG Review, 24 March HS-TCP limit slowstart Drop 1 in 10 6 but Limited Slowstart
Final EU DataTAG Review, 24 March HS-TCP 2 Drop 1 in 10 6
Final EU DataTAG Review, 24 March Scalable TCP Drop 1 in 10 6
Final EU DataTAG Review, 24 March Standard Reno TCP Drop 1 in 10 6 Transition highspeed to Standard 520s
Final EU DataTAG Review, 24 March Standard Reno TCP Drop 1 in 10 6
Final EU DataTAG Review, 24 March Helping Real Users: Throughput CERN -SARA Using the GÉANT Backup Link 1 GByte disk-disk transfers Blue is the Data Red is the TCP ACKs Standard TCP Average Throughput 167 Mbit/s Users see Mbit/s! High-Speed TCP Average Throughput 345 Mbit/s Scalable TCP Average Throughput 340 Mbit/s Technology link to DataGrid & GÉANT EU Projects
Final EU DataTAG Review, 24 March Summary Multi-Gigabit transfers are possible and stable Demonstrated that new TCP stacks help performance DataTAG has made major contributions to understanding of high-speed networking There has been significant technology transfer between DataTAG and other projects Now reaching out to real users. But still much research to do: Achieve performance – Protocol vs implementation issues Stability / Sharing issues Optical transports & hybrid networks
Final EU DataTAG Review, 24 March
Final EU DataTAG Review, 24 March Gridftp Throughput + Web100 Throughput Mbit/s: See alternate 600/800 Mbit and zero Cwnd smooth No dup Ack / send stall / timeouts MB - NG
Final EU DataTAG Review, 24 March http data transfers HighSpeed TCP Apachie web server out of the box! prototype client - curl http library 1Mbyte TCP buffers 2Gbyte file Throughput 72 MBytes/s Cwnd - some variation No dup Ack / send stall / timeouts MB - NG
Final EU DataTAG Review, 24 March The performance of the end host / disks BaBar Case Study: RAID BW & PCI Activity 3Ware RAID5 parallel EIDE 3Ware forces PCI bus to 33 MHz BaBar Tyan to MB-NG SuperMicro Network mem-mem 619 Mbit/s Disk – disk throughput bbcp Mbytes/s (320 – 360 Mbit/s) PCI bus effectively full! User throughput ~ 250 Mbit/s User surprised !! Read from RAID5 Disks Write to RAID5 Disks