Download presentation
Presentation is loading. Please wait.
Published byBennett Cannon Modified over 9 years ago
1
Networkshop March 2005 Richard Hughes-Jones Manchester Bandwidth Challenge, Land Speed Record, TCP/IP and You
2
Networkshop March 2005 Richard Hughes-Jones Manchester uThe SC Network uWorking with S2io, Cisco & folks uAt the SLAC Booth Running the BW Challenge Bandwidth Lust at SC2003
3
Networkshop March 2005 Richard Hughes-Jones Manchester The Bandwidth Challenge at SC2003 uThe peak aggregate bandwidth from the 3 booths was 23.21Gbits/s u1-way link utilisations of >90% u6.6 TBytes in 48 minutes
4
Networkshop March 2005 Richard Hughes-Jones Manchester Multi-Gigabit flows at SC2003 BW Challenge u Three Server systems with 10 Gigabit Ethernet NICs u Used the DataTAG altAIMD stack 9000 byte MTU u Send mem-mem iperf TCP streams From SLAC/FNAL booth in Phoenix to: Pal Alto PAIX rtt 17 ms, window 30 MB Shared with Caltech booth 4.37 Gbit HighSpeed TCP I=5% Then 2.87 Gbit I=16% Fall when 10 Gbit on link 3.3Gbit Scalable TCP I=8% Tested 2 flows sum 1.9Gbit I=39% Chicago Starlight rtt 65 ms, window 60 MB Phoenix CPU 2.2 GHz 3.1 Gbit HighSpeed TCP I=1.6% Amsterdam SARA rtt 175 ms, window 200 MB Phoenix CPU 2.2 GHz 4.35 Gbit HighSpeed TCP I=6.9% Very Stable Both used Abilene to Chicago
5
Networkshop March 2005 Richard Hughes-Jones Manchester uSCINet Collaboration at SC2004 uSetting up the BW Bunker uThe BW Challenge at the SLAC Booth uWorking with S2io, Sun, Chelsio
6
Networkshop March 2005 Richard Hughes-Jones Manchester UKLight & ESLEA at SC2004 uUK e-Science Researchers from Manchester, UCL & ULCC involved in the Bandwidth Challenge uCollaborated with Scientists & Engineers from Caltech, CERN, FERMI, SLAC, Starlight, UKERNA & U. of Florida uNetworks used by the SLAC/UK team: 10 Gbit Ethernet link from SC2004 to ESnet/QWest PoP in Sunnyvale 10 Gbit Ethernet link from SC2004 and the CENIC/NLR/Level(3) PoP in Sunnyvale 10 Gbit Ethernet link from SC2004 to Chicago and on to UKLight uUKLight focused on Gigabit disk-to-disk transfers between UK sites and Pittsburgh uUK had generous support from Boston Ltd who loaned the servers uThe BWC Collaboration had support from: S2io NICs Chelsio TOE Sun who loaned servers uEssential support from Boston, Sun & Cisco
7
Networkshop March 2005 Richard Hughes-Jones Manchester The Bandwidth Challenge – SC2004 uThe peak aggregate bandwidth from the booths was 101.13Gbits/s uThat is 3 full length DVDs per second ! u4 Times greater that SC2003 ! uSaturated TEN 10Gigabit Ethernet waves uSLAC Booth: Sunnyvale to Pittsburgh, LA to Pittsburgh and Chicago to Pittsburgh (with UKLight).
8
Networkshop March 2005 Richard Hughes-Jones Manchester Land Speed Record – SC2004 Pittsburgh-Tokyo-CERN Single stream TCP uLSR = Distance * Speed uSingle Stream, Multiple Stream, IPv4 and IPv6 Standard TCP uCurrent single stream IPv4 University of Tokyo, Fujitsu & WIDE 9 Nov 05 u20,645 km connection SC2004 booth - CERN via Tokyo uLatency 433 ms RTT u10 Gbit Chelsio TOE Card u7.21 Gbps (TCP payload), 1500 B mtu taking about 10 min u148,850 Tetabit meter / second (Internet2 LSR approved record) uFull DVD in 5 s
9
Networkshop March 2005 Richard Hughes-Jones Manchester Just a Well Engineered End-to-End Connection End-to-End “no loss” environment NO contention, NO sharing on the end-to-end path Processor speed and system bus characteristics TCP Configuration – window size and frame size (MTU) Tuned PCI-X bus Tuned Network Interface Card driver A single TCP connection on the end-to-end path Memory-to-Memory transfer no disk system involved No real user application (but did file transfers!!) Not a typical User or Campus situation BUT … So what’s the matter with TCP – Did we cheat? Internet Regional Campus Client Server Campu s Client Server UK Light From Robin Tasker
10
Networkshop March 2005 Richard Hughes-Jones Manchester TCP (Reno) – What’s the problem? uTCP has 2 phases: Slowstart Probe the network to estimate the Available BW Exponential growth Congestion Avoidance Main data transfer phase – transfer rate glows “slowly” uAIMD and High Bandwidth – Long Distance networks Poor performance of TCP in high bandwidth wide area networks is due in part to the TCP congestion control algorithm. For each ack in a RTT without loss: cwnd -> cwnd + a / cwnd- Additive Increase, a=1 For each window experiencing loss: cwnd -> cwnd – b (cwnd) - Multiplicative Decrease, b= ½ uPacket loss is a killer !!
11
Networkshop March 2005 Richard Hughes-Jones Manchester TCP (Reno) – Details uTime for TCP to recover its throughput from 1 lost packet given by: u for rtt of ~200 ms: 2 min UK 6 ms Europe 20 ms USA 150 ms
12
Networkshop March 2005 Richard Hughes-Jones Manchester Investigation of new TCP Stacks uThe AIMD Algorithm – Standard TCP (Reno) For each ack in a RTT without loss: cwnd -> cwnd + a / cwnd- Additive Increase, a=1 For each window experiencing loss: cwnd -> cwnd – b (cwnd) - Multiplicative Decrease, b= ½ uHigh Speed TCP a and b vary depending on current cwnd using a table a increases more rapidly with larger cwnd – returns to the ‘optimal’ cwnd size sooner for the network path b decreases less aggressively and, as a consequence, so does the cwnd. The effect is that there is not such a decrease in throughput. uScalable TCP a and b are fixed adjustments for the increase and decrease of cwnd a = 1/100 – the increase is greater than TCP Reno b = 1/8 – the decrease on loss is less than TCP Reno Scalable over any link speed. uFast TCP Uses round trip time as well as packet loss to indicate congestion with rapid convergence to fair equilibrium for throughput. uHSTCP-LP, H-TCP, BiC-TCP
13
Networkshop March 2005 Richard Hughes-Jones Manchester Packet Loss with new TCP Stacks uTCP Response Function Throughput vs Loss Rate – further to right: faster recovery Drop packets in kernel MB-NG rtt 6ms DataTAG rtt 120 ms
14
Networkshop March 2005 Richard Hughes-Jones Manchester Packet Loss and new TCP Stacks uTCP Response Function UKLight London-Chicago-London rtt 177 ms 2.6.6 Kernel Agreement with theory good
15
Networkshop March 2005 Richard Hughes-Jones Manchester High Throughput Demonstrations Manchester (Geneva) man03lon01 2.5 Gbit SDH MB-NG Core 1 GEth Cisco GSR Cisco 7609 Cisco 7609 London (Chicago) Dual Zeon 2.2 GHz Send data with TCP Drop Packets Monitor TCP with Web100
16
Networkshop March 2005 Richard Hughes-Jones Manchester High Performance TCP – DataTAG uDifferent TCP stacks tested on the DataTAG Network u rtt 128 ms uDrop 1 in 10 6 uHigh-Speed Rapid recovery uScalable Very fast recovery uStandard Recovery would take ~ 20 mins
17
Networkshop March 2005 Richard Hughes-Jones Manchester Is TCP fair? TCP Flows – Sharing the Bandwidth
18
Networkshop March 2005 Richard Hughes-Jones Manchester uChose 3 paths from SLAC (California) Caltech (10ms), Univ Florida (80ms), CERN (180ms) uUsed iperf/TCP and UDT/UDP to generate traffic uEach run was 16 minutes, in 7 regions Test of TCP Sharing: Methodology (1Gbit/s) Ping 1/s Iperf or UDT ICMP/ping traffic TCP/UDP bottleneck iperf SLAC Caltech/UFL/CERN 2 mins 4 mins Les Cottrell PFLDnet 2005
19
Networkshop March 2005 Richard Hughes-Jones Manchester uLow performance on fast long distance paths AIMD (add a=1 pkt to cwnd / RTT, decrease cwnd by factor b=0.5 in congestion) Net effect: recovers slowly, does not effectively use available bandwidth, so poor throughput Unequal sharing TCP Reno single stream Congestion has a dramatic effect Recovery is slow Increase recovery rate SLAC to CERN RTT increases when achieves best throughput Les Cottrell PFLDnet 2005 Remaining flows do not take up slack when flow removed
20
Networkshop March 2005 Richard Hughes-Jones Manchester UK Transfers MB-NG and SuperJANET4 Throughput for real users
21
Networkshop March 2005 Richard Hughes-Jones Manchester iperf Throughput + Web100 u SuperMicro on MB-NG network u HighSpeed TCP u Linespeed 940 Mbit/s u DupACK ? <10 (expect ~400) u BaBar on Production network u Standard TCP u 425 Mbit/s u DupACKs 350-400 – re-transmits
22
Networkshop March 2005 Richard Hughes-Jones Manchester Applications: Throughput Mbit/s u HighSpeed TCP u 2 GByte file RAID5 u SuperMicro + SuperJANET u bbcp u bbftp u Apachie u Gridftp u Previous work used RAID0 (not disk limited)
23
Networkshop March 2005 Richard Hughes-Jones Manchester bbftp: What else is going on? u Scalable TCP u BaBar + SuperJANET u SuperMicro + SuperJANET u Congestion window – duplicate ACK u Variation not TCP related? Disk speed / bus transfer Application
24
Networkshop March 2005 Richard Hughes-Jones Manchester SC2004 & Transfers with UKLight A Taster for Lambda & Packet Switched Hybrid Networks
25
Networkshop March 2005 Richard Hughes-Jones Manchester Transatlantic Ethernet: TCP Throughput Tests uSupermicro X5DPE-G2 PCs uDual 2.9 GHz Xenon CPU FSB 533 MHz u1500 byte MTU u2.6.6 Linux Kernel uMemory-memory TCP throughput uStandard TCP uWire rate throughput of 940 Mbit/s uFirst 10 sec uWork in progress to study: Implementation detail Advanced stacks Effect of packet loss Sharing
26
Networkshop March 2005 Richard Hughes-Jones Manchester SC2004 Disk-Disk bbftp (work in progress) ubbftp file transfer program uses TCP/IP uUKLight: Path:- London-Chicago-London; PCs:- Supermicro +3Ware RAID0 uMTU 1500 bytes; Socket size 22 Mbytes; rtt 177ms; SACK off uMove a 2 Gbyte file uWeb100 plots: uStandard TCP uAverage 825 Mbit/s u(bbcp: 670 Mbit/s) uScalable TCP uAverage 875 Mbit/s u(bbcp: 701 Mbit/s ~4.5s of overhead) uDisk-TCP-Disk at 1Gbit/s is here!
27
Networkshop March 2005 Richard Hughes-Jones Manchester uSuper Computing Bandwidth Challenge gives opportunity to make world-wide High performance tests. uLand Speed Record shows what can be achieved with state of the art kit uStandard TCP not optimum for high throughput long distance links uPacket loss is a killer for TCP Check on campus links & equipment, and access links to backbones Users need to collaborate with the Campus Network Teams Dante Pert uNew stacks are stable give better response & performance Still need to set the TCP buffer sizes ! Check other kernel settings e.g. window-scale maximum Watch for “TCP Stack implementation Enhancements” uHost is critical think Server quality not Supermarket PC uMotherboards NICs, RAID controllers and Disks matter NIC should use 64 bit 133 MHz PCI-X 66 MHz PCI can be OK but 32 bit 33 MHz is too slow for Gigabit rates Worry about the CPU-Memory bandwidth as well as the PCI bandwidth Data crosses the memory bus at least 3 times Separate the data transfers – use motherboards with multiple 64 bit PCI-X buses Choose a modern high throughput RAID controller Consider SW RAID0 of RAID5 HW controllers uUsers are now able to perform sustained 1 Gbit/s transfers Summary, Conclusions & Thanks MB - NG
28
Networkshop March 2005 Richard Hughes-Jones Manchester More Information Some URLs uUKLight web site: http://www.uklight.ac.uk uMB-NG project web site: http://www.mb-ng.net/ uDataTAG project web site: http://www.datatag.org/ uUDPmon / TCPmon kit + writeup: http://www.hep.man.ac.uk/~rich/net uMotherboard and NIC Tests: http://www.hep.man.ac.uk/~rich/net/nic/GigEth_tests_Boston.ppt & http://datatag.web.cern.ch/datatag/pfldnet2003/ “Performance of 1 and 10 Gigabit Ethernet Cards with Server Quality Motherboards” FGCS Special issue 2004 http:// www.hep.man.ac.uk/~rich/ uTCP tuning information may be found at: http://www.ncne.nlanr.net/documentation/faq/performance.html & http://www.psc.edu/networking/perf_tune.html uTCP stack comparisons: “Evaluation of Advanced TCP Stacks on Fast Long-Distance Production Networks” Journal of Grid Computing 2004 uPFLDnet http://www.ens-lyon.fr/LIP/RESO/pfldnet2005/ uDante PERT http://www.geant2.net/server/show/nav.00d00h002
29
Networkshop March 2005 Richard Hughes-Jones Manchester Any Questions?
30
Networkshop March 2005 Richard Hughes-Jones Manchester Backup Slides
31
Networkshop March 2005 Richard Hughes-Jones Manchester Topology of the MB – NG Network Key Gigabit Ethernet 2.5 Gbit POS Access MPLS Admin. Domains UCL Domain Edge Router Cisco 7609 man01 man03 Boundary Router Cisco 7609 RAL Domain Manchester Domain lon02 man02 ral01 UKERNA Development Network Boundary Router Cisco 7609 ral02 lon03 lon01 HW RAID
32
Networkshop March 2005 Richard Hughes-Jones Manchester Topology of the Production Network Key Gigabit Ethernet 2.5 Gbit POS Access 10 Gbit POS man01 RAL Domain Manchester Domain ral01 HW RAID routers switches 3 routers 2 switches
33
Networkshop March 2005 Richard Hughes-Jones Manchester SC2004 UKLIGHT Overview MB-NG 7600 OSR Manchester ULCC UKLight UCL HEP UCL network K2 Ci Chicago Starlight Amsterdam SC2004 Caltech Booth UltraLight IP SLAC Booth Cisco 6509 UKLight 10G Four 1GE channels UKLight 10G Surfnet/ EuroLink 10G Two 1GE channels NLR Lambda NLR-PITT-STAR-10GE-16 K2 Ci Caltech 7600
34
Networkshop March 2005 Richard Hughes-Jones Manchester uDrop 1 in 25,000 urtt 6.2 ms uRecover in 1.6 s High Performance TCP – MB-NG StandardHighSpeed Scalable
35
Networkshop March 2005 Richard Hughes-Jones Manchester bbftp: Host & Network Effects u 2 Gbyte file RAID5 Disks: 1200 Mbit/s read 600 Mbit/s write u Scalable TCP u BaBar + SuperJANET Instantaneous 220 - 625 Mbit/s u SuperMicro + SuperJANET Instantaneous 400 - 665 Mbit/s for 6 sec Then 0 - 480 Mbit/s u SuperMicro + MB-NG Instantaneous 880 - 950 Mbit/s for 1.3 sec Then 215 - 625 Mbit/s
36
Networkshop March 2005 Richard Hughes-Jones Manchester Average Transfer Rates Mbit/s AppTCP StackSuperMicro on MB-NG SuperMicro on SuperJANET4 BaBar on SuperJANET4 SC2004 on UKLight IperfStandard940350-370425940 HighSpeed940510570940 Scalable940580-650605940 bbcpStandard434290-310290 HighSpeed435385360 Scalable432400-430380 bbftpStandard400-410325320825 HighSpeed370-390380 Scalable430345-532380875 apacheStandard425260300-360 HighSpeed430370315 Scalable428400317 GridftpStandard405240 HighSpeed320 Scalable335 New stacks give more throughput Rate decreases
37
Networkshop March 2005 Richard Hughes-Jones Manchester UKLight and ESLEA uCollaboration forming for SC2005 Caltech, CERN, FERMI, SLAC, Starlight, UKLight, … uCurrent Proposals include: Bandwidth Challenge with even faster disk-to-disk transfers between UK sites and SC2005 Radio Astronomy demo at 512 Mbit user data or 1 Gbit user data Japan, Haystack(MIT), Jodrell Bank, JIVE High Bandwidth linkup between UK and US HPC systems 10Gig NLR wave to Seattle uSet up a 10 Gigabit Ethernet Test Bench Experiments (CALICE) need to investigate >25 Gbit to the processor uESLEA/UKlight need resources to study: New protocols and congestion / sharing The interaction between protcol processing, applications and storage Monitoring L1/L2 behaviour in hybrid networks
38
Networkshop March 2005 Richard Hughes-Jones Manchester 10 Gigabit Ethernet: UDP Throughput Tests u1500 byte MTU gives ~ 2 Gbit/s uUsed 16144 byte MTU max user length 16080 uDataTAG Supermicro PCs uDual 2.2 GHz Xenon CPU FSB 400 MHz uPCI-X mmrbc 512 bytes uwire rate throughput of 2.9 Gbit/s uCERN OpenLab HP Itanium PCs uDual 1.0 GHz 64 bit Itanium CPU FSB 400 MHz uPCI-X mmrbc 512 bytes uwire rate of 5.7 Gbit/s uSLAC Dell PCs giving a uDual 3.0 GHz Xenon CPU FSB 533 MHz uPCI-X mmrbc 4096 bytes uwire rate of 5.4 Gbit/s
39
Networkshop March 2005 Richard Hughes-Jones Manchester 10 Gigabit Ethernet: Tuning PCI-X u16080 byte packets every 200 µs uIntel PRO/10GbE LR Adapter uPCI-X bus occupancy vs mmrbc Measured times Times based on PCI-X times from the logic analyser Expected throughput ~7 Gbit/s Measured 5.7 Gbit/s mmrbc 1024 bytes mmrbc 2048 bytes mmrbc 4096 bytes 5.7Gbit/s mmrbc 512 bytes CSR Access PCI-X Sequence Data Transfer Interrupt & CSR Update
40
Networkshop March 2005 Richard Hughes-Jones Manchester 10 Gigabit Ethernet: SC2004 TCP Tests uSun AMD opteron compute servers v20z uChelsio TOE Tests between Linux 2.6.6. hosts 10 Gbit ethernet link from SC2004 to CENIC/NLR/Level(3) PoP in Sunnyvale Two 2.4GHz AMD 64 bit Opteron processors with 4GB of RAM at SC2004 1500B MTU, all Linux 2.6.6 in one direction 9.43G i.e. 9.07G goodput and the reverse direction 5.65G i.e. 5.44G goodput Total of 15+G on wire. 10 Gbit ethernet link from SC2004 to ESnet/QWest PoP in Sunnyvale One 2.4GHz AMD 64 bit Opteron each end 2MByte window, 16 streams, 1500B MTU, all Linux 2.6.6 in one direction 7.72Gbit/s i.e. 7.42 Gbit/s goodput 120mins (6.6Tbits shipped) uS2io NICs with Solaris 10 in 4*2.2GHz Opteron cpu v40z to one or more S2io or Chelsio NICs with Linux 2.6.5 or 2.6.6 in 2*2.4GHz V20Zs LAN 1 S2io NIC back to back: 7.46 Gbit/s LAN 2 S2io in V40z to 2 V20z : each NIC ~6 Gbit/s total 12.08 Gbit/s
41
Networkshop March 2005 Richard Hughes-Jones Manchester Transatlantic Ethernet: disk-to-disk Tests uSupermicro X5DPE-G2 PCs uDual 2.9 GHz Xenon CPU FSB 533 MHz u1500 byte MTU u2.6.6 Linux Kernel uRAID0 (6 SATA disks) uBbftp (disk-disk) throughput uStandard TCP uThroughput of 436 Mbit/s uFirst 10 sec uWork in progress to study: Throughput limitations Help real users
42
Networkshop March 2005 Richard Hughes-Jones Manchester SC2004 Disk-Disk bbftp (work in progress) uUKLight: Path:- London-Chicago-London; PCs:- Supermicro +3Ware RAID0 uMTU 1500 bytes; Socket size 22 Mbytes; rtt 177ms; SACK off uMove a 2 Gbyte file uWeb100 plots: uHS TCP Don ’ t believe this is a protocol problem !
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.