13th-14th July 2004 University College London End-user systems: NICs, MotherBoards, TCP Stacks & Applications Richard Hughes-Jones.

Slides:



Advertisements
Similar presentations
MB - NG MB-NG Technical Meeting 03 May 02 R. Hughes-Jones Manchester 1 Task2 Traffic Generation and Measurement Definitions Pass-1.
Advertisements

DataTAG CERN Oct 2002 R. Hughes-Jones Manchester Initial Performance Measurements With DataTAG PCs Gigabit Ethernet NICs (Work in progress Oct 02)
CALICE, Mar 2007, R. Hughes-Jones Manchester 1 Protocols Working with 10 Gigabit Ethernet Richard Hughes-Jones The University of Manchester
JIVE VLBI Network Meeting 15 Jan 2003 R. Hughes-Jones Manchester The EVN-NREN Project Richard Hughes-Jones The University of Manchester.
GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 1 End-2-End Network Monitoring What do we do ? What do we use it for? Richard Hughes-Jones Many people.
20th-21st June 2005 NeSC Edinburgh End-user systems: NICs, MotherBoards, Disks, TCP Stacks & Applications Richard Hughes-Jones.
ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester 1 Protocols Working with 10 Gigabit Ethernet Richard Hughes-Jones The University.
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004 R. Hughes-Jones Manchester Networking for ATLAS Remote Farms Richard Hughes-Jones The University.
Slide: 1 Richard Hughes-Jones T2UK, October 06 R. Hughes-Jones Manchester 1 Update on Remote Real-Time Computing Farms For ATLAS Trigger DAQ. Richard Hughes-Jones.
GridPP meeting Feb 03 R. Hughes-Jones Manchester WP7 Networking Richard Hughes-Jones.
CdL was here DataTAG/WP7 Amsterdam June 2002 R. Hughes-Jones Manchester 1 EU DataGrid - Network Monitoring Richard Hughes-Jones, University of Manchester.
PFLDnet, Nara, Japan 2-3 Feb 2006, R. Hughes-Jones Manchester 1 Transport Benchmarking Panel Discussion Richard Hughes-Jones The University of Manchester.
5 Annual e-VLBI Workshop, September 2006, Haystack Observatory R. Hughes-Jones Manchester 1 The Network Transport layer and the Application or TCP/IP.
Slide: 1 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 1 Investigating the interaction between high-performance network and disk.
DataGrid WP7 Meeting CERN April 2002 R. Hughes-Jones Manchester Some Measurements on the SuperJANET 4 Production Network (UK Work in progress)
JIVE VLBI Network Meeting 28 Jan 2004 R. Hughes-Jones Manchester Brief Report on Tests Related to the e-VLBI Project Richard Hughes-Jones The University.
T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester 1 ATLAS Networking & T2UK Richard Hughes-Jones The University of Manchester then.
CALICE UCL, 20 Feb 2006, R. Hughes-Jones Manchester 1 10 Gigabit Ethernet Test Lab PCI-X Motherboards Related work & Initial tests Richard Hughes-Jones.
DataTAG Meeting CERN 7-8 May 03 R. Hughes-Jones Manchester 1 High Throughput: Progress and Current Results Lots of people helped: MB-NG team at UCL MB-NG.
PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester 1 UDP Performance and PCI-X Activity of the Intel 10 Gigabit Ethernet Adapter on: HP rx2600 Dual Itanium.
© 2006 Open Grid Forum Interactions Between Networks, Protocols & Applications HPCN-RG Richard Hughes-Jones OGF20, Manchester, May 2007,
Slide: 1 Richard Hughes-Jones CHEP2004 Interlaken Sep 04 R. Hughes-Jones Manchester 1 Bringing High-Performance Networking to HEP users Richard Hughes-Jones.
ESLEA Bedfont Lakes Dec 04 Richard Hughes-Jones Network Measurement & Characterisation and the Challenge of SuperComputing SC200x.
CdL was here DataTAG CERN Sep 2002 R. Hughes-Jones Manchester 1 European Topology: NRNs & Geant SuperJANET4 CERN UvA Manc SURFnet RAL.
MB - NG MB-NG Meeting UCL 17 Jan 02 R. Hughes-Jones Manchester 1 Discussion of Methodology for MPLS QoS & High Performance High throughput Investigations.
02 nd April 03Networkshop Managed Bandwidth Next Generation F. Saka UCL NETSYS (NETwork SYStems centre of excellence)
GGF4 Toronto Feb 2002 R. Hughes-Jones Manchester Initial Performance Measurements Gigabit Ethernet NICs 64 bit PCI Motherboards (Work in progress Mar 02)
Sven Ubik, Petr Žejdl CESNET TNC2008, Brugges, 19 May 2008 Passive monitoring of 10 Gb/s lines with PC hardware.
Slide: 1 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 1 TCP/IP and Other Transports for High Bandwidth Applications.
3: Transport Layer3b-1 Principles of Congestion Control Congestion: r informally: “too many sources sending too much data too fast for network to handle”
Transport Layer 4 2: Transport Layer 4.
EVN-NREN Meeting, Zaandan, 31 Oct 2006, R. Hughes-Jones Manchester 1 FABRIC 4 Gigabit Work & VLBI-UDP Performance and Stability. Richard Hughes-Jones The.
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.
Slide: 1 Richard Hughes-Jones e-VLBI Network Meeting 28 Jan 2005 R. Hughes-Jones Manchester 1 TCP/IP Overview & Performance Richard Hughes-Jones The University.
Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., Allyn Romanow, Cisco Systems,
Slide: 1 Richard Hughes-Jones Mini-Symposium on Optical Data Networking, August 2005, R. Hughes-Jones Manchester 1 Using TCP/IP on High Bandwidth Long.
Srihari Makineni & Ravi Iyer Communications Technology Lab
High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group.
High-speed TCP  FAST TCP: motivation, architecture, algorithms, performance (by Cheng Jin, David X. Wei and Steven H. Low)  Modifying TCP's Congestion.
ESLEA VLBI Bits&Bytes Workshop, 4-5 May 2006, R. Hughes-Jones Manchester 1 VLBI Data Transfer Tests Recent and Current Work. Richard Hughes-Jones The University.
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.
MB - NG MB-NG Meeting Dec 2001 R. Hughes-Jones Manchester MB – NG SuperJANET4 Development Network SuperJANET4 Production Network Leeds RAL / UKERNA RAL.
ESLEA Bits&Bytes, Manchester, 7-8 Dec 2006, R. Hughes-Jones Manchester 1 Protocols DCCP and dccpmon. Richard Hughes-Jones The University of Manchester.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
ESLEA-FABRIC Technical Meeting, 1 Sep 2006, R. Hughes-Jones Manchester 1 Multi-Gigabit Trials on GEANT Collaboration with Dante. Richard Hughes-Jones The.
CAIDA Bandwidth Estimation Meeting San Diego June 2002 R. Hughes-Jones Manchester UDPmon and TCPstream Tools to understand Network Performance Richard.
PFLDNet Workshop February 2003 R. Hughes-Jones Manchester Some Performance Measurements Gigabit Ethernet NICs & Server Quality Motherboards Richard Hughes-Jones.
Collaboration Meeting, 4 Jul 2006, R. Hughes-Jones Manchester 1 Collaborations in Networking and Protocols HEP and Radio Astronomy Richard Hughes-Jones.
Transport Layer 3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March
Transport Layer3-1 Chapter 3 outline r 3.1 Transport-layer services r 3.2 Multiplexing and demultiplexing r 3.3 Connectionless transport: UDP r 3.4 Principles.
Networkshop March 2005 Richard Hughes-Jones Manchester Bandwidth Challenge, Land Speed Record, TCP/IP and You.
MB - NG MB-NG Meeting UCL 17 Jan 02 R. Hughes-Jones Manchester 1 Discussion of Methodology for MPLS QoS & High Performance High throughput Investigations.
GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 1 Lessons Learned in Grid Networking or How do we get end-2-end performance to Real Users ? Richard.
TCP transfers over high latency/bandwidth networks & Grid DT Measurements session PFLDnet February 3- 4, 2003 CERN, Geneva, Switzerland Sylvain Ravot
Networks ∙ Services ∙ People Richard-Hughes Jones eduPERT Training Session, Porto A Hands-On Session udpmon for Network Troubleshooting 18/06/2015.
Final EU Review - 24/03/2004 DataTAG is a project funded by the European Commission under contract IST Richard Hughes-Jones The University of.
ESLEA VLBI Bits&Bytes Workshop, 31 Aug 2006, R. Hughes-Jones Manchester 1 vlbi_udp Throughput Performance and Stability. Richard Hughes-Jones The University.
L1/HLT trigger farm Bologna setup 0 By Gianluca Peco INFN Bologna Genève,
An Analysis of AIMD Algorithm with Decreasing Increases Yunhong Gu, Xinwei Hong, and Robert L. Grossman National Center for Data Mining.
Recent experience with PCI-X 2.0 and PCI-E network interfaces and emerging server systems Yang Xia Caltech US LHC Network Working Group October 23, 2006.
Connect. Communicate. Collaborate 4 Gigabit Onsala - Jodrell Lightpath for e-VLBI Richard Hughes-Jones.
DataGrid WP7 Meeting Jan 2002 R. Hughes-Jones Manchester Initial Performance Measurements Gigabit Ethernet NICs 64 bit PCI Motherboards (Work in progress)
MB MPLS MPLS Technical Meeting Sep 2001 R. Hughes-Jones Manchester SuperJANET Development Network Testbed – Cisco GSR SuperJANET4 C-PoP – Cisco GSR.
R. Hughes-Jones Manchester
Networking between China and Europe
Transport Protocols over Circuits/VCs
MB-NG Review High Performance Network Demonstration 21 April 2004
MB – NG SuperJANET4 Development Network
Review of Internet Protocols Transport Layer
Presentation transcript:

13th-14th July 2004 University College London End-user systems: NICs, MotherBoards, TCP Stacks & Applications Richard Hughes-Jones Work reported is from many Network Collaborations Special mention: Yee Ting Lee UCL and Stephen Dallison Manchester

Slide: 2 Richard Hughes-Jones Network Performance Issues End System Issues Network Interface Card and Driver and their configuration Processor speed MotherBoard configuration, Bus speed and capability Disk System TCP and its configuration Operating System and its configuration Network Infrastructure Issues Obsolete network equipment Configured bandwidth restrictions Topology Security restrictions (e.g., firewalls) Sub-optimal routing Transport Protocols Network Capacity and the influence of Others! Congestion – Group, Campus, Access links Many, many TCP connections Mice and Elephants on the path

Slide: 3 Richard Hughes-Jones Methodology used in testing NICs & Motherboards

Slide: 4 Richard Hughes-Jones uUDP/IP packets sent between back-to-back systems Processed in a similar manner to TCP/IP Not subject to flow control & congestion avoidance algorithms Used UDPmon test program uLatency uRound trip times measured using Request-Response UDP frames uLatency as a function of frame size Slope is given by: Mem-mem copy(s) + pci + Gig Ethernet + pci + mem-mem copy(s) Intercept indicates: processing times + HW latencies uHistograms of ‘singleton’ measurements uTells us about: Behavior of the IP stack The way the HW operates Interrupt coalescence Latency Measurements

Slide: 5 Richard Hughes-Jones Throughput Measurements (1) uUDP Throughput uSend a controlled stream of UDP frames spaced at regular intervals n bytes Number of packets Wait time time  Zero stats OK done ●●● Get remote statistics Send statistics: No. received No. lost + loss pattern No. out-of-order CPU load & no. int 1-way delay Send data frames at regular intervals ●●● Time to send Time to receive Inter-packet time (Histogram) Signal end of test OK done Time Sender Receiver

Slide: 6 Richard Hughes-Jones PCI Bus & Gigabit Ethernet Activity uPCI Activity uLogic Analyzer with PCI Probe cards in sending PC Gigabit Ethernet Fiber Probe Card PCI Probe cards in receiving PC Gigabit Ethernet Probe CPU mem chipset NIC CPU mem NIC chipset Logic Analyser Display PCI bus Possible Bottlenecks

Slide: 7 Richard Hughes-Jones A “Server Quality” Motherboard u SuperMicro P4DP6 uDual Xeon Prestonia (2cpu/die) u 400 MHx Front side bus u Intel® E7500 Chipset u 6 PCI PCI-X slots u 4 independent PCI buses u Can select: 64 bit 66 MHz PCI 100 MHz PCI-X 133 MHz PCI-X u Mbit Ethernet u Adaptec AIC-7899W dual channel SCSI u UDMA/100 bus master/EIDE channels data transfer rates of 100 MB/sec burst u P4DP8-2G dual Gigabit Ethernet

Slide: 8 Richard Hughes-Jones NIC & Motherboard Evaluations

Slide: 9 Richard Hughes-Jones SuperMicro 370DLE: Latency: SysKonnect PCI:32 bit 33 MHz Latency small 62 µs & well behaved Latency Slope µs/byte Expect: µs/byte PCI GigE0.008 PCI PCI:64 bit 66 MHz Latency small 56 µs & well behaved Latency Slope µs/byte Expect: µs/byte PCI GigE0.008 PCI n Possible extra data moves ? nMotherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset nCPU: PIII 800 MHz nRedHat 7.1 Kernel

Slide: 10 Richard Hughes-Jones SuperMicro 370DLE: Throughput: SysKonnect PCI:32 bit 33 MHz Max throughput 584Mbit/s No packet loss >18 us spacing PCI:64 bit 66 MHz Max throughput 720 Mbit/s No packet loss >17 us spacing Packet loss during BW drop % Kernel mode nMotherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset nCPU: PIII 800 MHz nRedHat 7.1 Kernel

Slide: 11 Richard Hughes-Jones SuperMicro 370DLE: PCI: SysKonnect Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel Receive transfer Send PCI Packet on Ethernet Fibre Send setup Send transfer Receive PCI 1400 bytes sent Wait 100 us ~8 us for send or receive Stack & Application overhead ~ 10 us / node ~36 us

Slide: 12 Richard Hughes-Jones Send PCI Receive PCI Send setup Data Transfers Receive Transfers Send PCI Receive PCI Send setup Data Transfers Receive Transfers Signals on the PCI bus u1472 byte packets every 15 µs Intel Pro/1000 uPCI:64 bit 33 MHz u82% usage uPCI:64 bit 66 MHz u65% usage uData transfers half as long

Slide: 13 Richard Hughes-Jones SuperMicro 370DLE: PCI: SysKonnect 1400 bytes sent Wait 20 us 1400 bytes sent Wait 10 us nMotherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset nCPU: PIII 800 MHz PCI:64 bit 66 MHz nRedHat 7.1 Kernel Frames on Ethernet Fiber 20 us spacing Frames are back-to-back 800 MHz Can drive at line speed Cannot go any faster !

Slide: 14 Richard Hughes-Jones SuperMicro 370DLE: Throughput: Intel Pro/1000 Max throughput 910 Mbit/s No packet loss >12 us spacing Packet loss during BW drop CPU load 65-90% spacing < 13 us nMotherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset nCPU: PIII 800 MHz PCI:64 bit 66 MHz nRedHat 7.1 Kernel

Slide: 15 Richard Hughes-Jones SuperMicro 370DLE: PCI: Intel Pro/1000 nMotherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset nCPU: PIII 800 MHz PCI:64 bit 66 MHz nRedHat 7.1 Kernel nRequest – Response nDemonstrates interrupt coalescence nNo processing directly after each transfer

Slide: 16 Richard Hughes-Jones SuperMicro P4DP6: Latency Intel Pro/1000 Some steps Slope us/byte Slope flat sections : us/byte Expect us/byte No variation with packet size FWHM 1.5 us Confirms timing reliable Motherboard: SuperMicro P4DP6 Chipset: Intel E7500 (Plumas) CPU: Dual Xeon Prestonia 2.2 GHz PCI, 64 bit, 66 MHz RedHat 7.2 Kernel

Slide: 17 Richard Hughes-Jones SuperMicro P4DP6: Throughput Intel Pro/1000 Max throughput 950Mbit/s No packet loss CPU utilisation on the receiving PC was ~ 25 % for packets > than 1000 bytes % for smaller packets Motherboard: SuperMicro P4DP6 Chipset: Intel E7500 (Plumas) CPU: Dual Xeon Prestonia 2.2 GHz PCI, 64 bit, 66 MHz RedHat 7.2 Kernel

Slide: 18 Richard Hughes-Jones SuperMicro P4DP6: PCI Intel Pro/ bytes sent Wait 12 us ~5.14us on send PCI bus PCI bus ~68% occupancy ~ 3 us on PCI for data recv CSR access inserts PCI STOPs NIC takes ~ 1 us/CSR CPU faster than the NIC ! Similar effect with the SysKonnect NIC Motherboard: SuperMicro P4DP6 Chipset: Intel E7500 (Plumas) CPU: Dual Xeon Prestonia 2.2 GHz PCI, 64 bit, 66 MHz RedHat 7.2 Kernel

Slide: 19 Richard Hughes-Jones SuperMicro P4DP8-G2: Throughput Intel onboard Max throughput 995Mbit/s No packet loss 20% CPU utilisation receiver packets > 1000 bytes 30% CPU utilisation smaller packets nMotherboard: SuperMicro P4DP8-G2 Chipset: Intel E7500 (Plumas) nCPU: Dual Xeon Prestonia 2.4 GHz PCI-X:64 bit nRedHat 7.3 Kernel

Slide: 20 Richard Hughes-Jones Interrupt Coalescence: Throughput Intel Pro 1000 on 370DLE

Slide: 21 Richard Hughes-Jones Interrupt Coalescence Investigations u Set Kernel parameters for Socket Buffer size = rtt*BW u TCP mem-mem lon2-man1 u Tx 64 Tx-abs 64 u Rx 0 Rx-abs 128 u Mbit/s Mbit/s u Tx 64 Tx-abs 64 u Rx 20 Rx-abs 128 u Mbit/s Mbit/s u Tx 64 Tx-abs 64 u Rx 80 Rx-abs 128 u Mbit/s +- 1 Mbit/s

Slide: 22 Richard Hughes-Jones Supermarket Motherboards

Slide: 23 Richard Hughes-Jones Tyan Tiger S2466N uMotherboard: Tyan Tiger S2466N uPCI: 1 64bit 66 MHz uCPU: Athlon MP2000+ uChipset: AMD-760 MPX u3Ware forces PCI bus to 33 MHz uBaBar Tyan to MB-NG SuperMicro Network mem-mem 619 Mbit/s

Slide: 24 Richard Hughes-Jones IBM das: Throughput: Intel Pro/1000 Max throughput 930Mbit/s No packet loss > 12 us Clean behaviour Packet loss during drop nMotherboard: IBM das Chipset:: ServerWorks CNB20LE nCPU: Dual PIII 1GHz PCI:64 bit 33 MHz nRedHat 7.1 Kernel bytes sent 11 us spacing Signals clean ~9.3us on send PCI bus PCI bus ~82% occupancy ~ 5.9 us on PCI for data recv.

Slide: 25 Richard Hughes-Jones 10 Gigabit Ethernet

Slide: 26 Richard Hughes-Jones 10 Gigabit Ethernet: UDP Throughput u1500 byte MTU gives ~ 2 Gbit/s uUsed byte MTU max user length uDataTAG Supermicro PCs uDual 2.2 GHz Xenon CPU FSB 400 MHz uPCI-X mmrbc 512 bytes uwire rate throughput of 2.9 Gbit/s uCERN OpenLab HP Itanium PCs uDual 1.0 GHz 64 bit Itanium CPU FSB 400 MHz uPCI-X mmrbc 4096 bytes uwire rate of 5.7 Gbit/s uSLAC Dell PCs giving a uDual 3.0 GHz Xenon CPU FSB 533 MHz uPCI-X mmrbc 4096 bytes uwire rate of 5.4 Gbit/s

Slide: 27 Richard Hughes-Jones 10 Gigabit Ethernet: Tuning PCI-X u16080 byte packets every 200 µs uIntel PRO/10GbE LR Adapter uPCI-X bus occupancy vs mmrbc Measured times Times based on PCI-X times from the logic analyser Expected throughput ~7 Gbit/s Measured 5.7 Gbit/s mmrbc 1024 bytes mmrbc 2048 bytes mmrbc 4096 bytes 5.7Gbit/s mmrbc 512 bytes CSR Access PCI-X Sequence Data Transfer Interrupt & CSR Update

Slide: 28 Richard Hughes-Jones Different TCP Stacks

Slide: 29 Richard Hughes-Jones Investigation of new TCP Stacks uThe AIMD Algorithm – Standard TCP (Reno) For each ack in a RTT without loss: cwnd -> cwnd + a / cwnd- Additive Increase, a=1 For each window experiencing loss: cwnd -> cwnd – b (cwnd) - Multiplicative Decrease, b= ½ uHigh Speed TCP a and b vary depending on current cwnd using a table a increases more rapidly with larger cwnd – returns to the ‘optimal’ cwnd size sooner for the network path b decreases less aggressively and, as a consequence, so does the cwnd. The effect is that there is not such a decrease in throughput. uScalable TCP a and b are fixed adjustments for the increase and decrease of cwnd a = 1/100 – the increase is greater than TCP Reno b = 1/8 – the decrease on loss is less than TCP Reno Scalable over any link speed. uFast TCP Uses round trip time as well as packet loss to indicate congestion with rapid convergence to fair equilibrium for throughput.

Slide: 30 Richard Hughes-Jones Comparison of TCP Stacks uTCP Response Function Throughput vs Loss Rate – further to right: faster recovery Drop packets in kernel MB-NG rtt 6ms DataTAG rtt 120 ms

Slide: 31 Richard Hughes-Jones High Throughput Demonstrations Manchester (Geneva) man03lon Gbit SDH MB-NG Core 1 GEth Cisco GSR Cisco 7609 Cisco 7609 London (Chicago) Dual Zeon 2.2 GHz Send data with TCP Drop Packets Monitor TCP with Web100

Slide: 32 Richard Hughes-Jones uDrop 1 in 25,000 urtt 6.2 ms uRecover in 1.6 s High Performance TCP – MB-NG StandardHighSpeed Scalable

Slide: 33 Richard Hughes-Jones High Performance TCP – DataTAG uDifferent TCP stacks tested on the DataTAG Network u rtt 128 ms uDrop 1 in 10 6 uHigh-Speed Rapid recovery uScalable Very fast recovery uStandard Recovery would take ~ 20 mins

Slide: 34 Richard Hughes-Jones Application Throughput

Slide: 35 Richard Hughes-Jones Topology of the MB – NG Network Key Gigabit Ethernet 2.5 Gbit POS Access MPLS Admin. Domains UCL Domain Edge Router Cisco 7609 man01 man03 Boundary Router Cisco 7609 RAL Domain Manchester Domain lon02 man02 ral01 UKERNA Development Network Boundary Router Cisco 7609 ral02 lon03 lon01 HW RAID

Slide: 36 Richard Hughes-Jones Gridftp Throughput + Web100 u RAID0 Disks: 960 Mbit/s read 800 Mbit/s write u Throughput Mbit/s: u See alternate 600/800 Mbit and zero u Data Rate: 520 Mbit/s u Cwnd smooth u No dup Ack / send stall / timeouts

Slide: 37 Richard Hughes-Jones http data transfers HighSpeed TCP u Same Hardware u Bulk data moved by web servers u Apachie web server out of the box! uprototype client - curl http library u1Mbyte TCP buffers u2Gbyte file u Throughput ~720 Mbit/s u Cwnd - some variation u No dup Ack / send stall / timeouts

Slide: 38 Richard Hughes-Jones Bbcp & GridFTP Throughput u 2Gbyte file transferred RAID5 - 4disks Manc – RAL u bbcp u Mean 710 Mbit/s u GridFTP u See many zeros Mean ~710 Mean ~620 u DataTAG altAIMD kernel in BaBar & ATLAS

Slide: 39 Richard Hughes-Jones uThe NICs should be well designed: Use advanced PCI commands - Chipset will then make efficient use of memory uThe drivers need to be well written: CSR access / Clean management of buffers / Good interrupt handling u Worry about the CPU-Memory bandwidth as well as the PCI bandwidth Data crosses the memory bus at least 3 times uSeparate the data transfers – use motherboards with multiple 64 bit PCI-X buses 32 bit 33 MHz is too slow for Gigabit rates 64 bit 33 MHz > 80% used uNeed plenty of CPU power for sustained 1 Gbit/s transfers uUse of Jumbo frames, Interrupt Coalescence and Tuning the PCI-X bus helps uNew TCP stacks are stable and run with 10 Gigabit Ethernet NICs uNew stacks give better performance uApplication architecture & implementation is also important Summary, Conclusions & Thanks

Slide: 40 Richard Hughes-Jones More Information Some URLs uMB-NG project web site: uDataTAG project web site: uUDPmon / TCPmon kit + writeup: uMotherboard and NIC Tests: & uTCP tuning information may be found at: &

Slide: 41 Richard Hughes-Jones