PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester 1 UDP Performance and PCI-X Activity of the Intel 10 Gigabit Ethernet Adapter on: HP rx2600 Dual Itanium.

Slides:



Advertisements
Similar presentations
TCP transfers over high latency/bandwidth network & Grid TCP Sylvain Ravot
Advertisements

MB - NG MB-NG Technical Meeting 03 May 02 R. Hughes-Jones Manchester 1 Task2 Traffic Generation and Measurement Definitions Pass-1.
DataTAG CERN Oct 2002 R. Hughes-Jones Manchester Initial Performance Measurements With DataTAG PCs Gigabit Ethernet NICs (Work in progress Oct 02)
CALICE, Mar 2007, R. Hughes-Jones Manchester 1 Protocols Working with 10 Gigabit Ethernet Richard Hughes-Jones The University of Manchester
LCG TCP performance optimization for 10 Gb/s LHCOPN connections 1 on behalf of M. Bencivenni, T.Ferrari, D. De Girolamo, Stefano.
JIVE VLBI Network Meeting 15 Jan 2003 R. Hughes-Jones Manchester The EVN-NREN Project Richard Hughes-Jones The University of Manchester.
GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 1 End-2-End Network Monitoring What do we do ? What do we use it for? Richard Hughes-Jones Many people.
20th-21st June 2005 NeSC Edinburgh End-user systems: NICs, MotherBoards, Disks, TCP Stacks & Applications Richard Hughes-Jones.
TCP and ATLAS T/DAQ Dec 2002 R. Hughes-Jones Manchester TCP/IP and ATLAS T/DAQ With help from: Richard, HansPeter, Bob, & …
ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester 1 Protocols Working with 10 Gigabit Ethernet Richard Hughes-Jones The University.
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004 R. Hughes-Jones Manchester Networking for ATLAS Remote Farms Richard Hughes-Jones The University.
Slide: 1 Richard Hughes-Jones T2UK, October 06 R. Hughes-Jones Manchester 1 Update on Remote Real-Time Computing Farms For ATLAS Trigger DAQ. Richard Hughes-Jones.
High-Performance Throughput Tuning/Measurements Davide Salomoni & Steffen Luitz Presented at the PPDG Collaboration Meeting, Argonne National Lab, July.
CdL was here DataTAG/WP7 Amsterdam June 2002 R. Hughes-Jones Manchester 1 EU DataGrid - Network Monitoring Richard Hughes-Jones, University of Manchester.
PFLDnet, Nara, Japan 2-3 Feb 2006, R. Hughes-Jones Manchester 1 Transport Benchmarking Panel Discussion Richard Hughes-Jones The University of Manchester.
Slide: 1 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 1 Investigating the interaction between high-performance network and disk.
IEEE Real Time 2007, Fermilab, 29 April – 4 May R. Hughes-Jones Manchester 1 Using FPGAs to Generate Gigabit Ethernet Data Transfers & The Network Performance.
DataGrid WP7 Meeting CERN April 2002 R. Hughes-Jones Manchester Some Measurements on the SuperJANET 4 Production Network (UK Work in progress)
JIVE VLBI Network Meeting 28 Jan 2004 R. Hughes-Jones Manchester Brief Report on Tests Related to the e-VLBI Project Richard Hughes-Jones The University.
CALICE UCL, 20 Feb 2006, R. Hughes-Jones Manchester 1 10 Gigabit Ethernet Test Lab PCI-X Motherboards Related work & Initial tests Richard Hughes-Jones.
DataTAG Meeting CERN 7-8 May 03 R. Hughes-Jones Manchester 1 High Throughput: Progress and Current Results Lots of people helped: MB-NG team at UCL MB-NG.
© 2006 Open Grid Forum Interactions Between Networks, Protocols & Applications HPCN-RG Richard Hughes-Jones OGF20, Manchester, May 2007,
Slide: 1 Richard Hughes-Jones CHEP2004 Interlaken Sep 04 R. Hughes-Jones Manchester 1 Bringing High-Performance Networking to HEP users Richard Hughes-Jones.
ESLEA Bedfont Lakes Dec 04 Richard Hughes-Jones Network Measurement & Characterisation and the Challenge of SuperComputing SC200x.
CdL was here DataTAG CERN Sep 2002 R. Hughes-Jones Manchester 1 European Topology: NRNs & Geant SuperJANET4 CERN UvA Manc SURFnet RAL.
July 2000 PPNCG Meeting R. Hughes-Jones Performance Measurements of LANs MANs and SuperJANET III This is PRELIMINARY uBaseline data for Grid development.
MB - NG MB-NG Meeting UCL 17 Jan 02 R. Hughes-Jones Manchester 1 Discussion of Methodology for MPLS QoS & High Performance High throughput Investigations.
Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji  Hemal V. Shah ¥ D. K. Panda 
02 nd April 03Networkshop Managed Bandwidth Next Generation F. Saka UCL NETSYS (NETwork SYStems centre of excellence)
GGF4 Toronto Feb 2002 R. Hughes-Jones Manchester Initial Performance Measurements Gigabit Ethernet NICs 64 bit PCI Motherboards (Work in progress Mar 02)
13th-14th July 2004 University College London End-user systems: NICs, MotherBoards, TCP Stacks & Applications Richard Hughes-Jones.
Sven Ubik, Petr Žejdl CESNET TNC2008, Brugges, 19 May 2008 Passive monitoring of 10 Gb/s lines with PC hardware.
A TCP/IP transport layer for the DAQ of the CMS Experiment Miklos Kozlovszky for the CMS TriDAS collaboration CERN European Organization for Nuclear Research.
EVN-NREN Meeting, Zaandan, 31 Oct 2006, R. Hughes-Jones Manchester 1 FABRIC 4 Gigabit Work & VLBI-UDP Performance and Stability. Richard Hughes-Jones The.
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.
Slide: 1 Richard Hughes-Jones e-VLBI Network Meeting 28 Jan 2005 R. Hughes-Jones Manchester 1 TCP/IP Overview & Performance Richard Hughes-Jones The University.
Srihari Makineni & Ravi Iyer Communications Technology Lab
High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group.
1 Network Performance Optimisation and Load Balancing Wulf Thannhaeuser.
ESLEA VLBI Bits&Bytes Workshop, 4-5 May 2006, R. Hughes-Jones Manchester 1 VLBI Data Transfer Tests Recent and Current Work. Richard Hughes-Jones The University.
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.
MB - NG MB-NG Meeting Dec 2001 R. Hughes-Jones Manchester MB – NG SuperJANET4 Development Network SuperJANET4 Production Network Leeds RAL / UKERNA RAL.
ESLEA Bits&Bytes, Manchester, 7-8 Dec 2006, R. Hughes-Jones Manchester 1 Protocols DCCP and dccpmon. Richard Hughes-Jones The University of Manchester.
Masaki Hirabaru NICT Koganei 3rd e-VLBI Workshop October 6, 2004 Makuhari, Japan Performance Measurement on Large Bandwidth-Delay Product.
ESLEA-FABRIC Technical Meeting, 1 Sep 2006, R. Hughes-Jones Manchester 1 Multi-Gigabit Trials on GEANT Collaboration with Dante. Richard Hughes-Jones The.
CAIDA Bandwidth Estimation Meeting San Diego June 2002 R. Hughes-Jones Manchester UDPmon and TCPstream Tools to understand Network Performance Richard.
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
PFLDNet Workshop February 2003 R. Hughes-Jones Manchester Some Performance Measurements Gigabit Ethernet NICs & Server Quality Motherboards Richard Hughes-Jones.
TERENA Networking Conference, Zagreb, Croatia, 21 May 2003 High-Performance Data Transport for Grid Applications T. Kelly, University of Cambridge, UK.
DataGrid WP7 Meeting Amsterdam Nov 01 R. Hughes-Jones Manchester 1 UDPmon Measuring Throughput with UDP  Send a burst of UDP frames spaced at regular.
Performance Engineering E2EpiPEs and FastTCP Internet2 member meeting - Indianapolis World Telecom Geneva October 15, 2003
GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 1 Lessons Learned in Grid Networking or How do we get end-2-end performance to Real Users ? Richard.
TCP transfers over high latency/bandwidth networks & Grid DT Measurements session PFLDnet February 3- 4, 2003 CERN, Geneva, Switzerland Sylvain Ravot
Networks ∙ Services ∙ People Richard-Hughes Jones eduPERT Training Session, Porto A Hands-On Session udpmon for Network Troubleshooting 18/06/2015.
Final EU Review - 24/03/2004 DataTAG is a project funded by the European Commission under contract IST Richard Hughes-Jones The University of.
L1/HLT trigger farm Bologna setup 0 By Gianluca Peco INFN Bologna Genève,
16 th IEEE NPSS Real Time Conference 2009 IHEP, Beijing, China, 12 th May, 2009 High Rate Packets Transmission on 10 Gbit/s Ethernet LAN Using Commodity.
Connect communicate collaborate Performance Metrics & Basic Tools Robert Stoy, DFN EGI TF, Madrid September 2013.
Recent experience with PCI-X 2.0 and PCI-E network interfaces and emerging server systems Yang Xia Caltech US LHC Network Working Group October 23, 2006.
Connect. Communicate. Collaborate 4 Gigabit Onsala - Jodrell Lightpath for e-VLBI Richard Hughes-Jones.
DataGrid WP7 Meeting Jan 2002 R. Hughes-Jones Manchester Initial Performance Measurements Gigabit Ethernet NICs 64 bit PCI Motherboards (Work in progress)
MB MPLS MPLS Technical Meeting Sep 2001 R. Hughes-Jones Manchester SuperJANET Development Network Testbed – Cisco GSR SuperJANET4 C-PoP – Cisco GSR.
CALICE TDAQ Application Network Protocols 10 Gigabit Lab
R. Hughes-Jones Manchester
Data Transfer Node Performance GÉANT & AENEAS
MB-NG Review High Performance Network Demonstration 21 April 2004
MB – NG SuperJANET4 Development Network
High-Performance Data Transport for Grid Applications
Presentation transcript:

PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester 1 UDP Performance and PCI-X Activity of the Intel 10 Gigabit Ethernet Adapter on: HP rx2600 Dual Itanium 2 SuperMicro P4DP8-2G Dual Xenon Dell Poweredge 2650 Dual Xenon Richard Hughes-Jones Many people helped including: Sverre Jarp and Glen Hisdal CERN Open Lab Sylvain Ravot, Olivier Martin and Elise Guyot DataTAG project Les Cottrell, Connie Logg and Gary Buhrmaster SLAC Stephen Dallison MB-NG

PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester 2  Introduction  10 GigE on Itanium IA64  10 GigE on Xeon IA32  10 GigE on Dell Xeon IA32  Tuning the PCI-X bus  SC2003 Phoenix

PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester 3  UDP/IP packets sent between back-to-back systems  Similar processing to TCP/IP but no flow control & congestion avoidance algorithms  Used UDPmon test program  Latency  Round trip times using Request-Response UDP frames  Latency as a function of frame size Slope s given by: Mem-mem copy(s) + pci + Gig Ethernet + pci + mem-mem copy(s) Intercept indicates processing times + HW latencies  Histograms of ‘singleton’ measurements  UDP Throughput  Send a controlled stream of UDP frames spaced at regular intervals  Vary the frame size and the frame transmit spacing & measure: The time of first and last frames received The number packets received, lost, & out of order Histogram inter-packet spacing received packets Packet loss pattern 1-way delay CPU load Number of interrupts Latency & Throughput Measurements  Tells us about:  Behavior of the IP stack  The way the HW operates  Interrupt coalescence  Tells us about:  Behavior of the IP stack  The way the HW operates  Capacity & Available throughput of the LAN / MAN / WAN

PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester 4 The Throughput Measurements  UDP Throughput  Send a controlled stream of UDP frames spaced at regular intervals Zero stats OK done ●●● Get remote statistics Send statistics: No. received No. lost + loss pattern No. out-of-order CPU load & no. int 1-way delay Send data frames at regular intervals ●●● Time to send Time to receive Inter-packet time (Histogram) Signal end of test OK done n bytes Number of packets Wait time time 

PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester 5 The PCI Bus & Gigabit Ethernet Measurements  PCI Activity  Logic Analyzer with  PCI Probe cards in sending PC  Gigabit Ethernet Fiber Probe Card  PCI Probe cards in receiving PC Gigabit Ethernet Probe CPU mem chipset NIC CPU mem NIC chipset Logic Analyser Display

PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester 6 Example: The 1 Gigabit NIC Intel pro/1000 Latency Throughput Bus Activity  Motherboard: Supermicro P4DP6  Chipset: E7500 (Plumas)  CPU: Dual Xeon 2 2GHz with 512k L2 cache  Mem bus 400 MHz PCI-X 64 bit 66 MHz  HP Linux Kernel SMP  MTU 1500 bytes  Intel PRO/1000 XT Receive Transfer Send Transfer

PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester 7 Data Flow: SuperMicro 370DLE: SysKonnect Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel bytes sent Wait 100 us ~8 us for send or receive Stack & Application overhead ~ 10 us / node Send PCI Receive PCI ~36 us Send Transfer Send CSR setup Receive Transfer Packet on Ethernet Fibre

PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester 8 10 Gigabit Ethernet NIC with the PCI-X probe card.

PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester 9 Intel PRO/10GbE LR Adapter in the HP rx2600 system

PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester GigE on Itanium IA64: UDP Latency  Motherboard: HP rx2600 IA 64  Chipset: HPzx1  CPU: Dual Itanium 2 1GHz with 512k L2 cache  Mem bus dual 622 MHz 4.3 GByte/s  PCI-X 133 MHz  HP Linux Kernel SMP  Intel PRO/10GbE LR Server Adapter  NIC driver with  RxIntDelay=0  XsumRX=1 XsumTX=1  RxDescriptors=2048 TxDescriptors=2048  MTU 1500 bytes  Latency 100 µs & very well behaved  Latency Slope µs/byte  B2B Expect: µs/byte  PCI **  10GigE  PCI

PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester GigE on Itanium IA64: Latency Histograms  Double peak structure with the peaks separated by 3-4 µs  Peaks are ~1-2 µs wide  Similar to that observed with 1 Gbit Ethernet NICs on IA32 architectures

PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester GigE on Itanium IA64: UDP Throughput HP Linux Kernel SMP MTU bytes Max throughput Gbit/s Int on every packet No packet loss in 10M packets Sending host, 1 CPU is idle For byte packets, one CPU is 40% in kernel mode As the packet size decreases load rises to ~90% for packets of 4000 bytes or less. Receiving host both CPUs busy bytes 40% kernel mode Small packets 80 % kernel mode TCP gensink data rate was 745 MBytes/s = 5.96 Gbit/s

PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester GigE on Itanium IA64: UDP Throughput [04] HP Linux Kernel #17 SMP MTU bytes Max throughput 5.81 Gbit/s Int on every packet Some packet loss pkts < 4000 bytes Sending host, 1 CPU is idle – but swap over For byte packets, one CPU is 20-30% in kernel mode As the packet size decreases load rises to ~90% for packets of 4000 bytes or less. Receiving host 1 CPU is idle – but swap over bytes 40% kernel mode Small packets 70 % kernel mode

PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester 14  byte packets every 200 µs Intel PRO/10GbE LR Server Adapter MTU  setpci -s 02:1.0 e6.b=2e ( a ) mmrbc 4096 bytes ( )  PCI-X Signals transmit - memory to NIC  Interrupt and processing: 48.4 µs after start  Data transfer takes ~22 µs  Data transfer rate over PCI-X: 5.86 Gbit/s 10 GigE on Itanium IA64: PCI-X bus Activity CSR Access Transfer of bytes PCI-X bursts 256 bytes  Made up of 4 PCI-X sequences of ~4.55 µs then a gap of 700 ns  Sequence contains 16 PCI bursts 256 bytes  Sequence length 4096 bytes ( mmrbc) CSR Access PCI-X Sequence 4096 bytes PCI-X bursts 256 bytes Gap 700ns PCI-X Sequence

PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester 15  byte packets every 200 µs Intel PRO/10GbE LR Server Adapter MTU  setpci -s 02:1.0 e6.b=2e ( a ) mmrbc 4096 bytes ( )  PCI-X Signals transmit - memory to NIC  Interrupt and processing: 48.4 µs after start  Data transfer takes ~22 µs  Data transfer rate over PCI-X: 5.86 Gbit/s  PCI-X Signals receive – NIC to memory  Interrupt every packet  Data transfer takes ~18.4 µs  Data transfer rate over PCI-X : Gbit/s  Note: receive is faster cf the 1 GE NICs 10 GigE on Itanium IA64: PCI-X bus Activity CSR Access Transfer of bytes PCI-X bursts 256 bytes Interrupt CSR Access Transfer of bytes PCI-X bursts 512 bytes PCI-X Sequence

PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester GigE on Xeon IA32: UDP Latency  Motherboard: Supermicro P4DP8-G2  Chipset: Intel E7500 (Plumas)  CPU: Dual Xeon 2.2GHz with 512k L2 cache  Mem bus 400 MHz  PCI-X 133 MHz  RedHat Kernel SMP  Intel(R) PRO/10GbE Network Driver v  Intel PRO/10GbE LR Server Adapter  NIC driver with  RxIntDelay=0  XsumRX=1 XsumTX=1  RxDescriptors=2048 TxDescriptors=2048  MTU 1500 bytes  Latency 144 µs & reasonably behaved  Latency Slope µs/byte  B2B Expect: µs/byte

PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester GigE on Xeon IA32: Latency Histograms  Double peak structure with the peaks separated by 3-4 µs  Peaks are ~1-2 µs wide  Simliar to that observed with 1 Gbit Ethernet NICs on IA32 architectures

PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester GigE on Xeon IA32: Throughput MTU bytes Max throughput 2.75 Gbit/s mmrbc 512 Max throughput 3.97 Gbit/s mmrbc 4096 bytes Int on every packet No packet loss in 10M packets Sending host, For closely spaced packets, the other CPU is ~60-70 % in kernel mode Receiving host Small packets 80 % in kernel mode >9000 bytes ~50% in kernel mode

PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester GigE on Xeon IA32: PCI-X bus Activity  byte packets every 200 µs Intel PRO/10GbE LR mmrbc 512 bytes  PCI-X Signals transmit - memory to NIC  Interrupt and processing: 70 µs after start  Data transfer takes ~44.7 µs  Data transfer rate over PCI-X: 2.88Gbit/s  PCI-X Signals receive – NIC to memory  Interrupt every packet  Data transfer takes ~18.29 µs  Data transfer rate over PCI-X : 7.014Gbit/s same as Itanium Interrupt CSR Access Transfer of bytes PCI-X burst 256 bytes Interrupt Transfer of bytes CSR Access

PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester GigE on Dell Xeon: Throughput MTU bytes Max throughput 5.4 Gbit/s Int on every packet Some packet loss pkts < 4000 bytes Sending host, For closely spaced packets, one CPU is ~70% in kernel mode CPU usage swaps Receiving host 1 CPU is idle but CPU usage swaps For closely spaced packets ~80 % in kernel mode

PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester GigE on Dell Xeon : UDP Latency  Motherboard: Dell Poweredge 2650  Chipset: Intel E7500 (Plumas)  CPU: Dual Xeon 3.06 GHz with 512k L2 cache  Mem bus 533 MHz  PCI-X 133 MHz  RedHat Kernel altAIMD  Intel(R) PRO/10GbE Network Driver v  Intel PRO/10GbE LR Server Adapter  NIC driver with  RxIntDelay=0  XsumRX=1 XsumTX=1  RxDescriptors=2048 TxDescriptors=2048  MTU bytes  Latency 36 µs with some steps  Latency Slope µs/byte  B2B Expect: µs/byte

PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester GigEthernet: Throughput  1500 byte MTU gives ~ 2 Gbit/s  Used byte MTU max user length  DataTAG Supermicro PCs  Dual 2.2 GHz Xeon CPU FSB 400 MHz  PCI-X mmrbc 512 bytes  wire rate throughput of 2.9 Gbit/s  SLAC Dell PCs giving a  Dual 3.0 GHz Xeon CPU FSB 533 MHz  PCI-X mmrbc 4096 bytes  wire rate of 5.4 Gbit/s  CERN OpenLab HP Itanium PCs  Dual 1.0 GHz 64 bit Itanium CPU FSB 400 MHz  PCI-X mmrbc 4096 bytes  wire rate of 5.7 Gbit/s

PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester 23 Tuning PCI-X: Variation of mmrbc IA32 mmrbc 1024 bytes mmrbc 2048 bytes mmrbc 4096 bytes mmrbc 512 bytes CSR Access PCI-X Sequence Data Transfer Interrupt & CSR Update  byte packets every 200 µs  Intel PRO/10GbE LR Adapter  PCI-X bus occupancy vs mmrbc  Plot:  Measured times  Times based on PCI-X times from the logic analyser  Expected throughput

PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester 24 Tuning PCI-X: Throughput vs mmrbc  DataTag IA32  2.2 GHz Xeon  400 MHz FSB  2.7 – 4.0 Gbit.s  SLAC Dell  3.0 GHz Xeon  533 MHz FSB  Gbit/s  OpenLab IA64  1.0 GHz Itanium  622 MHz FSB  Gbit/s  CPU load is 1 CPU not average

PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester GigEthernet at SC2003 BW Challenge  Three Server systems with 10 GigEthernet NICs  Used the DataTAG altAIMD stack 9000 byte MTU  Send mem-mem iperf TCP streams From SLAC/FNAL booth in Phoenix to:  Pal Alto PAIX  rtt 17 ms, window 30 MB  Shared with Caltech booth  4.37 Gbit hstcp I=5%  Then 2.87 Gbit I=16%  Fall corresponds to 10 Gbit on link  3.3Gbit Scalable I=8%  Tested 2 flows sum 1.9Gbit I=39%  Chicago Starlight  rtt 65 ms, window 60 MB  Phoenix CPU 2.2 GHz  3.1 Gbit hstcp I=1.6%  Amsterdam SARA  rtt 175 ms, window 200 MB  Phoenix CPU 2.2 GHz  4.35 Gbit hstcp I=6.9%  Very Stable  Both used Abilene to Chicago

PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester 26 Summary & Conclusions  Intel PRO/10GbE LR Adapter and driver gave stable throughput and worked well  Need large MTU (9000 or 16114) – 1500 bytes gives ~2 Gbit/s  PCI-X tuning mmrbc = 4096 bytes increase by 55% (3.2 to 5.7 Gbit/s)  PCI-X sequences clear on transmit gaps ~ 950 ns  Transfers: transmission (22 µs) takes longer than receiving (18 µs)  Tx rate 5.85 Gbit/s Rx rate 7.0 Gbit/s (Itanium) (PCI-X max 8.5Gbit/s)  CPU load considerable 60% Xenon 40% Itanium  BW of Memory system important – crosses 3 times!  Sensitive to OS/ Driver updates  More study needed

PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester 27

PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester 28 Test setup with the CERN Open Lab Itanium systems

PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester 29  1472 byte packets every 15 µs Intel Pro/1000  PCI:64 bit 33 MHz  82% usage  PCI:64 bit 66 MHz  65% usage  Data transfers half as long Send PCI Receive PCI Send setup Data Transfers Receive Transfers Send PCI Receive PCI Send setup Data Transfers Receive Transfers 1 GigE on IA32: PCI bus Activity 33 & 66 MHz