PFLDNet Workshop February 2003 R. Hughes-Jones Manchester Some Performance Measurements Gigabit Ethernet NICs & Server Quality Motherboards Richard Hughes-Jones.

Slides:



Advertisements
Similar presentations
MB - NG MB-NG Technical Meeting 03 May 02 R. Hughes-Jones Manchester 1 Task2 Traffic Generation and Measurement Definitions Pass-1.
Advertisements

DataTAG CERN Oct 2002 R. Hughes-Jones Manchester Initial Performance Measurements With DataTAG PCs Gigabit Ethernet NICs (Work in progress Oct 02)
CALICE, Mar 2007, R. Hughes-Jones Manchester 1 Protocols Working with 10 Gigabit Ethernet Richard Hughes-Jones The University of Manchester
Institute of Computer Science Foundation for Research and Technology – Hellas Greece Computer Architecture and VLSI Systems Laboratory Exploiting Spatial.
JIVE VLBI Network Meeting 15 Jan 2003 R. Hughes-Jones Manchester The EVN-NREN Project Richard Hughes-Jones The University of Manchester.
GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 1 End-2-End Network Monitoring What do we do ? What do we use it for? Richard Hughes-Jones Many people.
20th-21st June 2005 NeSC Edinburgh End-user systems: NICs, MotherBoards, Disks, TCP Stacks & Applications Richard Hughes-Jones.
ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester 1 Protocols Working with 10 Gigabit Ethernet Richard Hughes-Jones The University.
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004 R. Hughes-Jones Manchester Networking for ATLAS Remote Farms Richard Hughes-Jones The University.
Slide: 1 Richard Hughes-Jones T2UK, October 06 R. Hughes-Jones Manchester 1 Update on Remote Real-Time Computing Farms For ATLAS Trigger DAQ. Richard Hughes-Jones.
CdL was here DataTAG/WP7 Amsterdam June 2002 R. Hughes-Jones Manchester 1 EU DataGrid - Network Monitoring Richard Hughes-Jones, University of Manchester.
IEEE Real Time 2007, Fermilab, 29 April – 4 May R. Hughes-Jones Manchester 1 Using FPGAs to Generate Gigabit Ethernet Data Transfers & The Network Performance.
DataGrid WP7 Meeting CERN April 2002 R. Hughes-Jones Manchester Some Measurements on the SuperJANET 4 Production Network (UK Work in progress)
JIVE VLBI Network Meeting 28 Jan 2004 R. Hughes-Jones Manchester Brief Report on Tests Related to the e-VLBI Project Richard Hughes-Jones The University.
T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester 1 ATLAS Networking & T2UK Richard Hughes-Jones The University of Manchester then.
CALICE UCL, 20 Feb 2006, R. Hughes-Jones Manchester 1 10 Gigabit Ethernet Test Lab PCI-X Motherboards Related work & Initial tests Richard Hughes-Jones.
DataTAG Meeting CERN 7-8 May 03 R. Hughes-Jones Manchester 1 High Throughput: Progress and Current Results Lots of people helped: MB-NG team at UCL MB-NG.
PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester 1 UDP Performance and PCI-X Activity of the Intel 10 Gigabit Ethernet Adapter on: HP rx2600 Dual Itanium.
© 2006 Open Grid Forum Interactions Between Networks, Protocols & Applications HPCN-RG Richard Hughes-Jones OGF20, Manchester, May 2007,
Slide: 1 Richard Hughes-Jones CHEP2004 Interlaken Sep 04 R. Hughes-Jones Manchester 1 Bringing High-Performance Networking to HEP users Richard Hughes-Jones.
CdL was here DataTAG CERN Sep 2002 R. Hughes-Jones Manchester 1 European Topology: NRNs & Geant SuperJANET4 CERN UvA Manc SURFnet RAL.
GridPP Collaboration Meeting May 2002 R. Hughes-Jones Manchester 1 Networking in Under 30 Minutes ! Richard Hughes-Jones, University of Manchester.
July 2000 PPNCG Meeting R. Hughes-Jones Performance Measurements of LANs MANs and SuperJANET III This is PRELIMINARY uBaseline data for Grid development.
MB - NG MB-NG Meeting UCL 17 Jan 02 R. Hughes-Jones Manchester 1 Discussion of Methodology for MPLS QoS & High Performance High throughput Investigations.
Can Google Route? Building a High-Speed Switch from Commodity Hardware Guido Appenzeller, Matthew Holliman Q2/2002.
Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji  Hemal V. Shah ¥ D. K. Panda 
GGF4 Toronto Feb 2002 R. Hughes-Jones Manchester Initial Performance Measurements Gigabit Ethernet NICs 64 bit PCI Motherboards (Work in progress Mar 02)
13th-14th July 2004 University College London End-user systems: NICs, MotherBoards, TCP Stacks & Applications Richard Hughes-Jones.
Time measurement of network data transfer R. Fantechi, G. Lamanna 25/5/2011.
Sven Ubik, Petr Žejdl CESNET TNC2008, Brugges, 19 May 2008 Passive monitoring of 10 Gb/s lines with PC hardware.
Performance Tradeoffs for Static Allocation of Zero-Copy Buffers Pål Halvorsen, Espen Jorde, Karl-André Skevik, Vera Goebel, and Thomas Plagemann Institute.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
“ PC  PC Latency measurements” G.Lamanna, R.Fantechi & J.Kroon (CERN) TDAQ WG –
Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links Stefan Haas, Markus Joos CERN Wieslaw Iwanski Henryk Niewodnicznski Institute.
Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., Allyn Romanow, Cisco Systems,
Srihari Makineni & Ravi Iyer Communications Technology Lab
High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group.
Integrating New Capabilities into NetPIPE Dave Turner, Adam Oline, Xuehua Chen, and Troy Benjegerdes Scalable Computing Laboratory of Ames Laboratory This.
1 Network Performance Optimisation and Load Balancing Wulf Thannhaeuser.
ESLEA VLBI Bits&Bytes Workshop, 4-5 May 2006, R. Hughes-Jones Manchester 1 VLBI Data Transfer Tests Recent and Current Work. Richard Hughes-Jones The University.
ECE 526 – Network Processing Systems Design Computer Architecture: traditional network processing systems implementation Chapter 4: D. E. Comer.
MB - NG MB-NG Meeting Dec 2001 R. Hughes-Jones Manchester MB – NG SuperJANET4 Development Network SuperJANET4 Production Network Leeds RAL / UKERNA RAL.
ESLEA Bits&Bytes, Manchester, 7-8 Dec 2006, R. Hughes-Jones Manchester 1 Protocols DCCP and dccpmon. Richard Hughes-Jones The University of Manchester.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
Prospects for the use of remote real time computing over long distances in the ATLAS Trigger/DAQ system R. W. Dobinson (CERN), J. Hansen (NBI), K. Korcyl.
CAIDA Bandwidth Estimation Meeting San Diego June 2002 R. Hughes-Jones Manchester UDPmon and TCPstream Tools to understand Network Performance Richard.
High bit rate tests between Manchester and JIVE Looking at data rates attainable with UDP along with packet loss and reordering statistics Simon Casey,
DEVICES AND COMMUNICATION BUSES FOR DEVICES NETWORK– PARALLEL BUS DEVICE PROTOCOLS 1.
DataGrid WP7 Meeting Amsterdam Nov 01 R. Hughes-Jones Manchester 1 UDPmon Measuring Throughput with UDP  Send a burst of UDP frames spaced at regular.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
LNL 1 SADIRC2000 Resoconto 2000 e Richieste LNL per il 2001 L. Berti 30% M. Biasotto 100% M. Gulmini 50% G. Maron 50% N. Toniolo 30% Le percentuali sono.
GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 1 Lessons Learned in Grid Networking or How do we get end-2-end performance to Real Users ? Richard.
Networking update and plans (see also chapter 10 of TP) Bob Dobinson, CERN, June 2000.
Networks ∙ Services ∙ People Richard-Hughes Jones eduPERT Training Session, Porto A Hands-On Session udpmon for Network Troubleshooting 18/06/2015.
Final EU Review - 24/03/2004 DataTAG is a project funded by the European Commission under contract IST Richard Hughes-Jones The University of.
ESLEA VLBI Bits&Bytes Workshop, 31 Aug 2006, R. Hughes-Jones Manchester 1 vlbi_udp Throughput Performance and Stability. Richard Hughes-Jones The University.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
L1/HLT trigger farm Bologna setup 0 By Gianluca Peco INFN Bologna Genève,
Connect. Communicate. Collaborate 4 Gigabit Onsala - Jodrell Lightpath for e-VLBI Richard Hughes-Jones.
DataGrid WP7 Meeting Jan 2002 R. Hughes-Jones Manchester Initial Performance Measurements Gigabit Ethernet NICs 64 bit PCI Motherboards (Work in progress)
MB MPLS MPLS Technical Meeting Sep 2001 R. Hughes-Jones Manchester SuperJANET Development Network Testbed – Cisco GSR SuperJANET4 C-PoP – Cisco GSR.
CALICE TDAQ Application Network Protocols 10 Gigabit Lab
R. Hughes-Jones Manchester
CS 286 Computer Organization and Architecture
Evolution of S-LINK to PCI interfaces
Mar 2001 ATLAS T2UK Meeting R. Hughes-Jones
MB-NG Review High Performance Network Demonstration 21 April 2004
Event Building With Smart NICs
MB – NG SuperJANET4 Development Network
Cluster Computers.
Presentation transcript:

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester Some Performance Measurements Gigabit Ethernet NICs & Server Quality Motherboards Richard Hughes-Jones The University of Manchester Workshop on Protocols for Fast Long-Distance Networks Session: Close to Hardware

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester uUDP/IP packets sent between back-to-back systems Processed in a similar manner to TCP/IP Not subject to flow control & congestion avoidance algorithms Used UDPmon test program uLatency uRound trip times measured using Request-Response UDP frames uLatency as a function of frame size Slope s given by: Mem-mem copy(s) + pci + Gig Ethernet + pci + mem-mem copy(s) Intercept indicates processing times + HW latencies uHistograms of ‘singleton’ measurements uTells us about: Behavior of the IP stack The way the HW operates Interrupt coalescence The Latency Measurements Made

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester The Throughput Measurements Made (1) uUDP Throughput uSend a controlled stream of UDP frames spaced at regular intervals Zero stats OK done ●●● Get remote statistics Send statistics: No. received No. lost + loss pattern No. out-of-order CPU load & no. int 1-way delay Send data frames at regular intervals ●●● Time to send Time to receive Inter-packet time (Histogram) Signal end of test OK done n bytes Number of packets Wait time time 

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester The Throughput Measurements Made (2) uUDP Throughput uSend a controlled stream of UDP frames spaced at regular intervals uVary the frame size and the frame transmit spacing uAt the receiver record The time of first and last frames received The number packets received, the number lost, number out of order The received inter-packet spacing is histogramed The time each packet is received  provides packet loss pattern CPU load, Number of interrupts uUse the Pentium CPU cycle counter for times and delay Few lines of user code uTells us about: Behavior of the IP stack The way the HW operates Capacity and Available throughput of the LAN / MAN / WAN

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester The PCI Bus & Gigabit Ethernet Measurements uPCI Activity uLogic Analyzer with PCI Probe cards in sending PC Gigabit Ethernet Fiber Probe Card PCI Probe cards in receiving PC Gigabit Ethernet Probe CPU mem chipset NIC CPU mem NIC chipset Logic Analyser Display

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester uExamine the behaviour of different NICs uNice example of running at 33 MHz uQuick look at some new Server boards

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester SuperMicro 370DLE: Latency: SysKonnect PCI:32 bit 33 MHz Latency small 62 µs & well behaved Latency Slope µs/byte Expect: µs/byte PCI GigE0.008 PCI PCI:64 bit 66 MHz Latency small 56 µs & well behaved Latency Slope µs/byte Expect: µs/byte PCI GigE0.008 PCI n Possible extra data moves ? nMotherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset nCPU: PIII 800 MHz nRedHat 7.1 Kernel

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester SuperMicro 370DLE: Throughput: SysKonnect PCI:32 bit 33 MHz Max throughput 584Mbit/s No packet loss >18 us spacing PCI:64 bit 66 MHz Max throughput 720 Mbit/s No packet loss >17 us spacing Packet loss during BW drop nMotherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset nCPU: PIII 800 MHz nRedHat 7.1 Kernel

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester SuperMicro 370DLE: PCI: SysKonnect Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel Receive transfer Send PCI Packet on Ethernet Fibre Send setup Send transfer Receive PCI 1400 bytes sent Wait 100 us ~8 us for send or receive Stack & Application overhead ~ 10 us / node ~36 us

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester SuperMicro 370DLE: PCI: SysKonnect 1400 bytes sent Wait 20 us 1400 bytes sent Wait 10 us nMotherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset nCPU: PIII 800 MHz PCI:64 bit 66 MHz nRedHat 7.1 Kernel Frames are back-to-back Can drive at line speed Cannot go any faster ! Frames on Ethernet Fiber 20 us spacing

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester SuperMicro 370DLE: Latency: Intel Pro/1000 Latency high but well behaved Indicates Interrupt coalescence Slope us/byte Expect: PCI GigE0.008 PCI us/byte nMotherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset nCPU: PIII 800 MHz PCI:64 bit 66 MHz nRedHat 7.1 Kernel

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester SuperMicro 370DLE: Throughput: Intel Pro/1000 Max throughput 910 Mbit/s No packet loss >12 us spacing Packet loss during BW drop CPU load 65-90% spacing < 13 us nMotherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset nCPU: PIII 800 MHz PCI:64 bit 66 MHz nRedHat 7.1 Kernel

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester SuperMicro 370DLE: PCI: Intel Pro/1000 nMotherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset nCPU: PIII 800 MHz PCI:64 bit 66 MHz nRedHat 7.1 Kernel nRequest – Response nDemonstrates interrupt coalescence nNo processing directly after each transfer

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester SuperMicro 370DLE: PCI: Intel Pro/ bytes sent Wait 11 us ~4.7us on send PCI bus PCI bus ~43% occupancy ~ 3.25 us on PCI for data recv ~ 30% occupancy 1400 bytes sent Wait 11 us Action of pause packets nMotherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset nCPU: PIII 800 MHz PCI:64 bit 66 MHz nRedHat 7.1 Kernel

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester SuperMicro 370DLE: Throughput: Alteon PCI:64 bit 33 MHz Max throughput 674Mbit/s Packet loss < 10 us spacing PCI:64 bit 66 MHz Max throughput 930 Mbit/s Packet loss < 10 us spacing Packet loss during BW drop nMotherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset nCPU: PIII 800 MHz nRedHat 7.1 Kernel

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester SuperMicro 370DLE: PCI: Alteon nMotherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset nCPU: PIII 800 MHz nRedHat 7.1 Kernel Send PCI Receive PCI PCI:64 bit 33 MHz 1400 byte packets Signals nice and clean PCI:64 bit 66 MHz 1400 byte packets Spacing 16 us NIC  mem transfer pauses slows down the transfer Send PCI Receive PCI Receive transfer

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester IBM das: Throughput: Intel Pro/1000 Max throughput 930Mbit/s No packet loss > 12 us Clean behaviour Packet loss during drop nMotherboard: IBM das Chipset:: ServerWorks CNB20LE nCPU: Dual PIII 1GHz PCI:64 bit 33 MHz nRedHat 7.1 Kernel

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester IBM das: PCI: Intel Pro/ bytes sent 11 us spacing Signals clean ~9.3us on send PCI bus PCI bus ~82% occupancy ~ 5.9 us on PCI for data recv. nMotherboard: IBM das Chipset:: ServerWorks CNB20LE nCPU: Dual PIII 1GHz PCI:64 bit 33 MHz nRedHat 7.1 Kernel

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester SuperMicro P4DP6: Latency Intel Pro/1000 Some steps Slope us/byte Slope flat sections : us/byte Expect us/byte No variation with packet size FWHM 1.5 us Confirms timing reliable Motherboard: SuperMicro P4DP6 Chipset: Intel E7500 (Plumas) CPU: Dual Xeon Prestonia 2.2 GHz PCI, 64 bit, 66 MHz RedHat 7.2 Kernel

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester SuperMicro P4DP6: Throughput Intel Pro/1000 Max throughput 950Mbit/s No packet loss CPU utilisation on the receiving PC was ~ 25 % for packets > than 1000 bytes % for smaller packets Motherboard: SuperMicro P4DP6 Chipset: Intel E7500 (Plumas) CPU: Dual Xeon Prestonia 2.2 GHz PCI, 64 bit, 66 MHz RedHat 7.2 Kernel

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester SuperMicro P4DP6: PCI Intel Pro/ bytes sent Wait 12 us ~5.14us on send PCI bus PCI bus ~68% occupancy ~ 3 us on PCI for data recv CSR access inserts PCI STOPs NIC takes ~ 1 us/CSR CPU faster than the NIC ! Similar effect with the SysKonnect NIC Motherboard: SuperMicro P4DP6 Chipset: Intel E7500 (Plumas) CPU: Dual Xeon Prestonia 2.2 GHz PCI, 64 bit, 66 MHz RedHat 7.2 Kernel

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester SuperMicro P4DP8-G2: Throughput SysKonnect Max throughput 990Mbit/s New Card cf other tests % utilisation Sender ~30 % utilisation Receiver nMotherboard: SuperMicro P4DP8-G2 Chipset: Intel E7500 (Plumas) nCPU: Dual Xeon Prestonia 2.4 GHz PCI:64 bit 66 MHz nRedHat 7.3 Kernel

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester SuperMicro P4DP8-G2: Throughput Intel onboard Max throughput 995Mbit/s No packet loss 20% CPU utilisation receiver packets > 1000 bytes 30% CPU utilisation smaller packets nMotherboard: SuperMicro P4DP8-G2 Chipset: Intel E7500 (Plumas) nCPU: Dual Xeon Prestonia 2.4 GHz PCI-X:64 bit nRedHat 7.3 Kernel

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester Futures & Work in Progress uDual Gigabit Ethernet controllers uMore detailed study of PCI-X uInteraction of multiple PCI Gigabit flows uWhat happens when you have disks? u10 Gigabit Ethernet NICs u

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester uAll NICs & motherboards were stable – 1000s GBytes of transfers uAlteon could handle 930 Mbit/s on 64bit/66MHz uSysKonnect gave Mbit/s improving to Mbit/s on later m.boards uIntel gave 910 – 950 Mbit/s and Mbit/s on later m.boards uPCI and GigEthernet signals show 800 MHz CPU can drive large packets at line speed. uMore CPU power is required for receiving – loss due to IP discards Rule of thumb at least 1 GHz CPU power free for 1 Gbit uTimes for DMA transfers scale with PCI bus speed but CSR access is constant New PCI-X and on-board controllers are better uBuses: 64 bit 66MHz PCI or faster PCI-X are required for performance 32 bit 33 MHz PCI bus is REALLY busy !! 64bit 33 MHz are > 80% used Summary & Conclusions (1)

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester uThe NICs should be well designed: Use advanced PCI commands Chipset will then make efficient use of memory CSRs well designed – minimum no of accesses uThe drivers need to be well written: CSR access / Clean management of buffers / Good interrupt handling u Worry about the CPU-Memory bandwidth as well as the PCI bandwidth Data crosses the CPU bus several times uSeparate the data transfers – use m.boards with multiple PCI buses uOS must be up to it too !! Summary, Conclusions & Thanks

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester Throughput Measured for 1472 byte Packets NIC Motherboard Alteon AceNIC SysKonnect SK-9843 IntelPro1000 SuperMicro 370DLE; Chipset: ServerWorks III LE PCI 32bit 33 MHz RedHat 7.1 Kernel Mbit/s584 Mbit/s 0-0 µs SuperMicro 370DLE Chipset: ServerWorks III LE PCI 64bit 64 MHz RedHat 7.1 Kernel Mbit/s720 Mbit/s 0-0 µs 910 Mbit/s µs IBM das Chipset: CNB20LE; PCI 64bit 32 MHz RedHat 7.1 Kernel Mbit/s 0-0 µs 930 Mbit/s µs SuperMicro P4DP6 Chipset: Intel E7500; PCI 64bit 64 MHz RedHat 7.2 Kernel SMP 876 Mbit/s 0-0 µs 950 Mbit/s µs SuperMicro P4DP8-G2 Chipset: Intel E7500; PCI 64bit 64 MHz RedHat 7.2 Kernel SMP 990 Mbit/s 0-0 µs 995 Mbit/s µs

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester The SuperMicro P4DP6 Motherboard uDual Xeon Prestonia (2cpu/die) u 400 MHx Front side bus u Intel® E7500 Chipset u 6 PCI-X slots u 4 independent PCI buses u Can select: 64 bit 66 MHz PCI 100 MHz PCI-X 133 MHz PCI-X u Mbit Ethernet u Adaptec AIC-7899W dual channel SCSI u UDMA/100 bus master/EIDE channels data transfer rates of 100 MB/sec burst u P4DP8-2G dual Gigabit Ethernet

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester More Information Some URLs  UDPmon / TCPmon kit + writeup  ATLAS Investigation of the Performance of 100Mbit and Gigabit Ethernet Components Using Raw Ethernet Frames  DataGrid WP7 Networking:  Motherboard and NIC Tests:  IEPM-BW site:

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester SuperMicro P4DP6: Latency: SysKonnect nMotherboard: SuperMicro P4DP6 Chipset: Intel E7500 (Plumas) nCPU: Dual Xeon Prestonia 2.2 GHz PCI:64 bit 66 MHz nRedHat 7.3 Kernel Latency low Interrupts every packet Latency well behaved Slope us/byte Expect: PCI GigE0.008 PCI us/byte

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester SuperMicro P4DP6: Throughput: SysKonnect Max throughput 876Mbit/s Big improvement Loss not due to user  Kernel moves Loss traced to “indiscards” in the receiving IP layer CPU utilisation on the receiving PC was ~ 25 % for packets > than 1000 bytes % for smaller packets nMotherboard: SuperMicro P4DP6 Chipset: Intel E7500 (Plumas) nCPU: Dual Xeon Prestonia 2.2 GHz PCI:64 bit 66 MHz nRedHat 7.3 Kernel

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester SuperMicro P4DP6: PCI: SysKonnect 1400 bytes sent DMA transfers clean PCI STOP signals when accessing the NIC CSRs NIC takes ~ 0.7us/CSR CPU faster than the NIC ! Motherboard: SuperMicro P4DP6 Chipset: Intel E7500 (Plumas) CPU: Dual Xeon Prestonia 2.2 GHz PCI, 64 bit, 66 MHz RedHat 7.2 Kernel Send PCI Receive PCI PCI STOP

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester SuperMicro P4DP8-G2: Latency SysKonnect nMotherboard: SuperMicro P4DP8-G2 Chipset: Intel E7500 (Plumas) nCPU: Dual Xeon Prestonia 2.4 GHz PCI:64 bit 66 MHz nRedHat 7.3 Kernel Latency low Interrupt every packet Several steps Slope us/byte Expect: us/byte PCI GigE0.008 PCI Plot smooth for PC switch PC ! Slope us/byte

PFLDNet Workshop February 2003 R. Hughes-Jones Manchester Interrupt Coalescence: Throughput Intel Pro 1000 on 370DLE