R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet Domenico Galli Università di Bologna and INFN, Sezione di Bologna XII SuperB Project Workshop, Annecy-les-Vieux, 18 th March, 2010
Commodity Links More and more often used in HEP for DAQ, Event Building and High Level Trigger Systems: –Limited costs; –Maintainability; –Upgradability. Demand of data throughput in HEP is increasing following: –Physical event rate; –Number of electronic channels; –Reduction of the on-line event filter (trigger) stages. Industry has moved on since the design of the DAQ for the LHC experiments: –10 Gigabit Ethernet well established; –4x DDR Infiniband (16 Gb/s) ready; –100 Gigabit Ethernet is being actively worked on. DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet 2
Evaluation of New Commercial Link Technologies Bologna group, in its spare time, is constantly evaluating new commodity link technologies: –In the perspective of an employment in DAQ/EB/HLT. Evaluated parameters: –Maximum throughput; –Maximum datagram rate; –CPU load; –Datagram loss rate. Recently tested links: –Gigabit Ethernet (presented at IEEE RT-05); –10-Gigabit Ethernet (presented at IEEE RT-09); –Infiniband (2010). Choice of technology for the experiment must be delayed as much as possible. DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet 3
10-GbE Point-to-Point Tests We start technology evaluation from PC-to-PC tests. –NIC mounted on the PCI-E bus of commodity PCs as transmitters and receivers. In real operating condition, maximum transfer rate limited not only by the capacity of the link itself, but also: –by the capacity of the data busses (PCI and FSB/QPI); –by the ability of the CPUs and of the OS to handle packet processing and interrupt rates raised by the network interface cards in due time. DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet 4 10GBase-SR
10-GbE Network I/O “Fast network, slow host” scenario: –Already seen in transitions to 1 Gigabit Ethernet: 3 major system bottlenecks may limit the efficiency of high-performance I/O adapters: –The peripheral bus bandwidth: PCI-X (peak throughput 8.5 Gbit/s in 133 MHz flavor) substituted by the PCI-E, (20 Gbit/s peak throughput in x8 flavor). –The memory bandwidth: FSB has increased the clock from 533 MHz to 1600 MHz and then substituted by AMD Hypertransport and Intel QuickPath Interconnect. –The CPU utilization: Multi-core architectures. DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet 5
CPU Affinity Settings DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet 6
CPU Affinity Settings (II) DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet 7
UDP protocol UDP/IP protocol is the simplest IP protocol that can be implemented in a FPGA. –It does not hide the network problems at lower layers. –SCTP/IP (Stream Control Transmission Protocol) could be an alternative. –TCP/IP is too complex: Need thousands of connections (and buffers) to be kept open on the FPGA side. Too many mechanism which slow down the data flow to be tuned: –Congestion control, slow start, sliding windows, retransmission timer, Nagle’s algorithm, etc. Large protocol overhead. Retransmission timer to be tuned in order to keep the latency low. Experience in DAQ shows that a protocol stack as complete as possible is very useful to simplify debugging in commissioning phase: –Including ARP, RARP, ICMP (ping), etc. DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet 8
UDP – Standard Frames 1500 B MTU (Maximum Transfer Unit). UDP datagrams sent as fast as they can be sent. Bottleneck: sender CPU core 2 (sender process 100% system load). DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet 9 User System IRQ Soft IRQ Total ~ 4.8 Gb/s ~ 440 kHz 2 frames 3 frames 4 frames 100% (bottleneck) fake softIRQ softIRQ (4/5) IRQ (1/5) softIRQ (~50%) system (~50%)
UDP – Jumbo Frames 9000 B MTU. Sensible enhancement with respect to 1500 MTU. DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet 10 User System IRQ Soft IRQ Total ~ 9.7 Gb/s ~ 440 kHz 2 frames 3 frames 4 frames 2 frames 3 frames 4 frames 2 PCI-E frames 3 PCI-E frames 100% (bottleneck) fake softIRQ softIRQ (4/5) IRQ (1/5) softIRQ (~50%) system (~50%)
~ 10 Gb/s ~ 600 kHz 2 frames 3 frames 4 frames 2 frames 3 frames 4 frames ~3 KiB UDP – Jumbo Frames 2 Sender Processes Doubled availability of CPU cycles to the sender PC. 10GbE fully saturated. Receiver (playing against 2 senders) not yet saturated. DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet % (bottleneck) fake softIRQ softIRQ (4/5) IRQ (1/5) softIRQ (25-75%) system (75-90%) ~5 KiB no more CPU bottleneck User System IRQ Soft IRQ Total
R&D Project A R&D project (PRIN) has been funded by Italian Education and Research Ministry (MIUR): –TeraDAQ: protype demonstrator of a high-performance data acquisition system based on a PC cluster and using ultra- high speed networking standards. The project targets particle physics experiments on next-generation accelerators of very high luminosity. –INFN Bologna, Bologna University and Roma Tor Vergata University. –51,700 €. DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet 12
Electronics Evaluation kit Xilinx ML605: –Equipped with last generation Virtex-6 Xilinx FPGA; –FPGA Mezzanine Connector (FMC). Connectivity board FMC XM104: –10-GbE CX4 connector. DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet 13 Mezzanine FMC XM104 connectivity card Xilinx Mezzanine FMC XM104 connectivity card Xilinx Xilinx Virtex-6 FPGA ML605 Evaluation board PC 10 GbE connector CX4 10GBASE-CX4 (max 10 m) FMC 10 Gb/s FPGA Virtex-6 Xilinx Software VHDL UDP/IP Software VHDL UDP/IP Software core 10-GbE MAC Software core 10-GbE MAC Software core XAUI SERDES Software core XAUI SERDES
Electronics (II) FMC XM104 Connectivity Card: –designed to provide access to eight serial transceivers on the FMC HPC connector found on Xilinx FMC- supported boards including Virtex-6 ML605. DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet 14 ML605 board
Software XAUI SERDES and 10-GbE MAC: –Available as evaluation software for free. UDP/IP: –Evaluating possible solutions. DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet 15
Domenico Galli Dipartimento di Fisica, Alma Mater Studiorum - Università di Bologna and INFN, Sezione di Bologna
Test Platform DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet 17 MotherboardIBM X3650 Processor typeIntel Xeon E5335 Procesors x cores x clock (GHz)2 x 4 x 2.00 L2 cache (MiB)8 L2 speed (GHz)2.00 FSB speed (MHz)1333 ChipsetIntel 5000P RAM4 GiB NICMyricom 10G-PCIE-8A-S NIC DMA Speed (Gbit/s) ro / wo /rw10.44 / / 19.07
Settings DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet 18 net.core.rmem_max (B) net.core.wmem_max (B) net.ipv4.tcp_rmem (B)4096 / / net.ipv4.tcp_wmem (B)4096 / / net.core.netdev_max_backlog Interrupt Coalescence (μs)25 PCI-E speed (Gbit/s)2.5 PCI-E widthx8 Write Combiningenabled Interrupt TypeMSI