CMS DAQ Event Builder Based on Gigabit Ethernet

Slides:



Advertisements
Similar presentations
VSMC MIMO: A Spectral Efficient Scheme for Cooperative Relay in Cognitive Radio Networks 1.
Advertisements

G ö khan Ü nel / CHEP Interlaken ATLAS 1 Performance of the ATLAS DAQ DataFlow system Introduction/Generalities –Presentation of the ATLAS DAQ components.
Copyright© 2000 OPNET Technologies, Inc. R.W. Dobinson, S. Haas, K. Korcyl, M.J. LeVine, J. Lokier, B. Martin, C. Meirosu, F. Saka, K. Vella Testing and.
Remigius K Mommsen Fermilab A New Event Builder for CMS Run II A New Event Builder for CMS Run II on behalf of the CMS DAQ group.
CHEP04 - Interlaken - Sep. 27th - Oct. 1st 2004T. M. Steinbeck for the Alice Collaboration1/20 New Experiences with the ALICE High Level Trigger Data Transport.
Timm M. Steinbeck - Kirchhoff Institute of Physics - University Heidelberg 1 Timm M. Steinbeck HLT Data Transport Framework.
LNL CMS G. MaronCPT Week CERN, 23 April Legnaro Event Builder Prototypes Luciano Berti, Gaetano Maron Luciano Berti, Gaetano Maron INFN – Laboratori.
K. Honscheid RT-2003 The BTeV Data Acquisition System RT-2003 May 22, 2002 Klaus Honscheid, OSU  The BTeV Challenge  The Project  Readout and Controls.
S. Cittolin CERN/CMS, 22/03/07 DAQ architecture. TDR-2003 DAQ evolution and upgrades DAQ upgrades at SLHC.
A TCP/IP transport layer for the DAQ of the CMS Experiment Miklos Kozlovszky for the CMS TriDAS collaboration CERN European Organization for Nuclear Research.
Internet Addresses. Universal Identifiers Universal Communication Service - Communication system which allows any host to communicate with any other host.
Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R.
VLVNT Amsterdam 2003 – J. Panman1 DAQ: comparison with an LHC experiment J. Panman CERN VLVNT workshop 7 Oct 2003 Use as example CMS (slides taken from.
Network Architecture for the LHCb DAQ Upgrade Guoming Liu CERN, Switzerland Upgrade DAQ Miniworkshop May 27, 2013.
The new CMS DAQ system for LHC operation after 2014 (DAQ2) CHEP2013: Computing in High Energy Physics Oct 2013 Amsterdam Andre Holzner, University.
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
1 Network Performance Optimisation and Load Balancing Wulf Thannhaeuser.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
2003 Conference for Computing in High Energy and Nuclear Physics La Jolla, California Giovanna Lehmann - CERN EP/ATD The DataFlow of the ATLAS Trigger.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
DAQ Andrea Petrucci 6 May 2008 – CMS-UCSD meeting OUTLINE Introduction SCX Setup Run Control Current Status of the Tests Summary.
LNL 1 SADIRC2000 Resoconto 2000 e Richieste LNL per il 2001 L. Berti 30% M. Biasotto 100% M. Gulmini 50% G. Maron 50% N. Toniolo 30% Le percentuali sono.
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb The CMS Event Builder Demonstrator based on Myrinet.
Multi-objective Topology Synthesis and FPGA Prototyping Framework of Application Specific Network-on-Chip m Akram Ben Ahmed Xinyu LI, Omar Hammami.
DAQ interface + implications for the electronics Niko Neufeld LHCb Electronics Upgrade June 10 th, 2010.
Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Artur BarczykRT2003, High Rate Event Building with Gigabit Ethernet Introduction Transport protocols Methods to enhance link utilisation Test.
ROM. ROM functionalities. ROM boards has to provide data format conversion. – Event fragments, from the FE electronics, enter the ROM as serial data stream;
Remigius K Mommsen Fermilab CMS Run 2 Event Building.
Grzegorz Korcyl - Jagiellonian University, Kraków Grzegorz Korcyl – PANDA TDAQ Workshop, Giessen April 2010.
ROD Activities at Dresden Andreas Glatte, Andreas Meyer, Andy Kielburg-Jeka, Arno Straessner LAr Electronics Upgrade Meeting – LAr Week September 2009.
The Evaluation Tool for the LHCb Event Builder Network Upgrade Guoming Liu, Niko Neufeld CERN, Switzerland 18 th Real-Time Conference June 13, 2012.
EPS 2007 Alexander Oh, CERN 1 The DAQ and Run Control of CMS EPS 2007, Manchester Alexander Oh, CERN, PH-CMD On behalf of the CMS-CMD Group.
CHEP 2010, October 2010, Taipei, Taiwan 1 18 th International Conference on Computing in High Energy and Nuclear Physics This research project has.
Eric Hazen1 Ethernet Readout With: E. Kearns, J. Raaf, S.X. Wu, others... Eric Hazen Boston University.
Graciela Perera Department of Computer Science and Information Systems Slide 1 of 18 INTRODUCTION NETWORKING CONCEPTS AND ADMINISTRATION CSIS 3723 Graciela.
HTCC coffee march /03/2017 Sébastien VALAT – CERN.
Giovanna Lehmann Miotto CERN EP/DT-DI On behalf of the DAQ team
Modeling event building architecture for the triggerless data acquisition system for PANDA experiment at the HESR facility at FAIR/GSI Krzysztof Korcyl.
M. Bellato INFN Padova and U. Marconi INFN Bologna
Use of FPGA for dataflow Filippo Costa ALICE O2 CERN
LHCb and InfiniBand on FPGA
High Rate Event Building with Gigabit Ethernet
Modeling event building architecture for the triggerless data acquisition system for PANDA experiment at the HESR facility at FAIR/GSI Krzysztof Korcyl.
Challenges in ALICE and LHCb in LHC Run3
Electronics Trigger and DAQ CERN meeting summary.
LHC experiments Requirements and Concepts ALICE
Enrico Gamberini, Giovanna Lehmann Miotto, Roland Sipos
TELL1 A common data acquisition board for LHCb
RT2003, Montreal Niko Neufeld, CERN-EP & Univ. de Lausanne
ISO/OSI Model and Collision Domain
CoBo - Different Boundaries & Different Options of
DAQ upgrades at SLHC S. Cittolin CERN/CMS, 22/03/07
J.M. Landgraf, M.J. LeVine, A. Ljubicic, Jr., M.W. Schulz
Introduction to Networks
PCI BASED READ-OUT RECEIVER CARD IN THE ALICE DAQ SYSTEM
The LHCb Event Building Strategy
Event Building With Smart NICs
Computer Networks.
Networks Networking has become ubiquitous (cf. WWW)
LHCb Trigger, Online and related Electronics
Chapter 4 Network Layer Computer Networking: A Top Down Approach 5th edition. Jim Kurose, Keith Ross Addison-Wesley, April Network Layer.
Network Processors for a 1 MHz Trigger-DAQ System
EEC4113 Data Communication & Multimedia System Chapter 1: Introduction by Muhazam Mustapha, July 2010.
LHCb Online Meeting November 15th, 2000
TELL1 A common data acquisition board for LHCb
Cluster Computers.
Presentation transcript:

CMS DAQ Event Builder Based on Gigabit Ethernet Presented by Marco Pieri University of California San Diego CHEP06, 13-17 February 2006, Mumbai, India

Marco Pieri - UC San Diego Outline Introduction CMS DAQ architecture Readout Builder and Gigabit Ethernet networking Results of the tests Summary CHEP06 13-17 February 2006, Mumbai Marco Pieri - UC San Diego

Marco Pieri - UC San Diego DAQ Requirements SUPERCONDUCTING COIL IRON YOKE Total weight : 12,500 t Overall diameter : 15 m Overall length : 21.6 m Magnetic field : 4 Tesla TRACKER Beam crossing rate 40 MHz Interaction rate 1 GHz Max Level 1 trigger rate 100 kHz Event size 1 MByte EVB inputs ~700 CHEP06 13-17 February 2006, Mumbai Marco Pieri - UC San Diego

Marco Pieri - UC San Diego CMS DAQ Structure Only two physical trigger stages 40 MHz Level 1 trigger Custom design, implemented in ASIC, FPGA 100 kHz 1 Tbit/s High Level Trigger (HLT) “Offline” code running on PC farm 100 Hz CHEP06 13-17 February 2006, Mumbai Marco Pieri - UC San Diego

Marco Pieri - UC San Diego EVB Architecture Data to surface Readout Builders 12.5 kHz +12.5 kHz +12.5 kHz Events are built in two stages The required throughput is 200 MBytes/sec per node CHEP06 13-17 February 2006, Mumbai Marco Pieri - UC San Diego

Marco Pieri - UC San Diego EVB Architecture (II) Baseline DAQ Configuration 512 inputs 64 8x8 FED Builders 8 64x64 RU Builder slices: 1 EVM 64 RUs 64 BUs 256 FUs Probably for the first few years only 4 slices CHEP06 13-17 February 2006, Mumbai Marco Pieri - UC San Diego

Event Builder Networking Myrinet (http://www.myri.com) Data rate 2 Gbit/sec per link, low latency Flow-control and backpressure in hardware No load on the PCs, exploit the NIC CPU Relatively easy to implement custom drivers in the NIC Gigabit Ethernet (GbE) Reliable transfer protocols widely used available (TCP/IP) Standard technology steadily improving CPUs are heavily loaded with TCP/IP FED Builder – Use Myrinet, it also provides a rather cost effective optic fiber solution for the Data to Surface data transfer Readout Builder – Myrinet or GbE Filter data network – GbE –Unpractical to have Myrinet on the FUs Note: 10Gbit Ethernet not yet considered because still too expensive CHEP06 13-17 February 2006, Mumbai Marco Pieri - UC San Diego

Marco Pieri - UC San Diego FED Builder Use Myrinet technology Link speed 2 Gbit/s for data Networking efficiency ~50% Use two links 8x8 FED Builder Throughput above 200 MByte/s at the nominal input fragment size of 2 kBytes Average output fragment size 16 kBytes Also possible to merge up to 16 inputs into 16 outputs CHEP06 13-17 February 2006, Mumbai Marco Pieri - UC San Diego

Readout Builder Options Baseline: RU, BU, FU in different PCs 64+64 PCs needed for one slice Folded: RU-BU in the same PC Only 64 PCs used for one slice Exploit full duplex features of interfaces and switches Throughput on the PCs is double (200 MB/s input Myrinet, 200+400 MB/s in+out GbE) Trapezoidal: BU-FU in the same PC CHEP06 13-17 February 2006, Mumbai Marco Pieri - UC San Diego

Readout Builder Based on Myrinet In 64 RU x 64 BU (or 32x32) configuration head of line blocking further reduces the throughput with no traffic shaping 2 links are not enough for the Readout Builder With older hardware we have reached 94% link utilization with traffic shaping (barrel-shifter) in the NIC driver Sources divide messages into fixed size packets and cycle through all destinations Sources send to destinations in such a way that in each moment they take different paths inside the switch Not yet implemented for the final two port Myrinet NICs CHEP06 13-17 February 2006, Mumbai Marco Pieri - UC San Diego

Gigabit Ethernet Readout Builder Gigabit Ethernet option Uses standard hardware, firmware and software Easy to upgrade to 10Gbit Ethernet when it will become affordable Use TCP/IP No development and maintenance of drivers Reliable protocol – no worry about packet loss that may occur when operating links at wire speed We must be able to use the GbE links with almost 100% efficiency, not a trivial task We addressed and solved all problems related to TCP/IP usage in large configurations We implemented a software component integrated inside our software framework for optimal use of TCP/IP CHEP06 13-17 February 2006, Mumbai Marco Pieri - UC San Diego

ATCP – Asynchronous TCP ATCP is fully integrated in XDAQ, the CMS DAQ software framework XDAQ, C++ platform for a distributed Data Acquisition System, implements: configuration (parametrization) and control communication over multiple network technologies concurrently high-level provision of system services (memory management, tasks, ...) The ATCP Peer transport non blocking TCP/IP communication Puts all messages to be sent in different queues and asynchronously processes them in another thread for sending and receiving it operates until the current socket is blocked and as soon as it blocks it moves to another open socket and continues No blocking observed for any tested configuration (~100% link usage) CHEP06 13-17 February 2006, Mumbai Marco Pieri - UC San Diego

Design Choices for the Use of TCP/IP Use of Jumbo frames By increasing the Maximum Transmission Unit (MTU) to ~7000 we observe an increase in performance of ~50% Need switches that support Jumbo Frames Use of multiple links (multi-rail) To achieve the design performance we need at least two GbE links Use different physical networks depending on the source and destination host RU1 BU1 RU2 BU2 RU…. BU… RU64 BU64 CHEP06 13-17 February 2006, Mumbai Marco Pieri - UC San Diego

Design Choices for the Use of TCP/IP (II) Difference between sending data and control messages Control messages that steer the RU Builder operation must be delivered with low latency and have low throughput TCP Nagle algorithm on gives sometimes large latency O(10-2) sec when few messages in the pipeline TCP Nagle algorithm off causes a throughput decrease with time when sending data Optimal performance is obtained by using different sockets for data and control messages and setting: Nagle on for data Nagle off for control messages CHEP06 13-17 February 2006, Mumbai Marco Pieri - UC San Diego

Marco Pieri - UC San Diego Pre-series System Our current test-bed is installed at the LHC interaction point P5 It is roughly equivalent to 1/16 of the full DAQ EVB system with all its functionalities Readout Builder PCs; 64 dual Xeon 2.6 GHz RU-BU PCs Myrinet +GbE interfaces 16 dual Xeon 2.6 GHz Filter nodes OS Linux 2.4 Interconnected with GbE with a FORCE10 E1200 switch The switch can also be used in the first RU builder slices CHEP06 13-17 February 2006, Mumbai Marco Pieri - UC San Diego

Baseline Readout Builder Configuration Use FED Builder Myrinet input BU connected with FUs or not BU with FUs have a throughput of 165 MByte/s at the nominal fragment size 30RUx 30(BU+0FU) 16 kBytes CHEP06 13-17 February 2006, Mumbai Marco Pieri - UC San Diego

Baseline Readout Builder Configuration (II) Configuration 30 RU x 28 BU x 16 FU (4 BUs have 4 FUs each) No limitations from the network, only from PC resources 16 kBytes CHEP06 13-17 February 2006, Mumbai Marco Pieri - UC San Diego

Folded RU Builder Configuration No Myrinet input to the RUs, no GbE output to the FUs Verified scaling and full duplex performances for the RU-BU communication up to 60 RU x 60 BU Full duplex does not introduce any penalty 16 kBytes CHEP06 13-17 February 2006, Mumbai Marco Pieri - UC San Diego

Trapezoidal RU Builder In the baseline configurations Builder Units are the most heavily loaded part, TCP sending/receiving Events are built inside the FU PCs (BU-FU) The number of connections from a RU increases from 64 to 256 The lower throughput on the switch output ports reduces the stress on the switch – possible to use oversubscribed switches CPU load on the FUs due to event building is 5-10% CHEP06 13-17 February 2006, Mumbai Marco Pieri - UC San Diego

Trapezoidal Readout Builder Test Tested almost 1/4 of the final RU Builder 16 RU x 60 BU-FU system Throughput ~240 MByte/s at the nominal fragment size To approach the full scale test in terms of number of connections we also run multiple BU-FU applications in each of the BU-FU PCs – Similar results up to 16 RU x 240 virtual BU system 16 kBytes CHEP06 13-17 February 2006, Mumbai Marco Pieri - UC San Diego

GbE RU Builder Performance Baseline Readout Builder (nRUs=nBUs) currently at ~165 MByte/sec for 16 kByte fragment size Trapezoidal Readout Builder is the most promising configuration (240 MByte/sec), it might allow to reduce the number of RUs 30% gain in throughput with current 32 bit CPUs, expect further improvement in performances from the 64 bit and dual core CPUs Will try to carry out full scale RU Builder tests for one slice before finalizing the system CHEP06 13-17 February 2006, Mumbai Marco Pieri - UC San Diego

Marco Pieri - UC San Diego Summary CMS Event Builder: Myrinet FED Builder at the required level of performance GbE Trapezoidal RU Builder satisfies the requirements for all tested configurations Myrinet RU Builder was already shown to be feasible Now finalizing the architecture of the Readout Builders: Test the largest possible Event Builder system Test new dual socket, dual core CPU based PCs Order the RU Builder hardware equipment, install and commission the system to be ready for data taking in summer 2007 CHEP06 13-17 February 2006, Mumbai Marco Pieri - UC San Diego

Marco Pieri - UC San Diego Backup Slides BACKUP SLIDES CHEP06 13-17 February 2006, Mumbai Marco Pieri - UC San Diego

FED Builder Performances CHEP06 13-17 February 2006, Mumbai Marco Pieri - UC San Diego

Myrinet Barrel-Shifter Each source has message queue per destination Sources divide messages into fixed size packets (carriers) and cycle through all destinations Messages can span more than one packet and a packet can contain data of more than one message No external synchronization (relies on Myrinet back pressure by HW flow control) Principle works for multi-stage switches CHEP06 13-17 February 2006, Mumbai Marco Pieri - UC San Diego

Myrinet Readout Builder Traffic Shaping Possible to use the links with an efficiency approaching 100% with fully synchronized barrel-shifter traffic shaping in the NIC driver Working point Myrinet 32x32 Readout Builder with BS Throughput: 234 MB/s = 94% of link Bandwidth CHEP06 13-17 February 2006, Mumbai Marco Pieri - UC San Diego

Marco Pieri - UC San Diego Pre-Series System 32 GIII FED emulators Up to 32 detector FED 64 FRLs 8 8x8 FED builders 4 FRL crates 64x64 RU builders 16 FUs Filter SF 200m fibers Data links CHEP06 13-17 February 2006, Mumbai Marco Pieri - UC San Diego