Remigius K Mommsen Fermilab CMS Run 2 Event Building.

Slides:



Advertisements
Similar presentations
Computing Infrastructure
Advertisements

CSC 360- Instructor: K. Wu Overview of Operating Systems.
R2: An application-level kernel for record and replay Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, Z. Zhang, (MSR Asia, Tsinghua, MIT),
Copyright© 2000 OPNET Technologies, Inc. R.W. Dobinson, S. Haas, K. Korcyl, M.J. LeVine, J. Lokier, B. Martin, C. Meirosu, F. Saka, K. Vella Testing and.
Remigius K Mommsen Fermilab A New Event Builder for CMS Run II A New Event Builder for CMS Run II on behalf of the CMS DAQ group.
CS533 Concepts of Operating Systems Class 2 The Duality of Threads and Events.
ECE 526 – Network Processing Systems Design
LHCb readout infrastructure NA62 TDAQ WG Meeting April 1 st, 2009 Niko Neufeld, PH/LBC.
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
© 2013 Mellanox Technologies 1 NoSQL DB Benchmarking with high performance Networking solutions WBDB, Xian, July 2013.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
LNL CMS G. MaronCPT Week CERN, 23 April Legnaro Event Builder Prototypes Luciano Berti, Gaetano Maron Luciano Berti, Gaetano Maron INFN – Laboratori.
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
Artdaq Introduction artdaq is a toolkit for creating the event building and filtering portions of a DAQ. A set of ready-to-use components along with hooks.
A TCP/IP transport layer for the DAQ of the CMS Experiment Miklos Kozlovszky for the CMS TriDAS collaboration CERN European Organization for Nuclear Research.
Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.
MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October.
The NE010 iWARP Adapter Gary Montry Senior Scientist
GBT Interface Card for a Linux Computer Carson Teale 1.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R.
10GE network tests with UDP
A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.
Srihari Makineni & Ravi Iyer Communications Technology Lab
VLVNT Amsterdam 2003 – J. Panman1 DAQ: comparison with an LHC experiment J. Panman CERN VLVNT workshop 7 Oct 2003 Use as example CMS (slides taken from.
Network Architecture for the LHCb DAQ Upgrade Guoming Liu CERN, Switzerland Upgrade DAQ Miniworkshop May 27, 2013.
Dynamic configuration of the CMS Data Acquisition Cluster Hannes Sakulin, CERN/PH On behalf of the CMS DAQ group Part 1: Configuring the CMS DAQ Cluster.
Upgrade of the CMS Event Builder Andrea Petrucci - CERN (PH/CMD) on behalf of the CMS DAQ group 19 th International Conference on Computing in High Energy.
The new CMS DAQ system for LHC operation after 2014 (DAQ2) CHEP2013: Computing in High Energy Physics Oct 2013 Amsterdam Andre Holzner, University.
1 Network Performance Optimisation and Load Balancing Wulf Thannhaeuser.
DAQ2 Shift TutorialcDAQ group1 Monitoring of the DAQ2 system Remi Mommsen, FNAL.
Overview of DAQ at CERN experiments E.Radicioni, INFN MICE Daq and Controls Workshop.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
Authors: Danhua Guo 、 Guangdeng Liao 、 Laxmi N. Bhuyan 、 Bin Liu 、 Jianxun Jason Ding Conf. : The 4th ACM/IEEE Symposium on Architectures for Networking.
LNL 1 SADIRC2000 Resoconto 2000 e Richieste LNL per il 2001 L. Berti 30% M. Biasotto 100% M. Gulmini 50% G. Maron 50% N. Toniolo 30% Le percentuali sono.
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb The CMS Event Builder Demonstrator based on Myrinet.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
1 Farm Issues L1&HLT Implementation Review Niko Neufeld, CERN-EP Tuesday, April 29 th.
Niko Neufeld HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
FroNtier Stress Tests at Tier-0 Status report Luis Ramos LCG3D Workshop – September 13, 2006.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
DAQ Overview + selected Topics Beat Jost Cern EP.
Major OS Components CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
L1/HLT trigger farm Bologna setup 0 By Gianluca Peco INFN Bologna Genève,
CRISP WP18, High-speed data recording Krzysztof Wrona, European XFEL PSI, 18 March 2013.
The Evaluation Tool for the LHCb Event Builder Network Upgrade Guoming Liu, Niko Neufeld CERN, Switzerland 18 th Real-Time Conference June 13, 2012.
1© Copyright 2015 EMC Corporation. All rights reserved. NUMA(YEY) BY JACOB KUGLER.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
EPS 2007 Alexander Oh, CERN 1 The DAQ and Run Control of CMS EPS 2007, Manchester Alexander Oh, CERN, PH-CMD On behalf of the CMS-CMD Group.
The ALICE Data-Acquisition Read-out Receiver Card C. Soós et al. (for the ALICE collaboration) LECC September 2004, Boston.
Eric Hazen1 Ethernet Readout With: E. Kearns, J. Raaf, S.X. Wu, others... Eric Hazen Boston University.
HTCC coffee march /03/2017 Sébastien VALAT – CERN.
Balazs Voneki CERN/EP/LHCb Online group
MPD Data Acquisition System: Architecture and Solutions
M. Bellato INFN Padova and U. Marconi INFN Bologna
LHCb and InfiniBand on FPGA
Presented by Yoon-Soo Lee
Electronics Trigger and DAQ CERN meeting summary.
RT2003, Montreal Niko Neufeld, CERN-EP & Univ. de Lausanne
CMS DAQ Event Builder Based on Gigabit Ethernet
What is an Operating System?
PCI BASED READ-OUT RECEIVER CARD IN THE ALICE DAQ SYSTEM
Example of DAQ Trigger issues for the SoLID experiment
Event Building With Smart NICs
High-Performance Storage System for the LHCb Experiment
LHCb Trigger, Online and related Electronics
Network Processors for a 1 MHz Trigger-DAQ System
Cluster Computers.
Presentation transcript:

Remigius K Mommsen Fermilab CMS Run 2 Event Building

Remi Mommsen – CMS Run 2 Event Builder Overview Overview of CMS event builder for LHC run II Event-building protocol Measurements Overview of CMS event builder for LHC run II Event-building protocol Measurements 2 2

Remi Mommsen – CMS Run 2 Event Builder CMS Event Builder Detector front-end (custom electronics) Front-End Readout Optical Link (FEROL) Optical 10 GbE TCP/IP Data Concentrator switches Data to Surface Aggregate into 40 GbE links Up to 108 Readout Units (RUs) Combine FEROL fragments into super-fragment Event Builder switch Infiniband FDR 56 Gbps CLOS network 72 Builder Units (BUs) Event building Temporary recording to RAM disk Filter Units (FUs) Run HLT selection using files from RAM disk Detector front-end (custom electronics) Front-End Readout Optical Link (FEROL) Optical 10 GbE TCP/IP Data Concentrator switches Data to Surface Aggregate into 40 GbE links Up to 108 Readout Units (RUs) Combine FEROL fragments into super-fragment Event Builder switch Infiniband FDR 56 Gbps CLOS network 72 Builder Units (BUs) Event building Temporary recording to RAM disk Filter Units (FUs) Run HLT selection using files from RAM disk x 10 GbE 200 m 108 x 40 GbE 108 x 72 IB 56 Gbps

Remi Mommsen – CMS Run 2 Event Builder EvB Protocol 4 4 EVM RU 1RU 2 BU 1BU 2 FEROLs Fragment Assign event to BU1 Event Request Event Request Event Request Fragment Super- fragment Event Fragment

Remi Mommsen – CMS Run 2 Event Builder Achieving Performance Avoid high rate of small messages Request multiple events at the same time Pack data of multiple events into one message Avoid copying data Operate on pointers to data in receiving buffers Copy data directly into RDMA buffers of Infiniband NICs Stay in kernel space when writing data Parallelize the work Use multiple threads for data transmission and event handling Write events concurrently into multiple files Bind everything to CPU cores and memory (NUMA) Each thread bound to a core Memory structures allocated on pre-defined CPU Interrupts from NICs restricted to certain cores Tune Linux TCP stack for maximum performance Avoid high rate of small messages Request multiple events at the same time Pack data of multiple events into one message Avoid copying data Operate on pointers to data in receiving buffers Copy data directly into RDMA buffers of Infiniband NICs Stay in kernel space when writing data Parallelize the work Use multiple threads for data transmission and event handling Write events concurrently into multiple files Bind everything to CPU cores and memory (NUMA) Each thread bound to a core Memory structures allocated on pre-defined CPU Interrupts from NICs restricted to certain cores Tune Linux TCP stack for maximum performance 5 5

Remi Mommsen – CMS Run 2 Event Builder Computers Readout Unit (RU) Dell PowerEdge R620 Dual 8 core Xeon CPU E GHz 32 GB of memory Builder Unit (BU) Dell PowerEdge R720 Dual 8 core Xeon CPU E GHz GB of memory (240 GB for Ramdisk on CPU 1) Readout Unit (RU) Dell PowerEdge R620 Dual 8 core Xeon CPU E GHz 32 GB of memory Builder Unit (BU) Dell PowerEdge R720 Dual 8 core Xeon CPU E GHz GB of memory (240 GB for Ramdisk on CPU 1) 6 6

Remi Mommsen – CMS Run 2 Event Builder Data Network 40/56Gb NICs (Infiniband or Ethernet) Mellanox Technologies MT27500 Family [ConnectX-3] 10/40 GbE switches Mellanox SX1024 & SX1036 Infiniband switches Mellanox SX /56Gb NICs (Infiniband or Ethernet) Mellanox Technologies MT27500 Family [ConnectX-3] 10/40 GbE switches Mellanox SX1024 & SX1036 Infiniband switches Mellanox SX Infiniband CLOS network

Remi Mommsen – CMS Run 2 Event Builder Measurement Technique Semi-automatic scanning using python scripts (w/o run control) Generate fake data on the FEROL in free-running mode EVM gets always 1024 Bytes fragments & is not counted as RU Full event building, but events not written to disk unless noted 20 measurements every 5s done at each point after waiting >60s Measurement schemes Data-concentrator measurement Vary number of FEDs (TCP streams) sent to 1 RU Canonical setup uses 8 FEDs (TCP streams) per RU All streams use the same fragment size Scan different fragment sizes Measure various sizes of the event-building system Real FED builder setup with different number of FEDs 1-18 FEDs per RU Set fragment sizes to roughly expected size for each FED Scale fragment sizes linearly for different event sizes Semi-automatic scanning using python scripts (w/o run control) Generate fake data on the FEROL in free-running mode EVM gets always 1024 Bytes fragments & is not counted as RU Full event building, but events not written to disk unless noted 20 measurements every 5s done at each point after waiting >60s Measurement schemes Data-concentrator measurement Vary number of FEDs (TCP streams) sent to 1 RU Canonical setup uses 8 FEDs (TCP streams) per RU All streams use the same fragment size Scan different fragment sizes Measure various sizes of the event-building system Real FED builder setup with different number of FEDs 1-18 FEDs per RU Set fragment sizes to roughly expected size for each FED Scale fragment sizes linearly for different event sizes 8 8

Remi Mommsen – CMS Run 2 Event Builder Data Concentrator – 24 Streams (1 stream / FEROL) BU 1 1 kB 1 RU 2 BUs EVM RU BU 2 256B - 16kB 1 kB1 – 256 kB

Remi Mommsen – CMS Run 2 Event Builder Data Concentrator 10

Remi Mommsen – CMS Run 2 Event Builder 1 kB kB 1 kB 256B - 16kB Scalability of the Event Builder 11 8 Streams per RU (1 or 2 streams / FEROL) 1 – 101 RUs 1 – 72 BUs EVM 3 kB – 8 MB

Remi Mommsen – CMS Run 2 Event Builder Scalability of the Event Builder 12

Remi Mommsen – CMS Run 2 Event Builder Effect of NUMA Settings ‘/usr/bin/numactl —physcpubind=10,12,14,26,28,30 —membind=1’ used to start executives Threads and memory structures are bound to cores/memory using XDAQ policies ‘/usr/bin/numactl —physcpubind=10,12,14,26,28,30 —membind=1’ used to start executives Threads and memory structures are bound to cores/memory using XDAQ policies 13

Remi Mommsen – CMS Run 2 Event Builder Custom IB Routing Optimize the routing to prefer traffic from RU to BU See next talk from Andre about IB routing Optimize the routing to prefer traffic from RU to BU See next talk from Andre about IB routing 14

Remi Mommsen – CMS Run 2 Event Builder 1 kB kB 1 kB 256B - 16kB BU Scalability 15 1 – 101 RUs 3 BUs EVM 8 Streams per RU (1 or 2 streams / FEROL)

Remi Mommsen – CMS Run 2 Event Builder BU Scaling vs Number of RUs 16

Remi Mommsen – CMS Run 2 Event Builder BU Performance 17

Remi Mommsen – CMS Run 2 Event Builder 1 kB kB 1 kB 4B – 12kB Production FED Builders or 82 RUs 72 BUs EVM 100kB – 5MB 2-18 Streams per RU (1 or 2 streams / FEROL)

Remi Mommsen – CMS Run 2 Event Builder Standard pp FED Builder 19

Remi Mommsen – CMS Run 2 Event Builder Prod. FED Builder vs Canonical 20

Remi Mommsen – CMS Run 2 Event Builder Summary CMS has a complete new event-builing system for LHC run 2 State-of-the-art technology Order of magnitude smaller than run-1 DAQ system Optimal use of high-end hardware New event-building protocol New software to exploit hardware capabilities A lot of fine-tuning to get full performance Infiniband performance sensible to traffic pattern Requires custom routing Performance diminishes the more uneven the system becomes CMS has a complete new event-builing system for LHC run 2 State-of-the-art technology Order of magnitude smaller than run-1 DAQ system Optimal use of high-end hardware New event-building protocol New software to exploit hardware capabilities A lot of fine-tuning to get full performance Infiniband performance sensible to traffic pattern Requires custom routing Performance diminishes the more uneven the system becomes 21

Remi Mommsen – CMS Run 2 Event Builder

1 kB 256B - 16kB RU Scalability 23 8 FEROLs per RU 1 – 72 BUs EVM 1 RU RU 1 – 256 kB

Remi Mommsen – CMS Run 2 Event Builder RU Scaling vs Number of BUs 24

Remi Mommsen – CMS Run 2 Event Builder Checksums RUs and/or BUs can verify the CRC16 calculate by the FED on the payload BUs calculate a CRC32c on the complete event which is rechecked by the HLT RUs and/or BUs can verify the CRC16 calculate by the FED on the payload BUs calculate a CRC32c on the complete event which is rechecked by the HLT 25

Remi Mommsen – CMS Run 2 Event Builder Number of Builder Threads 26