HTCC coffee march 2017 13/03/2017 Sébastien VALAT – CERN.

Slides:



Advertisements
Similar presentations
Remigius K Mommsen Fermilab A New Event Builder for CMS Run II A New Event Builder for CMS Run II on behalf of the CMS DAQ group.
Advertisements

ACAT 2002, Moscow June 24-28thJ. Hernández. DESY-Zeuthen1 Offline Mass Data Processing using Online Computing Resources at HERA-B José Hernández DESY-Zeuthen.
CHEP04 - Interlaken - Sep. 27th - Oct. 1st 2004T. M. Steinbeck for the Alice Collaboration1/20 New Experiences with the ALICE High Level Trigger Data Transport.
CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration Timm Morten Steinbeck, Computer Science.
PCIe based readout U. Marconi, INFN Bologna CERN, May 2013.
Niko Neufeld, CERN/PH-Department
A TCP/IP transport layer for the DAQ of the CMS Experiment Miklos Kozlovszky for the CMS TriDAS collaboration CERN European Organization for Nuclear Research.
Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.
Copyright © 2000 OPNET Technologies, Inc. Title – 1 Distributed Trigger System for the LHC experiments Krzysztof Korcyl ATLAS experiment laboratory H.
Network Architecture for the LHCb DAQ Upgrade Guoming Liu CERN, Switzerland Upgrade DAQ Miniworkshop May 27, 2013.
Dynamic configuration of the CMS Data Acquisition Cluster Hannes Sakulin, CERN/PH On behalf of the CMS DAQ group Part 1: Configuring the CMS DAQ Cluster.
The new CMS DAQ system for LHC operation after 2014 (DAQ2) CHEP2013: Computing in High Energy Physics Oct 2013 Amsterdam Andre Holzner, University.
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
LHCb Upgrade Architecture Review BE DAQ Interface Rainer Schwemmer.
Latest ideas in DAQ development for LHC B. Gorini - CERN 1.
LHCb DAQ system LHCb SFC review Nov. 26 th 2004 Niko Neufeld, CERN.
Why it might be interesting to look at ARM Ben Couturier, Vijay Kartik Niko Neufeld, PH-LBC SFT Technical Group Meeting 08/10/2012.
LNL 1 SADIRC2000 Resoconto 2000 e Richieste LNL per il 2001 L. Berti 30% M. Biasotto 100% M. Gulmini 50% G. Maron 50% N. Toniolo 30% Le percentuali sono.
Niko Neufeld, CERN/PH. Online data filtering and processing (quasi-) realtime data reduction for high-rate detectors High bandwidth networking for data.
Predrag Buncic Future IT challenges for ALICE Technical Workshop November 6, 2015.
DAQ interface + implications for the electronics Niko Neufeld LHCb Electronics Upgrade June 10 th, 2010.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
1 Farm Issues L1&HLT Implementation Review Niko Neufeld, CERN-EP Tuesday, April 29 th.
Niko Neufeld HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
Niko Neufeld, CERN. Trigger-free read-out – every bunch-crossing! 40 MHz of events to be acquired, built and processed in software 40 Tbit/s aggregated.
Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Pierre VANDE VYVRE ALICE Online upgrade October 03, 2012 Offline Meeting, CERN.
Monitoring for the ALICE O 2 Project 11 February 2016.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Workshop ALICE Upgrade Overview Thorsten Kollegger for the ALICE Collaboration ALICE | Workshop |
Remigius K Mommsen Fermilab CMS Run 2 Event Building.
The Evaluation Tool for the LHCb Event Builder Network Upgrade Guoming Liu, Niko Neufeld CERN, Switzerland 18 th Real-Time Conference June 13, 2012.
EPS 2007 Alexander Oh, CERN 1 The DAQ and Run Control of CMS EPS 2007, Manchester Alexander Oh, CERN, PH-CMD On behalf of the CMS-CMD Group.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
The ALICE Data-Acquisition Read-out Receiver Card C. Soós et al. (for the ALICE collaboration) LECC September 2004, Boston.
CMS DAQ project at Fermilab
Giovanna Lehmann Miotto CERN EP/DT-DI On behalf of the DAQ team
Balazs Voneki CERN/EP/LHCb Online group
MPD Data Acquisition System: Architecture and Solutions
Use of FPGA for dataflow Filippo Costa ALICE O2 CERN
LHCb and InfiniBand on FPGA
5/3/2018 3:51 AM Memory Efficient Loss Recovery for Hardware-based Transport in Datacenter Yuanwei Lu1,2, Guo Chen2, Zhenyuan Ruan1,2, Wencong Xiao2,3,
Niko Neufeld  (quasi) real-time connectivity requirements ”CERN openlab workshop.
Workshop Concluding Remarks
Challenges in ALICE and LHCb in LHC Run3
Niko Neufeld LHCb Upgrade Online Computing Challenges CERN openlab Workshop on Data Center Technologies and Infrastructures, Mar 2017.
Electronics Trigger and DAQ CERN meeting summary.
CMS High Level Trigger Configuration Management
LHC experiments Requirements and Concepts ALICE
Enrico Gamberini, Giovanna Lehmann Miotto, Roland Sipos
Controlling a large CPU farm using industrial tools
RT2003, Montreal Niko Neufeld, CERN-EP & Univ. de Lausanne
The DZero DAQ System Sean Mattingly Gennady Briskin Michael Clements
Offline data taking and processing
CMS DAQ Event Builder Based on Gigabit Ethernet
DAQ upgrades at SLHC S. Cittolin CERN/CMS, 22/03/07
PCI BASED READ-OUT RECEIVER CARD IN THE ALICE DAQ SYSTEM
The LHCb Event Building Strategy
11/13/ :11 PM Memory Efficient Loss Recovery for Hardware-based Transport in Datacenter Yuanwei Lu1,2, Guo Chen2, Zhenyuan Ruan1,2, Wencong Xiao2,3,
VELO readout On detector electronics Off detector electronics to DAQ
Example of DAQ Trigger issues for the SoLID experiment
LHCb Trigger and Data Acquisition System Requirements and Concepts
John Harvey CERN EP/LBC July 24, 2001
Event Building With Smart NICs
LHCb Trigger, Online and related Electronics
Network Processors for a 1 MHz Trigger-DAQ System
TELL1 A common data acquisition board for LHCb
FED Design and EMU-to-DAQ Test
Cluster Computers.
Presentation transcript:

HTCC coffee march 2017 13/03/2017 Sébastien VALAT – CERN

Sébastien Valat – CERN openlab / LHCb Reminder on LHC Accelerator of 27 km 10 000 superconductive magnets Collision energy up to 14 TeV Proton-Proton collisions, but also heavy-ions 4 BIG experiments : ALICE, ATLAS, CMD, LHCb 29/01/2016 Sébastien Valat – CERN openlab / LHCb

LHCb, an upgrade for 2018-2020 Update of sub-detectors Hardware trigger DAQ High Level Trigger Filter units : L0 Trigger CERN long term storage Offline physic analysis Update of sub-detectors Removal of hardware trigger Currently in FPGA Hard to maintain and update Close to radiation area Filter farm will need to handle : Larger event rate (1 Mhz to 40 Mhz) Larger event size (50 KB to ~100 KB) Much more data for DAQ & Trigger 40 Mhz 1 Mhz 20 Khz 29/01/2016 Sébastien Valat – CERN openlab / LHCb

Sébastien Valat – CERN openlab / LHCb Why triggering We cannot stored all the collisions ! Far too much data ! Most collisions produce already well known physics We keep only interesting events for new physics Challenge for upgrade: need to trigger in software only Need to improve current software performance For some costly functions Looks on GPU Looks on possible CPU embedded FPGA for some costly functions 29/01/2016 Sébastien Valat – CERN openlab / LHCb

Sébastien Valat – CERN openlab / LHCb 40 Tb/s for LHCb in 2020 Numbers ~10 000 optical links going out from detector to the surface (~100 m) and up to ~4.8 Gb/s each. ~500 readout nodes (up to 24 input links each) Event rate at 40 MHz with ~100 KB events Lead to a total of ~40 Tb/s 48 Tb/s, full usage of fibers Need a 100 Gb/s network Considering net. load at 80% 01/02/2016 Sébastien Valat – CERN openlab / LHCb

Event building network evaluation 29/01/2016 Sébastien Valat – CERN openlab / LHCb

Event building network topology Work with 4 units: Readout unit (RU) to read data from PCI-E readout board Builder unit (BU) to merge data from Readout unit and send to Filter unit. Event manager (EM) to dispatch the work over Builder unit (credits) RU/BU mostly does a kind of “all-to-all” To aggregate the data chunks from each collision BU-1 RU-1 BU-2 RU-2 BU-500 RU-500 … Node - 1 EM 29/01/2016 Sébastien Valat – CERN openlab / LHCb

Sébastien Valat – CERN openlab / LHCb IO nodes hardware Three IO boards at 100 Gb/s per node: PCI-40 for fiber input Event building network Output to filter farm Also stress the memory (400 Gb/s of total traffic) PCI-40 PCI-40 Buffer Buffer CPU CPU CPU CPU Buffer Buffer FILTER network DAQ network DAQ network FILTER network 29/01/2016 Sébastien Valat – CERN openlab / LHCb

Sébastien Valat – CERN openlab / LHCb Memory bandwidth 18/07/2016 Sébastien Valat – CERN openlab / LHCb

The DAQ network technologies We need 100 Gb/s per node (RU/BU) Some margins, 80 Gb/s might be fine Not as HPC apps We need to fully fill the network continuously Bad pattern : many gathers (all-to-all) ! Think using a fat-tree topology Technologies we looked on: Infiniband EDR Intel Omni-path 100 Gb/s Ethernet 29/01/2016 Sébastien Valat – CERN openlab / LHCb

Interfaces to exploit them MPI Support all networks Optimized for IB/Omni-path Ease test on HPC sites How to support fault tolerance ? We need to run 24h/24h and 7d/7 Infiniband VERBS For IB only Low level Might be OK for fault tolerance Libfabric For IB, Omni-path, and TCP Node failure and recovery support to be studied. TCP/UPD (we don’t depend on latency) 29/01/2016 Sébastien Valat – CERN openlab / LHCb

Sébastien Valat – CERN openlab / LHCb DAQPIPE A benchmark to evaluate event building solutions Provides EM/RU/BU units Support various APIs MPI TCP / UDP Libfabric PSM2 Verbs Three message size on the network Command : ~64 B Meta-data : ~10 KB Data : ~1 MB Manage communication scheduling models Barrel shift ordering (with N on-fly) Random ordering (with N on-fly) Send all in one go 29/01/2016 Sébastien Valat – CERN openlab / LHCb

Sébastien Valat – CERN openlab / LHCb Results 18/07/2016 Sébastien Valat – CERN openlab / LHCb

Micro-benchmarks on Intel Omni-Path 18/07/2016 Sébastien Valat – CERN openlab / LHCb

Micro-benchmarks on IB 18/07/2016 Sébastien Valat – CERN openlab / LHCb

Performance of one-to-one Default one to one send to the neighbor node One variant is to shift Each represent a step in DAQPIPE (barrel shifting) Does it provides the same performance ? Shift = 1 Shift = 3 18/07/2016 Sébastien Valat – CERN openlab / LHCb

One-to-one with shifts Full-duplex 16 nodes Full-duplex, 46 nodes 18/07/2016 Sébastien Valat – CERN openlab / LHCb

Sébastien Valat – CERN openlab / LHCb MPI_Alltoall 18/07/2016 Sébastien Valat – CERN openlab / LHCb

Sébastien Valat – CERN openlab / LHCb DAQPIPE & LSEB 18/07/2016 Sébastien Valat – CERN openlab / LHCb

DAQPIPE timings on Omni-Path 18/07/2016 Sébastien Valat – CERN openlab / LHCb

Sébastien Valat – CERN openlab / LHCb Routing & cabling 18/07/2016 Sébastien Valat – CERN openlab / LHCb

Real & ideal topologies [01-10,12-18] Holes 18/07/2016 Sébastien Valat – CERN openlab / LHCb

Sébastien Valat – CERN openlab / LHCb Conflicts The real topology implies many conflicts Due to : routing table missing nodes in leaf switches 64 nodes imply 4096 communications 870 conflicts => 21%, 704 conflicts => 17%, 160 conflicts => 3% 18/07/2016 Sébastien Valat – CERN openlab / LHCb

Collisions over barrel steps On 64 nodes 18/07/2016 Sébastien Valat – CERN openlab / LHCb

Scaling ideal topology to 512 nodes 2157 conflicts (0,8%) 18/07/2016 Sébastien Valat – CERN openlab / LHCb

Try to generate more steps Random We randomly selected a non conflicting step (+10%) Barrel rand : Move conflicts to another step (+23%) Test for 64 nodes 18/07/2016 Sébastien Valat – CERN openlab / LHCb

Fully populating the leaf switches Barrel shifting shows no-conflict if: Ideal topology Good routing table Fully populating the leaf switches It implies with 36 port switches to run with 72 nodes, not 64 (+12%) 540 nodes, not 512 (+5%) With 48 port switches : 96 nodes, not 64 (+50%) 528 nodes, not 512 (+3%) 18/07/2016 Sébastien Valat – CERN openlab / LHCb

Cabling on director switch Is it better ? { 'OmniPth00117501fa000404 L110B': { '1': 'node027 hfi1_0', '2': 'node030 hfi1_0', '3': 'node126 hfi1_0', '5': 'node036 hfi1_0', '6': 'node031 hfi1_0', '7': 'node123 hfi1_0', '8': 'node122 hfi1_0', '9': 'node006 hfi1_0', '10': 'node033 hfi1_0', '11': 'node034 hfi1_0', '12': 'node016 hfi1_0', '13': 'node004 hfi1_0', '14': 'node002 hfi1_0', '15': 'node125 hfi1_0' }, 'OmniPth00117501fa000404 L118A': { '2': 'node116 hfi1_0', '7': 'node008 hfi1_0', '11': 'node021 hfi1_0', '15': 'node012 hfi1_0' }, 'OmniPth00117501fa000404 L118B': { '2': 'node010 hfi1_0', '3': 'node009 hfi1_0', '5': 'node121 hfi1_0', '6': 'node022 hfi1_0', '7': 'node120 hfi1_0', '8': 'node117 hfi1_0', '9': 'node020 hfi1_0', '10': 'node011 hfi1_0', '13': 'node115 hfi1_0' }, 18/07/2016 Sébastien Valat – CERN openlab / LHCb

Sébastien Valat – CERN openlab / LHCb Slurm numbering Slurm can use a topology to improve It provides the list of nodes on same switch But still, inside a switch it uses the node name to order ranks 18/07/2016 Sébastien Valat – CERN openlab / LHCb

Sébastien Valat – CERN openlab / LHCb Next step Generate node re-numbering Not based on cabling But based on routing tables 18/07/2016 Sébastien Valat – CERN openlab / LHCb

Sébastien Valat – CERN openlab / LHCb Backup 18/07/2016 Sébastien Valat – CERN openlab / LHCb

Sébastien Valat – CERN openlab / LHCb Picture of the cables https://www.euro-fusion.org/newsletter/full-computational-throttle/ 18/07/2016 Sébastien Valat – CERN openlab / LHCb

Sébastien Valat – CERN openlab / LHCb Network topology 18/07/2016 Sébastien Valat – CERN openlab / LHCb

Sébastien Valat – CERN openlab / LHCb Mapping Marconi has a 2:1 mapping 15 link up for 32 nodes We selected 16 nodes per leaf switch 18/07/2016 Sébastien Valat – CERN openlab / LHCb

Summary scalability on Marconi Day-1 ….  18/07/2016 Sébastien Valat – CERN openlab / LHCb

During the night ubenchmark 18/07/2016 Sébastien Valat – CERN openlab / LHCb

Day-2, 2 hours before leaving try random scheduling 15 min before leaving 30.37 5 min… 18/07/2016 Sébastien Valat – CERN openlab / LHCb

Sébastien Valat – CERN openlab / LHCb Final summary 18/07/2016 Sébastien Valat – CERN openlab / LHCb

Compared to DELL cluster Full duplex 18/07/2016 Sébastien Valat – CERN openlab / LHCb

Sébastien Valat – CERN openlab / LHCb IB Single-way, 46 nodes 18/07/2016 Sébastien Valat – CERN openlab / LHCb