Niko Neufeld niko.neufeld@cern.ch LHCb Upgrade Online Computing Challenges CERN openlab Workshop on Data Center Technologies and Infrastructures, Mar 2017.

Slides:



Advertisements
Similar presentations
LHCb Upgrade Overview ALICE, ATLAS, CMS & LHCb joint workshop on DAQ Château de Bossey 13 March 2013 Beat Jost / Cern.
Advertisements

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
5 th LHCb Computing Workshop, May 19 th 2015 Niko Neufeld, CERN/PH-Department
LHCb readout infrastructure NA62 TDAQ WG Meeting April 1 st, 2009 Niko Neufeld, PH/LBC.
PCIe based readout U. Marconi, INFN Bologna CERN, May 2013.
Niko Neufeld CERN PH/LBC. Detector front-end electronics Eventbuilder network Eventbuilder PCs (software LLT) Eventfilter Farm up to 4000 servers Eventfilter.
Niko Neufeld, CERN/PH-Department
A TCP/IP transport layer for the DAQ of the CMS Experiment Miklos Kozlovszky for the CMS TriDAS collaboration CERN European Organization for Nuclear Research.
Status and plans for online installation LHCb Installation Review April, 12 th 2005 Niko Neufeld for the LHCb Online team.
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
Network Architecture for the LHCb DAQ Upgrade Guoming Liu CERN, Switzerland Upgrade DAQ Miniworkshop May 27, 2013.
The new CMS DAQ system for LHC operation after 2014 (DAQ2) CHEP2013: Computing in High Energy Physics Oct 2013 Amsterdam Andre Holzner, University.
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Niko Neufeld PH/LBC. Detector front-end electronics Eventbuilder network Eventbuilder PCs (software LLT) Eventfilter Farm up to 4000 servers Eventfilter.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
LHCb Upgrade Architecture Review BE DAQ Interface Rainer Schwemmer.
Latest ideas in DAQ development for LHC B. Gorini - CERN 1.
LHCb front-end electronics and its interface to the DAQ.
LHCb DAQ system LHCb SFC review Nov. 26 th 2004 Niko Neufeld, CERN.
Why it might be interesting to look at ARM Ben Couturier, Vijay Kartik Niko Neufeld, PH-LBC SFT Technical Group Meeting 08/10/2012.
Niko Neufeld, CERN/PH. ALICE – “A Large Ion Collider Experiment” Size: 26 m long, 16 m wide, 16m high; weight: t 35 countries, 118 Institutes Material.
Niko Neufeld, CERN/PH. Online data filtering and processing (quasi-) realtime data reduction for high-rate detectors High bandwidth networking for data.
Management of the LHCb Online Network Based on SCADA System Guoming Liu * †, Niko Neufeld † * University of Ferrara, Italy † CERN, Geneva, Switzerland.
1 Farm Issues L1&HLT Implementation Review Niko Neufeld, CERN-EP Tuesday, April 29 th.
Niko Neufeld HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
Niko Neufeld, CERN. Trigger-free read-out – every bunch-crossing! 40 MHz of events to be acquired, built and processed in software 40 Tbit/s aggregated.
Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Pierre VANDE VYVRE ALICE Online upgrade October 03, 2012 Offline Meeting, CERN.
Niko Neufeld LHCC Detector Upgrade Review, June 3 rd 2014.
DAQ Overview + selected Topics Beat Jost Cern EP.
DAQ & ConfDB Configuration DB workshop CERN September 21 st, 2005 Artur Barczyk & Niko Neufeld.
Introduction to DAQ Architecture Niko Neufeld CERN / IPHE Lausanne.
ROM. ROM functionalities. ROM boards has to provide data format conversion. – Event fragments, from the FE electronics, enter the ROM as serial data stream;
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
The Evaluation Tool for the LHCb Event Builder Network Upgrade Guoming Liu, Niko Neufeld CERN, Switzerland 18 th Real-Time Conference June 13, 2012.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
The ALICE Data-Acquisition Read-out Receiver Card C. Soós et al. (for the ALICE collaboration) LECC September 2004, Boston.
IRFU The ANTARES Data Acquisition System S. Anvar, F. Druillole, H. Le Provost, F. Louis, B. Vallage (CEA) ACTAR Workshop, 2008 June 10.
WPFL General Meeting, , Nikhef A. Belias1 Shore DAQ system - report on studies A.Belias NOA-NESTOR.
HTCC coffee march /03/2017 Sébastien VALAT – CERN.
Giovanna Lehmann Miotto CERN EP/DT-DI On behalf of the DAQ team
MPD Data Acquisition System: Architecture and Solutions
M. Bellato INFN Padova and U. Marconi INFN Bologna
Use of FPGA for dataflow Filippo Costa ALICE O2 CERN
LHCb and InfiniBand on FPGA
Niko Neufeld  (quasi) real-time connectivity requirements ”CERN openlab workshop.
FPGAs for next gen DAQ and Computing systems at CERN
Workshop Concluding Remarks
Challenges in ALICE and LHCb in LHC Run3
Electronics Trigger and DAQ CERN meeting summary.
LHC experiments Requirements and Concepts ALICE
Enrico Gamberini, Giovanna Lehmann Miotto, Roland Sipos
Electronics, Trigger and DAQ for SuperB
Controlling a large CPU farm using industrial tools
ALICE – First paper.
RT2003, Montreal Niko Neufeld, CERN-EP & Univ. de Lausanne
CMS DAQ Event Builder Based on Gigabit Ethernet
DAQ upgrades at SLHC S. Cittolin CERN/CMS, 22/03/07
ALICE Computing Upgrade Predrag Buncic
The LHCb Event Building Strategy
VELO readout On detector electronics Off detector electronics to DAQ
Example of DAQ Trigger issues for the SoLID experiment
John Harvey CERN EP/LBC July 24, 2001
Event Building With Smart NICs
Characteristics of Reconfigurable Hardware
High-Performance Storage System for the LHCb Experiment
LHCb Trigger, Online and related Electronics
Network Processors for a 1 MHz Trigger-DAQ System
TELL1 A common data acquisition board for LHCb
Cluster Computers.
Presentation transcript:

Niko Neufeld niko.neufeld@cern.ch LHCb Upgrade Online Computing Challenges CERN openlab Workshop on Data Center Technologies and Infrastructures, Mar 2017 Niko Neufeld niko.neufeld@cern.ch

The Large Hadron Collider LHCb Upgrade Online Computing Challenges - Openlab Workshop 1/3/17 - Niko Neufeld

LHC long-term planning LHCb Upgrade Online Computing Challenges - Openlab Workshop 1/3/17 - Niko Neufeld

Detector front-end electronics Run3 Online System Dimensioning the system: ~10000 versatile links ~ 300 m ~500 readout nodes ~40 MHz event-building rate ~130 kB event size High bisection bandwidth in event builder network ~40 Tb/s aggregate bandwidth Use industry leading 100 Gbit/s LAN technologies Global configuration and control via ECS subsystem Global synchronization via TFC subsystem 100 PB buffer storage Detector front-end electronics Event Builder network Event Builders (PC + readout board) Event Filter Farm 1000 – 4000 nodes UX85B Point 8 surface + maybe Prevessin subfarm switch TFC x500 HLT buffer storage Clock & fast commands throttle from PCIe40s 6 x 100 Gbit/s LHCb Upgrade Online Computing Challenges - Openlab Workshop 1/3/17 - Niko Neufeld

Required components for a TByte/s scientific instrument at the LHC “Zero-suppression” on front-end in real-time, 10000 radiation hard links over 350 m, etc… (not covered here) A 100 Gbit/s acquisition card (1 slide!) “Event-builder” PCs handling 400 Gbit/s A high-throughput / high-link load network with 1000 x 100 Gbit/s ports. Candidates: OmniPath, InfiniBand, Ethernet A ~ 2 MW datacentre for about 2000 to 4000 servers (with accelerators: FPGAs? GPGPUs?) O(100) PB @ O(100) GB/s read and write for intermediate storage, ~ 4000 streams Processing capacity to process all those data and reduce them to an amount we can (afford to) store, which is about O(10) GB/s LHCb Upgrade Online Computing Challenges - Openlab Workshop 1/3/17 - Niko Neufeld

Design principles Architecture-centered, not tied to a specific technology Guiding principle is the overall cost-efficiency of the entire system, where lowest initial capital expenses are favored Open for any technological development which can help / cut cost as long as it is (mostly) COTS Focus on firmware and software LHCb Upgrade Online Computing Challenges - Openlab Workshop 1/3/17 - Niko Neufeld

Custom hardware and I/O PCI Express add-in card Altera Arria10 FPGA 100 Gbps DMA engine to event-builder memory High-density optical IO Up to 48 transceivers (Avago MiniPODs) Reuse same HW for timing distribution system Decouple FPGA from network Maximum flexibility in network technology Exploit commercial technologies PCI Express Gen3 interconnect COTS servers designed for GPU acceleration With sufficient on-FPGA / on-card memory this could be re-purposed as a very powerful OpenCL accelerator In-house built PCIe card to receive data from custom links into a server PC LHCb Upgrade Online Computing Challenges - Openlab Workshop 1/3/17 - Niko Neufeld

Online Cost optimisation: it’s all about TCO Control and Monitoring cost mostly fixed – determined by number of “devices” Main cost drivers: CPU (+ accelerators) Storage Number and length of fast interconnects Detector links are there and optical anyhow  Bring all data to single (new) data-centre LHCb Upgrade Online Computing Challenges - Openlab Workshop 1/3/17 - Niko Neufeld

NETWORK LHCb Upgrade Online Computing Challenges - Openlab Workshop 1/3/17 - Niko Neufeld

DAQ Network challenges Transport multiple Terabit/s reliably and cost-effectively 500 port full duplex, full bi-sectional bandwidth network, aiming at 80% sustained link-load @ >= 100 Gbit/s / link (industry average is normally 50% on fully bisectional network) Integrate the network closely and efficiently with compute resources (be they classical CPU, “many-core” or accelerator-assisted (FPGA)) Multiple network technologies should seamlessly co-exist in the same integrated fabric (“the right link for the right task”) LHCb Upgrade Online Computing Challenges - Openlab Workshop 1/3/17 - Niko Neufeld

Long-distance data-transport option The CPU cluster can be hosted elsewhere. In that case the 32 Tbit/s raw data need to be transported off-site Distance: about 10 km Amount of SMF available dictates the use of DWDM Using 25 Ghz wavelength about 1200 lambdas are needed in the maximum configuration. The solution should be compact, does not need redundancy (non- critical data), it should be scalable (starting smaller to be ramped up to max later) Traffic is essentially uni-directional  Could this be exploited? Would prefer a Layer 1 option (protocol agnostic), but ok to use Layer 2 or Layer 3, if lower cost In any case data transport cost / Tbit/s is significant, compression of data is attractive, if cost-efficient LHCb Upgrade Online Computing Challenges - Openlab Workshop 1/3/17 - Niko Neufeld

STORAGE LHCb Upgrade Online Computing Challenges - Openlab Workshop 1/3/17 - Niko Neufeld

Buffer-needs for Run3 Same concept but now very challenging numbers 1 MHz * 130 kB  4.5 PB / day @ a Hübner(*)-factor of 0.4 To cover a good LHC week might need > 100 PB Very benign I/O (streams, large files, single write / single read), no need for POSIX LHC bunch crossing (30 MHz) HLT1 software trigger (1 MHz) Real-time alignment and calibrations Buffer HLT2 software trigger O(100) kHz (*) Hübner-factor: availability of the accelerator: average ~ 0.3 over a year However can be up to 0.8 over good days, typically 0.4 – 0.5 during physics production runs LHCb Upgrade Online Computing Challenges - Openlab Workshop 1/3/17 - Niko Neufeld

Implementation possibilities Completely decentral in local disks of servers  most likely cheapest  not well adaptable to different node speeds / histories, limits node choice, operationally most complicated, no redundancy Fully central  maximum flexibility, easiest to operate, many implementation options (tiered storage)  expensive, single point of failure  require high level of redundancy Partially centralized  several hardware options, redundancy relatively cheap, not a single point of failure depending on granularity, covers node differences well  probably more costly than local disks, more complicated to operate > 100 PB ~ 100 GB/s Example numbers, Not binding  LHCb Upgrade Online Computing Challenges - Openlab Workshop 1/3/17 - Niko Neufeld

More storage facts Asynchronous 2nd phase of processing allows maximising utilization of compute resources  exploit non-availability of accelerator  up to a factor 3(!) at equal cost, if sufficient buffering Practically no assumptions about storage in the application: can be solid-state, spinning, could be even tape , can be shared with other users/experiments as long as performance is guaranteed (IMHO a prime example of a synergy potential between IT and experiments and industry) LHCb Upgrade Online Computing Challenges - Openlab Workshop 1/3/17 - Niko Neufeld

COMPUTE LHCb Upgrade Online Computing Challenges - Openlab Workshop 1/3/17 - Niko Neufeld

Data-processing: example track-finding in LHCb Iterative algorithm that finds straight lines in collision event data in VeloPixel sub-detector Triplets of hits with best criterion are searched (seeding) Triplets are extended to tracks if a fitting hit can be found LHCb Upgrade Online Computing Challenges - Openlab Workshop 1/3/17 - Niko Neufeld

Flexible infrastructure Ideally could seamlessly switch between or run side-by-side batch-type (simulation, analysis “offline”) and (near) real-time workloads Require easy access to disk-storage High speed network between (some) servers Housing of custom I/O cards (PCIe) Flexible amount of accelerators (FPGA, GPGPU, Xeon/Phi), should also be easily assigned to different work-loads  and all this ideally in the same infrastructure, easily reconfigurable rack-level and data-centre oriented design LHCb Upgrade Online Computing Challenges - Openlab Workshop 1/3/17 - Niko Neufeld

A building-block for Online Compute Chassis or logical group Required hardware for efficient online processing profits from locality and sharing of some resources Currently we organize entire racks into “pods” Ideal granularity could be a bit smaller: 8 – 16 servers 25/50 Gbit/s network, dual or single socket CPU Internal network over back-plane with very high-speed uplinks (200G/400G  reduce cabling Shared storage Cost effective redundancy (local drives need mirroring) Spinning drives (capacity) Deployed as NAS (cost) Flexible amount of accelerators (GPGPUS, FPGAs, etc…) Not 1:1 ratio with servers Optional: possibility to plug custom PCIe card (and link to server) (for DAQ) Network Internal & uplinks Accelerator Accelerator Custom PCIe card Storage Server Server Server Server Server Server Server LHCb Upgrade Online Computing Challenges - Openlab Workshop 1/3/17 - Niko Neufeld