Download presentation
Presentation is loading. Please wait.
1
Challenges in ALICE and LHCb in LHC Run3
Niko Neufeld (including material kindly provided by Pierre vande Vyvre) CERN/EP
2
The Large Hadron Collider
High Throughput Computing Project Status Report 11/16/16 - Niko Neufeld
3
LHC long-term planning
ALICE & LHCb upgrade High Throughput Computing Project Status Report 11/16/16 - Niko Neufeld
4
IT infrastructure for LHCb experiment after 2019
Driven by upgrade of the experiment data acquisition and trigger to read all(!) of the 40 Million bunch-crossings in the LHC into a computer farm Requires 2 MW of computing power Will consist of two separate, connected clusters with different characteristics: predominantly I/O and predominantly CPU I/O-cluster: Requires a lot of external I/O (50 Tbit/s) which will come on about ~ optical fibres: current plan to enter as MPO12 CPU-cluster: Requires an estimated 4000 dual-socket xeon severs of the 2020 type (or equivalent CPU power)
5
LHCb Data Acquisition & Trigger 2018
Detector front-end electronics UX85B Clock & fast commands 10000 Versatile Link Clock & fast commands I/O cluster 500 Eventbuilder PCs (software LLT) TFC throttle from PCIe40 100 Gbit/s Eventbuilder network 100 Gbit/s Versatile Link: proprietary 4.8 Gbit/s link connecting the detector in the radiation, strong B-field area to the commodity hardware of the I/O cluster Eventbuilder PC: element of the I/O cluster Sub-farm: logical unit in the CPU-farm (flexible size) TFC: timing and fast control: synchronous, hard-real-time distribution mechanism which is interfaced to the I/O cluster The 6x100 Gbit/s in the picture are just indicative of a certain network topology used for modelling – any other topology providing the at least the required 40 Tbit/s is possible. Point 8 surface CPU cluster subfarm switch subfarm switch Online storage Eventfilter Farm ~ 4000 dual socket nodes
6
Network facts Total bandwidth about 40 Tbit/s full duplex
Target is about 500 nodes with 100 Gbit/s interfaces: Each node is sending and receiving round-robin to and from all other nodes (full duplex!) Candidate technologies: IB EDR, Intel OPA, 100 G Ethernet “Event-builder” PCs in I/O cluster have two interfaces one for event-building, one to feed completely assembled data to filter units (workers) Can be, but need not be, the same technologies
7
Anatomy of a server in the I/O cluster
PCIe40: custom-made 16-lane PCIe Gen3 card (FPGA based using an Altera Arria 10) – puts about 100 Gigabit/s into server memory Server: needs to be able to sustain at least 400 Gigabit/s I/O per PCIe40 card (baseline: one card per server)
8
Requirements Lots of high-speed (>= 100 Gb/s) connections short distances for economical cabling At least Gbit/s ports I/O cluster needs to house custom-made PCIe cards Consists of lower-density servers (GPGPU type) Need lot of I/O ports I/O cluster needs only moderate room for growth Full system deployed from day 1 (2019) I/O precludes high density (1U / node) CPU cluster will be upgraded and extended regularly Can be a mixture of CPU and accelerators Dense deployment (as dense as possible) Server in the CPU-cluster will be CPU-bound Dense server packing possible (0.5 U / server) Up to 2600 U needed
9
Workload aspects I/O cluster: CPU cluster:
needs efficient I/O (low CPU overhead) Ideally want to use remaining CPU cycle parasitically (similar to CPU cluster) but need to watch out for memory bandwidth and I/O constraints CPU cluster: current work-loads mostly sequential work on better threading on-going (complex) use of accelerators (GPGPUs, XeonPhis, FPGAs) under study
10
Facility aspects New DC to be constructed (2 MW)
Require about 4000 U (or equivalent) Power available (not battery backed) Location fixed (arrival of fibres from underground) Cooling available 25 C) or Outside air average high throughout the year < 26 C Average low throughout the year > -3 C Annual average T = 10 C
11
Long-distance data-transport option
The CPU cluster can be hosted elsewhere. In that case the 32 Tbit/s raw data need to be transported off-site Distance: about 10 km Amount of SMF available dictates the use of DWDM Using 25 Ghz wavelength about 1200 lambdas are needed in the maximum configuration. The solution should be compact, does not need redundancy (non-critical data), it should be scalable (starting smaller to be ramped up to max later) Traffic is essentially uni-directional should exploit this to reduce cost of optics (I think I know how to do this at layer 3 (in IP)) Would prefer a Layer 1 option (protocol agnostic), but are willing to consider Layer 2 or Layer 3, if lower cost
12
Tentative planning (subject to change)
Decision on data-centre until Q2 2017 Decision on network technology until Q1 2019, procurement during 2019 Decision on I/O server during until Q1 2019, procurement during 2019 CPU server procurement Q4 2020, followed by procurement in early 2021
13
Acceleration with FPGA in ALICE
14
High efficiency LHC up-time integrated over the year is "only" about 30% Classical re-construction step (which requires re-reading the full data to create an intermediate format used then for analysis) is being phased out in favour of "streaming" DAQ which provides data directly at the source in the "best" possible state, aligned and calibrated Tight integration of off- and online facilities needed. Idle cycles used for analysis, simulation, etc...
16
Flexible infrastructure
Ideally could seamlessly run batch-type (simulation, analysis) and (near) real-time workloads. Require easy access to disk-storage High speed network between (some) servers Housing of custom I/O cards (PCIe) Flexible amount of accelerators (FPGA, GPGPU, Xeon/Phi), shoudl also be flexibly be assigned to different work-loads and all this ideally in the same infrastructure, easily reconfigurable rack-level and data-centre oriented desgin might be the way to go?
17
Backup
18
ALICE O2 Technical Design Report
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.