Presentation is loading. Please wait.

Presentation is loading. Please wait.

evoluzione modello per Run3 LHC

Similar presentations


Presentation on theme: "evoluzione modello per Run3 LHC"— Presentation transcript:

1 evoluzione modello per Run3 LHC
Workshop CCR LNGS, Maggio 2017 Calcolo ALICE: evoluzione modello per Run3 LHC Domenico Elia (INFN Bari) Domenico Elia Workshop CCR / LNGS,

2 Outline Introduction: Computing for Run3: Conclusions
ALICE computing for Run1/Run2 current data volume and prospects for Computing for Run3: ALICE upgrade, data taking target for Run3 new computing model and O2 system first raw estimate of the needed resources Conclusions Domenico Elia Workshop CCR / LNGS,

3 A Large Ion Collider Experiment
Low material budget: tracking down to 50 MeV/c Complementary PID dE/dx, ToF, Cerenkov, topological decays Introduction LHC Recorded collisions in Run1/Run2: 2.76, 5 TeV 2.76, 7, 8, 13 TeV 5, 5 TeV A Large Ion Collider Experiment at the LHC Recording bandwidth: 1.3 GB/s for PbPb Data volume: few PB/year Reconstruction: almost completely offline Domenico Elia Workshop CCR / LNGS,

4 Resource usage and INFN share
Introduction Resource usage and INFN share Average ~75K concurrent jobs in 2016 Domenico Elia Workshop CCR / LNGS,

5 Run2 data taking 2015-2016 Introduction Total 2016: ~7 PB raw No HLT
p-p HLT compression, High IR Pb-Pb End of year stop No HLT compression HLT + ROOT compression, Low IR p-A Total 2016: ~7 PB raw 80% replicated to T1s (ran out of tape) Total Run1+Run2: ~25 PB Domenico Elia Workshop CCR / LNGS,

6 Run2 data taking objectives
Introduction Run2 data taking objectives Pb-Pb collisions: reach target of 1 nb-1 integrated luminosity for rare triggers increase statistics of min bias and centrality triggered events pp collisions: collect reference rare trigger sample of 40 pb-1 (equivalent to 1 nb-1 sample in Pb-Pb) enlarge statistics of the unbiased data sample (including min bias collisions at top energy) p-Pb collisions: enlarge the existing data sample (in particular the unbias event 5.02 TeV) Domenico Elia Workshop CCR / LNGS,

7 Expected in 2017-2018 Introduction Pb-Pb 2018 ~12 PB pp 2017-2018
During pp data taking mode will be set to limit the TPC readout rate to 400 Hz: total amount of data recorded will be 17.5 PB Pb-Pb run in 2018: assuming the HLT compression of a factor of 6, total readout rate of 10 GB/s total amount of data recorded will be 12 PB Domenico Elia Workshop CCR / LNGS,

8 LHC Run3 and ALICE upgrade
ALICE Run3 LHC Run3 and ALICE upgrade PHASE I Upgrade ALICE, LHCb major upgrade ATLAS, CMS minor upgrade Heavy-Ion Luminosity from 1027 to 7 x1027 PHASE II Upgrade ATLAS, CMS major upgrade HL-LHC, pp luminosity from 1034 (peak) to 5 x1034 (levelled) Domenico Elia Workshop CCR / LNGS,

9 ALICE upgrade New conditions after LS2 (2019-2020):
ALICE Run3 ALICE upgrade New conditions after LS2 ( ): expected peak interaction rate: 50 kHz (now 8 kHz) no reliable trigger strategies for several physics channels Domenico Elia Workshop CCR / LNGS,

10 ALICE upgrade New conditions after LS2 (2019-2020):
ALICE Run3 ALICE upgrade New conditions after LS2 ( ): expected peak interaction rate: 50 kHz (now 8 kHz) no reliable trigger strategies for several physics channels Goal for Run3: increase readout rate to 50 kHz (now ~500 Hz) improve pointing resolution both in the barrel (new ITS) and in the forward muon arm (new Muon Forward Tracker, MFT) Capability of reducing online the data volume delivered by the detectors, to reach a target integrated luminosity > 10 nb-1 for Pb-Pb (x100 wrt Run1) New ITS: 7 pixel layers 10 m2 of silicon 12.5 G pixel Domenico Elia Workshop CCR / LNGS,

11 ALICE upgrade New conditions after LS2 (2019-2020):
ALICE Run3 ALICE upgrade New conditions after LS2 ( ): expected peak interaction rate: 50 kHz (now 8 kHz) no reliable trigger strategies for several physics channels Goal for Run3: 10 GB/s 90 GB/s 1.1 TB/s 3 TB/s 500 Hz 50 kHz Capability of reducing online the data volume delivered by the detectors, to reach a target integrated luminosity > 10 nb-1 for Pb-Pb (x100 wrt Run1) Domenico Elia Workshop CCR / LNGS,

12 ALICE upgrade: O2 system
ALICE Run3 ALICE upgrade: O2 system O2 Project: aiming to integrate in a single infrastructure: DAQ, HLT, Offline (for the reconstruction part) O2 TDR approved in September 2015 by the LHCC Domenico Elia Workshop CCR / LNGS,

13 ALICE upgrade: O2 system
ALICE Run3 ALICE upgrade: O2 system O2 Project: aiming to integrate in a single infrastructure: DAQ, HLT, Offline O2 TDR approved in September 2015 by the LHCC data volume coming from the detectors must be substantially reduced before sending data to the mass storage online processing is the only option computing strategy must rely on a heterogeneous architecture to match the interaction rate: ~250 FLP worker nodes equipped with FPGA ~1500 EPN worker nodes equipped with GPU yearly amount of data ( ): ~50 PB Domenico Elia Workshop CCR / LNGS,

14 ALICE upgrade: O2 system
ALICE Run3 ALICE upgrade: O2 system O2 Project: aiming to integrate in a single infrastructure: DAQ, HLT, Offline O2 TDR approved in September 2015 by the LHCC impressive online data volume reduction for the TPC: zero suppression clustering and compression removal of clusters not associated to interesting particle tracks (eg very low momentum electrons) data format optimization (largely based on the present HLT results) Domenico Elia Workshop CCR / LNGS,

15 ALICE upgrade: O2 system
ALICE Run3 ALICE upgrade: O2 system O2 Facility: 463 FPGAs Detector readout and fast cluster finder 100’000 CPU cores To compress 1.1 TB/s data stream by overall factor 14 3000 GPUs To speed up the reconstruction 3 CPU1) + 1 GPU2) = 28 CPUs 60 PB of disk To buy us an extra time and allow more precise calibration Considerable (but heterogeneous) computing capacity that will be used for Online and Offline tasks Identical s/w should work in Online and Offline environments Domenico Elia Workshop CCR / LNGS,

16 Run3 vs Run2 data volume ALICE Computing @ Run3
# of events Raw data volume x 4.2 x 88 x 30 While event statistics will increase by factor 30 in Pb-Pb (x 88 in pp), data volume will increase by factor 4.2 thanks to data reduction in O2 facility: Online tracking that allows rejection of clusters not associated with tracks Large effect in case of pileup (pp) Domenico Elia Workshop CCR / LNGS,

17 Data management No replication policy: Deletion policy:
ALICE Run3 Data management No replication policy: only one instance of each raw data file (CTF) stored on disk backup on tape (restore from tape in case of data loss) Deletion policy: with the exception of raw data (CTF) and derived analysis data (AOD), all other intermediate data from variuos processing stages are transient (removed after a given processing step) or temporary (limited lifetime) all CTF stored on disk buffers (in O2 and T1s) in the previous year will have to be removed before new data taking starts all data not finally processed during this period will remain parked on tapes until the next opportunity for the re-processing arises (LS3) Domenico Elia Workshop CCR / LNGS,

18 Complexity management
ALICE Run3 Complexity management Need to transform (logically) 100s of individual sites to 10s of clouds/regions Each cloud/region should provide reliable data management and sufficient processing capability to simplify scheduling and high level data management Domenico Elia Workshop CCR / LNGS,

19 Run3 computing model Grid Tiers mostly specialized for given role:
ALICE Run3 Run3 computing model Grid Tiers mostly specialized for given role: O2 facility (2/3 reco & calibration), T1s (1/3 reco & calibration, archiving to tape), T2 (simulation) Reconstruction Calibration Archiving CTF: Compressed Time Frame Reconstruction Calibration Simulation Analysis Domenico Elia Workshop CCR / LNGS,

20 Run3 computing model Grid Tiers mostly specialized for given role:
ALICE Run3 Run3 computing model Grid Tiers mostly specialized for given role: O2 facility (2/3 reco & calibration), T1s (1/3 reco & calibration, archiving to tape), T2 (simulation) AODs collected on few specialized Analysis Facility (AF) sites capable of processing ~5 PB data in ½ day scale Reconstruction Calibration Archiving CTF: Compressed Time Frame Reconstruction Calibration typically (a fraction of) HPC facility: ~20-30,000 cores / 5-10 PB disk storage on a very performant file system Simulation GOAL: minimize data movement and optimize processing efficiency! Analysis Domenico Elia Workshop CCR / LNGS,

21 Expected resource needs
ALICE Run3 Expected resource needs Estimates based on: no replication + deletion policies (see slide 17) an online compression factor of ~16 ~20% yearly growth during Run3 ~x 2 of the resources at the end of Run3 wrt end of Run2 pessimistic estimates based on a compression factor of ~12 ~27% yearly growth during Run3, ~x 2.5 end of Run3 wrt Run2 AF out of these evaluations: 2-3 centers, progressive deployment (no need full size at Run3 start) impact: ~10% of the total WLCG resources provided as in-kind contribution to the experiment? Domenico Elia Workshop CCR / LNGS,

22 Conclusions Data taking conditions in Run3 and O2 system:
expected interaction rate of 50 kHz continuous detector readout, aim to 10-1 nb PbPb online reco and data compression: 1.1 TB/s  GB/s New computing model: changing data and complexity management re-definition in the role of the Tiers, introducing AF Resource needs: expect to fit under the flat budget envelope in Run3 still some uncertainties: Run2 resource evolution compression efficiency (expected in the range 12-16) CPU and AOD size estimates (assumed to scale with raw data, as in Run2) availability of AF and their ability to grow Domenico Elia Workshop CCR / LNGS,


Download ppt "evoluzione modello per Run3 LHC"

Similar presentations


Ads by Google