Download presentation
Presentation is loading. Please wait.
Published byDaniela McKenzie Modified over 8 years ago
1
Domenico Elia1 ALICE computing: status and perspectives Domenico Elia, INFN Bari Workshop CCR INFN / LNS Catania, 27-30.5.2014 Workshop Commissione Calcolo e Reti LNS - Catania, 27-30 Maggio 2014
2
Domenico Elia2Workshop CCR INFN / LNS Catania, 27-30.5.2014 Status of the ALICE computing ALICE: heavy ions @ LHC A Large Ion Collider Experiment at the LHC Recorded collisions in Run1: PbPb @ 2.76 TeV pp @ 0.9, 2.76, 7, 8 TeV pPb @ 5.02 TeV
3
Domenico Elia3Workshop CCR INFN / LNS Catania, 27-30.5.2014 Status of the ALICE computing Original Computing Model Similar to other experiments for pp collisions Different model for PbPb collisions: ~12.5 MB/event raw, ~3 MB ESD+AOD ~10 kHS06 s/event online first calibration pilot reconstructions and partial data export during data taking complete data distribution and Pass1 reconstruction at Tier-0 in four months after HI data taking (during shutdown) further reconstruction passes (Pass2) at Tier-1’s
4
Domenico Elia4Workshop CCR INFN / LNS Catania, 27-30.5.2014 Status of the ALICE computing Original Computing Model Similar to other experiments for pp collisions Different model for PbPb collisions Role of the Tier’s: Tier-0 (CERN): first pass reconstruction, calibration and alignment one copy of Raw, calibration data and first-pass ESDs Tier-1’s: further reconstructions and scheduled batch analysis second collective copy of Raw, one copy of “good” data (on tape) disk replicas of derived data (ESDs and AODs) Tier-2’s: simulation and end-user analysis disk replicas of derived data (ESDs and AODs)
5
Domenico Elia5Workshop CCR INFN / LNS Catania, 27-30.5.2014 Status of the ALICE computing Main evolution of the CM More complex job structure: offline calibration passes before a reconstruction pass (on limited statistics) + comprehensive QA organized analysis in trains more complex MC simulations More access to calibration: OCDB bigger than anticipated access 30x more frequent large increase due to more Pass0, QA-Trains, Tenders Data taking Online calibration Immediate Pass1 reco CPass0/CPass1 Validation pass (10%) + QA Full reconstruction pass … Current implementation
6
Domenico Elia6Workshop CCR INFN / LNS Catania, 27-30.5.2014 Status of the ALICE computing Integrated raw data Run1 Data taking 2010: pp @ 0.9 – 7 TeV PbPb @ 2.76 TeV (MB) 2011: pp @ 2.76 – 7 TeV (MB & rare) PbPb @ 2.76 TeV (MB & rare) 2012-2013: pp @ 8 TeV (rare) pPb @ 5.02 TeV (pilot, 2012) pPb @ 5.02 TeV (MB & rare, 2013) Total data volume: -7.3 PB raw data 2 copies @ CERN (T0) + 1 replica @ 6 T1s copies on tape @ T1s (“good” data only) -16 PB derived data
7
Domenico Elia7Workshop CCR INFN / LNS Catania, 27-30.5.2014 Status of the ALICE computing ALICE Grid Over 100 sites, 50k concurrent jobs running at any time, 22 PB of disk 10 in Aisa 8 operational 2 future 2 in Africa 1 operational 1 future 53 in Europe 8 in North America 4 operational 4 future + 1 past 2 in South America 1 operational 1 future
8
Domenico Elia8Workshop CCR INFN / LNS Catania, 27-30.5.2014 Status of the ALICE computing Federation view
9
Domenico Elia9Workshop CCR INFN / LNS Catania, 27-30.5.2014 Status of the ALICE computing Grid running profile Progressive reduction of User analysis with increasing usage of “LEGO trains” in the last 2 years
10
Domenico Elia10Workshop CCR INFN / LNS Catania, 27-30.5.2014 Status of the ALICE computing Working on CPU efficiency Improved a lot in the last few years: Main actions: modifications of the OCDB structure improvement of raw processing more efficient analysis trains (LEGO framework) moving users from ESDs to AODs from ~50-60% (2011) to current average (unweighted): Tier-0/1: ~85% Tier-2: ~80%
11
Domenico Elia11Workshop CCR INFN / LNS Catania, 27-30.5.2014 Status of the ALICE computing Computing sharing in 2013 ALICE total: 50/50 (T0+T1s)/T2s ~290M Wall hours (260M in 2012) Italian contribution: 50/50 T1/T2s ~43M Wall hours (15% of the total)
12
Domenico Elia12Workshop CCR INFN / LNS Catania, 27-30.5.2014 Status of the ALICE computing Italian contribution @ Tier-1 ALICE monitoring data nicely compares with that from CNAF Pledge 2013: 18600 HS06 (~18050 job slot) Av. running jobs: 2796 (~151%, roughly) L. Morganti, S. Taneja, CdG Tier-1 25.3.2014 Pledge 2014: 20900 HS06
13
Domenico Elia13Workshop CCR INFN / LNS Catania, 27-30.5.2014 ALICE INFN local centers: 4 Tier-2’s: Bari, Catania, Padova-LNL and Torino available resources (end of 2014): other centers: Cagliari (some pledge), Bologna and Trieste All sites quite well performing: very good local support (despite the short man power) internal coordination/monitoring with monthly meetings activity on the VAF (Torino and other italian T2 sites) Status of the ALICE computing Italian contribution @ Tier-2’s BariCatania LNL- Padova TorinoCagliariTotal HS06 82641075782647805196037050 TB 812683717814703096
14
Domenico EliaWorkshop CCR INFN / LNS Catania, 27-30.5.201414 Perspectives Activity on Virtual AF Elastic expandable Virtual AF deployed in Torino: fully in production since a couple of years using up to 360 cpu core, in average ~10% of the Torino Tier-2 main ingredients: - PROOF, PoD, CERNVM, underlying Cloud infrastructure converged into mainstream PROOF development @ CERN fully documented in Dario’s PhD thesis: http://www.infn.it/thesis/index.php#8887 (see Dario’s talk tomorrow)http://www.infn.it/thesis/index.php#8887
15
Domenico EliaWorkshop CCR INFN / LNS Catania, 27-30.5.201415 Perspectives Activity on Virtual AF Elastic expandable Virtual AF deployed in Torino Some developments also in Trieste Propagating to other italian sites (tests): ongoing activity in BA, PD-LNL, TO, TS (PRIN STOA-LHC) Catania also going to join aim to develop parallel interactive Cloud-based AFs - ready to experiment with Cloud-oriented evolution of the ALICE CM main issues under development: - Virtual SE for accessing data stored on local SE (no dedicated storage) - resource accounting ERC Consolidator Grant just submitted (S. Piano, TS): - Interoperating Cloud-based Virtual Farms for ALICE
16
Domenico Elia16Workshop CCR INFN / LNS Catania, 27-30.5.2014 Perspectives Preparing for Run2 (2015-2017) Changing data taking conditions: targeting integrated luminosity of 1 nb -1 PbPb collisions 4-fold increase in instant luminosity for PbPb collisions additional detectors (TRD 60% 100%, DCAL) consolidation of TPC/TRD r/o electronics (r/o rate x2 wrt Run1) increased capacity of HLT/DAQ systems (up to 8 GB/s to T0)
17
Domenico Elia17Workshop CCR INFN / LNS Catania, 27-30.5.2014 Perspectives Preparing for Run2 (2015-2017) Changing data taking conditions (No) Evolution of the CM: minor changes, basically the same CM as for Run1 use HLT for online Raw data compression (factor 4) - already tested in Run1, implies reduction of tape storage @ Tier-0/1 resource estimate based on: same CPU power need for reconstruction 25% increase raw event size (additional dets, higher multiplicity with increased energy and pile-up) MC productions: 100% pp, pPb + 30-40% PbPb events
18
Domenico Elia18Workshop CCR INFN / LNS Catania, 27-30.5.2014 Perspectives Preparing for Run2 (2015-2017) Changing data taking conditions (No) Evolution of the CM Evolution of the software framework (AliRoot 5.x): calibration (will make use of HLT in Run2) - move one calibration iteration to online reconstruction (use HLT track seeds for offline reconstruction) - include TRD information in the track fit - requires precise ITS/TPC/TRD alignment - improve double track resolution - improve matching efficiency (handling ITS and TPC standalone tracks) simulation (start transition from G3 to G4) - improve G4 for ALICE, further develop fast and parametrized MC
19
Domenico Elia19Workshop CCR INFN / LNS Catania, 27-30.5.2014 Perspectives Preparing for Run2 (2015-2017) Changing data taking conditions (No) Evolution of the CM Evolution of the software framework (AliRoot 5.x) Additional code develpment during LS-1 (but for Run3): simulation/reconstruction for the upgrade large involvement of the INFN groups for the new ITS
20
Domenico EliaWorkshop CCR INFN / LNS Catania, 27-30.5.201420 Perspectives Towards Run3 (2019-2021) Detectors and running scenario: ALICE upgrade aiming to high statistics sample (10 nb -1 ) continuous readout TPC, upgraded ITS 50 kHz PbPb interaction rate (current rate x100) ~1.1 TB /s detector readout
21
Domenico EliaWorkshop CCR INFN / LNS Catania, 27-30.5.201421 Perspectives Towards Run3 (2019-2021) Detectors and running scenario Reconstruction strategy: data reduction by (partial) online reconstruction and compression Store only reconstruction results, discard raw data - demonstrated with TPC cluster finder running on HLT since PbPb 2011 - using data structures optimized for lossless compression - using algorithms designed to allow for subsequent offline reconstruction passes with improved calibrations implies much tighter coupling between online and offline reconstruction software
22
Domenico EliaWorkshop CCR INFN / LNS Catania, 27-30.5.201422 Perspectives Towards Run3 (2019-2021) Detectors and running scenario Reconstruction strategy: data reduction by (partial) online reconstruction and compression Store only reconstruction results, discard raw data - demonstrated with TPC cluster finder running on HLT since PbPb 2011 - using data structures optimized for lossless compression - using algorithms designed to allow for subsequent offline reconstuction passes with improved calibrations From Detector Readout to Analysis, from DAQ, HLT to Offline: together, one computing framework O 2 project
23
Domenico EliaWorkshop CCR INFN / LNS Catania, 27-30.5.201423 Perspectives Towards Run3 (2019-2021) Detectors and running scenario Reconstruction strategy Simulation strategy: migrate from G3 to G4 - expect to profit from future G4 developments - multithreaded G4, G4 on GPU … be ready to use contributed resources - supercomputers, volunteer computing resources must work on (fast) parametrized simulation - basic support exists in the current framework must make more use of embedding, event mixing …
24
Domenico EliaWorkshop CCR INFN / LNS Catania, 27-30.5.201424 Perspectives Towards Run3 (2019-2021) Detectors and running scenario Reconstruction strategy Simulation strategy O 2 project and basics of the new CM: project started on March 2013 reconstruction @ Tier-0 (online, using FPGA, GPU, MCCPU etc) MC and analysis @ Tier-1/2’s (AF on Demand) evolution of the framework AliRoot 5.x 6.x - working on AliRoot 6.x already started in collaboration with FAIR
25
Domenico EliaWorkshop CCR INFN / LNS Catania, 27-30.5.201425 Perspectives Towards Run3 (2019-2021) AliRoot 6.x AliRoot 5.x Evolution of current framework Root 5.x Improve the algorithms and procedures New modern framework Root 6.x, C++11 Optimized for I/O FPGA, GPU, MIC… Run 1 201 0 201 1 201 2 201 3 201 4 201 5 201 6 201 7 201 8 201 9 202 0 202 1 LS1Run 2LS2Run 3 202 2 202 3 2024-26 LS3Run 4 Software development timeline (Predrag Buncic)
26
Domenico EliaWorkshop CCR INFN / LNS Catania, 27-30.5.201426 Perspectives Towards Run3 (2019-2021) Detectors and running scenario Reconstruction strategy Simulation strategy O 2 project and basics of the new CM: project started on March 2013 reconstruction @ Tier-0 (online, using FPGA, GPU, MCCPU etc) MC and analysis @ Tier-1/2’s (AF on Demand) evolution of the framework AliRoot 5.x 6.x adapting ALICE distributed computing to Cloud
27
Domenico EliaWorkshop CCR INFN / LNS Catania, 27-30.5.201427 Perspectives Towards Run3 (2019-2021)
28
Domenico EliaWorkshop CCR INFN / LNS Catania, 27-30.5.201428 Perspectives Towards Run3 (2019-2021) In order to reduce complexity national or regional T1/T2 centers could transform themselves into Cloud regions. Providing IaaS and reliable data services with very good network between the sites.
29
Domenico EliaWorkshop CCR INFN / LNS Catania, 27-30.5.201429 Perspectives Towards Run3 (2019-2021) Detectors and running scenario Reconstruction strategy Simulation strategy O 2 project and basics of the new CM: project started on March 2013 reconstruction @ Tier-0 (online, using FPGA, GPU, MCCPU etc) MC and analysis @ Tier-1/2’s (AF on Demand) evolution of the framework AliRoot 5.x 6.x adapting ALICE distributed computing to Cloud Computing Upgrade TDR expected by October 2014
30
Domenico Elia30Workshop CCR INFN / LNS Catania, 27-30.5.2014 Conclusions Status of the ALICE computing: quite smooth and reasonably streamlined Grid operations fraction of organized analysis increasing very good performance of the Italian sites Perspectives: finish re-processing of raw data and MC (next 6 months) increasing activity on Virtual AF in Italy preparing for Run2: - move calibration to online - improve and speed up offline reconstruction designing the new CM for Run3: - raw reconstruction and compression online @ T0 - MC and analysis activities @ T1/T2 (external clouds)
31
Domenico Elia31Workshop CCR INFN / LNS Catania, 27-30.5.2014 Backup slides
32
Domenico Elia32Workshop CCR INFN / LNS Catania, 27-30.5.2014 Status of the ALICE computing Original Computing Model Similar to other experiments for pp collisions: ~1.3 MB/event raw, ~0.2 MB derived data (ESD+AOD) ~0.1 kHS06 s/event online first calibration quasi-online data distribution and first reconstruction at Tier-0 further reconstructions at the Tier-1’s
33
Domenico Elia33Workshop CCR INFN / LNS Catania, 27-30.5.2014 Status of the ALICE computing Grid jobs: three main classes MC simulation & reco productions: low I/O, high CPU efficiency data export after job completion managed, scheduled Analysis (LEGO) trains: optimized I/O (read once, do many tasks) streamlined code (as much as possible …) managed by train operators (from PWGs) User jobs: lowest CPU efficiency variable job duration, many failures, less good code … managed by the users
34
Domenico Elia34Workshop CCR INFN / LNS Catania, 27-30.5.2014 Status of the ALICE computing CPU and storage: 2013 sharing Extra-CERN resources:
35
Domenico Elia35Workshop CCR INFN / LNS Catania, 27-30.5.2014 DISK (TB) TAPE (TB) Status of the ALICE computing Italian contribution @ Tier-1 Low consumption of tape storage compared to requirements due to revised strategy of data preservation: -ALICE analysis not using data on tape due to high latency of the tape system (ESDs and AODs data reside on disk) -obsolete data permanently deleted and not saved to tape New requirements reflect this practice L. Morganti, S. Taneja, CdG Tier-1 25.3.2014
36
Domenico Elia36Workshop CCR INFN / LNS Catania, 27-30.5.2014 Site Availability from EGI monthly reports (Jan 2013 – Mar 2014): Italian Tier-2’s quite satisfactory, all above the average CNAF99.0% Average90.8% BA95.1% CT92.7% PD-LNL98.8% TO91.2% Status of the ALICE computing Italian contribution @ Tier-2’s
37
Domenico EliaWorkshop CCR INFN / LNS Catania, 27-30.5.201437 Perspectives Activity on Virtual AF Elastic expandable Virtual AF deployed in Torino Some developments also in Trieste: smaller static AF deployed in 2013 recently moved to a (Torino-like) VAF in a Cloud infrastructure
38
Domenico Elia38Workshop CCR INFN / LNS Catania, 27-30.5.2014 Perspectives Preparing for Run2 (2015-2017) Changing data taking conditions (No) Evolution of the CM Evolution of the software framework (AliRoot 5.x) Additional software and process improvements: start adapting ALICE distributed computing to Cloud, using of HLT farm for offline processing (additional 3% CPU resources) improving performance of the organized analysis trains speeding up and improving the efficiency of the analysis activity by active data management collaborating with other experiments to explore contributed resources (i.e. spare CPU cycles and supercomputers)
39
Domenico Elia39Workshop CCR INFN / LNS Catania, 27-30.5.2014 Perspectives Plan for the next six months Pass2/3 of 2011 pp data and associated MC: full detector calibration, 2 years of software updates Pass2 of LHC12 pp, Pass3 of 2013 pPb data: From August/September start cosmic trigger run: upgraded detector readout, Trigger, DAQ, HLT data will be reconstructed offline
40
Domenico Elia40Workshop CCR INFN / LNS Catania, 27-30.5.2014 Perspectives Resource requirements Run2 Request for 2015: scritinized and approved by CRSG in April 2014 CPU request growth compatible with “flat” budget tape and disk resources increase after 2013/14 flat profile major demand on resources towards end of 2015 (PbPb run)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.