Presentation is loading. Please wait.

Presentation is loading. Please wait.

Domenico Elia1 ALICE computing: status and perspectives Domenico Elia, INFN Bari Workshop CCR INFN / LNS Catania, 27-30.5.2014 Workshop Commissione Calcolo.

Similar presentations


Presentation on theme: "Domenico Elia1 ALICE computing: status and perspectives Domenico Elia, INFN Bari Workshop CCR INFN / LNS Catania, 27-30.5.2014 Workshop Commissione Calcolo."— Presentation transcript:

1 Domenico Elia1 ALICE computing: status and perspectives Domenico Elia, INFN Bari Workshop CCR INFN / LNS Catania, 27-30.5.2014 Workshop Commissione Calcolo e Reti LNS - Catania, 27-30 Maggio 2014

2 Domenico Elia2Workshop CCR INFN / LNS Catania, 27-30.5.2014 Status of the ALICE computing ALICE: heavy ions @ LHC A Large Ion Collider Experiment at the LHC Recorded collisions in Run1: PbPb @ 2.76 TeV pp @ 0.9, 2.76, 7, 8 TeV pPb @ 5.02 TeV

3 Domenico Elia3Workshop CCR INFN / LNS Catania, 27-30.5.2014 Status of the ALICE computing Original Computing Model  Similar to other experiments for pp collisions  Different model for PbPb collisions:  ~12.5 MB/event raw, ~3 MB ESD+AOD  ~10 kHS06 s/event  online first calibration  pilot reconstructions and partial data export during data taking  complete data distribution and Pass1 reconstruction at Tier-0 in four months after HI data taking (during shutdown)  further reconstruction passes (Pass2) at Tier-1’s

4 Domenico Elia4Workshop CCR INFN / LNS Catania, 27-30.5.2014 Status of the ALICE computing Original Computing Model  Similar to other experiments for pp collisions  Different model for PbPb collisions  Role of the Tier’s:  Tier-0 (CERN): first pass reconstruction, calibration and alignment one copy of Raw, calibration data and first-pass ESDs  Tier-1’s: further reconstructions and scheduled batch analysis second collective copy of Raw, one copy of “good” data (on tape) disk replicas of derived data (ESDs and AODs)  Tier-2’s: simulation and end-user analysis disk replicas of derived data (ESDs and AODs)

5 Domenico Elia5Workshop CCR INFN / LNS Catania, 27-30.5.2014 Status of the ALICE computing Main evolution of the CM  More complex job structure:  offline calibration passes before a reconstruction pass (on limited statistics) + comprehensive QA  organized analysis in trains  more complex MC simulations  More access to calibration:  OCDB bigger than anticipated  access 30x more frequent  large increase due to more Pass0, QA-Trains, Tenders Data taking Online calibration Immediate Pass1 reco CPass0/CPass1 Validation pass (10%) + QA Full reconstruction pass … Current implementation

6 Domenico Elia6Workshop CCR INFN / LNS Catania, 27-30.5.2014 Status of the ALICE computing Integrated raw data Run1  Data taking 2010:  pp @ 0.9 – 7 TeV  PbPb @ 2.76 TeV (MB)  2011:  pp @ 2.76 – 7 TeV (MB & rare)  PbPb @ 2.76 TeV (MB & rare)  2012-2013:  pp @ 8 TeV (rare)  pPb @ 5.02 TeV (pilot, 2012)  pPb @ 5.02 TeV (MB & rare, 2013) Total data volume: -7.3 PB raw data 2 copies @ CERN (T0) + 1 replica @ 6 T1s copies on tape @ T1s (“good” data only) -16 PB derived data

7 Domenico Elia7Workshop CCR INFN / LNS Catania, 27-30.5.2014 Status of the ALICE computing ALICE Grid Over 100 sites, 50k concurrent jobs running at any time, 22 PB of disk 10 in Aisa 8 operational 2 future 2 in Africa 1 operational 1 future 53 in Europe 8 in North America 4 operational 4 future + 1 past 2 in South America 1 operational 1 future

8 Domenico Elia8Workshop CCR INFN / LNS Catania, 27-30.5.2014 Status of the ALICE computing Federation view

9 Domenico Elia9Workshop CCR INFN / LNS Catania, 27-30.5.2014 Status of the ALICE computing Grid running profile Progressive reduction of User analysis with increasing usage of “LEGO trains” in the last 2 years

10 Domenico Elia10Workshop CCR INFN / LNS Catania, 27-30.5.2014 Status of the ALICE computing Working on CPU efficiency  Improved a lot in the last few years:  Main actions:  modifications of the OCDB structure  improvement of raw processing  more efficient analysis trains (LEGO framework)  moving users from ESDs to AODs from ~50-60% (2011) to current average (unweighted): Tier-0/1: ~85% Tier-2: ~80%

11 Domenico Elia11Workshop CCR INFN / LNS Catania, 27-30.5.2014 Status of the ALICE computing Computing sharing in 2013 ALICE total: 50/50 (T0+T1s)/T2s ~290M Wall hours (260M in 2012) Italian contribution: 50/50 T1/T2s ~43M Wall hours (15% of the total)

12 Domenico Elia12Workshop CCR INFN / LNS Catania, 27-30.5.2014 Status of the ALICE computing Italian contribution @ Tier-1 ALICE monitoring data nicely compares with that from CNAF Pledge 2013: 18600 HS06 (~18050 job slot) Av. running jobs: 2796 (~151%, roughly) L. Morganti, S. Taneja, CdG Tier-1 25.3.2014 Pledge 2014: 20900 HS06

13 Domenico Elia13Workshop CCR INFN / LNS Catania, 27-30.5.2014  ALICE INFN local centers:  4 Tier-2’s: Bari, Catania, Padova-LNL and Torino  available resources (end of 2014):  other centers: Cagliari (some pledge), Bologna and Trieste  All sites quite well performing:  very good local support (despite the short man power)  internal coordination/monitoring with monthly meetings  activity on the VAF (Torino and other italian T2 sites) Status of the ALICE computing Italian contribution @ Tier-2’s BariCatania LNL- Padova TorinoCagliariTotal HS06 82641075782647805196037050 TB 812683717814703096

14 Domenico EliaWorkshop CCR INFN / LNS Catania, 27-30.5.201414 Perspectives Activity on Virtual AF  Elastic expandable Virtual AF deployed in Torino:  fully in production since a couple of years  using up to 360 cpu core, in average ~10% of the Torino Tier-2  main ingredients: - PROOF, PoD, CERNVM, underlying Cloud infrastructure  converged into mainstream PROOF development @ CERN  fully documented in Dario’s PhD thesis: http://www.infn.it/thesis/index.php#8887 (see Dario’s talk tomorrow)http://www.infn.it/thesis/index.php#8887

15 Domenico EliaWorkshop CCR INFN / LNS Catania, 27-30.5.201415 Perspectives Activity on Virtual AF  Elastic expandable Virtual AF deployed in Torino  Some developments also in Trieste  Propagating to other italian sites (tests):  ongoing activity in BA, PD-LNL, TO, TS (PRIN STOA-LHC)  Catania also going to join  aim to develop parallel interactive Cloud-based AFs - ready to experiment with Cloud-oriented evolution of the ALICE CM  main issues under development: - Virtual SE for accessing data stored on local SE (no dedicated storage) - resource accounting  ERC Consolidator Grant just submitted (S. Piano, TS): - Interoperating Cloud-based Virtual Farms for ALICE

16 Domenico Elia16Workshop CCR INFN / LNS Catania, 27-30.5.2014 Perspectives Preparing for Run2 (2015-2017)  Changing data taking conditions:  targeting integrated luminosity of 1 nb -1 PbPb collisions  4-fold increase in instant luminosity for PbPb collisions  additional detectors (TRD 60%  100%, DCAL)  consolidation of TPC/TRD r/o electronics (r/o rate x2 wrt Run1)  increased capacity of HLT/DAQ systems (up to 8 GB/s to T0)

17 Domenico Elia17Workshop CCR INFN / LNS Catania, 27-30.5.2014 Perspectives Preparing for Run2 (2015-2017)  Changing data taking conditions  (No) Evolution of the CM:  minor changes, basically the same CM as for Run1  use HLT for online Raw data compression (factor 4) - already tested in Run1, implies reduction of tape storage @ Tier-0/1  resource estimate based on: same CPU power need for reconstruction 25% increase raw event size (additional dets, higher multiplicity with increased energy and pile-up)  MC productions: 100% pp, pPb + 30-40% PbPb events

18 Domenico Elia18Workshop CCR INFN / LNS Catania, 27-30.5.2014 Perspectives Preparing for Run2 (2015-2017)  Changing data taking conditions  (No) Evolution of the CM  Evolution of the software framework (AliRoot 5.x):  calibration (will make use of HLT in Run2) - move one calibration iteration to online  reconstruction (use HLT track seeds for offline reconstruction) - include TRD information in the track fit - requires precise ITS/TPC/TRD alignment - improve double track resolution - improve matching efficiency (handling ITS and TPC standalone tracks)  simulation (start transition from G3 to G4) - improve G4 for ALICE, further develop fast and parametrized MC

19 Domenico Elia19Workshop CCR INFN / LNS Catania, 27-30.5.2014 Perspectives Preparing for Run2 (2015-2017)  Changing data taking conditions  (No) Evolution of the CM  Evolution of the software framework (AliRoot 5.x)  Additional code develpment during LS-1 (but for Run3):  simulation/reconstruction for the upgrade  large involvement of the INFN groups for the new ITS

20 Domenico EliaWorkshop CCR INFN / LNS Catania, 27-30.5.201420 Perspectives Towards Run3 (2019-2021)  Detectors and running scenario:  ALICE upgrade aiming to high statistics sample (10 nb -1 )  continuous readout TPC, upgraded ITS  50 kHz PbPb interaction rate (current rate x100)  ~1.1 TB /s detector readout

21 Domenico EliaWorkshop CCR INFN / LNS Catania, 27-30.5.201421 Perspectives Towards Run3 (2019-2021)  Detectors and running scenario  Reconstruction strategy:  data reduction by (partial) online reconstruction and compression Store only reconstruction results, discard raw data - demonstrated with TPC cluster finder running on HLT since PbPb 2011 - using data structures optimized for lossless compression - using algorithms designed to allow for subsequent offline reconstruction passes with improved calibrations  implies much tighter coupling between online and offline reconstruction software

22 Domenico EliaWorkshop CCR INFN / LNS Catania, 27-30.5.201422 Perspectives Towards Run3 (2019-2021)  Detectors and running scenario  Reconstruction strategy:  data reduction by (partial) online reconstruction and compression Store only reconstruction results, discard raw data - demonstrated with TPC cluster finder running on HLT since PbPb 2011 - using data structures optimized for lossless compression - using algorithms designed to allow for subsequent offline reconstuction passes with improved calibrations From Detector Readout to Analysis, from DAQ, HLT to Offline: together, one computing framework  O 2 project

23 Domenico EliaWorkshop CCR INFN / LNS Catania, 27-30.5.201423 Perspectives Towards Run3 (2019-2021)  Detectors and running scenario  Reconstruction strategy  Simulation strategy:  migrate from G3 to G4 - expect to profit from future G4 developments - multithreaded G4, G4 on GPU …  be ready to use contributed resources - supercomputers, volunteer computing resources  must work on (fast) parametrized simulation - basic support exists in the current framework  must make more use of embedding, event mixing …

24 Domenico EliaWorkshop CCR INFN / LNS Catania, 27-30.5.201424 Perspectives Towards Run3 (2019-2021)  Detectors and running scenario  Reconstruction strategy  Simulation strategy  O 2 project and basics of the new CM:  project started on March 2013  reconstruction @ Tier-0 (online, using FPGA, GPU, MCCPU etc)  MC and analysis @ Tier-1/2’s (AF on Demand)  evolution of the framework AliRoot 5.x  6.x - working on AliRoot 6.x already started in collaboration with FAIR

25 Domenico EliaWorkshop CCR INFN / LNS Catania, 27-30.5.201425 Perspectives Towards Run3 (2019-2021) AliRoot 6.x AliRoot 5.x Evolution of current framework Root 5.x Improve the algorithms and procedures New modern framework Root 6.x, C++11 Optimized for I/O FPGA, GPU, MIC… Run 1 201 0 201 1 201 2 201 3 201 4 201 5 201 6 201 7 201 8 201 9 202 0 202 1 LS1Run 2LS2Run 3 202 2 202 3 2024-26 LS3Run 4 Software development timeline (Predrag Buncic)

26 Domenico EliaWorkshop CCR INFN / LNS Catania, 27-30.5.201426 Perspectives Towards Run3 (2019-2021)  Detectors and running scenario  Reconstruction strategy  Simulation strategy  O 2 project and basics of the new CM:  project started on March 2013  reconstruction @ Tier-0 (online, using FPGA, GPU, MCCPU etc)  MC and analysis @ Tier-1/2’s (AF on Demand)  evolution of the framework AliRoot 5.x  6.x  adapting ALICE distributed computing to Cloud

27 Domenico EliaWorkshop CCR INFN / LNS Catania, 27-30.5.201427 Perspectives Towards Run3 (2019-2021)

28 Domenico EliaWorkshop CCR INFN / LNS Catania, 27-30.5.201428 Perspectives Towards Run3 (2019-2021) In order to reduce complexity national or regional T1/T2 centers could transform themselves into Cloud regions. Providing IaaS and reliable data services with very good network between the sites.

29 Domenico EliaWorkshop CCR INFN / LNS Catania, 27-30.5.201429 Perspectives Towards Run3 (2019-2021)  Detectors and running scenario  Reconstruction strategy  Simulation strategy  O 2 project and basics of the new CM:  project started on March 2013  reconstruction @ Tier-0 (online, using FPGA, GPU, MCCPU etc)  MC and analysis @ Tier-1/2’s (AF on Demand)  evolution of the framework AliRoot 5.x  6.x  adapting ALICE distributed computing to Cloud Computing Upgrade TDR expected by October 2014

30 Domenico Elia30Workshop CCR INFN / LNS Catania, 27-30.5.2014 Conclusions  Status of the ALICE computing:  quite smooth and reasonably streamlined Grid operations  fraction of organized analysis increasing  very good performance of the Italian sites  Perspectives:  finish re-processing of raw data and MC (next 6 months)  increasing activity on Virtual AF in Italy  preparing for Run2: - move calibration to online - improve and speed up offline reconstruction  designing the new CM for Run3: - raw reconstruction and compression online @ T0 - MC and analysis activities @ T1/T2 (external clouds)

31 Domenico Elia31Workshop CCR INFN / LNS Catania, 27-30.5.2014 Backup slides

32 Domenico Elia32Workshop CCR INFN / LNS Catania, 27-30.5.2014 Status of the ALICE computing Original Computing Model  Similar to other experiments for pp collisions:  ~1.3 MB/event raw, ~0.2 MB derived data (ESD+AOD)  ~0.1 kHS06 s/event  online first calibration  quasi-online data distribution and first reconstruction at Tier-0  further reconstructions at the Tier-1’s

33 Domenico Elia33Workshop CCR INFN / LNS Catania, 27-30.5.2014 Status of the ALICE computing Grid jobs: three main classes  MC simulation & reco productions:  low I/O, high CPU efficiency  data export after job completion  managed, scheduled  Analysis (LEGO) trains:  optimized I/O (read once, do many tasks)  streamlined code (as much as possible …)  managed by train operators (from PWGs)  User jobs:  lowest CPU efficiency  variable job duration, many failures, less good code …  managed by the users

34 Domenico Elia34Workshop CCR INFN / LNS Catania, 27-30.5.2014 Status of the ALICE computing CPU and storage: 2013 sharing Extra-CERN resources:

35 Domenico Elia35Workshop CCR INFN / LNS Catania, 27-30.5.2014 DISK (TB) TAPE (TB) Status of the ALICE computing Italian contribution @ Tier-1 Low consumption of tape storage compared to requirements due to revised strategy of data preservation: -ALICE analysis not using data on tape due to high latency of the tape system (ESDs and AODs data reside on disk) -obsolete data permanently deleted and not saved to tape New requirements reflect this practice L. Morganti, S. Taneja, CdG Tier-1 25.3.2014

36 Domenico Elia36Workshop CCR INFN / LNS Catania, 27-30.5.2014 Site Availability from EGI monthly reports (Jan 2013 – Mar 2014): Italian Tier-2’s quite satisfactory, all above the average CNAF99.0% Average90.8% BA95.1% CT92.7% PD-LNL98.8% TO91.2% Status of the ALICE computing Italian contribution @ Tier-2’s

37 Domenico EliaWorkshop CCR INFN / LNS Catania, 27-30.5.201437 Perspectives Activity on Virtual AF  Elastic expandable Virtual AF deployed in Torino  Some developments also in Trieste:  smaller static AF deployed in 2013  recently moved to a (Torino-like) VAF in a Cloud infrastructure

38 Domenico Elia38Workshop CCR INFN / LNS Catania, 27-30.5.2014 Perspectives Preparing for Run2 (2015-2017)  Changing data taking conditions  (No) Evolution of the CM  Evolution of the software framework (AliRoot 5.x)  Additional software and process improvements:  start adapting ALICE distributed computing to Cloud, using of HLT farm for offline processing (additional 3% CPU resources)  improving performance of the organized analysis trains  speeding up and improving the efficiency of the analysis activity by active data management  collaborating with other experiments to explore contributed resources (i.e. spare CPU cycles and supercomputers)

39 Domenico Elia39Workshop CCR INFN / LNS Catania, 27-30.5.2014 Perspectives Plan for the next six months  Pass2/3 of 2011 pp data and associated MC:  full detector calibration, 2 years of software updates  Pass2 of LHC12 pp, Pass3 of 2013 pPb data:  From August/September start cosmic trigger run:  upgraded detector readout, Trigger, DAQ, HLT  data will be reconstructed offline

40 Domenico Elia40Workshop CCR INFN / LNS Catania, 27-30.5.2014 Perspectives Resource requirements Run2  Request for 2015:  scritinized and approved by CRSG in April 2014  CPU request growth compatible with “flat” budget  tape and disk resources increase after 2013/14 flat profile  major demand on resources towards end of 2015 (PbPb run)


Download ppt "Domenico Elia1 ALICE computing: status and perspectives Domenico Elia, INFN Bari Workshop CCR INFN / LNS Catania, 27-30.5.2014 Workshop Commissione Calcolo."

Similar presentations


Ads by Google