1-2 March 2006 P. Capiluppi INFN Tier1 for the LHC Experiments: ALICE, ATLAS, CMS, LHCb
2 P. Capiluppi 1-2 March 2006 LHC: role of a Tier1 (short remind) u Custodial of a fraction (1/#Tier1s) of Raw & Reconstructed (ESD aka RECO aka DST) Data u Full set of AODs u Reprocessing of data (re-reconstruction) u Skimming and selection of data (Large analysis jobs by Physics Groups) {ALICE, ATLAS, CMS} u User analyses {LHCb} u Distribution of data to the Tier2s u Many services needed for that (SLA): Accounting (fair share of resources) Permanent storage Data access, location and distribution User accesses Job tracking and monitoring Availability 24x7, … etc.
3 P. Capiluppi 1-2 March 2006 The LHC (shared) Tier1-CNAF
4 P. Capiluppi 1-2 March 2006 Usage so far of the Tier1-CNAF by the Italian LHC community u Simulation & reconstruction u Data & Computing Challenges u WLCG Service challenges u Analysis of simulated data u Custodial for many produced data (simulated and real, like test-beams data, cosmic, etc.) u Test of many “new” functionalities Both “Grid-like” and “Experiment-specific” u LHC is (still) not running, therefore LHC Experiments activities have spikes The role (and use) of the Tier1 is (still) not that of the Computing TDRs
5 P. Capiluppi 1-2 March 2006 Use of Tier1 (Grid only) ATLAS LHCb CMS ALICE Nov 2005 Total CPU Time LHC Jobs/day: Dec05-Feb06
6 P. Capiluppi 1-2 March 2006 Analysis: CMS-CRAB Monitor Submitted jobs Submitted from Destination of jobs
7 P. Capiluppi 1-2 March 2006 ALICE Simulation jobs at CNAF Tier1 Pb-Pb events JAN-27 FEB-10 Done jobs
8 P. Capiluppi 1-2 March 2006 WLCG SC3 rates CNAF
9 P. Capiluppi 1-2 March 2006 Services: really “a number” of… CEs, SEs, RBs, UIs, VOMS, Information Systems, Catalogs, etc. And… Mass Storage system, disk file systems, LAN configuration, data-base services, Accounting, Monitoring, compilers, libraries, experiments’ software and libraries, shared caches, etc. u Most of them are there, however u Integration of WLCG and specific needs of the experiments might be a problem u The INFN Tier1 is part of WLCG, INFN-Grid, EGEE programs: mostly integrated, but… u Castor(2) is quite new (and in evolution), but is a key element of the Tier1: support by and collaboration with CERN? u File transfer from/to the Tier0, the Tier1s, the Tier2s is still largely untested (SC4) u Publication of data (files, or better “datasets”) needs a strong interaction with experiments for the implementation at the Tier1
10 P. Capiluppi 1-2 March 2006 Services: really “a number” of… Storage Storage Storage (and CPUs) And in addition: u WLCG Service Challenge 4? “Production” vs “Challenge” Duplication of effort? u Partitioning the Tier1? For different scopes, different experiments and also different needs within an experiment? Is it needed, desirable, possible? And if yes, how?
11 P. Capiluppi 1-2 March 2006 Supporting the Experiments Single shared Farm Same File system Experiment dedicated Storage Elements Common services (WLCG) u Too many Experiments? Integration with WLCG is enough? And the non-WLCG Experiments? è Also, compatibility with other Experiments Tier1s Problem solving and resources-competition u Specific services needed by Experiments (maybe temporally) How to manage? Procedures? u User support Accounts (on UIs) Interactive access (development machines) Dedicated queues for particular purposes Etc. Good, but
12 P. Capiluppi 1-2 March 2006 And … Experiments supporting Tier1 u LHC Experiments personnel is actively working with the Tier1 personnel on: Disk storage file-systems performances testing {LHCb, CMS, ALICE, …} è GPFS, dCache, Parallel File Systems, … File Transfers tools {CMS,ALICE, …} è FTS, … Catalogs { ALICE, ATLAS, CMS,…} è Data-base services usage and implementation (Oracle, MySQL, Squidd, Frontier, …) VO-Boxes {ALICE, ATLAS, CMS, …} è Harmonization with the common services WLCG Service Challenges {CMS, ALICE, ATLAS, …} è Service implementation and running WMS (Workload Management System) {ATLAS, ALICE, CMS, …} è Feedback on performances and setup
13 P. Capiluppi 1-2 March 2006 WMS performances tests: ATLAS/LCG/EGEE Task Force gLiteLCG (sec/job)Submission Match- making OverallSubmission Match- making Overall Simple hello world ~ Simple hello world with CE requirement ~ LS with 48 KB inputsandbox (partially shared) ~ u gLite Observable effect comes from the number of matched CEs The inputsandbox effect on submission still not be fully understood with the data in the table u LCG Match-making takes place right after submission No observable effect from the number of matched CEs Submission of job with inputsandbox is about 2 times slower than simple hello world job
14 P. Capiluppi 1-2 March 2006 What we would like to have now … and is (still) missing u Storage accounting and quotas For the use of the Experiments (runs or datasets, not files) u Job priority (Tier1 has to guarantee agreed sharing) u Catalogs Data-bases for data-access: common and experiments specific u Experiments support: too few people <~ one person per experiment (LHC) u Transparent access to heterogeneous hardware u Link and co-ordination with Experiments is a KEY ISSUE Operations Planning Clear interfaces and contacts (for every issue) Testing & integration of Experiments specific software è How, when and if possible (decision taking process) Both functions urgently needed
15 P. Capiluppi 1-2 March 2006 Conclusions A Tier1 already working for the LHC Experiments With reasonable satisfaction We got a lot of support, both common and specific, by very technically competent personnel However we are very worried of The Tier1 is largely understaffed and also very few senior personnel for management Ü consequently a lack of personnel dedicated to Experiments support However, experiments’ people from outside sites is already supporting Tier1 Organization of the link/interaction with experiments Must be improved To Guarantee the experiments commitments and the Tier1 running in addition Few (many?) other areas that need urgent investments Storage access and use User interaction with the Center Procedures for interventions (emergency and routine) Means for notifications of events (of any kind, not only for Italy) Last but not the least: Hardware procurement