ARDA-ALICE activity in 2005 and tasks in 2006 CERN_Russian JWG, CERN, 6.03.06 ARDA-ALICE activity in 2005 and tasks in 2006 G.Shabratova Joint Institute for Nuclear Research
Goals of ALICE-SC3 Verification of ALICE distributed computing infrastructure Production of meaningful physics Signals and statistics requested by the ALICE PWG Not so much a test of the complete computing model as described in the TDR Function of central AliEn services and interaction with the remote centres services
ARDA+ALICE visits in 2005 GOAL : Configuration and test Russian sites for participation in the ALICE PDC'05/ LCG SC3 November- December 2005 < than 1 month #Installation and test VO-boxes, interaction with LCG, installation of application software at VO-boxes #Connection to central services, stability, job submission to RB: with AliEn v.2-5 Mikalai Kutouski from JINR
ARDA+ALICE visits in 2005 What have been done: VO boxes at ITEP and JINR sites. Installation of application software(AliEn , ROOt, AliRoot, Geat3). Test of Tier1<->Tier2 links: FZK (dCache) ITEP (DPM) JINR (DPM)
Running job profile 2450 jobs Negative slope: see results(4)
Results 15 sites CPU utilization (80% T1/ 20%T2) T1’s: CERN: 8%, CCIN2P3: 12%, CNAF: 20%, GridKA: 41% T2’s: Bari: 0.5%, GSI: 2%, Houston: 2%, Muenster: 3.5%, NIHAM: 1%, OSC: 2.5%, Prague: 4%, Torino: 2%, ITEP: 1%, SARA: 0.1%, Clermont: 0.5% Number of jobs: 98% of target number Special thanks to K.Schwarz & GridKa team for making 1200 CPUs available for the test Duration: 12 hours (1/2 of the target duration) Jobs done: 2500 (33% of target number) Storage: 33% of target Number of running jobs (2450) – 25% more than the entire installed lxbatch capacity at CERN
Results (2) VO-box behaviour No problems with services running, no interventions necessary Load profile on VO-boxes – in average proportional to the number of jobs running on the site, nothing special CERN GridKA
Physics Data Challenge 12500 jobs; 100M SI2K/hours; 2 TB output This represents 20% more computing power than the entire CERN batch capacity FZK 43%, CNAF 23%, CCIN2P3 10%, CERN 8%, Muenster 5%, remaining centres 10%
Tasks in 2006 February March April May -June July - August September data transfers T0->T1 (CCIN2P3, CNAF, Grid.Ka, RAL) March bulk production at T1/T2; data back to T0 April first push out of simulated data; reconstruction at T1s. May -June July - August Reconstruction at CERN and remote centers September Scheduled + unscheduled (T2s?) analysis challenges Extensive testing on PPS by all VOs Deployment of gLite 3.0 at major sites for SC4 production
ARDA+ALICE visits in 2006 GOAL : Participation in the preparation and operation of the distributed data production and user analysis in the ALICE PDC'06/ LCG SC4 15.05.06 - 31.93.06 =>1.53 months #Familiarization with the ALICE distributed computing environment with the agents and services running at the computing centers # Familiarization with the LCG VO-box environment 20.07.06 - 20.10.06 => 3.0 months #Responsible for the installation, debugging and operation of the analysis environment #Coordination of on-site activities with the local system
Tier2s resources available in 2006 689 (100%) 913 (105%) 4232 (134%) Total 0.6 5 25 Slovakia 7.1 198 Polish T2* 1 10 132 U. Muenster 30 100 GSI 28 (184%) 130 (146%) French T2 (6%) 240 (48%) RDIG 14 60 FZU Prague 21 (54%) 513 (285%) USA BW to CERN/T1 (Gb/s) Tape (TB) Disk CPU (MKSI2K) Site
What we can do Assuming 85% CPU efficiency 33 68 231,500 3 M Total 51 238 RAW 5 ESD 68 231,500 3 M Total 210 RAW 2 ESD 51 172,000 1 M 1 M PbPb 28 RAW 3 ESD 18 59,500 2 M 100 M pp BW [MB/s] Data [TB] Duration [days] CPU work [CPU/days] Number of jobs Number of events
ARDA+ALICE in 2006(Russia) At this time we have : VO boxes Appl. software at VO boxes ITEP (Moscow) + IHEP (Protvino) INR (Troitsk) JINR (Dubna) + KI (Moscow) SPtSU (S.Petersburg) Under installation: at PNPI(Gatchina), SINP (Moscow)
Computing resources ALICE in 2006
AliRoot: Execution Flow Initialization Event Generation Particle Transport Hits AliSimulation Clusters Digits/ Raw digits Event merging Summable Digits AliReconstruction Tracking PID ESD Analysis
Analysis Documentation pre-released: http://project-arda-dev.web.cern.ch/project-arda-dev/alice/apiservice/ Services opened to experts users, beginning of February Debugging progressing Good performance in terms of functionalities, not fully stable yet, very good user support Services opened to more non expert users, end of last week
ALICE Analysis Basic Concepts Analysis Models Prompt analysis at T0 using PROOF(+file catalogue) infrastructure Batch Analysis using GRID infrastructure Interactive Analysis using PROOF(+GRID) infrastructure User Interface ALICE User access any GRID Infrastructure via AliEn or ROOT/PROOF UIs AliEn Native and “GRID on a GRID” (LCG/EGEE, ARC, OSG) integrate as much as possible common components LFC, FTS, WMS, MonALISA ... PROOF/ROOT single- + multitier static and dynamic PROOF cluster GRID API class TGrid(virtual)->TAliEn(real)
Distributed analysis File Catalogue query User job (many events) Data set (ESD’s, AOD’s) Job output Job Optimizer Grouped by SE files location Sub-job 1 Sub-job 2 Sub-job n Job Broker Submit to CE with closest SE CE and SE CE and SE CE and SE processing processing processing Output file 1 Output file 2 Output file n File merging job
Batch Analysis: input Input Files Input Data Downloaded into the local job sandbox: macros, configuration… Input Data Created from Catalogue Queries Stored as ROOT Objects (Tchain, TDSet, TAlienCollection) in a registered GRID file Stored in XML file format in a registered GRID file Stored in a regular AliEn JDL on demand GRID jobs don't stage Input Data into the job sandbox (no download) GRID jobs access Input Data via “xrootd” protocol using the TAlienFile class implementation in ROOT TFile::Open(“alien://alice/...../Kinematis.root”);
ALICE Analysis - File Access from ROOT “all files accessible via LFNs”
ALICE AliEn Batch Analysis: Scheduling After optimization, a Job Scheduler periodically assign priorities to jobs in the TQ Scheduling defined by user based on reference and maximum number of parallel jobs Avoids flooding of the TQ and resources by a single user submitting many jobs Dynamic configuration Users can be privileged or blocked in the TQ by a system administrator Reordering of the Task Queue ID 1 Priority -1 ID 2 Priority -1 ID 3 Priority -1 ID 4 Priority -1 ID 4 Priority 10 ID 2 Priority 5 ID 1 Priority 3 ID 3 Priority 1 Job Scheduler
Batch Analysis via Agents on heterogeneous GRIDs Requirements to run AliEn as a “GRID on an GRID” Provide few (one) User logins per VO Install the Agent Software Startup agents via Queue/Broker systems or run as permanent daemon Access local storage element all data access from the application via xrootd run “xrootd” as front-end daemon to any mass storage system ideally via the SRM interface, read-write mode enforce strong authorization through file catalogue tokens run “xrootd” with every JobAgent / WN as an analysis cache read-only mode strong authorization only for specific secure MSS paths => “public access SE”
ALICE Batch Analysis via agents in heterogeneous GRIDs /alice/file1.root /alice/file2.root /alice/file3.root /alice/file4.root /alice/file5.root /alice/file6.root /alice/file7.root Job Optimizer-Splitting JDL:InputData JDL:InputData JDL:InputData JDL:InputData /alice/file1.root /alice/file2.root /alice/file3.root /alice/file4.root /alice/file5.root /alice/file6.root /alice/file7.root Job Agent TAlienCollection XML - File TAlienCollection XML - File TAlienCollection XML - File ROOT xrootd MSS Site A Site B Site C
Interactive Analysis Model: PROOF Four different use cases to consider Local Setups Conventional single-tier PROOF cluster in sites for interactive Analysis (data pre-staged on the cluster disks) site autonomy site policies apply manual work for data deployment, but quite easy to do integrate single-tier PROOF clusters into AliEn a permanent PROOF cluster(proofd+xrootd) is registered as a read-only storage element in AliEn working on a MSS backend PROOF Chains are queried from the AliEn File Catalogue Location of data files in the xrootd cache using the xrootd redirector
Interactive Analysis Model: PROOF Multi-tier Static Setup Permanent PROOF clusters are configured in a multi-tier structure A PROOF Analysis Chain is queried directly from the File Catalogue A Chain is looked up by sub-masters using the local xrootd redirectors During an Analysis Query the PROOF master assigns analysis packets to the sub-master -- workers have the right (=local) data accessible Multi-tier Dynamic Setup all like in the multitier static setup, but proofd/xrootd are started up as jobs for a specific user in several tiers using the AliEn Task Queue …or… proofd/xrootd are started up as generic agents by a Job Agent – the assignment of PROOFD to a specific user has to be implemented in the PROOF master
PROOF@GRID Multitier Hierarchical Setup with xrootd read-cache Depending on the Catalogue model, LFNs can be either resolved by the PROOF master using a centralized file catalogue or only SE indexed and resolved by Submasters and local file catalogues. proofd xrootd Client MSS PROOF Master Submaster Local File Catalogue Site 1 Site 2 Storage Index Catalogue
Services structure AliEn CE/SE LCG UI AliEn CE/SE AliEn CE/SE LCG RB Phase 1: Event production and storage at CERN Phase 2: Test of file transfer utilities (FTS) Phase 3: analysis – batch and interactive with PROOF Central services: Catalogue Task queue Job optimization -etc. File registration Job submission AliEn CE/SE LCG UI LCG CE LCG SE/SRM AliEn CE/SE LCG CE LCG SE/SRM AliEn CE/SE LCG RB LCG CE LCG SE/SRM
Courtesy of I. Bird, LCG GDB, May 2005 VO “Agents & Daemons” VO-specific services/agents Appeared in the discussions of fts, catalogs, etc. …. – all experiments need the ability to run “long-lived agents” on a site At Tier 1 and at Tier 2 how do they get machines for this, who runs it, can we make a generic service framework
ALICE & LCG Service Challenge 3 AliEn and monitoring agents and services running on the VO node: Storage Element Service (SES) – interface to local storage (via SRM or directly) File Transfer Daemon (FTD) – scheduled file transfers agent (possibly using FTS implementation) xrootd – application file access Cluster Monitor (CM) – local queue monitoring MonALISA – general monitoring agent PackMan (PM) – application software distribution and management Computing Element (CE)
Issues (1) VO-Box support/operation: Experts needed for site installation, tuning and operation Patricia is validating LCG installation on VO-boxes Stefano and Pablo are installing and tuning ALICE agents/services Mikalai is responsible for the Russian sites Artem for France Kilian for Germany Instruction manual on LCG-AliEn interoperability and site tuning (Stefano’s idea) will help a lot to speed up the process
PSS Schedule for gLite 3.0 PRODUCTION!! Thursday 1/6/06 SC4 starts!! YOU ARE HERE February March April May June Certification PPS Deploy in prodn PRODUCTION!! Tuesday 28/2/06 gLite 3.0β exits certification and enters PPS. Wednesday 15/3/06 gLite 3.0β available to users in the PPS. Thursday 1/6/06 SC4 starts!! Friday 28/4/06 gLite 3.0 exits PPS and enters production. Deployment of gLite 3.0β in PPS Patches for bugs continually passed to PPS.
In the case of ALICE Take some points into account You require UI inside the VO-BOX The UI configuration changes. It is a combination of LCG and gLite UIs If you want to do something as gLite-job-submit from the VO-BOX you have to include the new UI You require the FTS and LFC clients inside Next Monday here is a meeting to clarify the status of these services with the experts and the sites More details next TF Meeting after that meeting
FTD-FTS integration It works!! Several transfers done successfully gridftp bbftp … fts BDII FTS Endpoint MyProxy
Current limitations Only for specified channels Only for SRM SE At the moment, only CNAF and CERN Required myproxy password At the moment, my certificate… GGUS response time It takes quite long (more than 15 hours!!) to assign the tickets to the responsible. Once assigned, tickets solved pretty fast Endpoints have to be defined in BDII