ARDA-ALICE activity in 2005 and tasks in 2006

Slides:



Advertisements
Similar presentations
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Advertisements

1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
1 Status of the ALICE CERN Analysis Facility Marco MEONI – CERN/ALICE Jan Fiete GROSSE-OETRINGHAUS - CERN /ALICE CHEP Prague.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
Production Activities and Requirements by ALICE Patricia Méndez Lorenzo (on behalf of the ALICE Collaboration) Service Challenge Technical Meeting CERN,
AliEn uses bbFTP for the file transfers. Every FTD runs a server, and all the others FTD can connect and authenticate to it using certificates. bbFTP implements.
Task 6.1 Installing and testing components of the LCG infrastructure to achieve full-scale functionality CERN-INTAS , 25 June, 2006, Dubna V.A.
CHEP – Mumbai, February 2006 The LCG Service Challenges Focus on SC3 Re-run; Outlook for 2006 Jamie Shiers, LCG Service Manager.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES P. Saiz (IT-ES) AliEn job agents.
F. Fassi, S. Cabrera, R. Vives, S. González de la Hoz, Á. Fernández, J. Sánchez, L. March, J. Salt, A. Lamas IFIC-CSIC-UV, Valencia, Spain Third EELA conference,
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
The ALICE Distributed Computing Federico Carminati ALICE workshop, Sibiu, Romania, 20/08/2008.
ROOT-CORE Team 1 PROOF xrootd Fons Rademakers Maarten Ballantjin Marek Biskup Derek Feichtinger (ARDA) Gerri Ganis Guenter Kickinger Andreas Peters (ARDA)
V.Ilyin, V.Gavrilov, O.Kodolova, V.Korenkov, E.Tikhonenko Meeting of Russia-CERN JWG on LHC computing CERN, March 14, 2007 RDMS CMS Computing.
Author: Andrew C. Smith Abstract: LHCb's participation in LCG's Service Challenge 3 involves testing the bulk data transfer infrastructure developed to.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES A. Abramyan, S. Bagansco, S. Banerjee, L. Betev, F. Carminati,
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Summary of Services for the MC Production Patricia Méndez Lorenzo WLCG T2 Workshop CERN, 12 th June 2006.
LCG LCG-1 Deployment and usage experience Lev Shamardin SINP MSU, Moscow
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
Production Activities and Results by ALICE Patricia Méndez Lorenzo (on behalf of the ALICE Collaboration) Service Challenge Technical Meeting CERN, 15.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The LCG interface Stefano BAGNASCO INFN Torino.
14/03/2007A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 14/03/07.
ALICE experiences with CASTOR2 Latchezar Betev ALICE.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
Status of AliEn2 Services ALICE offline week Latchezar Betev Geneva, June 01, 2005.
Service Challenge Report Federico Carminati GDB – January 11, 2006.
Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.
PDC’06 - status of deployment and production Latchezar Betev TF meeting – April 27, 2006.
Alien and GSI Marian Ivanov. Outlook GSI experience Alien experience Proposals for further improvement.
The Grid Storage System Deployment Working Group 6 th February 2007 Flavia Donno IT/GD, CERN.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
ALICE Physics Data Challenge ’05 and LCG Service Challenge 3 Latchezar Betev / ALICE Geneva, 6 April 2005 LCG Storage Management Workshop.
Gestion des jobs grille CMS and Alice Artem Trunov CMS and Alice support.
The ALICE Production Patricia Méndez Lorenzo (CERN, IT/PSS) On behalf of the ALICE Offline Project LCG-France Workshop Clermont, 14th March 2007.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
LHCC meeting – Feb’06 1 SC3 - Experiments’ Experiences Nick Brook In chronological order: ALICE CMS LHCb ATLAS.
Monthly video-conference, 18/12/2003 P.Hristov1 Preparation for physics data challenge'04 P.Hristov Alice monthly off-line video-conference December 18,
1-2 March 2006 P. Capiluppi INFN Tier1 for the LHC Experiments: ALICE, ATLAS, CMS, LHCb.
Storage discovery in AliEn
Availability of ALICE Grid resources in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
The ALICE Analysis -- News from the battlefield Federico Carminati for the ALICE Computing Project CHEP 2010 – Taiwan.
WLCG IPv6 deployment strategy
Grid Computing: Running your Jobs around the World
ALICE and LCG Stefano Bagnasco I.N.F.N. Torino
LCG Service Challenge: Planning and Milestones
Data Challenge with the Grid in ATLAS
Status of the CERN Analysis Facility
INFN-GRID Workshop Bari, October, 26, 2004
GSIAF & Anar Manafov, Victor Penso, Carsten Preuss, and Kilian Schwarz, GSI Darmstadt, ALICE Offline week, v. 0.8.
ALICE Physics Data Challenge 3
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Readiness of ATLAS Computing - A personal view
RDIG for ALICE today and in future
MC data production, reconstruction and analysis - lessons from PDC’04
Grid Deployment Board meeting, 8 November 2006, CERN
Short update on the latest gLite status
Artem Trunov and EKP team EPK – Uni Karlsruhe
Installation and Commissioning of ALICE VO-BOXES and AliEn Services
Simulation use cases for T2 in ALICE
ALICE – FAIR Offline Meeting KVI (Groningen), 3-4 May 2010
LCG middleware and LHC experiments ARDA project
Data Management cluster summary
R. Graciani for LHCb Mumbay, Feb 2006
LHC Data Analysis using a worldwide computing grid
Support for ”interactive batch”
ATLAS DC2 & Continuous production
The LHCb Computing Data Challenge DC06
Presentation transcript:

ARDA-ALICE activity in 2005 and tasks in 2006 CERN_Russian JWG, CERN, 6.03.06 ARDA-ALICE activity in 2005 and tasks in 2006 G.Shabratova Joint Institute for Nuclear Research

Goals of ALICE-SC3 Verification of ALICE distributed computing infrastructure Production of meaningful physics Signals and statistics requested by the ALICE PWG Not so much a test of the complete computing model as described in the TDR Function of central AliEn services and interaction with the remote centres services

ARDA+ALICE visits in 2005 GOAL : Configuration and test Russian sites for participation in the ALICE PDC'05/ LCG SC3 November- December 2005 < than 1 month #Installation and test VO-boxes, interaction with LCG, installation of application software at VO-boxes #Connection to central services, stability, job submission to RB: with AliEn v.2-5 Mikalai Kutouski from JINR

ARDA+ALICE visits in 2005 What have been done: VO boxes at ITEP and JINR sites. Installation of application software(AliEn , ROOt, AliRoot, Geat3). Test of Tier1<->Tier2 links: FZK (dCache) ITEP (DPM) JINR (DPM)

Running job profile 2450 jobs Negative slope: see results(4)

Results 15 sites CPU utilization (80% T1/ 20%T2) T1’s: CERN: 8%, CCIN2P3: 12%, CNAF: 20%, GridKA: 41% T2’s: Bari: 0.5%, GSI: 2%, Houston: 2%, Muenster: 3.5%, NIHAM: 1%, OSC: 2.5%, Prague: 4%, Torino: 2%, ITEP: 1%, SARA: 0.1%, Clermont: 0.5% Number of jobs: 98% of target number Special thanks to K.Schwarz & GridKa team for making 1200 CPUs available for the test Duration: 12 hours (1/2 of the target duration) Jobs done: 2500 (33% of target number) Storage: 33% of target Number of running jobs (2450) – 25% more than the entire installed lxbatch capacity at CERN

Results (2) VO-box behaviour No problems with services running, no interventions necessary Load profile on VO-boxes – in average proportional to the number of jobs running on the site, nothing special CERN GridKA

Physics Data Challenge 12500 jobs; 100M SI2K/hours; 2 TB output This represents 20% more computing power than the entire CERN batch capacity FZK 43%, CNAF 23%, CCIN2P3 10%, CERN 8%, Muenster 5%, remaining centres 10%

Tasks in 2006 February March April May -June July - August September data transfers T0->T1 (CCIN2P3, CNAF, Grid.Ka, RAL) March bulk production at T1/T2; data back to T0 April first push out of simulated data; reconstruction at T1s. May -June July - August Reconstruction at CERN and remote centers September Scheduled + unscheduled (T2s?) analysis challenges Extensive testing on PPS by all VOs Deployment of gLite 3.0 at major sites for SC4 production

ARDA+ALICE visits in 2006 GOAL : Participation in the preparation and operation of the distributed data production and user analysis in the ALICE PDC'06/ LCG SC4 15.05.06 - 31.93.06 =>1.53 months #Familiarization with the ALICE distributed computing environment with the agents and services running at the computing centers # Familiarization with the LCG VO-box environment 20.07.06 - 20.10.06 => 3.0 months #Responsible for the installation, debugging and operation of the analysis environment #Coordination of on-site activities with the local system

Tier2s resources available in 2006 689 (100%) 913 (105%) 4232 (134%) Total 0.6 5 25 Slovakia 7.1 198 Polish T2* 1 10 132 U. Muenster 30 100 GSI 28 (184%) 130 (146%) French T2 (6%) 240 (48%) RDIG 14 60 FZU Prague 21 (54%) 513 (285%) USA BW to CERN/T1 (Gb/s) Tape (TB) Disk CPU (MKSI2K) Site

What we can do Assuming 85% CPU efficiency 33 68 231,500 3 M Total 51 238 RAW 5 ESD 68 231,500 3 M Total 210 RAW 2 ESD 51 172,000 1 M 1 M PbPb 28 RAW 3 ESD 18 59,500 2 M 100 M pp BW [MB/s] Data [TB] Duration [days] CPU work [CPU/days] Number of jobs Number of events

ARDA+ALICE in 2006(Russia) At this time we have : VO boxes Appl. software at VO boxes ITEP (Moscow) + IHEP (Protvino) INR (Troitsk) JINR (Dubna) + KI (Moscow) SPtSU (S.Petersburg) Under installation: at PNPI(Gatchina), SINP (Moscow)

Computing resources ALICE in 2006

AliRoot: Execution Flow Initialization Event Generation Particle Transport Hits AliSimulation Clusters Digits/ Raw digits Event merging Summable Digits AliReconstruction Tracking PID ESD Analysis

Analysis Documentation pre-released: http://project-arda-dev.web.cern.ch/project-arda-dev/alice/apiservice/ Services opened to experts users, beginning of February Debugging progressing Good performance in terms of functionalities, not fully stable yet, very good user support Services opened to more non expert users, end of last week

ALICE Analysis Basic Concepts Analysis Models Prompt analysis at T0 using PROOF(+file catalogue) infrastructure Batch Analysis using GRID infrastructure Interactive Analysis using PROOF(+GRID) infrastructure User Interface ALICE User access any GRID Infrastructure via AliEn or ROOT/PROOF UIs AliEn Native and “GRID on a GRID” (LCG/EGEE, ARC, OSG) integrate as much as possible common components LFC, FTS, WMS, MonALISA ... PROOF/ROOT single- + multitier static and dynamic PROOF cluster GRID API class TGrid(virtual)->TAliEn(real)

Distributed analysis File Catalogue query User job (many events) Data set (ESD’s, AOD’s) Job output Job Optimizer Grouped by SE files location Sub-job 1 Sub-job 2 Sub-job n Job Broker Submit to CE with closest SE CE and SE CE and SE CE and SE processing processing processing Output file 1 Output file 2 Output file n File merging job

Batch Analysis: input Input Files Input Data Downloaded into the local job sandbox: macros, configuration… Input Data Created from Catalogue Queries Stored as ROOT Objects (Tchain, TDSet, TAlienCollection) in a registered GRID file Stored in XML file format in a registered GRID file Stored in a regular AliEn JDL on demand GRID jobs don't stage Input Data into the job sandbox (no download) GRID jobs access Input Data via “xrootd” protocol using the TAlienFile class implementation in ROOT TFile::Open(“alien://alice/...../Kinematis.root”);

ALICE Analysis - File Access from ROOT “all files accessible via LFNs”

ALICE AliEn Batch Analysis: Scheduling After optimization, a Job Scheduler periodically assign priorities to jobs in the TQ Scheduling defined by user based on reference and maximum number of parallel jobs Avoids flooding of the TQ and resources by a single user submitting many jobs Dynamic configuration Users can be privileged or blocked in the TQ by a system administrator Reordering of the Task Queue ID 1 Priority -1 ID 2 Priority -1 ID 3 Priority -1 ID 4 Priority -1 ID 4 Priority 10 ID 2 Priority 5 ID 1 Priority 3 ID 3 Priority 1 Job Scheduler

Batch Analysis via Agents on heterogeneous GRIDs Requirements to run AliEn as a “GRID on an GRID” Provide few (one) User logins per VO Install the Agent Software Startup agents via Queue/Broker systems or run as permanent daemon Access local storage element all data access from the application via xrootd run “xrootd” as front-end daemon to any mass storage system ideally via the SRM interface, read-write mode enforce strong authorization through file catalogue tokens run “xrootd” with every JobAgent / WN as an analysis cache read-only mode strong authorization only for specific secure MSS paths => “public access SE”

ALICE Batch Analysis via agents in heterogeneous GRIDs /alice/file1.root /alice/file2.root /alice/file3.root /alice/file4.root /alice/file5.root /alice/file6.root /alice/file7.root Job Optimizer-Splitting JDL:InputData JDL:InputData JDL:InputData JDL:InputData /alice/file1.root /alice/file2.root /alice/file3.root /alice/file4.root /alice/file5.root /alice/file6.root /alice/file7.root Job Agent TAlienCollection XML - File TAlienCollection XML - File TAlienCollection XML - File ROOT xrootd MSS Site A Site B Site C

Interactive Analysis Model: PROOF Four different use cases to consider Local Setups Conventional single-tier PROOF cluster in sites for interactive Analysis (data pre-staged on the cluster disks) site autonomy site policies apply manual work for data deployment, but quite easy to do integrate single-tier PROOF clusters into AliEn a permanent PROOF cluster(proofd+xrootd) is registered as a read-only storage element in AliEn working on a MSS backend PROOF Chains are queried from the AliEn File Catalogue Location of data files in the xrootd cache using the xrootd redirector

Interactive Analysis Model: PROOF Multi-tier Static Setup Permanent PROOF clusters are configured in a multi-tier structure A PROOF Analysis Chain is queried directly from the File Catalogue A Chain is looked up by sub-masters using the local xrootd redirectors During an Analysis Query the PROOF master assigns analysis packets to the sub-master -- workers have the right (=local) data accessible Multi-tier Dynamic Setup all like in the multitier static setup, but proofd/xrootd are started up as jobs for a specific user in several tiers using the AliEn Task Queue …or… proofd/xrootd are started up as generic agents by a Job Agent – the assignment of PROOFD to a specific user has to be implemented in the PROOF master

PROOF@GRID Multitier Hierarchical Setup with xrootd read-cache Depending on the Catalogue model, LFNs can be either resolved by the PROOF master using a centralized file catalogue or only SE indexed and resolved by Submasters and local file catalogues. proofd xrootd Client MSS PROOF Master Submaster Local File Catalogue Site 1 Site 2 Storage Index Catalogue

Services structure AliEn CE/SE LCG UI AliEn CE/SE AliEn CE/SE LCG RB Phase 1: Event production and storage at CERN Phase 2: Test of file transfer utilities (FTS) Phase 3: analysis – batch and interactive with PROOF Central services: Catalogue Task queue Job optimization -etc. File registration Job submission AliEn CE/SE LCG UI LCG CE LCG SE/SRM AliEn CE/SE LCG CE LCG SE/SRM AliEn CE/SE LCG RB LCG CE LCG SE/SRM

Courtesy of I. Bird, LCG GDB, May 2005 VO “Agents & Daemons” VO-specific services/agents Appeared in the discussions of fts, catalogs, etc. …. – all experiments need the ability to run “long-lived agents” on a site At Tier 1 and at Tier 2  how do they get machines for this, who runs it, can we make a generic service framework

ALICE & LCG Service Challenge 3 AliEn and monitoring agents and services running on the VO node: Storage Element Service (SES) – interface to local storage (via SRM or directly) File Transfer Daemon (FTD) – scheduled file transfers agent (possibly using FTS implementation) xrootd – application file access Cluster Monitor (CM) – local queue monitoring MonALISA – general monitoring agent PackMan (PM) – application software distribution and management Computing Element (CE)

Issues (1) VO-Box support/operation: Experts needed for site installation, tuning and operation Patricia is validating LCG installation on VO-boxes Stefano and Pablo are installing and tuning ALICE agents/services Mikalai is responsible for the Russian sites Artem for France Kilian for Germany Instruction manual on LCG-AliEn interoperability and site tuning (Stefano’s idea) will help a lot to speed up the process

PSS Schedule for gLite 3.0 PRODUCTION!! Thursday 1/6/06 SC4 starts!! YOU ARE HERE February March April May June Certification PPS Deploy in prodn PRODUCTION!! Tuesday 28/2/06 gLite 3.0β exits certification and enters PPS. Wednesday 15/3/06 gLite 3.0β available to users in the PPS. Thursday 1/6/06 SC4 starts!! Friday 28/4/06 gLite 3.0 exits PPS and enters production. Deployment of gLite 3.0β in PPS Patches for bugs continually passed to PPS.

In the case of ALICE Take some points into account You require UI inside the VO-BOX The UI configuration changes. It is a combination of LCG and gLite UIs If you want to do something as gLite-job-submit from the VO-BOX you have to include the new UI You require the FTS and LFC clients inside Next Monday here is a meeting to clarify the status of these services with the experts and the sites More details next TF Meeting after that meeting

FTD-FTS integration It works!!  Several transfers done successfully gridftp bbftp … fts BDII FTS Endpoint MyProxy

Current limitations Only for specified channels Only for SRM SE At the moment, only CNAF and CERN Required myproxy password At the moment, my certificate… GGUS response time It takes quite long (more than 15 hours!!) to assign the tickets to the responsible. Once assigned, tickets solved pretty fast Endpoints have to be defined in BDII