Oxana Smirnova LCG/ATLAS/Lund November 11, 2002, Uppsala 4th NorduGrid Workshop ATLAS Data Challenges on EDG.

Slides:



Advertisements
Similar presentations
DataTAG WP4 Meeting CNAF Jan 14, 2003 Interfacing AliEn and EDG 1/13 Stefano Bagnasco, INFN Torino Interfacing AliEn to EDG Stefano Bagnasco, INFN Torino.
Advertisements

Stephen Burke - WP8 Status - 9/5/2002 Partner Logo WP8 Status Stephen Burke, PPARC/RAL.
Stephen Burke - WP8 Status - 14/2/2002 Partner Logo WP8 Status Stephen Burke, PPARC/RAL.
Andrew McNab - Manchester HEP - 17 September 2002 Putting Existing Farms on the Testbed Manchester DZero/Atlas and BaBar farms are available via the Testbed.
ATLAS/LHCb GANGA DEVELOPMENT Introduction Requirements Architecture and design Interfacing to the Grid Ganga prototyping A. Soroko (Oxford), K. Harrison.
Andrew McNab - Manchester HEP - 22 April 2002 EU DataGrid Testbed EU DataGrid Software releases Testbed 1 Job Lifecycle Authorisation at your site More.
Andrew McNab - Manchester HEP - 2 May 2002 Testbed and Authorisation EU DataGrid Testbed 1 Job Lifecycle Software releases Authorisation at your site Grid/Web.
Andrew McNab - Manchester HEP - 22 April 2002 EU DataGrid Testbed EU DataGrid Software releases Testbed 1 Job Lifecycle Authorisation at your site More.
Réunion DataGrid France, Lyon, fév CMS test of EDG Testbed Production MC CMS Objectifs Résultats Conclusions et perspectives C. Charlot / LLR-École.
WP 1 Grid Workload Management Massimo Sgaravatto INFN Padova.
The DataGrid Project NIKHEF, Wetenschappelijke Jaarvergadering, 19 December 2002
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.
C. Loomis – Testbed: Status… – Sep. 5, 2002 – 1 Testbed: Status & Plans Charles Loomis (CNRS) Sept. 5, th Project Conference (Budapest)
The EDG Testbed Deployment Details The European DataGrid Project
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
WP9 – Earth Observation Applications – n° 1 Experiences with Testbed1, plans and objectives for Testbed 2 Testbed retreat th August 2002
Grid Computing - AAU 14/ Grid Computing Josva Kleist Danish Center for Grid Computing
3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol.
GridPP CM, ICL 16 September 2002 Roger Jones. RWL Jones, Lancaster University EDG Integration  EDG decision to put short-term focus of effort on making.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
G RID M IDDLEWARE AND S ECURITY Suchandra Thapa Computation Institute University of Chicago.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
Nadia LAJILI User Interface User Interface 4 Février 2002.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
11 December 2000 Paolo Capiluppi - DataGrid Testbed Workshop CMS Applications Requirements DataGrid Testbed Workshop Milano, 11 December 2000 Paolo Capiluppi,
WP8 Status – Stephen Burke – 30th January 2003 WP8 Status Stephen Burke (RAL) (with thanks to Frank Harris)
Grid Workload Management Massimo Sgaravatto INFN Padova.
F. Fassi, S. Cabrera, R. Vives, S. González de la Hoz, Á. Fernández, J. Sánchez, L. March, J. Salt, A. Lamas IFIC-CSIC-UV, Valencia, Spain Third EELA conference,
5 Sep 2002F Harris Plenary Budapest1 WP8 Report F Harris (Oxford/CERN)
CMS Stress Test Report Marco Verlato (INFN-Padova) INFN-GRID Testbed Meeting 17 Gennaio 2003.
Oxana Smirnova LCG/ATLAS/Lund September 19, 2002, RHUL ATLAS Software Workshop ATLAS-EDG Task Force report.
Andrew McNabETF Firewall Meeting, NeSC, 5 Nov 2002Slide 1 Firewall issues for Globus 2 and EDG Andrew McNab High Energy Physics University of Manchester.
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
Quick Introduction to NorduGrid Oxana Smirnova 4 th Nordic LHC Workshop November 23, 2001, Stockholm.
Production Tools in ATLAS RWL Jones GridPP EB 24 th June 2003.
First attempt for validating/testing Testbed 1 Globus and middleware services WP6 Meeting, December 2001 Flavia Donno, Marco Serra for IT and WPs.
TERENA 2003, May 21, Zagreb TERENA Networking Conference, 2003 MOBILE WORK ENVIRONMENT FOR GRID USERS. TESTBED Miroslaw Kupczyk Rafal.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
Stephen Burke – Data Management - 3/9/02 Partner Logo Data Management Stephen Burke, PPARC/RAL Jeff Templon, NIKHEF.
DataGRID Testbed Enlargement EDG Retreat Chavannes, august 2002 Fabio HERNANDEZ
ATLAS Data Challenges on the Grid Oxana Smirnova Lund University October 31, 2003, Košice.
DGC Paris WP2 Summary of Discussions and Plans Peter Z. Kunszt And the WP2 team.
UK Grid Meeting Glenn Patrick1 LHCb Grid Activities in UK Grid Prototype and Globus Technical Meeting QMW, 22nd November 2000 Glenn Patrick (RAL)
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
August 30, 2002Jerry Gieraltowski Launching ATLAS Jobs to either the US-ATLAS or EDG Grids using GRAPPA Goal: Use GRAPPA to launch a job to one or more.
Andrew McNab - Manchester HEP - 17 September 2002 UK Testbed Deployment Aim of this talk is to the answer the questions: –“How much of the Testbed has.
Oxana Smirnova LCG/ATLAS/Lund September 3, 2002, Budapest 5 th EU DataGrid Conference ATLAS-EDG Task Force status report.
Site Certification Process (Round Table) Fabio Hernandez IN2P3 Computing Center - Lyon October
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
15-Feb-02Steve Traylen, RAL WP6 Test Bed Report1 RAL/UK WP6 Test Bed Report Steve Traylen, WP6 PPGRID/RAL, UK
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
Overview of ATLAS Data Challenge Oxana Smirnova LCG/ATLAS, Lund University GAG monthly, February 28, 2003, CERN Strongly based on slides of Gilbert Poulard.
Oxana Smirnova LCG/ATLAS/Lund August 27, 2002, EDG Retreat ATLAS-EDG Task Force status report.
WP1 WMS release 2: status and open issues Massimo Sgaravatto INFN Padova.
The DataGrid Project NIKHEF, Wetenschappelijke Jaarvergadering, 19 December 2002
Stephen Burke – Sysman meeting - 22/4/2002 Partner Logo The Testbed – A User View Stephen Burke, PPARC/RAL.
ATLAS Distributed Analysis DISTRIBUTED ANALYSIS JOBS WITH THE ATLAS PRODUCTION SYSTEM S. González D. Liko
Software Management Workshop Steve Traylen. Software Management(WG5) The aim of the working group is to look at deficiencies in deployed and upcoming.
J Jensen / WP5 /RAL UCL 4/5 March 2004 GridPP / DataGrid wrap-up Mass Storage Management J Jensen
Testbed: Status & Plans
The EDG Testbed Deployment Details
Work Package 9 – EO Applications
Oxana Smirnova, Jakob Nielsen (Lund University/CERN)
Stephen Burke, PPARC/RAL Jeff Templon, NIKHEF
Presentation transcript:

Oxana Smirnova LCG/ATLAS/Lund November 11, 2002, Uppsala 4th NorduGrid Workshop ATLAS Data Challenges on EDG

EU Datagrid project Started on January 1, 2001, to deliver by end 2003  Aim: to develop a Grid middleware suitable for High Energy physics, Earth Observation and biology applications  Initial development based on existing tools, e.g., Globus, LCFG, GDMP etc The core testbed consists of the central site at CERN and few facilities across the Western Europe; many more sites are foreseen to join soon  Italy, UK come with several sites each; Spain, Germany and others – via the Crossgrid  ATLAS-affiliated sites: Canada, Taiwan etc By now reached the stability level sufficient to test submission of production-style tasks

EDG Testbed EDG is committed to create a stable testbed to be used by applications for real tasks  This started to materialize in mid-August…  …and coincided with the ATLAS DC1  ATLAS asked and was given the first priority Most sites are installed from scratch using the EDG tools (RedHat 6.2 based)  NIKHEF: EDG installation and configuration only  Lyon: installation on the top of existing farm  A lightweight EDG installation is available Central element: the Resource Broker (RB), distributes jobs between the resources  Currently, only one RB (CERN) is available for applications  In future, may be an RB per Virtual Organization (VO)

EDG functionality as of today UI CASTOR RC CE lxshare0393.cern.ch RB lxshare033.cern.ch testbed010.cern.ch lxshare0399.cern.ch do rfcp rfcp replicate GDMP or RM jdl LDAPNFS RSL OutputGDMP or RM Chart borrowed from Guido Negri’s slides Input Output

ATLAS is eager to use Grid tools for the Data Challenges  ATLAS Data Challenges are already on the Grid (NorduGrid, VDT)  The DC1/phase2 (starting now) is expected to be done using the Grid tools to a bigger extent ATLAS-EDG Task Force was put together in August with the aims:  To assess the usability of the EDG testbed for the immediate production tasks  To introduce the Grid awareness to the ATLAS collaboration The Task Force has representatives both from ATLAS and EDG: 40+ members (!) on the mailing list, ca 10 of them working nearly full-time The initial task: to process 5 input partitions of the Dataset 2000 at the EDG Testbed + one non-EDG site (Karlsruhe); if this works, continue with other datasets ATLAS-EDG Task Force

Execution of jobs It was expected that we can make full use of the Resource Broker functionality  Data-driven job steering  Best available resources otherwise Input files are pre-staged once (copied from CASTOR and replicated elsewhere) A job consists of the standard DC1 shell-script, very much the way it is done on a conventional cluster A Job Definition Language is used to wrap up the job, specifying:  The executable file (script)  Input data  Files to be retrieved manually by the user  Optionally, other attributes (maxCPU, Rank etc) Storage and registration of output files is a part of the job script: i.e., application manages output data the way it needs

Hurdles  EDG can not replicate files directly from CASTOR and can not register them in the Replica Catalog  Replication was done via CERN SE; EDG is working on a better (though temporary) solution. CASTOR team writes a GridFTP interface, which will help a lot.  Big file transfer interrupts after 21 minutes Also known Globus GridFTP server problem, temporary fixed by using multi-threaded GridFTP instead of EDG tools  Jobs were “lost” by the system after 20 minutes of execution Known problem of the Globus software (GASS Cache mechanism), temporary fixed on expense of frequent job submission  Static information system: if a site goes down, it should be removed manually from the index Attempts are under way to switch to the dynamic hierarchical MDS; not yet stable due to the Globus bugs

Other minor problems Installation of ATLAS software:  Cyclic dependencies  External dependencies, esp. on system software Authentication & authorization, users and services  EDG can’t accept instantly a dozen of new country Certificate Authorities  Default proxy lives only 12 hours – users keep forgetting to request longer ones to accommodate long jobs Documentation  Is abundant and not very much user-oriented  Things are improving as more users are coming Information system  faulty information providers, affecting brokering  very difficult to browse/search and retrieve relevant info Data management  information about existing file collections is not easy to find  management of output data is mostly manual (can not be done via JDL) General instability of most EDG services

Achievements: A team of hard-working people across the Europe (ATLAS VO is 45 members strong as of today) ATLAS software (starting from release 3.2.1) is packed into relocateable RPMs, distributed and validated elsewhere DC1 production script is “gridified”, submission script is produced User-friendly testbed status monitor and ATLAS VO information page are deployedtestbed status monitor ATLAS VO 5 Dataset 2000 input files are replicated to 5 sites each) Two production-style tests completed:  100 first partitions of the Dataset 2000 are processed  Other (smaller) datasets: 4 input files (ca 400 MB each) replicated to 4 sites; 250 jobs submitted, adjusted to run ca 4 hours each. The jobs were distributed across all the testbed by the Resource Broker

Summary EnvironmentSuccess/failure rate Job executionData management Testbed 1.2.0GASS Cache problems, 100% failureBig file replication fails (GridFTP timeout); no CASTOR support Testbed 1.2(.1) only CERN site is available, GASS Cache “unfixed” Half of the Dataset 2000 jobs are executed, 100% success Not applicable (only one site is used) Testbed All the core sites have GASS Cache “unfixed” 400 short jobs are executed across the testbed; the rest of the Dataset 2000 jobs proceeded with > 50% re- submission rate Short files are replicated everywhere; longer files are copied manually (GridFTP not fixed) Testbed 1.3 A.K.A. “The Showstopper” release To be tested (GASS Cache is expected to be fixed) To be tested (GridFTP is expected to be fixed)

What next  Testbed 1.3 is available for testing (not on production site yet) from today  Precise quantification of failure/success rate using Dataset 2000 partitions to be done on Testbed 1.3  ATLAS DC1, pile-up: the runtime environment is ready, scripts are prepared oTestbed feature: the “old” runtime environment (3.2.1) has to be replaced with a new one (4.0.1)  CASTOR-EDG interface has to be tested; GridFTP server on CASTOR is expected to arrive soon  Some ATLAS production sites may join the EDG Testbed soon

Ingo Augustin Vandy Berten Jean-Jacques Blaising Frederic Brochu Stephen Burke Serban Constantinescu Francois Etienne Michael Gardner Luc Goossens Marcus Hardt Frank Harris Fabio Hernandez Bob Jones Roger Jones Christos Kanellopoulos Andrey Kiryanov Peter Kunszt Emanuele Leonardi Cal Loomis Fairouz Malek-Ohlsson Gonzalo Merino Armin Nairz Giudo Negri Steve O'Neale Laura Perini Gilbert Poulard Alois Putzer Di Qing Mario Reale David Rebatto Zhongliang Ren Silvia Resconi Alessandro De Salvo Markus Schulz Massimo Sgaravatto Oxana Smirnova Chun Lik Tan Jeff Templon Stan Thompson Luca Vaccarossa Peter Watkins No animals were harmed in the production tests MMII