Testing the HEPCAL use cases J.J. Blaising, F. Harris, Andrea Sciabà GAG Meeting April, 17 2003.

Slides:



Advertisements
Similar presentations
DataTAG WP4 Meeting CNAF Jan 14, 2003 Interfacing AliEn and EDG 1/13 Stefano Bagnasco, INFN Torino Interfacing AliEn to EDG Stefano Bagnasco, INFN Torino.
Advertisements

ATLAS/LHCb GANGA DEVELOPMENT Introduction Requirements Architecture and design Interfacing to the Grid Ganga prototyping A. Soroko (Oxford), K. Harrison.
EGEE is a project funded by the European Union under contract IST EGEE Middleware Requirements Are we happy with HEPCAL? Stephen Burke, CCLRC.
EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
1 CHEP 2000, Roberto Barbera Tests of data management services in EDG 1.2 ALICE Off-line Week,
Réunion DataGrid France, Lyon, fév CMS test of EDG Testbed Production MC CMS Objectifs Résultats Conclusions et perspectives C. Charlot / LLR-École.
The Grid Constantinos Kourouyiannis Ξ Architecture Group.
Job Submission The European DataGrid Project Team
Workload Management meeting 07/10/2004 Federica Fanzago INFN Padova Grape for analysis M.Corvo, F.Fanzago, N.Smirnov INFN Padova.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
Basic Grid Job Submission Alessandra Forti 28 March 2006.
1 Use of the European Data Grid software in the framework of the BaBar distributed computing model T. Adye (1), R. Barlow (2), B. Bense (3), D. Boutigny.
1 LHC requirements for GRID middleware F.Carminati, P.Cerello, C.Grandi, O.Smirnova, J.Templon, E.Van Herwijnen CHEP 2003 La Jolla, March 24-28, 2003.
The ATLAS Production System. The Architecture ATLAS Production Database Eowyn Lexor Lexor-CondorG Oracle SQL queries Dulcinea NorduGrid Panda OSGLCG The.
ILDG5QCDgrid1 QCDgrid status report UKQCD data grid Chris Maynard.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol.
RISICO on the GRID architecture First implementation Mirko D'Andrea, Stefano Dal Pra.
Computational grids and grids projects DSS,
Nick Brook Current status Future Collaboration Plans Future UK plans.
Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 Plans for the integration of grid tools in the CMS computing environment Claudio.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
LCG Middleware Testing in 2005 and Future Plans E.Slabospitskaya, IHEP, Russia CERN-Russia Joint Working Group on LHC Computing March, 6, 2006.
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
DIRAC Review (13 th December 2005)Stuart K. Paterson1 DIRAC Review Exposing DIRAC Functionality.
MAGDA Roger Jones UCL 16 th December RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
SkimData and Replica Catalogue Alessandra Forti BaBar Collaboration Meeting November 13 th 2002 skimData based replica catalogue RLS (Replica Location.
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.
14-May-2003 AWG FH, JT, JJB DataGrig Barcelona 1 HEP GRID use cases Common GRID use cases F.Harris, J.Templon, J.J Blaising.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
The GridPP DIRAC project DIRAC for non-LHC communities.
Oxana Smirnova LCG/ATLAS/Lund September 3, 2002, Budapest 5 th EU DataGrid Conference ATLAS-EDG Task Force status report.
Data Management The European DataGrid Project Team
Development of test suites for the certification of EGEE-II Grid middleware Task 2: The development of testing procedures focused on special details of.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
Data Management The European DataGrid Project Team
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
User Interface UI TP: UI User Interface installation & configuration.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
The GridPP DIRAC project DIRAC for non-LHC communities.
WMS baseline issues in Atlas Miguel Branco Alessandro De Salvo Outline  The Atlas Production System  WMS baseline issues in Atlas.
Istituto Nazionale di Astrofisica Information Technology Unit INAF-SI Job with data management Giuliano Taffoni.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
Claudio Grandi INFN Bologna CSN1 - Perugia 11/11/2002 Gli esperimenti LHC hanno qualcosa in comune? (HEPCAL RTAG di LCG) C. Grandi INFN - Bologna.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
The Data Grid: Towards an architecture for Distributed Management
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
Joint JRA1/JRA3/NA4 session
Introduction to Grid Technology
5. Job Submission Grid Computing.
Stephen Burke, PPARC/RAL Jeff Templon, NIKHEF
Data services in gLite “s” gLite and LCG.
EGEE Middleware: gLite Information Systems (IS)
Presentation transcript:

Testing the HEPCAL use cases J.J. Blaising, F. Harris, Andrea Sciabà GAG Meeting April,

Introduction  The LHC experiments have defined their common use cases for production in HEPCAL  It is desirable to have a tool to test the available implementations of these use cases  For example, this tool could simulate a Monte Carlo production  Eventually, this tool should try to implement as many HEPCAL use cases as possible

D8.3 analysis of 43 HEPCAL use cases for EDG Classification : ‘not implemented’ / ‘partially implemented’ / ‘mostly implemented’ / ‘implemented’  General (authorisation, login, browse resources)4 use cases  Fully2  Almost1  Partially1  Data Management (metadata and data operations)19 use cases  Fully4  Almost3  Partially3  No9 (includes metadata, virtual data…)

 Job Management (submission, control, monitoring, error, resource estimation, job splitting…) 15 use cases  Fully4  Almost3  Partially3  No5 (includes job catalogues, job splitting)  VO and software environment Management (resource reservation, user rights, conditions publishing, software publishing…) 5 use cases  Almost1  Partially1  No3 (VO Resource handling,condition publishing)  Global fraction in implemented use cases: 60%

Use cases not satisfied by EDG 1.4 Use caseEDG 2.0EDG 2.1 DS Metadata UpdateLFN attributes (strings)general attributes DS Metadata AccessLFN attributes (strings)general attributes Virtual Dataset Declarationout of scope Virtual Dataset Materialisationout of scope User-Defined Catalogue Creationnounknown Dataset Access Cost Evaluationget_accessCost() Data Retrieval from Remote Datasetnomaybe Dataset Verificationnounsure Browse Experiment Databaseout of scope Job Catalogue Updateuser defined attributes stored in L&B Job Catalogue Queryuser defined attributes stored in L&B Error Recovery for Failed Production Jobscheckpointing Job SplittingnoDAG support Analysis Iout of scope Conditions Publishingno VO Wide Resource Reservationnomaybe VO Wide Resource Allocation to Usersnomaybe (Thanks to E. Laure)

This test suite  Main purpose: to simulate a Monte Carlo production on an EDG-based Grid enviroment  Inspired by the test suite written by J. J. Blaising  Implemented in Perl  requires Getopt::Long, Pod::Usage, Net::LDAP, Term::Readkey  Tested on the EDG application testbed and on LCG-0  Supports both LDAP RC and RLS  Available from the EDG CVS repository 

Functionalities  Job submission  arbitrary job duration  arbitrary output data size  match making with input data (for two-stage productions)  Data management  output retrieval with sandbox  supported protocols: GridFTP  copy and registration of input and output files  replication and deletion of files

Covered use cases Grid login Dataset registration Dataset upload Dataset access Dataset replication (not tested) Dataset deletion Dataset browsing Job submission Job output access or retrieval Steer job submission (not tested) Production job Job monitoring Simulation job 30% of the HEPCAL use cases 50% of the HEPCAL use cases at least partially implemented Obtain Grid authorization Ask for revocation of Grid authorization Browse Grid resources Dataset transfer to non-Grid storage Dataset replica upload to the Grid Physical dataset instance deletion Catalogue deletion Job control Job resource estimation Job environment modification Data transformation Software publishing Experiment software development for the Grid CoveredNot covered Not scriptable “Trivial” Barely implemented

Commands  Test.exe  represents the event generator  performs an amount of useless computations proportional to and creates a file with size equal to bytes  prod_submit  submits a bunch of jobs following the directives in the cards file  creates a *.db file which will contain all the information about the jobs in the “production”

Other commands  prod_status  updates the database with the most recent information about the jobs and prints the status and the CE of each job  prod_getout  retrieves the output of finished jobs  prod_summary  prints a summary of the production  prod_delete  deletes from the SE and from the replica catalog the files of the production

Card file parametres VariableDefinition OUTPUT_DIR directory where to store the outputs DATASET dataset name INIT_RUN first run number RUNS number of runs (=jobs) EVENTS number of events (  CPU time) DATA_NAME output file name (the run no. is appended) INPUT_DATA LFN of the input file (the run no. is appended) DATA_SIZE size of the output file OUT_SANDBOX YES if the output file must the returned in the sandbox COPY2SE YES if the output file must be copied to a SE OUTPUT_SE CLOSE for the close SE, otherwise the SE host name LFN_DIR “directory” where to store the file (part of the LFN) REPLICATION YES if the output file must be also replicated elsewhere REP_SE host name of the SE where to replicate the output file VO virtual organization The rc.conf file is embedded in the card file

Output analysis  This information is collected:  success/failure of the dg-job-* commands (submission, status, output retrieval)  success/failure of every relevant command in the job script (e.g. the replica manager commands)  CE name, WN name, SE name, storage directory, LFN of the output file  duration of the job

Present issues  Fragile: no consistency and syntax checks on the config file  a job can require only up to one input file  the replication is not tested  additional requirements in the JDL not (yet) available  Poor documentation

EDG 2.0/2.1 expectations  With EDG 2.0, the fraction of supported use cases should be (76.7 ± 4.7) %  With EDG 2.1, the fraction of supported use cases should be (85 ± 8) %

Further development  Make it compatible with EDG2/LCG-1 as soon as possible  Add new use cases and new tests when possible  find new manpower in LCG  help from the loose cannons welcome (but available only until EDG ends)