Status of Globus activities within INFN Massimo Sgaravatto INFN Padova for the INFN Globus group
INFN WP “Installation and Evaluation of the Globus Toolkit” of the INFN-GRID Project (WP 1) Goal: evaluate the Globus toolkit as a GRID framework providing basic services Which services can be useful ? What is necessary to integrate/modify ? What is missing ? Duration: 6 months Results of this first evaluation used to plan future activities
Globus Project led by Ian Foster and Carl Kesselman Basic research on GRID (resource management, security, QoS,...) Development of Globus Toolkit Core services for GRID tools and applications
Globus Architecture Applications Core Services Metacomputing Directory Service GRAM Globus Security Interface Heartbeat Monitor Nexus Gloperf Local Services LSF CondorMPI NQEEasy TCP SolarisIrixAIX UDP High-level Services and Tools DUROCglobusrunMPINimrod/GMPI-IOCC++ GlobusViewTestbed Status GASS
Tasks Security To access GRID resources mechanisms for user authentication needed Evaluation of GSI service Information Service To discover the GRID resources (CPU, storage, network, …) mechanisms to “publish” them must be defined Analysis of GIS service to “publish” information using a uniform and standard interface Resource Management Necessary a uniform interface to submit jobs on GRID resources Uniform standard interface to different resource management systems Uniform standard language for task management Assessment of Globus GRAM service for resource allocation and process management
Tasks Data Access and Migration High performance and reliable tools needed to “manage” data (data transfers, wide area replica, …) Assessment of Globus tools for data management (GASS, Globusftp, Replica Management tools) Fault Monitoring Faults in a GRID environment must be promptly detected and recovery mechanisms must be implemented Evaluation of HBM service for fault detection Execution Environment Management Code migration (moving the application where the job will actually be executed) as a possible implementation strategy Evaluation of GEM service to support code migration Globus deployment Reduce complexity and manpower for Globus installation and maintenance
Globus installed on ~ 30 machines in 11 sites TORINO PADOVA BARI PALERMO FIRENZE PAVIA MILANO GENOVA NAPOLI CAGLIARI TRIESTE ROMA PISA L’AQUILA CATANIA BOLOGNA UDINE TRENTO PERUGIA LNF LNGS SASSARI LECCE LNS LNL SALERNO COSENZA S.Piero FERRARA PARMA CNAF Status ROMA2
Security (GSI) Already done: Evaluation of the Globus security architecture We like the “one time login” paradigm, but some improvements needed Globus certificates (for hosts and users) signed by INFN certification authority On-going activities: Definition and implementation of architecture of CAs Up to task force of the European DataGrid project Periodic update of CRL “Management” of grid-mapfile (where the mappings between local users and GRID users are defined) updates I.e.: a certain Globus resource must be available to all members of a specific physics group
Information Service (GIS) Already done: INFN MDS server serving Globus and installations (single LDAP server) Lot of problems using the “default” American MDS server Definition and implementation of test architecture of GIS for Globus installations (distributed model) Web interface for browsing On-going activities: Improvement of performance (Netscape LDAP server as top level GIIS) Tests on performance and scalability Results used to define and implement the GIS architecture Review the information gathered from the various machines and published in the GIS
Dc=bo, Dc=infn, dc=it,o=grid Bologna GIIS INFN ATLAS GIIS GIIS Dc=mi,Dc=infn, dc=it,o=grid Exp=atlas, o=grid Top Level INFN GIIS Dc=infn,dc=it, o=grid Milano GIS Architecture (test phase) GRIS Implemented Implemented using INFNGRID distribution To be implemented
Resource Management (GRAM) Already done: Job submission tests using Globus tools with real applications and in real production environments (GRAM as uniform interface to different underlying resource management systems [LSF, Condor, PBS]) Some bugs found and fixed Many many memory leaks !!! … Some bugs can be solved without major re-design and/or re-implementation Two major problems: Scalability Fault tolerance Submission of Condor jobs to Globus resources (Condor-G and GlideIn) Evaluation of RSL as uniform language to specify resources More flexibility is required Resource administrators should be allowed to define new attributes and users should be allowed to use them in resource specification expressions (Condor Class-Ads model) Cooperation” between GRAM and GIS The information on characteristics and status of local resources and on jobs is not enough (as local resources we must consider Farms) The default schema must be integrated with other info provided by the underlying resource management systems or by specific agents
Resource Management (GRAM) On going activities: Tests with GRAM API Identity a set of useful attributes of a Condor pool, LSF cluster, PBS cluster that should be reported to the GIS, and integrate the default schema Tests with MPICH-G2
Globus deployment Already done: INFN-GRID 1.0 Non-precompiled Globus bug fixes Installation instructions (in particular for INFN customizations) INFN-GRID 1.1 Precompiled Globus for Linux Red Hat 6.x gsiwuftpd Support for LSF and Condor as underlying resource management systems Possibility to implement INFN customizations Certificates signed by INFN CA Preliminary architecture for GIS Installation instructions INFN-GRID 1.2 Besides INFN-GRID 1.1’s functionalities Support for Solaris 2.6 Support for PBS as resource management system Support for GDMP (for Linux) Tool to upgrade INFN-GRID 1.1 INFN-GRID 1.2 Installation instructions
Globus deployment On-going activities: Web software repository INFN-GRID 1.3 Fixes for Globus jobmanager memory leaks Support for Solaris 7 Full support for GDMP Distribution of various Globus compilations (Kerberos, MPICH-G2) INFN-GRID toolkit available to DataGrid partners Globus team interested to this toolkit
Data Management Already done: Preliminary tests with GASS and gsiftp To do: Tests with GlobusFTP and Replica Catalog Software (Globus Data Grid Alpha Release 2)
Other tasks Fault Monitoring (HBM) Evaluation of HBM for fault detection (for “system” and “user” processes) Data collectors (implementing automatic recovery mechanisms) … but the HBM package is not seeing active development Execution Environment Management (GEM) Evaluation of GEM as service for code migration … but the GEM service now provides only limited capabilities (executable staging)
Other info