Globus activities within INFN Massimo Sgaravatto INFN Padova for the INFN Globus group

Slides:



Advertisements
Similar presentations
WP1 Grid Workload Management Massimo Sgaravatto INFN Padova
Advertisements

– n° 1 Review resources access policy, procedures, rules and challenges: The Italian experience and future challenges Antonia Ghiselli INFN-CNAF Workshop.
Installation and evaluation of the Globus toolkit WP 1 INFN-GRID Workload management WP 1 DATAGRID WP 2.1 INFN-GRID Massimo Sgaravatto INFN Padova.
INFN & Globus activities Massimo Sgaravatto INFN Padova.
WP 1 (Globus) Status Report Massimo Sgaravatto INFN Padova for the INFN Globus group
Work Package 1 Installation and Evaluation of the Globus Toolkit Massimo Sgaravatto INFN Padova.
INFN Testbed1 status L. Gaido, A. Ghiselli WP6 meeting CERN, 11 December 2001.
Deployment Team. Deployment –Central Management Team Takes care of the deployment of the release, certificates the sites and manages the grid services.
Evaluation of the Globus Toolkit: Status Roberto Cucchi – INFN Cnaf Antonia Ghiselli – INFN Cnaf Giuseppe Lo Biondo – INFN Milano Francesco Prelz – INFN.
A Computation Management Agent for Multi-Institutional Grids
WP 1 Grid Workload Management Massimo Sgaravatto INFN Padova.
CMS HLT production using Grid tools Flavia Donno (INFN Pisa) Claudio Grandi (INFN Bologna) Ivano Lippi (INFN Padova) Francesco Prelz (INFN Milano) Andrea.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.
Status of Globus activities within INFN Massimo Sgaravatto INFN Padova for the INFN Globus group
GRID DATA MANAGEMENT PILOT (GDMP) Asad Samar (Caltech) ACAT 2000, Fermilab October , 2000.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Massimo Cafaro GridLab Review GridLab WP10 Information Services Massimo Cafaro CACT/ISUFI University of Lecce, Italy.
INFN-GRID Globus evaluation Massimo Sgaravatto INFN Padova for the INFN Globus group
Report on the INFN-GRID Globus evaluation Massimo Sgaravatto INFN Padova for the INFN Globus group
GRID Workload Management System Massimo Sgaravatto INFN Padova.
Globus activities within INFN Massimo Sgaravatto INFN Padova for the INFN Globus group
Workload Management Massimo Sgaravatto INFN Padova.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Status of Globus activities within INFN (update) Massimo Sgaravatto INFN Padova for the INFN Globus group
First ideas for a Resource Management Architecture for Productions Massimo Sgaravatto INFN Padova.
Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.
INFN Testbed status report L. Gaido WP6 meeting CERN - October 30th, 2002.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
INFN-GRID Globus evaluation (WP 1) Massimo Sgaravatto INFN Padova for the INFN Globus group
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
GRID The GRID distribution toolkit at INFN Flavia Donno (INFN Pisa) Andrea Sciaba` (INFN Pisa) Zhen Xie (INFN Pisa) presented by Massimo Sgaravatto (INFN.
DATAGRID ConferenceTestbed0 - resources in Italy Luciano Gaido 1 DATAGRID WP6 Testbed0 resources in Italy Amsterdam March,
A. Cavalli - F. Semeria INFN Experience With Globus GIS 1 A. Cavalli - F. Semeria INFN First INFN Grid Workshop Catania, 9-11 April 2001 INFN Experience.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
11 December 2000 Paolo Capiluppi - DataGrid Testbed Workshop CMS Applications Requirements DataGrid Testbed Workshop Milano, 11 December 2000 Paolo Capiluppi,
Grid Workload Management Massimo Sgaravatto INFN Padova.
First attempt for validating/testing Testbed 1 Globus and middleware services WP6 Meeting, December 2001 Flavia Donno, Marco Serra for IT and WPs.
DataGrid Workshop Oxford, July 2-5 INFN Testbed status report Luciano Gaido 1 DataGrid Workshop INFN Testbed status report L. Gaido Oxford July,
Report from USA Massimo Sgaravatto INFN Padova. Introduction Workload management system for productions Monte Carlo productions, data reconstructions.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Globus Toolkit Massimo Sgaravatto INFN Padova. Massimo Sgaravatto Introduction Grid Services: LHC regional centres need distributed computing Analyze.
GRID Zhen Xie, INFN-Pisa, on DataGrid WP6 meeting1 Globus Installation Toolkit Zhen Xie On behalf of grid-release team INFN-Pisa.
THE INFN GRID PROJECT zScope: Study and develop a general INFN computing infrastructure, based on GRID technologies, to be validated (as first use case)
Proposal for a IS schema Massimo Sgaravatto INFN Padova.
Report on the INFN-GRID Globus evaluation Massimo Sgaravatto INFN Padova for the INFN Globus group
GRID The GRID distribution toolkit at INFN Flavia Donno (INFN Pisa) Andrea Sciaba` (INFN Pisa) Zhen Xie (INFN Pisa) presented by Massimo Sgaravatto (INFN.
Condor on WAN D. Bortolotti - INFN Bologna T. Ferrari - INFN Cnaf A.Ghiselli - INFN Cnaf P.Mazzanti - INFN Bologna F. Prelz - INFN Milano F.Semeria - INFN.
6 march Building the INFN Grid Proposal outline a.ghiselli,l.luminari,m.sgaravatto,c.vistoli INFN Grid meeting, milano.
Summary from WP 1 Parallel Section Massimo Sgaravatto INFN Padova.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G.
4/9/ 2000 I Datagrid Workshop- Marseille C.Vistoli Wide Area Workload Management Work Package DATAGRID project Parallel session report Cristina Vistoli.
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
M. Cristina Vistoli EGEE SA1 Organization Meeting EGEE is proposed as a project funded by the European Union under contract IST Regional Operations.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
Current Globus Developments Jennifer Schopf, ANL.
First evaluation of the Globus GRAM service Massimo Sgaravatto INFN Padova.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
Workload Management Workpackage
INFN – GRID status and activities
Installation toolkit and deployment of Globus in Pisa
Wide Area Workload Management Work Package DATAGRID project
GRID Workload Management System for CMS fall production
Presentation transcript:

Globus activities within INFN Massimo Sgaravatto INFN Padova for the INFN Globus group

Globus activities within INFN WP “Installation and Evaluation of the Globus Toolkit” of the INFN-GRID Project Goal: evaluate the Globus toolkit as a GRID framework providing basic services Which services can be useful ? What is necessary to integrate/modify ? What is missing ? Duration: 6 months Results of this first evaluation used to plan future activities

Tasks Security Information Service Resource Manager Globus deployment Data Access and Migration Fault Monitoring Execution Environment Management

Globus installed on ~ 35 machines in 11 sites TORINO PADOVA BARI PALERMO FIRENZE PAVIA MILANO GENOVA NAPOLI CAGLIARI TRIESTE ROMA PISA L’AQUILA CATANIA BOLOGNA UDINE TRENTO PERUGIA LNF LNGS SASSARI LECCE LNS LNL SALERNO COSENZA S.Piero FERRARA PARMA CNAF Status ROMA2

Security (GSI) Already done: Evaluation of the Globus security architecture We like the general architecture, but: Granting local "identities" based only on certificate subjects allows the existence of multiple valid certificates for the same subject Authentication library not in sync with OpenSSL development Cryptic diagnostics (e.g. "certificate chain too long" when the CA policy check fails) Globus certificates (for hosts and users) signed by INFN certification authority

Security (GSI) To do: Definition and implementation of architecture of CAs Up to task force of the DataGrid project Make certificate requests easier Periodic update of CRL “Management” of grid-mapfile updates I.e.: a certain Globus resource must be available to all members of a specific physics group

Information Service (GIS) Already done: INFN MDS server serving Globus and installations Lot of problems using the “default” American MDS server Definition and implementation of test architecture of GIS (for Globus 1.1.3) Web interface for browsing

Dc=bo, Dc=infn, dc=it,o=grid Bologna GIIS INFN ATLAS GIIS GIIS Dc=mi,Dc=infn, dc=it,o=grid Exp=atlas, o=grid Top Level INFN GIIS Dc=infn,dc=it, o=grid Milano GIS Architecture (test phase) GRIS Implemented Implemented using INFNGRID distribution To be implemented

Information Service (GIS) To do: Netscape LDAP server as Top level INFN GIIS Tests on performance and scalability Results used to define and implement the GIS architecture Review the information gathered from the various machines and published in the GIS Other tools and interfaces for Grid users and administrators

Resource Management (GRAM) Already done: Job submission tests using Globus tools (globusrun, globus-job-run, globus-job-submit) GRAM as uniform interface to different underlying resource management systems (LSF, Condor, PBS) Some bugs found and fixed Standard output and error for vanilla Condor jobs globus-job-status … Some bugs can be solved without major re-design and/or re- implementation: For LSF the RSL parameter (count=x) is translated into: bsub –n x … Should be: bsub … x times … Two major problems: Scalability Fault tolerance

Globus GRAM Architecture Client LSF/ Condor/ PBS/ … Globus front-end machine Jobmanager Job pc1% globusrun –b –r pc2.pd.infn.it/jobmanager-xyz \ –f file.rsl file.rsl: & (executable=/diskCms/startcmsim.sh) (stdin=/diskCms/PythiaOut/filename (stdout=/diskCms/Cmsim/filename) (count=1) pc1 pc2

Scalability One jobmanager for each globusrun If I want to submit 1000 jobs ??? 1000 globusrun  1000 jobmanagers running in the front-end machine !!! %globusrun –b –r pc2.infn.it/jobmanager-xyz –f file.rsl file.rsl: & (executable=/diskCms/startcmsim.sh) (stdin=/diskCms/PythiaOut/filename) (stdout=/diskCms/CmsimOut/filename) (count=1000) It is not possible to specify in the RSL file 1000 different input files and 1000 different output files … $(Process) in Condor Problems with job monitoring (globus-job-status) Therefore (count=x) with x>1 not very useful !

Fault tolerance The jobmanager is not persistent If the jobmanager can’t be contacted, Globus assumes that the job(s) has been completed Example of problem Submission of n jobs on a cluster managed by a local resource management systems Reboot of the front end machine The jobmanager(s) doesn’t restart Orphan jobs  Globus assumes that the jobs have been successfully completed

Resource Management (GRAM) Already done: Submission of Condor jobs to Globus resources (Condor-G and GlideIn mechanisms) Evaluation of RSL as uniform language to specify resources The RSL syntax model seems suitable to define even complicated resource specification expressions The common set of RSL attributes is often not sufficient The attributes not belonging to the common set are ignored More flexibility is required Resource administrators should be allowed to define new attributes and users should be allowed to use them in resource specification expressions (Condor Class-Ads model) Same language to describe the offered resources and the requested resources (Condor Class-Ads model) seems a better approach

Resource Management (GRAM) Already done: “Cooperation” between GRAM and GIS The information on characteristics and status of local resources and on jobs is not enough As local resources we must consider Farms and not the single workstations Other information (i.e. total and available CPU power) needed The default schema must be integrated with other info provided by the underlying resource management systems or by specific agents

GRAM & Condor & GIS

GRAM & LSF & GIS Must be fixed

Jobs & GIS Info on Globus jobs published in the GIS: User Subject of certificate Local user name RSL string Globus job id LSF/Condor/… job id Status: Run/Pending/…

Resource Management (GRAM) To do: Tests with GRAM API Tests with real applications and real environments (CMS fall production) Already started Memory leak in the job manager ?!?!?!? Solve the problems Identity a set of useful attributes of a Condor pool, LSF cluster, PBS cluster that should be reported to the GIS, and integrate the default schema Let’s start with information provided by the underlying resource management system Second step: specific agents

GRID Globus deployment Tools to enable local administrators to deploy the GRID software (now Globus and related packages: OpenLDAP, …) Reduce complexity and manpower necessary for installation Decrease errors during installations Collect bug fixes Include INFN customizations Certificates (for hosts and users) signed by INFN CA … but user certificates signed by Globus CA are accepted as well Preliminary architecture for GIS

GRID First step (July 2000) Software distribution available on AFS Fixes for bugs found during first Globus evaluations included INFNGRID installation guide Instructions for INFN customizations included Scripts to make certain steps (i.e. post- install operations) automatic

GRID Second step (now) Pre-compiled distribution (available now for Linux Red Hat 6.1): INFNGRID 1.1 Script for installation and deployment: infngrid-install Users decide to use INFN customizations or “standard” setup Would you like the INFN setup (Y/N) ? (1) Copy INFNGRID tar files from /afs/infn.it/project/infngrid/1.1/Linux to download dir (2) Decompress and untar INFNGRID distribution files in install dir (3) Configure INFNGRID software (4) Globus Setup (5) Configure GRAM services (6) Globus local deploy (7) GIIS Configuration ==================================================== Condor and LSF

GRID Second step Script for post install operations: globus-root-setup Installation instructions for special environments (configuration of client machines, shared install-directory) included List of included bug fixes Status Tests performed in different environments (INFN, CERN, FNAL) “Officially” released Available to DATAGRID partners (1)Modify system files and reactive the inetd daemon (2)Change owner to root of certain files for tighter security (3) Modify system wide login files (4)Start/restart Globus now (5)Configure gsi-wuftpd and restart the inetd daemon

GRID Next steps Configuration of PBS as local resource management systems: 1.2 Support for Solaris 2.6: 1.2 We don’t plan (at least now) to support other platforms Improvement of current no-precompiled distribution Eventual use of infngrid-istall script for both pre-compiled and non pre-compiled distribution “Unattended” installation Management of updates Inclusion of GDMP: 1.2 Inclusion of other GRID software packages ?? Other works will be “triggered” by local administrators and users

Data Management Already done: Preliminary tests with GASS and gsiftp To do: Tests with GlobusFTP and Replica Catalog Software (Globus Data Grid Alpha Release 2)

GARA Preliminary tests considering both network and CPU advance reservation Client GARA API GARA Network Resource Manager sunlab3 sunlab2 CISCO 7500CISCO 7200 VC 100 Mbps FE Server

Other tasks Fault Monitoring (HBM) Evaluation of HBM for fault detection (for “system” and “user” processes) Data collectors (implementing automatic recovery mechanisms) … but the HBM package is not seeing active development Execution Environment Management (GEM) Evaluation of GEM as service for code migration … but the GEM service now provides only limited capabilities (executable staging)

Other info INFN-GRID/Globus