GRID Workload Management System Massimo Sgaravatto INFN Padova.

Slides:



Advertisements
Similar presentations
Installation and evaluation of the Globus toolkit WP 1 INFN-GRID Workload management WP 1 DATAGRID WP 2.1 INFN-GRID Massimo Sgaravatto INFN Padova.
Advertisements

INFN & Globus activities Massimo Sgaravatto INFN Padova.
Grid Workload Management (WP 1) Report to INFN-GRID TB Massimo Sgaravatto INFN Padova.
WP 1 (Globus) Status Report Massimo Sgaravatto INFN Padova for the INFN Globus group
Work Package 1 Installation and Evaluation of the Globus Toolkit Massimo Sgaravatto INFN Padova.
Author - Title- Date - n° 1 GDMP The European DataGrid Project Team
Evaluation of the Globus Toolkit: Status Roberto Cucchi – INFN Cnaf Antonia Ghiselli – INFN Cnaf Giuseppe Lo Biondo – INFN Milano Francesco Prelz – INFN.
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
A Computation Management Agent for Multi-Institutional Grids
WP 1 Grid Workload Management Massimo Sgaravatto INFN Padova.
CMS HLT production using Grid tools Flavia Donno (INFN Pisa) Claudio Grandi (INFN Bologna) Ivano Lippi (INFN Padova) Francesco Prelz (INFN Milano) Andrea.
GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.
Status of Globus activities within INFN Massimo Sgaravatto INFN Padova for the INFN Globus group
LNL M.Biasotto, Bologna, 20 novembre Providing the Grid Information Service with information of local farms Massimo Biasotto – INFN LNL Massimo.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Globus activities within INFN Massimo Sgaravatto INFN Padova for the INFN Globus group
INFN-GRID Globus evaluation Massimo Sgaravatto INFN Padova for the INFN Globus group
Report on the INFN-GRID Globus evaluation Massimo Sgaravatto INFN Padova for the INFN Globus group
Globus activities within INFN Massimo Sgaravatto INFN Padova for the INFN Globus group
Workload Management Massimo Sgaravatto INFN Padova.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Status of Globus activities within INFN (update) Massimo Sgaravatto INFN Padova for the INFN Globus group
First ideas for a Resource Management Architecture for Productions Massimo Sgaravatto INFN Padova.
Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.
OSG End User Tools Overview OSG Grid school – March 19, 2009 Marco Mambelli - University of Chicago A brief summary about the system.
EDG - WP1 (Grid Work Scheduling) Status and plans Massimo Sgaravatto - INFN Padova Francesco Prelz – INFN Milano.
INFN-GRID Globus evaluation (WP 1) Massimo Sgaravatto INFN Padova for the INFN Globus group
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
GRID The GRID distribution toolkit at INFN Flavia Donno (INFN Pisa) Andrea Sciaba` (INFN Pisa) Zhen Xie (INFN Pisa) presented by Massimo Sgaravatto (INFN.
DATAGRID ConferenceTestbed0 - resources in Italy Luciano Gaido 1 DATAGRID WP6 Testbed0 resources in Italy Amsterdam March,
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Grids and Portals for VLAB Marlon Pierce Community Grids Lab Indiana University.
Job Submission Condor, Globus, Java CoG Kit Young Suk Moon.
Grid Computing I CONDOR.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
11 December 2000 Paolo Capiluppi - DataGrid Testbed Workshop CMS Applications Requirements DataGrid Testbed Workshop Milano, 11 December 2000 Paolo Capiluppi,
Grid Workload Management Massimo Sgaravatto INFN Padova.
DataGrid Workshop Oxford, July 2-5 INFN Testbed status report Luciano Gaido 1 DataGrid Workshop INFN Testbed status report L. Gaido Oxford July,
Report from USA Massimo Sgaravatto INFN Padova. Introduction Workload management system for productions Monte Carlo productions, data reconstructions.
Tarball server (for Condor installation) Site Headnode Worker Nodes Schedd glidein - special purpose Condor pool master DB Panda Server Pilot Factory -
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Globus Toolkit Massimo Sgaravatto INFN Padova. Massimo Sgaravatto Introduction Grid Services: LHC regional centres need distributed computing Analyze.
GRID Zhen Xie, INFN-Pisa, on DataGrid WP6 meeting1 Globus Installation Toolkit Zhen Xie On behalf of grid-release team INFN-Pisa.
Proposal for a IS schema Massimo Sgaravatto INFN Padova.
Pilot Factory using Schedd Glidein Barnett Chiu BNL
Report on the INFN-GRID Globus evaluation Massimo Sgaravatto INFN Padova for the INFN Globus group
GRID The GRID distribution toolkit at INFN Flavia Donno (INFN Pisa) Andrea Sciaba` (INFN Pisa) Zhen Xie (INFN Pisa) presented by Massimo Sgaravatto (INFN.
LSF Universus By Robert Stober Systems Engineer Platform Computing, Inc.
Summary from WP 1 Parallel Section Massimo Sgaravatto INFN Padova.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G.
EDG - WP1 (Grid Work Scheduling) Status and plans Massimo Sgaravatto INFN Padova.
JSS Job Submission Service Massimo Sgaravatto INFN Padova.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
4/9/ 2000 I Datagrid Workshop- Marseille C.Vistoli Wide Area Workload Management Work Package DATAGRID project Parallel session report Cristina Vistoli.
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
WP1 Status and plans Francesco Prelz, Massimo Sgaravatto 4 th EDG Project Conference Paris, March 6 th, 2002.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
First evaluation of the Globus GRAM service Massimo Sgaravatto INFN Padova.
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
Workload Management Workpackage
First proposal for a modification of the GIS schema
Installation toolkit and deployment of Globus in Pisa
Wide Area Workload Management Work Package DATAGRID project
GRID Workload Management System for CMS fall production
Condor-G Making Condor Grid Enabled
Presentation transcript:

GRID Workload Management System Massimo Sgaravatto INFN Padova

What do we want to implement (simplified design) Globus GRAM CONDOR Globus GRAM LSF Globus GRAM … globusrun Site1 Site2Site3 condor_submit (Globus Universe) Condor-G Master Grid Information Service (GIS) Submit jobs (using Class-Ads) Resource Discovery Information on characteristics and status of local resources Local Resource Management Systems Globus GRAM as uniform interface to different local resource management systems Condor-G able to provide reliability Use of Condor tools for job monitoring, logging, … Master chooses in which Globus resources the jobs must be submitted Farms

What can be implemented now (GWMS release 0) Globus GRAM CONDOR Globus GRAM LSF Globus GRAM … globusrun Site1 Site2Site3 condor_submit (Globus Universe) Condor-G Grid Information Service (GIS) Submit jobs Information on characteristics and status of local resources Local Resource Management Systems Globus GRAM as uniform interface to different local resource management systems Condor-G able to provide reliability Use of Condor tools for job monitoring, logging, … Farms Not very useful in this model

Overview Job management (submission, monitoring) from a single machine using Condor tools User must explicitly define in which Globus resource (which farm) the jobs must be submitted The applications and the input files must be stored in the file system of the executing machine The output files will be created in the file system of the executing machine We can try to have just the standard input and/or output and/or error files (useful to check the “status” of the production) in the submitting machine, using bypass and/or Globus GASS

Bypass vs. GASS Bypass Written by Douglas Thain (Condor team) Redirection of standard input/output/error of a program to a remote machine when the program is running Can be used for dynamically linked program Successfully tested with Pythia Use of Globus Security Infrastructure Globus GASS Possibility to copy the input file on the remote machine before the execution, and have the output file back after the execution (otherwise it is necessary to modify the source code)

Status of GWMS release 0 Tests on basic capabilities and functionalities have been performed Some tests with real applications (Pythia, CMSIM) performed No “stress” tests performed to evaluate scalability, reliability, … Problems with scalability and fault tolerance found (Globus jobmanager)

What is necessary for GWMS rel. 0 Local farms with shared file system between the various nodes Installation of proper experiment environment and applications on these farms Local resource management system to manage the local farm Fork Warmly thoughtless (even for a single machine) Necessary to install Globus on each machine Job queuing up to the production manager LSF Local Condor pool PBS Tests on Globus-PBS interaction must be completed (i.e. farm environment) Tests on Condor-G – Globus – PBS not performed yet Globus One installation per each farm (on a “visible” node) Installation using INFNGRID distribution

INFNGRID distribution Done by INFN GRID release team (F. Donno, A. Sciaba`, Z. Xie) Version 1.1 released !!! Precompiled version for Linux Red Hat 6.1 Scripts that make simpler and more “automatic” installation and deployment Supported local resource management system: LSF, Condor Possibility to implement INFN customizations Certificates “Test” GIS Architecture Installation instructions (

Certificates Use of personal certificates and host certificates signed by INFN CA User certificates signed by Globus CA are accepted as well By default it is not possible to “use” Globus resources outside INFN using personal certificates signed by INFN CA. Is this a problem ??? Workaround 1: Users have also personal certificates signed by Globus CA Workaround 2: “Small” modification in the Globus configuration of these resources outside INFN in order to accept “our” certificates too

Dc=bo, Dc=infn, dc=it,o=grid Bologna GIIS INFN ATLAS GIIS GIIS Dc=mi,Dc=infn, dc=it,o=grid Exp=atlas, o=grid Top Level INFN GIIS Dc=infn,dc=it, o=grid Milano GIS Architecture (test phase) GRIS Implemented Implemented using INFNGRID distribution To be implemented

INFNGRID distribution Next release Solaris 2.6 Support of PBS as local resource management system GDMP Other works, changes, bug fixes “triggered” by users/administrators Necessary to define relationship with DataGrid !!!

What is necessary Condor-G Used by the production manager to submit jobs Scripts to run productions using this GRID environment Tools to “monitor” production condor_q Condor Job Viewer Java GUI

(Some) next steps Tests with real applications and real environments CMS fall production Fix the problems Globus jobmanager Who, how, relations with Globus team, relations with Condor team ??? … GIS – ClassAds converter Globus team ??? Master implementation Who, how, … ??? The default GIS schema must be integrated with other info (the information on characteristics and status of local resources and on jobs is not enough) We need to identify which other info are necessary Much more clear during Master design Packaging ???