First ideas for a Resource Management Architecture for Productions Massimo Sgaravatto INFN Padova.

Slides:



Advertisements
Similar presentations
WP1 Grid Workload Management Massimo Sgaravatto INFN Padova
Advertisements

Installation and evaluation of the Globus toolkit WP 1 INFN-GRID Workload management WP 1 DATAGRID WP 2.1 INFN-GRID Massimo Sgaravatto INFN Padova.
INFN & Globus activities Massimo Sgaravatto INFN Padova.
Grid Workload Management (WP 1) Report to INFN-GRID TB Massimo Sgaravatto INFN Padova.
Work Package 1 Installation and Evaluation of the Globus Toolkit Massimo Sgaravatto INFN Padova.
Evaluation of the Globus Toolkit: Status Roberto Cucchi – INFN Cnaf Antonia Ghiselli – INFN Cnaf Giuseppe Lo Biondo – INFN Milano Francesco Prelz – INFN.
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
WP 1 Grid Workload Management Massimo Sgaravatto INFN Padova.
CMS HLT production using Grid tools Flavia Donno (INFN Pisa) Claudio Grandi (INFN Bologna) Ivano Lippi (INFN Padova) Francesco Prelz (INFN Milano) Andrea.
GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.
Status of Globus activities within INFN Massimo Sgaravatto INFN Padova for the INFN Globus group
LNL M.Biasotto, Bologna, 20 novembre Providing the Grid Information Service with information of local farms Massimo Biasotto – INFN LNL Massimo.
GRID DATA MANAGEMENT PILOT (GDMP) Asad Samar (Caltech) ACAT 2000, Fermilab October , 2000.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Globus activities within INFN Massimo Sgaravatto INFN Padova for the INFN Globus group
INFN-GRID Globus evaluation Massimo Sgaravatto INFN Padova for the INFN Globus group
Report on the INFN-GRID Globus evaluation Massimo Sgaravatto INFN Padova for the INFN Globus group
GRID Workload Management System Massimo Sgaravatto INFN Padova.
Globus activities within INFN Massimo Sgaravatto INFN Padova for the INFN Globus group
Workload Management Massimo Sgaravatto INFN Padova.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Status of Globus activities within INFN (update) Massimo Sgaravatto INFN Padova for the INFN Globus group
Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
INFN-GRID Globus evaluation (WP 1) Massimo Sgaravatto INFN Padova for the INFN Globus group
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
GRID The GRID distribution toolkit at INFN Flavia Donno (INFN Pisa) Andrea Sciaba` (INFN Pisa) Zhen Xie (INFN Pisa) presented by Massimo Sgaravatto (INFN.
Grids and Portals for VLAB Marlon Pierce Community Grids Lab Indiana University.
Grid Computing I CONDOR.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.
Part 6: (Local) Condor A: What is Condor? B: Using (Local) Condor C: Laboratory: Condor.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
Grid Workload Management Massimo Sgaravatto INFN Padova.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
Grid Compute Resources and Job Management. 2 Local Resource Managers (LRM)‏ Compute resources have a local resource manager (LRM) that controls:  Who.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Report from USA Massimo Sgaravatto INFN Padova. Introduction Workload management system for productions Monte Carlo productions, data reconstructions.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Part Five: Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.
Globus Toolkit Massimo Sgaravatto INFN Padova. Massimo Sgaravatto Introduction Grid Services: LHC regional centres need distributed computing Analyze.
Resource Management Task Report Thomas Röblitz 19th June 2002.
Proposal for a IS schema Massimo Sgaravatto INFN Padova.
Pilot Factory using Schedd Glidein Barnett Chiu BNL
Job Submission with Globus, Condor, and Condor-G Selim Kalayci Florida International University 07/21/2009 Note: Slides are compiled from various TeraGrid.
Report on the INFN-GRID Globus evaluation Massimo Sgaravatto INFN Padova for the INFN Globus group
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Summary from WP 1 Parallel Section Massimo Sgaravatto INFN Padova.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G.
JSS Job Submission Service Massimo Sgaravatto INFN Padova.
STAR Scheduling status Gabriele Carcassi 9 September 2002.
4/9/ 2000 I Datagrid Workshop- Marseille C.Vistoli Wide Area Workload Management Work Package DATAGRID project Parallel session report Cristina Vistoli.
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
CSF. © Platform Computing Inc CSF – Community Scheduler Framework Not a Platform product Contributed enhancement to The Globus Toolkit Standards.
First evaluation of the Globus GRAM service Massimo Sgaravatto INFN Padova.
Campus Grid Technology Derek Weitzel University of Nebraska – Lincoln Holland Computing Center (HCC) Home of the 2012 OSG AHM!
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
Job submission overview Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.
CE design report Luigi Zangrando
CREAM Status and plans Massimo Sgaravatto – INFN Padova
Resource access in the EGEE project Massimo Sgaravatto INFN Padova
Workload Management Workpackage
First proposal for a modification of the GIS schema
GWE Core Grid Wizard Enterprise (
Globus Job Management. Globus Job Management Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.
Wide Area Workload Management Work Package DATAGRID project
GRID Workload Management System for CMS fall production
Presentation transcript:

First ideas for a Resource Management Architecture for Productions Massimo Sgaravatto INFN Padova

First step GRAM CONDOR GRAM LSF GRAM PBS Submit jobs (using globusrun) Site1 Site2Site3

Overview GRAM as uniform interface to different resource management systems Job submission from a single location Users must explicitly specify in which Globus resources (Condor pool, LSF cluster, …) the jobs must be executed Usage of Globus tools (globusrun, globus-job- status, …) to “manage” the jobs Are these “robust” tools with all the required capabilities ???

Usage examples %globusrun –b –r lxpd.pd.infn.it/jobmanager-lsf –f file.rsl file.rsl: & (executable=$(CMS)/startcmsim.sh) (stdin=$(CMS)/Pythia/run.1) (stdout=$(CMS)/Cmsim/log.1) (count=1) (queue=cmsprod) %globusrun –b –r lxbo.bo.infn.it/jobmanager-condor –f file.rsl file.rsl: & (executable=$(CMS)/startcmsim.sh) (stdin=$(CMS)/Pythia/run.1) (stdout=$(CMS)/Cmsim/log.1) (count=1)

What has been tested so far INFN-GRID/Globus/gram-report.pdf Tests only with simple programs (just to evaluate the capabilities and functionalities) No tests with “real” applications No “stress tests” (to evaluate reliability, robustness, …) GRAM – LSF: tested Seems working

What has been tested so far GRAM – Condor: tested GRAM assumes that the underlying environment is a “uniform” Condor pool (in particular for Vanilla jobs) Difficult to consider the INFN WAN Condor pool as Globus resource Usage of local “uniform” Condor pools ??? GRAM – PBS: not tested

Second step GRAM CONDOR GRAM LSF GRAM PBS globusrun Site1 Site2Site3 Submit jobs (using condor_submit and Globus Universe) Personal Condor

Overview Personal Condor able to provide robustness and reliability Job submission from a single location Users still must explicitly specify in which Globus resources the jobs must be executed Usage of Condor interface and tools (condor_submit, condor_q, …) to “manage” the jobs “Robust” tools with all the required capabilities (monitor, logging, …)

Usage examples %condor_submit file.cnd file.cnd: Universe=globus executable=$(CMS)/startcmsim.sh input=$(CMS)/Pythia/run.1 output=$(CMS)/Cmsim/log.1 GlobusScheduler=lxpd.pd.infn.it/jobmanager-lsf queue 1 %condor_submit file.cnd file.cnd: Universe=globus executable=$(CMS)/startcmsim.sh input=$(CMS)/Pythia/run.1 output=$(CMS)/Cmsim/log.1 GlobusScheduler=lxbo.bo.infn.it/jobmanager-condor queue 1

Second step (option 2) CONDOR GRAM LSF GRAM PBS globusrun Site1 Site2Site3 Submit jobs (using condor_submit and Globus Universe) Personal Condor Flocking condor_submit

Second step (option 3) CONDOR GRAM LSF GRAM PBS globusrun Site1 Site2Site3 Submit jobs (using condor_submit and Globus Universe) Personal Condor condor_submit Single Condor Pool

Problems The Globus Universe architecture is only a prototype Only best effort support by Condor team Tests not completed Ongoing tests (considering the fork system call as underlying resource management system) Tests considering the Globus Universe and LSF or Condor as underlying resource management system have not yet been performed PBS Is it supported by the Globus Universe mechanisms ??? Do we need it ??

Third step GRAM CONDOR GRAM LSF GRAM PBS globusrun Site1 Site2Site3 condor_submit (Globus Universe) Personal Condor MasterGIS Submit jobs Resource Discovery Information on characteristics and status of local resources

Overview Master smart enough to decide in which Globus resources the jobs must be submitted The Master uses the information on characteristics and status of resources published in the GIS

Problems and work needed The Master doesn’t exist  We have to implement it It is necessary to define the GIS architecture The local GRAMs provide the GIS with not enough information  The default schema must be integrated

GRAM & Condor & GIS

GRAM & LSF & GIS

Fourth step Information on characteristics and status of local resources Data Catalog Site1 GRAM CONDOR GRAM LSF GRAM PBS globusrun Site2Site3 condor_submit (Globus Universe) Personal Condor MasterGIS Submit jobs Resource Discovery Data Discovery Data Mover