GRID Workload Management System for CMS fall production

Slides:



Advertisements
Similar presentations
Installation and evaluation of the Globus toolkit WP 1 INFN-GRID Workload management WP 1 DATAGRID WP 2.1 INFN-GRID Massimo Sgaravatto INFN Padova.
Advertisements

INFN & Globus activities Massimo Sgaravatto INFN Padova.
Work Package 1 Installation and Evaluation of the Globus Toolkit Massimo Sgaravatto INFN Padova.
LNL CMS M.Biasotto, Bologna, 29 aprile LNL Analysis Farm Massimo Biasotto - LNL.
Evaluation of the Globus Toolkit: Status Roberto Cucchi – INFN Cnaf Antonia Ghiselli – INFN Cnaf Giuseppe Lo Biondo – INFN Milano Francesco Prelz – INFN.
WP 1 Grid Workload Management Massimo Sgaravatto INFN Padova.
CMS HLT production using Grid tools Flavia Donno (INFN Pisa) Claudio Grandi (INFN Bologna) Ivano Lippi (INFN Padova) Francesco Prelz (INFN Milano) Andrea.
GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.
Status of Globus activities within INFN Massimo Sgaravatto INFN Padova for the INFN Globus group
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Sun Grid Engine Grid Computing Assignment – Fall 2005 James Ruff Senior Department of Mathematics and Computer Science Western Carolina University.
Globus activities within INFN Massimo Sgaravatto INFN Padova for the INFN Globus group
INFN-GRID Globus evaluation Massimo Sgaravatto INFN Padova for the INFN Globus group
Report on the INFN-GRID Globus evaluation Massimo Sgaravatto INFN Padova for the INFN Globus group
GRID Workload Management System Massimo Sgaravatto INFN Padova.
Globus activities within INFN Massimo Sgaravatto INFN Padova for the INFN Globus group
Workload Management Massimo Sgaravatto INFN Padova.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Status of Globus activities within INFN (update) Massimo Sgaravatto INFN Padova for the INFN Globus group
First ideas for a Resource Management Architecture for Productions Massimo Sgaravatto INFN Padova.
Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.
DIRAC API DIRAC Project. Overview  DIRAC API  Why APIs are important?  Why advanced users prefer APIs?  How it is done?  What is local mode what.
INFN-GRID Globus evaluation (WP 1) Massimo Sgaravatto INFN Padova for the INFN Globus group
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
GRID The GRID distribution toolkit at INFN Flavia Donno (INFN Pisa) Andrea Sciaba` (INFN Pisa) Zhen Xie (INFN Pisa) presented by Massimo Sgaravatto (INFN.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Grids and Portals for VLAB Marlon Pierce Community Grids Lab Indiana University.
Job Submission Condor, Globus, Java CoG Kit Young Suk Moon.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
Grid Workload Management Massimo Sgaravatto INFN Padova.
The ALICE short-term use case DataGrid WP6 Meeting Milano, 11 Dec 2000Piergiorgio Cerello 1 Physics Performance Report (PPR) production starting in Feb2001.
Report from USA Massimo Sgaravatto INFN Padova. Introduction Workload management system for productions Monte Carlo productions, data reconstructions.
Globus Toolkit Massimo Sgaravatto INFN Padova. Massimo Sgaravatto Introduction Grid Services: LHC regional centres need distributed computing Analyze.
Review of Condor,SGE,LSF,PBS
Condor Project Computer Sciences Department University of Wisconsin-Madison Grids and Condor Barcelona,
Proposal for a IS schema Massimo Sgaravatto INFN Padova.
Pilot Factory using Schedd Glidein Barnett Chiu BNL
Report on the INFN-GRID Globus evaluation Massimo Sgaravatto INFN Padova for the INFN Globus group
GRID The GRID distribution toolkit at INFN Flavia Donno (INFN Pisa) Andrea Sciaba` (INFN Pisa) Zhen Xie (INFN Pisa) presented by Massimo Sgaravatto (INFN.
Summary from WP 1 Parallel Section Massimo Sgaravatto INFN Padova.
JSS Job Submission Service Massimo Sgaravatto INFN Padova.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
4/9/ 2000 I Datagrid Workshop- Marseille C.Vistoli Wide Area Workload Management Work Package DATAGRID project Parallel session report Cristina Vistoli.
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
EGEE 3 rd conference - Athens – 20/04/2005 CREAM JDL vs JSDL Massimo Sgaravatto INFN - Padova.
CSF. © Platform Computing Inc CSF – Community Scheduler Framework Not a Platform product Contributed enhancement to The Globus Toolkit Standards.
First evaluation of the Globus GRAM service Massimo Sgaravatto INFN Padova.
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
CREAM Status and plans Massimo Sgaravatto – INFN Padova
Workload Management Workpackage
First proposal for a modification of the GIS schema
U.S. ATLAS Grid Production Experience
Peter Kacsuk – Sipos Gergely MTA SZTAKI
Workload Management System
GWE Core Grid Wizard Enterprise (
BOSS: the CMS interface for job summission, monitoring and bookkeeping
Artem Trunov and EKP team EPK – Uni Karlsruhe
Building Grids with Condor
CMS report from FNAL demo week Marco Verlato (INFN-Padova)
Francesco Giacomini – INFN JRA1 All-Hands Nikhef, February 2008
Globus Job Management. Globus Job Management Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.
Genre1: Condor Grid: CSECCR
Wide Area Workload Management Work Package DATAGRID project
Job Application Monitoring (JAM)
Condor-G Making Condor Grid Enabled
Grid Computing Software Interface
Condor-G: An Update.
Presentation transcript:

GRID Workload Management System for CMS fall production Massimo Sgaravatto INFN Padova

Preliminary remarks I am going to present only some PERSONAL ideas I think that everything must be defined asap (now!) I am talking only for what concerning the workload management system

What do we want to implement (simplified design) Resource Discovery Submit jobs (using Class-Ads) Master Grid Information Service (GIS) condor_submit (Globus Universe) Master chooses in which Globus resources the jobs must be submitted Information on characteristics and status of local resources Condor-G Condor-G able to provide reliability Use of Condor tools for job monitoring, logging, … globusrun Globus GRAM as uniform interface to different local resource management systems Globus GRAM Globus GRAM Globus GRAM Local Resource Management Systems CONDOR LSF … Site1 Farms Site2 Site3

What can be implemented now Submit jobs Not very useful in this model Grid Information Service (GIS) condor_submit (Globus Universe) Information on characteristics and status of local resources Condor-G Condor-G able to provide reliability Use of Condor tools for job monitoring, logging, … globusrun Globus GRAM as uniform interface to different local resource management systems Globus GRAM Globus GRAM Globus GRAM Local Resource Management Systems CONDOR LSF … Site1 Farms Site2 Site3

Status Tests on basic capabilities and functionalities have been performed Problems with scalability and fault tolerance found CMS production useful exercise to test everything with real applications and real environments

CMS production Application: Pythia + Cmsim ? Overview Job management (submission, monitoring) from a single machine using Condor tools User must explicitly define in which Globus resource (which farm) the jobs must be submitted The applications and the input files must be stored in the file system of the executing machine The output files will be created in the file system of the executing machine We can try to have just the standard output/error files (useful to check the “status” of the production) created in the submitting machine, using bypass and/or Globus GASS

What is necessary Local farms with shared file system between the various nodes Done using CMS installation toolkit Installation and support up to CMS/local administrators Installation of CMS environment on these farms Support up to CMS

What is necessary Local resource management system to manage the local farm LSF Installation and support up to CMS/local administrators We should define in a “common” way how to configure the queue/s where the jobs run. Who ??? Local Condor pool Installation and configuration (for “dedicated” machines) using CMS toolkit Support ??? PBS Are there sites where PBS will be used ??? Tests on Globus-PBS interaction must be completed (i.e. farm environment) Tests on Condor-G – Globus – PBS not performed yet Fork Warmly thoughtless (even for a single machine) Necessary to install Globus on each machine Job queuing up to the production manager

What is necessary Globus One installation per each farm (on a “visible” node) Use of personal certificates and host certificates signed by INFN CA User certificates signed by Globus CA are accepted as well By default it is not possible to “use” Globus resources outside INFN using personal certificates signed by INFN CA Workaround 1: Users have also personal certificates signed by Globus CA Workaround 2: “Small” modification in the Globus configuration of these resources outside INFN in order to accept “our” certificates too Installation Installation done by CMS/local administrators/WP1 member (if present) using distribution and procedures provided by INFN GRID release team (http://www.pi.infn.it/GRID/GRID_INST_1.1.html) In case of problems: globus@infn.it Is CMS going to include the Globus package in its installation toolkit ??? In case of problems: ???

What is necessary Condor-G Just one installation, used by the production manager (Ivano Lippi ?) Installation and maintenance: Massimo Sgaravatto ??? Scripts to run CMS production using this GRID environment Up to CMS Run the production Up to production manager

Some items/actors missing ??? When ??? Relations with other activities ??? Data Management (GDMP, …) ??? ???