Download presentation
Presentation is loading. Please wait.
Published byAmerigo Valentino Costantini Modified over 5 years ago
1
GRID Workload Management System for CMS fall production
Massimo Sgaravatto INFN Padova
2
Preliminary remarks I am going to present only some PERSONAL ideas
I think that everything must be defined asap (now!) I am talking only for what concerning the workload management system
3
What do we want to implement (simplified design)
Resource Discovery Submit jobs (using Class-Ads) Master Grid Information Service (GIS) condor_submit (Globus Universe) Master chooses in which Globus resources the jobs must be submitted Information on characteristics and status of local resources Condor-G Condor-G able to provide reliability Use of Condor tools for job monitoring, logging, … globusrun Globus GRAM as uniform interface to different local resource management systems Globus GRAM Globus GRAM Globus GRAM Local Resource Management Systems CONDOR LSF … Site1 Farms Site2 Site3
4
What can be implemented now
Submit jobs Not very useful in this model Grid Information Service (GIS) condor_submit (Globus Universe) Information on characteristics and status of local resources Condor-G Condor-G able to provide reliability Use of Condor tools for job monitoring, logging, … globusrun Globus GRAM as uniform interface to different local resource management systems Globus GRAM Globus GRAM Globus GRAM Local Resource Management Systems CONDOR LSF … Site1 Farms Site2 Site3
5
Status Tests on basic capabilities and functionalities have been performed Problems with scalability and fault tolerance found CMS production useful exercise to test everything with real applications and real environments
6
CMS production Application: Pythia + Cmsim ? Overview
Job management (submission, monitoring) from a single machine using Condor tools User must explicitly define in which Globus resource (which farm) the jobs must be submitted The applications and the input files must be stored in the file system of the executing machine The output files will be created in the file system of the executing machine We can try to have just the standard output/error files (useful to check the “status” of the production) created in the submitting machine, using bypass and/or Globus GASS
7
What is necessary Local farms with shared file system between the various nodes Done using CMS installation toolkit Installation and support up to CMS/local administrators Installation of CMS environment on these farms Support up to CMS
8
What is necessary Local resource management system to manage the local farm LSF Installation and support up to CMS/local administrators We should define in a “common” way how to configure the queue/s where the jobs run. Who ??? Local Condor pool Installation and configuration (for “dedicated” machines) using CMS toolkit Support ??? PBS Are there sites where PBS will be used ??? Tests on Globus-PBS interaction must be completed (i.e. farm environment) Tests on Condor-G – Globus – PBS not performed yet Fork Warmly thoughtless (even for a single machine) Necessary to install Globus on each machine Job queuing up to the production manager
9
What is necessary Globus
One installation per each farm (on a “visible” node) Use of personal certificates and host certificates signed by INFN CA User certificates signed by Globus CA are accepted as well By default it is not possible to “use” Globus resources outside INFN using personal certificates signed by INFN CA Workaround 1: Users have also personal certificates signed by Globus CA Workaround 2: “Small” modification in the Globus configuration of these resources outside INFN in order to accept “our” certificates too Installation Installation done by CMS/local administrators/WP1 member (if present) using distribution and procedures provided by INFN GRID release team ( In case of problems: Is CMS going to include the Globus package in its installation toolkit ??? In case of problems: ???
10
What is necessary Condor-G
Just one installation, used by the production manager (Ivano Lippi ?) Installation and maintenance: Massimo Sgaravatto ??? Scripts to run CMS production using this GRID environment Up to CMS Run the production Up to production manager
11
Some items/actors missing ??? When ???
Relations with other activities ??? Data Management (GDMP, …) ??? ???
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.