Job Submission M. Jouvin (LAL-Orsay) jouvin@lal.in2p3.fr Grid Administration Training LAL, Orsay, September 2008, 15-19
Job Submission - M. Jouvin Agenda Main components Main commands Job Description Language (JDL) Advanced Features Summary Job Submission - M. Jouvin 13/07/2019
Workload Management System Responsible for optimizing resource usage and execute user jobs as quickly as possible Uses several components: UI (User Interface) : access point for users WMS : grid resource broker, responsible to select the most appropriate resources to run user jobs Formerly called RB (Resource Broker) Name confusing with the Workload Management global service LB (Logging and Bookeeping) : store information about all the successive states a job goes through BDII (Information System) : real-time view of grid resources available and their state Job Submission - M. Jouvin 13/07/2019
Job Submission Workflow User Interface Resource Broker Information System Replica Catalogs 1. submit 2. query 3. query 4. submit 5. retrieve 6. retrieve publish status User Interface WMS Catalog Storage Element Computing Site 1 Site 2 0. create proxy Job Submission - M. Jouvin 13/07/2019
Job Submission - M. Jouvin User Interface UI : host “outside” grid installed with grid client tools and interfaces allowing a user to interact with grid Get a proxy Submit and manage jobs Transfer and manage data Job submission commands : glite-wms-job-xxx Legacy service LCG RB (now decommissioned) was using other commands : edg-job-xxx edg-job-xxx commands are incompatible with WMS and glite- wms-job-xxx commands are incompatible with LCG RB Options are basically the same, except for new ones glite-job-xxx (intermediate generation) commands no longer supported Job Submission - M. Jouvin 13/07/2019
Job Submission - M. Jouvin Main Commands glite-wms-job-submit [-d|-a] (edg-job-submit) Submit a job Retourns the jobID glite-wms-job-status (edg-job-status) Returns job status glite-wms-job-output (edg-job-get-output) Retrieve from WMS files specified in OutputSandbox attribute of JDL glite-wms-job-cancel (edg-job-cancel) Cancel job glite-wms-job-delegation-proxy –d identifier Create a delegation proxy to use when submitting a job glite-wms-job-list-match (edg-job-list-match) List resources compatible with requirements in job description Does matchmaking without submitting the job glite-wms-job-logging-info (edg-job-get-logging-info) Returns information about sucessive state of jobs with detailed information about problems C, C++ and Java APIs available for all these features Job Submission - M. Jouvin 13/07/2019
Job Submission - M. Jouvin Job submission uses a job description file (JDL) An application is never directly submitted Application may be pre-installed on the grid or transmitted with the job description Job submission command returns a job identifier Important to save to be able to retrieve job status and output Option ‘–o file’ allows to write job identifier in a file Use option ‘-I file’ with the same file with other commands File may contain a list of jobid (it is not overwritten) Proxy delegation required to interact with WMS (WMProxy). 2 modes available: Automatic : option –a at submission time Explicit : glite-wms-job-delegate-proxy –d identifier Use same ‘-d identifier’ at submission time Much more efficient when submitting a lot of jobs Job Submission - M. Jouvin 13/07/2019
Job Monitoring and Output Status of a running job can be display with command glite-wms-job-status Use option ‘–i jobids_file’ if jobid has been registered into a file at submission (glite-wms-job-submit -o) ‘watch –n seconds.. glite-wms-job-status’ for a refreshed view Don’t use too short intervals (minimum 30s) Don’t use for long jobs or for many jobs at once ‘--all’ allow to see all one’s jobs Detailed information status: glite-wms-job-logging-info [-v 2] Output retrieval: glite-wms-job-output stdout, stderr, output sandbox : all files listed in OutputSandbox User must retrieve job outputs : cannot be sent to him Only possible after the job has “successfully” completed Output sandbox generally kept 3 weeks on WMS Job Submission - M. Jouvin 13/07/2019
Job Submission - M. Jouvin JDL File Example JDL : Job Description Language Application and its arguments Input and output files « Requirements » et « Rank » Syntax based on «Condor ClassAd » Executable = “gridTest”; StdError = “stderr.log”; StdOutput = “stdout.log”; InputSandbox = {“/home/joda/test/gridTest”}; OutputSandbox = {“stderr.log”, “stdout.log”}; InputData = “lfn:testbed0-00019”; DataAccessProtocol = “gridftp”; Requirements = other.Architecture==“INTEL” && other.OpSys==“LINUX” && other.FreeCpus >=4; Rank = “other.GlueHostBenchmarkSF00”; Job attributes Data attributes Resource attributes Job Submission - M. Jouvin 13/07/2019
Job Submission - M. Jouvin JDL Attributes Job attributes Define job and its parameters Data and resource attributes Used by WMS for resource (CE) match making Different types of supported resources Computing resources (based on BDII) Software resources (tags) (based on BDII) Data location (based on file catalogs, e.g. LFC) Job Submission - M. Jouvin 13/07/2019
Job Submission - M. Jouvin Job Definition Executable (mandatory) Application/command name Arguments (optional) Arguments to pass to application on command line StdInput, StdOutput, StdError (optional) File name for job standard input/output/error Environment (optional) Environment variable to define before running application InputSandbox (optional) List of files local to UI that must be sent with the job to the CE OutputSandbox (optional) List of job output files that must be retrieved from CE and staged on WMS for later retrieval by user with glite-wms-job-output Generally size limit between 10 and 100 MB (site dependent) Files generally kept on WMS 3 weeks (site dependent) Job Submission - M. Jouvin 13/07/2019
Job Submission - M. Jouvin Resource Attributes Requirements Criteria to select the appropriate CE for running the job May use either resources published into BDII or LFC (file catalog) file names/guids or physical file names (SURL) A default requirement is defined in each UI (site specific) and is used if none specified by the user LCG CE : requirements are used only by WMS and not passed to the CE/batch scheduler Will change with new generation of CE, CREAM CE, about to be released Rank Criteria to sort resources (CE) matching requirements Based on resources published into BDII, not on LFC names Default ranking rules defined in the UI (site dependent) and used when non explicitly specified by the user Job Submission - M. Jouvin 13/07/2019
Job Submission - M. Jouvin Data Attributes InputData (optional) List data files that must be present on a CE “close SE” for the job to run Can be a LFN (LFC name) or a PFN (SE name called SURL) A close SE is a SE published into BDII as having good connection with CE (based on declaration, not on tests) Different from InputSandbox : no file transfer will occur for these files DataAccessProtocol (mandatory si InputData specified) A list of transfer protocol the application may use to access the files listed in InputData (e.g. gsiftp, rfio, dcap…) OutputSE and OutputData have been deprecated and must not be used Job Submission - M. Jouvin 13/07/2019
Job Submission - M. Jouvin Job Collections… Goal: submit many jobs at once to improve submission performance Arbitrary collections : 1 JDL per job Option ‘--collection directory’ DAG (Direct Acyclic Graph) : chained jobs Option ‘--dag directory’ Parametric jobs : 1 job run with several sets of parameters Depracated : use job collections instead For DAG and collections, directory option contains JDL of all sub- jobs ‘JDL file’ parameter should not be specified Job Submission - M. Jouvin 13/07/2019
Job Submission - M. Jouvin … Job Collections 1 jobid attached to the whole collection Allows to check status, retrieve outputs, cancel all the jobs with one command (same command as for a single job) Output : 1 directory per sub-job + 1 ids_nodes.map giving mapping between sub-jobs and directories Collection status is the aggregated status of all jobs Completed when all jobs are completed Successful if all jobs are successful Each sub-job has also its own jobid Possible to manage sub-jobs as single job : check status, retrieve output, cancel… When submitting many jobs at one, more efficient to use a collection even if managing each job independently afterwards Job Submission - M. Jouvin 13/07/2019
Job Submission - M. Jouvin Job Perusal Goal: access (some) output files when the job is running Can be done for any file produced by the job Done by WMS periodically “polling” CE and retrieving current file contents Requires 2 additional lines in JDL : PerusalFileEnable = true; PerusalTimeInterval = 120; # In seconds, not too low Definition and retrieval of files with 1 command: glite-wms-job-perusal [--set|--get|--unset] –f file jobid --set : used to define files to monitor, 1 -f per file --get : retrieve file difference with previous version retrieved --all retrieves the whole file, even if some chunks already transferred --nodisplay store files rather than display them --unset : cancel file monitoring (per file) Use only when needed: may impact WMS performance Job Submission - M. Jouvin 13/07/2019
Job Submission - M. Jouvin VO Software Area Each VO has a dedicated space on each CE to install its software Shared space between CE and WNs Job references this space through an environment variable: VO_VONAME_SW_DIR VONAME is the VO name with ‘.’ and ‘-’ replaced by ‘_’ Write access only for the VO SW manager World readable: not to store sensitive/private data or default permissions need to be adjusted Software Manager is a VMS role (managed by VO manager) SW area is managed and updated by submitting jobs with SW manager role SW area contents may be published into BDII using tags Lcg-ManageVOTag –host CE –vo voname … Allows to select CE based on SW installed Job Submission - M. Jouvin 13/07/2019
Job Submission - M. Jouvin Writing Data to SE A running job has the same credential (proxy) has the user who submitted it Don’t forget to use proxy renewal if job may end after proxy expiration Use MyProxyServer attribute in JDL Can access any SE user has right to access To write output file to SE, job must select a SE May be hard coded in job : not necessarily on the same site where job is running A (per VO) default SE is defined on each worker node and may be discovered with environment variable: VO_VONAME_DEFAULT_SE VONAME is the VO name with ‘.’ and ‘-’ replaced by ‘_’ Job Submission - M. Jouvin 13/07/2019
Job Submission - M. Jouvin Summary Grid is like a very large batch system Main component is WMS in charge of selecting the appropriate CE based on user’s requirements For this match making, WMS uses information system and file catalogs gLite WMS is a generic job submission service with many advance features : « Bulk » soumission: Job collections, DAG jobs… Swallow resubmission, Fuzzy Ranking… VOMS proxy renewal (including VOMS attributes) Significant performances (demonstrated > 20 kjobs/day/WMS) Not the only solution: Other brokers like GridWay Pilot job (similar to Condor Glide-ins) frameworks like DIRAC, PanDA… « Workflow managers » like TAVERNA, MOTEUR, … Job Submission - M. Jouvin 13/07/2019
Job Submission - M. Jouvin Useful Links Man pages for glite-wms-job-xxx commands GRIF gLite tutorial: https://trac.lal.in2p3.fr/GridSupport/wiki/Tutorial/JobSubm https://trac.lal.in2p3.fr/GridSupport/wiki/Tutorial/VOSoftware Job Submission - M. Jouvin 13/07/2019