Workload Management System (WMS) & Job Description Language (JDL) Marcello Iacono Manno INFN Catania PI2S2 First Tutorial Messina, 09.01.2007
Sommario WMS (Workload Management System) is a part of the Grid Middleware collects the requests for resources from the jobs collects information about the status of the infrastructure matches requests with offers specifically chose a CE (Computing Element) for execution JDL (Job Description Language) is a language for the description of the job request to the Grid specifies input, output , executable files indicates the kind of job (normal, collection, DAG, etc…) gives user preferences about matchmaking Catania, PI2S2 Tutorial, 09.01.2007
Summary Catania, PI2S2 Tutorial, 09.01.2007
cancellation) expressed WMS’s Architecture Job management requests (submission, cancellation) expressed via a Job Description Language (JDL) Catania, PI2S2 Tutorial, 09.01.2007
WMS’s Architecture Keeps submission requests Requests are kept for a while if no matching resources available Catania, PI2S2 Tutorial, 09.01.2007
The Task Queue Task Queue Non-matching requests possibility to keep a submission request for a while if no resources are immediately available that match the job requirements technique used by the AliEn and Condor systems Non-matching requests will be retried either periodically eager scheduling approach or as soon as notifications of available resources appear in the ISM (Information Super Market) lazy scheduling approach Catania, PI2S2 Tutorial, 09.01.2007
WMS’s Scheduling Policies WM can adopt: eager scheduling (“push” model) a job is bound to a resource as soon as possible and, once the decision has been taken, the job is passed to the selected resource for execution lazy scheduling (“pull” model) foresees that the job is held by the WM until a resource becomes available, at which point that resource is matched against the submitted jobs the job that fits best is passed to the resource for immediate execution. Catania, PI2S2 Tutorial, 09.01.2007
WMS’s Architecture Finds an appropriate CE for each submission request, taking into account job requests and preferences, Grid status, utilization policies on resources Catania, PI2S2 Tutorial, 09.01.2007
WMS’s Architecture Performs the actual job submission and monitoring Catania, PI2S2 Tutorial, 09.01.2007
WMS’s Architecture Repository of resource information available to matchmaker Updated via notifications and/or active polling on sources Catania, PI2S2 Tutorial, 09.01.2007
The Information Supermarket Information SuperMarket (ISM) has been inherited by EU DataGrid (EDG) project decoupling between the collection of information concerning resources and its use allows flexible application of different policies basically consists of a repository of resource information that is available in read only mode to the matchmaking engine the update is the result of the arrival of notifications active polling of resources some arbitrary combination of both can be configured so that certain notifications can trigger the matchmaking engine improve the modularity of the software support the implementation of lazy scheduling policies Catania, PI2S2 Tutorial, 09.01.2007
Job Submission Services WMS components handling the job during its lifetime and performing the submission Job Adapter is responsible for making the final touches to the JDL expression for a job, before it is passed to CondorC for the actual submission creating the job wrapper script that creates the appropriate execution environment in the CE worker node transfer of the input and of the output sandboxes CondorC responsible for performing the actual job management operations job submission, job removal DAGMan meta-scheduler purpose is to navigate the graph determine which nodes are free of dependencies follow the execution of the corresponding jobs. instance is spawned by CondorC for each handled DAG Log Monitor watching the CondorC log file intercepting interesting events concerning active jobs events affecting the job state machine triggering appropriate actions. Catania, PI2S2 Tutorial, 09.01.2007
Job Logging & Bookkeeping L&B tracks jobs in terms of events important points of job life submission, finding a matching CE, starting execution etc gathered from various WMS components events are passed to a physically close component of the L&B infrastructure locallogger avoid network problems stores them in a local disk file and takes over the responsibility to deliver them further destination of an event is one of bookkeeping servers assigned statically to a job upon its submission processes the incoming events to give a higher level view on the job states Submitted, Running, Done various recorded attributes JDL, destination CE name, job exit code retrieval of both job states and raw events is available via legacy (EDG) and WS querying interfaces user may also register for receiving notifications on particular job state changes Catania, PI2S2 Tutorial, 09.01.2007
Job State Machine (1/9) Submitted: job is entered by the user to the User Interface but not yet transferred to Network Server for processing Catania, PI2S2 Tutorial, 09.01.2007
Job State Machine (2/9) Waiting: job was accepted by NS and is waiting for Workload Manager processing or being processed by WMHelper modules. Catania, PI2S2 Tutorial, 09.01.2007
Job State Machine (3/9) Ready: job processed by WM and its Helper modules (CE found) but not yet transferred to the CE (local batch system queue) via JC and CondorC. This state does not exists for a DAG as it is not subjected to matchmaking (the nodes are) but passed directly to DAGMan. Catania, PI2S2 Tutorial, 09.01.2007
Job State Machine (4/9) Scheduled: job waiting in the queue on the CE. This state also does not exists for a DAG as it is not directly sent to a CE (the nodes are). Catania, PI2S2 Tutorial, 09.01.2007
Job State Machine (5/9) Running: job is running. For a DAG this means that DAGMan has started processing it. Catania, PI2S2 Tutorial, 09.01.2007
Job State Machine (6/9) Done: job exited or considered to be in a terminal state by CondorC (e.g., submission to CE has failed in an unrecoverable way). Catania, PI2S2 Tutorial, 09.01.2007
Job State Machine (7/9) Aborted: job processing was aborted by WMS (waiting in the WM queue or CE for too long, over-use of quotas, expiration of user credentials). Catania, PI2S2 Tutorial, 09.01.2007
Job State Machine (8/9) Cancelled: job has been successfully canceled on user request. Catania, PI2S2 Tutorial, 09.01.2007
Job State Machine (9/9) Cleared: output sandbox was transferred to the user or removed due to the timeout. Catania, PI2S2 Tutorial, 09.01.2007
DAG State Machine Catania, PI2S2 Tutorial, 09.01.2007
input files interface output files GRID JDL executable job resources JDL introduction JDL describes a request to the WMS allowing the submission of one or more jobs the single items are called attributes job descriptors are mandatory attributes resource descriptors are supplied by the Information Service (IS) following the Glue (gLite) schema input files output files JDL GRID job executable interface resources Catania, PI2S2 Tutorial, 09.01.2007
JDL generalities JDL describes a request to the WMS JDL allows submission, cancellation, status query, output retrieval with either CLI or GUI two versions: legacy Network Server interface (socket) WMProxy (web service) job description: Condor classified advertisements (classads) descriptors are called attributes mandatory attributes: job fundamental descriptors resource attributes are related to the status and characteristics of grid resources Information Service (IS) schema (Glue for gLite) Catania, PI2S2 Tutorial, 09.01.2007
JDL format a list of entries enclosed by [ ] (each terminated by ;) entry: <attribute> = <value> | < list of values >; attribute: a string with the name of the attribute value: string “abc” a double-quoted string Integer 1234 Floating Point 12.34 Boolean “true”,”false”,expression (see GLUESchema) classads (see nodes) list of values: enclosed by <{ }> separated by <,> { “abc” , “bcd” , “def” } Catania, PI2S2 Tutorial, 09.01.2007
a Job a DAG a Collection a a a b c d b c d e node dependency e Request types Type = < string > “Job” a simple job (default) “DAG” a Direct Acyclic Graph of dependent jobs “Collection” a set of independent jobs a Job a DAG a Collection a a a b c d b c d e node dependency e Catania, PI2S2 Tutorial, 09.01.2007
JobType JobType = < string > “Normal” a simple job “Interactive” a job whose standard streams are forwarded to the submitting client “MPICH” a MPI parallel job “Partitionable” a job composed by a set of independent steps / iterations for parallel execution “Checkpointable” a job able to save its state in order to be suspended and resumed from the same point “Parametric” a job with parametric attributes in its JDL to submit many similar instances with a single command (only parameterized attributes vary) Catania, PI2S2 Tutorial, 09.01.2007
Executable Executable = < string > the path for the command/exec file(*) “/usr/bin/java/j2sdk1.5.0/bin/java”, “/home/user/executable.exe” environment variables accepted(*) “$JAVA/bin/java” local or absolute paths accepted(*) “executable.exe” requires an identical file entry in the Input Sandbox remote files can be specified by gsiftp (local path exec) mandatory for all jobs wild cards not allowed arguments are reported in a dedicated attribute (*) on the executing WN Catania, PI2S2 Tutorial, 09.01.2007
Arguments Arguments = < string > the arguments for the executable file “-out outputfile.dat” together with: Executable = “execprog”; originates on the WN the command line: $ execprog -out outputfile.dat quotas (“”) have to follow a backslash(\) “ -a \”quoted string\” -bcd” becomes (with the above executable) $ execprog -a ”quoted string” -bcd Catania, PI2S2 Tutorial, 09.01.2007
StdInput, StdOutput, StdError StdInput, StdOutput, StdError = < string > the paths for the I/O/err file same rules as the ‘Executable’ attribute require identical entries in the Input/Output Sandboxes StdOutput and StdError can be the same file StdInput not required for Interactive jobs examples: StdInput = “/home/iacono/config.dat”; StdOutput = “gsiftp://grid999.ct.infn.it:1234/tmp/file.out”; Catania, PI2S2 Tutorial, 09.01.2007
InputSandbox & OutputSandbox InputSandbox, OutputSandbox = < string | list of strings > identifying … the input file(s) to be copied from the local UI (or a gridFTP server) to the WN before starting execution the output file(s) to be prepared for downloading from WN after job execution completion (transfer: glite-job-output) wild cards admitted if solved locally LFN files not admitted (InputData or script–copy required) files in InputSandbox must not exceed 10 MB length different filenames required (same destination directory) examples: InputSandbox= { “myinp1.dat“ , ”data/myinp2.dat” }; Catania, PI2S2 Tutorial, 09.01.2007
InputData InputData = < string | list of strings > identify LFNs, GUIDs, LDs and/or queries query the related Data Catalog to retrieve PFNs input queried files into WN current directory influences the WMS matchmaking decision example: InputData = { “lfn:/grid/gilda/isospin.dat”, “guid:135b7b23-4a6a-87e7-9d101f8c8b70”, “lds:testfile.inp” // LDS catalog “query: select my_files”, // LDS catalog “si-lfn:/file.inp” /* StorageIndex catalog */ }; when catalog is not specified (first two entries), if StorageIndex attribute is declared, then the pointed catalog is used, otherwise WMS tries first RLS and then DLI Catania, PI2S2 Tutorial, 09.01.2007
StorageIndex & DataCatalog StorageIndex = < string > (*) the endpoint URL of the StorageIndex (SI) service to resolve file names for si-lfn: or si-guid: files in InputData when StorageIndex is not specified, the VO default SI is used example: StorageIndex = "https://grid017.ct.infn.it:8443/gilda/glite-data- catalog-service-fr-mysql/services/SEIndex" DataCatalog = < string > (*) the endpoint URL of the RLS or DLI service to be used to resolve file names example: DataCatalog = “https://grid017.ct.infn.it:8443/gilda/glite-data-catalog- service-fr-mysql/services/FiremanCatalog"; (*) for usage with InputData only Catania, PI2S2 Tutorial, 09.01.2007
OutputSE & OutputData OutputSE = < string > a string representing the URL of a SE for output data storing influence the matchmaking decision by the RB example: OutputSE = “grid009.ct.infn.it”; OutputData = < list of classads > (*) list of classads describing output files similar to DataRequirements automatic upload output files upon job completion (*) not yet supported Catania, PI2S2 Tutorial, 09.01.2007
VirtualOrganisation, RetryCount & ShallowRetryCount VirtualOrganisation = < string > the name of the VO has to match with the WMS default overwritten by –vo option in glite-job-submit or edg-job-submit example: VirtualOrganisation = “cometa”; RetryCount, ShallowRetryCount = < Integer > indicate how many times the job must be re-submitted upon a failure due to some grid resources not valid for DAG / Collection limited by the MaxRetryCount parameter of WMS resubmission is ‘shallow’ if user job aborted before running ‘deep’ resubmission resets shallow retry count Catania, PI2S2 Tutorial, 09.01.2007
LBAddress & VOMSProxyServer LBAddress = < string > the address of LB format: <host>[:<port>] default taken from WMS configuration (port = 9000) example: LBAddress = “lb-grid.ct.infn.it“; MyProxyServer = < string > the address of the proxy server automatic renewal of proxy certificate (long jobs) port defaults to 7512 example: MyProxyServer = “grid001.ct.infn.it:7512”; Catania, PI2S2 Tutorial, 09.01.2007
NodeNumber & ListenerPort NodeNumber = < Integer > an integer >1 specifying how many CPUs are needed for a MPICH job mandatory for JobType = “MPICH” example: NodeNumber = 3; ListenerPort = < Integer > the port number where condor_console_shadow listens for job standard streams for usage with JobType = ‘Interactive’ example: ListenerPort = 44000; Catania, PI2S2 Tutorial, 09.01.2007
ListenerHost & ListenerPipeName ListenerHost = < string > the host name on which the condor_console_shadow listens for job standard streams for usage with JobType = “Interactive” it is used when submission and interactive session are on different machines ListenerPipeName = < string > the absolute path of the pipes where job standard streams are located example: ListenerPipeName = “/tmp/pipe”; means: stdin=/tmp/pipe.in, stdout=/tmp/pipe.out (default=/tmp/listener/<jobID unique string>) Catania, PI2S2 Tutorial, 09.01.2007
JobSteps JobSteps = < Integer | list of strings > either an integer representing the number of steps to run or a string list with the labels associated to the steps of a partitionable or checkpointable job the main stepper is a part of the user job (not WMS) that links it to the run time checkpointing library if also specified in JobState classad, this definition prevails examples: JobSteps = 1000; (runs 1000 steps of main stepper) JobSteps = { “a”,”b”,”c” }; (runs sections “a”, “b”, “c” of main stepper) Catania, PI2S2 Tutorial, 09.01.2007
CurrentStep & JobState CurrentStep = < Integer > an integer number (>0) indicating the step number to be taken as the initial one when submitting a checkpointable or partitionable job example: CurrentStep = 2; (default=0) JobState = < classad > a job checkpoint state to start with when submitting a checkpointable job example: JobState = [ JobSteps = 1000; CurrentStep = 350; UserData = [ DumpPath=“gsiftp://grid999.ct.infn.it:1234/tmp/dumpfile” ] ] Catania, PI2S2 Tutorial, 09.01.2007
GLUESchema GLUESchema (Grid Logical Uniform Environment) an information model to describe grid sites and services independent from specific implementation syntax < entity >.< property > examples: entity = < Site | Service | SubCluster | StorageElement | Host | ... | Other > property = < UniqueID | NameUniqueID > Requirements = other.GlueCEUniqueID == "grid010.ct.infn.it:2119/jobmanager-lcgpbs-infinite" Requirements = other.GlueCEInfoLRMSType == "PBS" || other.GlueCEInfoLRMSType == "LSF" Catania, PI2S2 Tutorial, 09.01.2007
Requirements & UserTags Requirements = < Boolean expression built with classads > a Boolean classad expression with C-like operators describe job requirement on resources (CE) attributes according to the GlueSchema CE requirements attributes in the IS begin with prefix “other.” example: Requirements = other.GlueCEInfoTotalCPUs > 2 && other.GlueCEPolicyMaxRunningJobs < 2; UserTags = < classad > a classad attribute that allows the user to specify user-defined key, value pairs (values must be strings) tags can be used to query the LB Catania, PI2S2 Tutorial, 09.01.2007
Rank & FuzzyRank Rank = < Floating Point expression built with classads > a classad Floating Point expression showing the rank of each CE matching the Requirements expression the highest rank CE will be selected for job execution examples: Rank = other.GlueCEPolicyMaxRunningJobs other.GlueCEStateRunningJobs; (CE with the greatest number of free slots) Rank = - other.GlueCEStateEstimateResponseTime (CE with the estimated shortest travel time through the local batch system queue) FuzzyRank = < Boolean > a Boolean attribute for the fuzzyfication of ranking attribution defaults to false Catania, PI2S2 Tutorial, 09.01.2007
Nodes (1/2) Nodes = < list of classads > classads describe nodes and their dependencies example: nodes = [ a = [ /* node “a” */ description = [ description: a classad JobType = “Normal”; containing a JDL file Executable = “a.exe”; to describe a node InputSandbox = {…}; ]; b = [ /* node “b” */ file = node_b,jdl; file: a string indicating ]; the absolute path of a … JDL file describing a node Catania, PI2S2 Tutorial, 09.01.2007
Parametric Jobs Job = “Parametric” results in the submission of sets of identical jobs except for a parameter varying from a job to another each job receives a different jobID so… it is possible to trace it separately from the others but… parametric job handle allows a common treating a special variable (_PARAM_) marks variable items amongst JDL attributes _PARAM_ assumes numerical values or a list of declared values (strings) Catania, PI2S2 Tutorial, 09.01.2007
ParameterStart, ParameterStep, Parameters, NodesCollocation Parameters = < Integer | list of strings > an Integer to indicate the number of steps or a list of strings (each of them is the name of a step) ParameterStart, ParameterStep = < Integer > ParameterStart indicates the initial step ParameterStep indicates the amount of increment between two subsequent values of _PARAM_ NodesCollocation = < Boolean > if true all the job instances are sent to the same CE Catania, PI2S2 Tutorial, 09.01.2007
Job Collections Job = “Collection” a set of independent jobs that must be submitted, monitored and controlled as a single request similar to a DAG, but without dependencies all the clauses for DAGs are extended to Collections nodes are treated by classads attributes are referred to the whole Collection an inherit mechanism is also present Catania, PI2S2 Tutorial, 09.01.2007
GangMatching Requirements = anyMatch (<list of classads>) | whichMatch(<list of classads>) | allMatch (<list of classads>) RB uses classads to perform matchmacking job and CE are usually the only involved entities if also SE is to be considered, a more general mechanism is provided with new classad built-in functions example: Requirements = anyMatch( other.storage.CloseSEs, target.GlueSAStateAvailableSpace > 200) forces RB to select a CE close to a SE with >200 MBs of available space Catania, PI2S2 Tutorial, 09.01.2007
Bibliography bibliography WMS JDL attributes Simple job submission http://glite.web.cern.ch/glite/wms/ JDL attributes https://edms.cern.ch/file/555796/1/EGEE-JRA1-TEC-555796-JDL-Attributes-v0-8.pdf Simple job submission https://grid.ct.infn.it/twiki/bin/view/PI2S2/SimpleJobSubmissionWithRB More on JDL ( includes MPI jobs ) https://grid.ct.infn.it/twiki/bin/view/PI2S2/MoreOnJDL Job requesting data https://grid.ct.infn.it/twiki/bin/view/PI2S2/JobData Catania, PI2S2 Tutorial, 09.01.2007
Questions… Catania, PI2S2 Tutorial, 09.01.2007