Workload Management System Mike Mineter mjm@nesc.ac.uk Footer
Contents What is the Workload Management System (WMS)? How do you use it? Further information Footer
Without WMS… Without the WMS, need direct interaction with nodes User Nodes Without the WMS, need direct interaction with nodes Need to know resource addresses, capabilities Usually want a higher level abstraction – submit a job “to a Grid” not “to a CE” Footer
Which CE do you want to use? Without the WMS, use the Information System to see what’s available, then choose… lcg-infosites --vo gilda ce #CPU Free Total Jobs Running Waiting ComputingElement ---------------------------------------------------------- 10 10 1 0 1 grid011f.cnaf.infn.it:2119/jobmanager-lcgpbs-short 10 10 0 0 0 grid011f.cnaf.infn.it:2119/jobmanager-lcgpbs-long 10 10 2 0 2 grid011f.cnaf.infn.it:2119/jobmanager-lcgpbs-infinite 48 48 0 0 0 grid010.ct.infn.it:2119/jobmanager-lcgpbs-short 48 48 0 0 0 grid010.ct.infn.it:2119/jobmanager-lcgpbs-long 48 48 0 0 0 grid010.ct.infn.it:2119/jobmanager-lcgpbs-infinite …….[30% shown]. WMS does this for you! chooses CE for each job, balances workload, manages jobs and their files Footer
With WMS User WMS Compute Elements WMS manages jobs on users’ behalf User doesn’t decide where jobs are run User defines the job and its requiremements, WMS matches this with available CEs Effect: Easier submission Users insulated from change in Compute elements WMS – can optimise your jobs – e.g. which CE? Footer
Basics Why does the Workload Management System exist? Grids have Many users Running many jobs – a “job” = an executable / script you want to run many compute nodes are available WMS makes running jobs easier for the user AND optimises use of available resources It builds on the basic grid services Authorisation, Authentication, Security, Information Systems, Job submission Terminology: “Compute element”: defined as a batch queue - One cluster can have many queues Footer
WMS User describes job in text file using Job Description Language Local Workstation User describes job in text file using Job Description Language Submits job to WMS using (usually) the command-line interface ssh UI UI (user interface) has preinstalled client software WMS Workload Management System CEs Footer
Using WMS Jobs run in batch mode on grids. Steps in running a job on a gLite grid with WMS: Create a text file in “Job Description Language” Optional check: list the compute elements that match your requirements (“list match” command) Submit the job Non-blocking - Each job is given an id. Occasionally check the status of your job When “Done” retrieve output Footer
Example JDL file Executable = “gridTest”; StdError = “stderr.log”; StdOutput = “stdout.log”; InputSandbox = {“/home/joda/test/gridTest”}; OutputSandbox = {“stderr.log”, “stdout.log”}; Requirements = other.GlueCEPolicyMaxCPUTime > 480; ShallowRetryCount = 3; Example JDL file – InputData – from SE using logical file name. Footer
Job states Flag Meaning SUBMITTED submission logged in the Logging & Bookkeeping service WAIT job match making for resources READY job being sent to executing CE SCHEDULED job scheduled in the CE queue manager RUNNING job executing on a Worker Node of the selected CE queue DONE job terminated without grid errors CLEARED job output retrieved ABORT job aborted by middleware, check reason Footer
WMS: role of WMProxy UI WMProxy Client on the UI communicates with the “WM Proxy” On UI run: glite-wms-…commands WMProxy acts on your behalf in using the WM – it needs a “delegated proxy” – hence “-a” option on commands Local Workstation UI UI (user interface) has preinstalled client software WMProxy Workload Manager CEs Footer
More about WMProxy UI WMProxy Workload Manager Local Workstation WMPProxy can manage complex jobs Before WMProxy, user had to script or create software to manage these on the UI UI UI (user interface) has preinstalled client software WMProxy Workload Manager CEs Footer
Principales Commandes glite-wms-job-submit (edg-job-submit) Soumets un job Retourne le jobID glite-wms-job-status (edg-job-status) Donne le statut du job glite-wms-job-output (edg-job-get-output) Récupère les fichiers spécifiés dans l’attribut OutputSandbox glite-wms-job-cancel (edg-job-cancel) Annule un job glite-wms-job-list-match (edg-job-list-match) Liste les ressources compatible avec la description du job Effectue le matchmaking sans soumettre le job glite-wms-job-logging-info (edg-job-get-logging-info) Donne des informations de logging sur les jobs soumis (tout les événements répertoriés par les divers composants du WMS) Très utile pour débuguer Footer
WMS commands glite-wms-job-submit glite-wms-job-status Submit a job returns jobID – this is used in subsequent commands glite-wms-job-status Check status of job glite-wms-job-output Obtain result files glite-wms-job-cancel Cancel a job glite-wms-job-list-match List resourcews that can accept this job Does the matchmaking against resource characteristics for the job glite-wms-job-logging-info Retrieves information from the logging service Important for debugging Footer
Further information gLite Users Guide GILDA wiki Follow http://www.glite.org and “Documentation” GILDA wiki We are using some of these pages https://grid.ct.infn.it/twiki/bin/view/GILDA/ EGEE Digital Library http://egee.lib.ed.ac.uk/ Footer
What next? Practical to show basic use of the WMS Then next talk shows more complex jobs And another practical to run these Footer