Introduction to Computing Element HsiKai Wang Academia Sinica Grid Computing Center, Taiwan
Interlocution WMS - Workload Management System CREAM - Computing Resource Execution And Management Example Simple case for WMS Simple case for CREAM 2 Outline
3 API Access Job Mgmt. Services Computing Element Workload Management Metadata Catalog Data Services Storage Element Data Movement File & Replica Catalog Authorization Security Services Authentication Information & Monitoring Information & Monitoring Services Service Discovering Accounting Auditing Job Provenance Package Manager CLI Network Monitoring Overview of gLite Middleware
How to work
Compute Element 5 Condor-G Globus client gLite WMS User CREAM CEMon ICE CREAM or BES client EGEE authZ, InfoSys, Accounting In production Existing prototype gLite component non-gLite component Batch System LCG-CE (GT2/4 + add-ons) Condor-C BLAH User / Resource User Interface Computing Element GIP Workload Manager
Interlocution WMS - Workload Management System CREAM - Computing Resource Execution And Management Example Simple case for WMS Simple case for CREAM 6 Outline
Workload Management System Ref gLite-3.2 User Guide 7 The purpose of the Workload Management System (WMS): To accept user jobs To assign them to the most appropriate Computing Element To record their status To retrieve their output The WMS used to be called Resource Broker (RB). The service is called gLite-WMS.
Job Workflow in gLite-WMS 8 WMS/ Workload Management system File catalog IS/ Information system SE/ Storage Element CE/ Computing Element WN/ Worker Node UI JDL Input Sandbox Output Sandbox U I/ User Interface
UI WMProxy Job Contr. - CondorG Workload Manager LFC Inform. Service Computing Element Storage Element CE characts & status SE characts & status RB storage Input Sandbox files Job submitted WMS glite-wms-job-submit myjob.jdl WMProxy is responsible for accepting incoming requests
10 UI WMProxy Job Contr. - CondorG Workload Manager LFC Inform. Service Computing Element Storage Element CE characts & status SE characts & status RB storage waiting submitted WM: responsible to take the appropriate actions to satisfy the request Job WMS
11 UI WMProxy Job Contr. - CondorG Workload Manager LFC Inform. Service Computing Element Storage Element CE characts & status SE characts & status RB storage waiting submitted Match- Maker/ Broker Where must this job be executed ? WMS Matchmaker: responsible to find the “best” CE where to submit a job
12 WMS UI WMProxy Job Contr. - CondorG Workload Manager LFC Inform. Service Computing Element Storage Element CE characts & status SE characts & status RB storage waiting submitted Information supermarket Responsible of resource information available to Matchmaker Match- Maker/ Broker
13 UI WMProxy Job Contr. - CondorG Workload Manager LFC Inform. Service Computing Element Storage Element CE characts & status SE characts & status RB storage waiting submitted Match- Maker/ Broker WMS Information supermarket CE choice
UI WMProxy Job Contr. - CondorG Workload Manager LFC Inform. Service Computing Element Storage Element CE characts & status SE characts & status RB storage JC: responsible for the actual job management operations (done via CondorG) Job submitted waiting ready WMS Task Queue
15 WMS UI WMProxy Job Contr. - CondorG Workload Manager LFC Inform. Service Computing Element Storage Element CE characts & status SE characts & status RB storage Job Input Sandbox files submitted waiting ready scheduled Task Queue
16 UI WMProxy Job Contr. - CondorG Workload Manager LFC Inform. Service Computing Element Storage Element RB storage Input Sandbox submitted waiting ready scheduled running “Grid enabled” data transfers/ accesses Job WMS Task Queue
17 UI WMProxy Job Contr. - CondorG Workload Manager LFC Inform. Service Computing Element Storage Element RB storage Output Sandbox files submitted waiting ready scheduled running done WMS Task Queue
18 UI NS Job Contr. - CondorG Workload Manager LFC Inform. Service Computing Element Storage Element RB storage Output Sandbox files submitted waiting ready scheduled running done cleared WMS Task Queue glite-wms-job-output
19 UI Logging & Bookkeeping WMProxy Job Contr. - CondorG Workload Manager Computing Element LB: receives and stores job events; processes corresponding job status Log of job events Job status glite-wms-job-status glite-wms-job-logging-info WMS LB proxy
Job state machine 20
gLite-WMS Job States Ref gLite-3.2 User Guide 21 StatusDescription SUBMMITEDsubmission logged in the LB WAITjob match making for resources READYjob being sent to executing CE SCHEDULEDjob scheduled in the CE queue manager RUNNIGjob executing on a WN of the selected CE queue DONEjob terminated without grid errors CLEAREDjob output retrieved ABORTjob aborted by middleware, check reason
Interlocution WMS - Workload Management System CREAM - Computing Resource Execution And Management Example Simple case for WMS Simple case for CREAM 22 Outline
23 CREAM: Web Service Computing Element Cream WSDL allows defining custom user interface C++ CLI interface allows direct submission Lightweight Fast notification of job status changes via CEMon Improved security no “fork-scheduler” Will support for bulk jobs on the CE optimization of staging of input sandboxes for jobs with shared files ICE: Interface to Cream Environment being integrated in WMS for submissions to CREAM Computing Resource Execution And Management
Job Stat Machine Ref gLite-3.2 User Guide 24
CREAM Job States 25 StatusDescription REGISTEREDthe job has been registered but it has not been started yet. PENDINGthe job has been started, but it has still to be submitted to the LRMS abstraction layer module (i.e. BLAH). IDLEthe job is idle in the Local Resource Management System (LRMS). RUNNINGthe job wrapper, which "encompasses" the user job, is running in the LRMS. REALLY-RUNNINGthe actual user job (the one specified as Executable in the job JDL) is running in the LRMS. HELDthe job is held (suspended) in the LRMS. CANCELLEDthe job has been cancelled. DONE-OKthe job has successfully been executed. DONE-FAILEDthe job has been executed, but some errors occurred. ABORTEDerrors occurred during the "management" of the job, e.g. the submission to the LRMS abstraction layer software (BLAH) failed. UNKNOWNthe job is an unknown status.
Job Control Command Ref gLite-3.2 User Guide 26 gLite WMSgLite CREAM Delegate proxy glite-wms-job-delegate-proxy -d delegID glite-ce-job-delegate-proxy -e endpoint -d delegID Submit glite-wms-job-submit [-d delegID] [-a] [-o joblist] jdlfile glite-ce-job-submit [-d delegID] [-a] [-o joblist] -r ceIDs jdlfile Status glite-wms-job-status -i joblist | jobIDs glite-ce-job-status -i joblist | jobIDs Logging glite-wms-job-logging-info -i joblist | jobIDs Output glite-wms-job-output [-dir outdir] -i joblist | jobIDs Cancel glite-wms-job-cancel -i joblist | jobID glite-ce-job-cancel -i joblist | jobID Compatible resources glite-wms-job-list-match [-d delegID] [-a] jdlfile
Interlocution WMS - Workload Management System CREAM - Computing Resource Execution And Management Example Simple case for WMS Simple case for CREAM 27 Outline
Job Description Language for WMS Ref gLite-3.2 User Guide 28 wms]$ ls checkHost.sh Host_wms.jdl wms]$ cat Host_wms.jdl JobType = "Normal"; CPUNumber = 1; Executable = "checkHost.sh”; StdOutput = "std.out"; StdError = "std.err”; InputSandbox = {"checkHost.sh"}; OutputSandbox = {"std.out", "std.err", "Host.log"}; RetryCount = 5; Requirements = other.GlueCEUniqueID == "as-ce01.euasiagrid.org:8443/cream-pbs-euasia"; wms]$ cat checkHost.sh #!/bin/sh echo "HOST: `hostname`" >> Host.log printenv >> Host.log
Example for WMS Ref gLite-3.2 User Guide 29 wms]$ glite-wms-job-submit -a Host_wms.jdl ====================== glite-wms-job-submit Success ====================== The job has been successfully submitted to the WMProxy Your job identifier is: ========================================================================== wms]$ glite-wms-job-status ======================= glite-wms-job-status Success ===================== BOOKKEEPING INFORMATION: Status info for the Job : 7qE54xk0Sw Current Status: Scheduled Status Reason: unavailable Destination: as-ce01.euasiagrid.org:8443/cream-pbs-euasia Submitted: Sat Feb 2 16:35: UTC ==========================================================================
Example for WMS Ref gLite-3.2 User Guide 30 wms]$ glite-wms-job-output --dir. ================================================================================ JOB GET OUTPUT OUTCOME Output sandbox files for the job: have been successfully retrieved and stored in the directory: /home/hkw00/HAII/ce/wms/hkw00_FtH87_dKEfp-7qE54xk0Sw ======================================================== wms]$ ls /home/hkw00/HAII/ce/wms/hkw00_FtH87_dKEfp-7qE54xk0Sw Host.log std.err std.out
Interlocution WMS - Workload Management System CREAM - Computing Resource Execution And Management Example Simple case for WMS Simple case for CREAM 31 Outline
Job Description Language for CREAM Ref gLite-3.2 User Guide 32 cream]$ ls checkHost.sh Host_cream.jdl cream]$ cat Host_cream.jdl JobType = "Normal"; CPUNumber = 1; Executable = "checkHost.sh”; StdOutput = "std.out"; StdError = "std.err”; InputSandBox = {"/home/hkw00/HAII/ce/cream/checkHost.sh"}; OutputSandBox = {"Host.log"}; OutputSandboxDestURI = {"gsiftp://as- ds01.euasiagrid.org/dpm/euasiagrid.org/home/euasia/hkw00/"}; RetryCount = 5; cream]$ cat checkHost.sh #!/bin/sh echo "HOST: `hostname`" >> Host.log printenv >> Host.log
Example CREAM Ref gLite-3.2 User Guide 33 cream]$ lcg-infosites --vo euasia ce # CPU Free Total Jobs Running Waiting ComputingElement as-ce01.euasiagrid.org:8443/cream-pbs-euasia ce-qamar.utmgrid.utm.my:8443/cream-pbs-euasia ce.utmgrid.utm.my:8443/cream-pbs-euasia (...) cream]$ glite-ce-job-submit -r as-ce01.euasiagrid.org:8443/cream-pbs- euasia -a Host_cream.jdl cream]$ glite-ce-job-status ce01.euasiagrid.org:8443/CREAM ****** JobID=[ Status = [DONE-OK] ExitCode = [0]
Example CREAM Ref gLite-3.2 User Guide 34 cream]$ lcg-ls srm://as- ds01.euasiagrid.org/dpm/euasiagrid.org/home/euasia/hkw00/ /dpm/euasiagrid.org/home/euasia/hkw00//Host.log (...) cream]$ lcg-cp srm://as- ds01.euasiagrid.org/dpm/euasiagrid.org/home/euasia/hkw00/Host.log file:`pwd`/Host.log cream]$ ls checkHost.sh Host_cream.jdl Host.log