Download presentation
Presentation is loading. Please wait.
Published byPhillip Fowler Modified over 8 years ago
1
Introduction to Computing Element HsiKai Wang Academia Sinica Grid Computing Center, Taiwan
2
Interlocution WMS - Workload Management System CREAM - Computing Resource Execution And Management Example Simple case for WMS Simple case for CREAM 2 Outline
3
3 API Access Job Mgmt. Services Computing Element Workload Management Metadata Catalog Data Services Storage Element Data Movement File & Replica Catalog Authorization Security Services Authentication Information & Monitoring Information & Monitoring Services Service Discovering Accounting Auditing Job Provenance Package Manager CLI Network Monitoring Overview of gLite Middleware
4
How to work
5
Compute Element 5 Condor-G Globus client gLite WMS User CREAM CEMon ICE CREAM or BES client EGEE authZ, InfoSys, Accounting In production Existing prototype gLite component non-gLite component Batch System LCG-CE (GT2/4 + add-ons) Condor-C BLAH User / Resource User Interface Computing Element GIP Workload Manager
6
Interlocution WMS - Workload Management System CREAM - Computing Resource Execution And Management Example Simple case for WMS Simple case for CREAM 6 Outline
7
Workload Management System Ref gLite-3.2 User Guide 7 The purpose of the Workload Management System (WMS): To accept user jobs To assign them to the most appropriate Computing Element To record their status To retrieve their output The WMS used to be called Resource Broker (RB). The service is called gLite-WMS.
8
Job Workflow in gLite-WMS 8 WMS/ Workload Management system File catalog IS/ Information system SE/ Storage Element CE/ Computing Element WN/ Worker Node UI JDL Input Sandbox Output Sandbox U I/ User Interface
9
UI WMProxy Job Contr. - CondorG Workload Manager LFC Inform. Service Computing Element Storage Element CE characts & status SE characts & status RB storage Input Sandbox files Job submitted WMS glite-wms-job-submit myjob.jdl WMProxy is responsible for accepting incoming requests
10
10 UI WMProxy Job Contr. - CondorG Workload Manager LFC Inform. Service Computing Element Storage Element CE characts & status SE characts & status RB storage waiting submitted WM: responsible to take the appropriate actions to satisfy the request Job WMS
11
11 UI WMProxy Job Contr. - CondorG Workload Manager LFC Inform. Service Computing Element Storage Element CE characts & status SE characts & status RB storage waiting submitted Match- Maker/ Broker Where must this job be executed ? WMS Matchmaker: responsible to find the “best” CE where to submit a job
12
12 WMS UI WMProxy Job Contr. - CondorG Workload Manager LFC Inform. Service Computing Element Storage Element CE characts & status SE characts & status RB storage waiting submitted Information supermarket Responsible of resource information available to Matchmaker Match- Maker/ Broker
13
13 UI WMProxy Job Contr. - CondorG Workload Manager LFC Inform. Service Computing Element Storage Element CE characts & status SE characts & status RB storage waiting submitted Match- Maker/ Broker WMS Information supermarket CE choice
14
UI WMProxy Job Contr. - CondorG Workload Manager LFC Inform. Service Computing Element Storage Element CE characts & status SE characts & status RB storage JC: responsible for the actual job management operations (done via CondorG) Job submitted waiting ready WMS Task Queue
15
15 WMS UI WMProxy Job Contr. - CondorG Workload Manager LFC Inform. Service Computing Element Storage Element CE characts & status SE characts & status RB storage Job Input Sandbox files submitted waiting ready scheduled Task Queue
16
16 UI WMProxy Job Contr. - CondorG Workload Manager LFC Inform. Service Computing Element Storage Element RB storage Input Sandbox submitted waiting ready scheduled running “Grid enabled” data transfers/ accesses Job WMS Task Queue
17
17 UI WMProxy Job Contr. - CondorG Workload Manager LFC Inform. Service Computing Element Storage Element RB storage Output Sandbox files submitted waiting ready scheduled running done WMS Task Queue
18
18 UI NS Job Contr. - CondorG Workload Manager LFC Inform. Service Computing Element Storage Element RB storage Output Sandbox files submitted waiting ready scheduled running done cleared WMS Task Queue glite-wms-job-output
19
19 UI Logging & Bookkeeping WMProxy Job Contr. - CondorG Workload Manager Computing Element LB: receives and stores job events; processes corresponding job status Log of job events Job status glite-wms-job-status glite-wms-job-logging-info WMS LB proxy
20
Job state machine 20
21
gLite-WMS Job States Ref gLite-3.2 User Guide 21 StatusDescription SUBMMITEDsubmission logged in the LB WAITjob match making for resources READYjob being sent to executing CE SCHEDULEDjob scheduled in the CE queue manager RUNNIGjob executing on a WN of the selected CE queue DONEjob terminated without grid errors CLEAREDjob output retrieved ABORTjob aborted by middleware, check reason
22
Interlocution WMS - Workload Management System CREAM - Computing Resource Execution And Management Example Simple case for WMS Simple case for CREAM 22 Outline
23
23 CREAM: Web Service Computing Element Cream WSDL allows defining custom user interface C++ CLI interface allows direct submission Lightweight Fast notification of job status changes via CEMon Improved security no “fork-scheduler” Will support for bulk jobs on the CE optimization of staging of input sandboxes for jobs with shared files ICE: Interface to Cream Environment being integrated in WMS for submissions to CREAM Computing Resource Execution And Management
24
Job Stat Machine Ref gLite-3.2 User Guide 24
25
CREAM Job States 25 StatusDescription REGISTEREDthe job has been registered but it has not been started yet. PENDINGthe job has been started, but it has still to be submitted to the LRMS abstraction layer module (i.e. BLAH). IDLEthe job is idle in the Local Resource Management System (LRMS). RUNNINGthe job wrapper, which "encompasses" the user job, is running in the LRMS. REALLY-RUNNINGthe actual user job (the one specified as Executable in the job JDL) is running in the LRMS. HELDthe job is held (suspended) in the LRMS. CANCELLEDthe job has been cancelled. DONE-OKthe job has successfully been executed. DONE-FAILEDthe job has been executed, but some errors occurred. ABORTEDerrors occurred during the "management" of the job, e.g. the submission to the LRMS abstraction layer software (BLAH) failed. UNKNOWNthe job is an unknown status.
26
Job Control Command Ref gLite-3.2 User Guide 26 gLite WMSgLite CREAM Delegate proxy glite-wms-job-delegate-proxy -d delegID glite-ce-job-delegate-proxy -e endpoint -d delegID Submit glite-wms-job-submit [-d delegID] [-a] [-o joblist] jdlfile glite-ce-job-submit [-d delegID] [-a] [-o joblist] -r ceIDs jdlfile Status glite-wms-job-status -i joblist | jobIDs glite-ce-job-status -i joblist | jobIDs Logging glite-wms-job-logging-info -i joblist | jobIDs Output glite-wms-job-output [-dir outdir] -i joblist | jobIDs Cancel glite-wms-job-cancel -i joblist | jobID glite-ce-job-cancel -i joblist | jobID Compatible resources glite-wms-job-list-match [-d delegID] [-a] jdlfile
27
Interlocution WMS - Workload Management System CREAM - Computing Resource Execution And Management Example Simple case for WMS Simple case for CREAM 27 Outline
28
Job Description Language for WMS Ref gLite-3.2 User Guide 28 [hkw00@ui03 wms]$ ls checkHost.sh Host_wms.jdl [hkw00@ui03 wms]$ cat Host_wms.jdl JobType = "Normal"; CPUNumber = 1; Executable = "checkHost.sh”; StdOutput = "std.out"; StdError = "std.err”; InputSandbox = {"checkHost.sh"}; OutputSandbox = {"std.out", "std.err", "Host.log"}; RetryCount = 5; Requirements = other.GlueCEUniqueID == "as-ce01.euasiagrid.org:8443/cream-pbs-euasia"; [hkw00@ui03 wms]$ cat checkHost.sh #!/bin/sh echo "HOST: `hostname`" >> Host.log printenv >> Host.log
29
Example for WMS Ref gLite-3.2 User Guide 29 [hkw00@ui03 wms]$ glite-wms-job-submit -a Host_wms.jdl ====================== glite-wms-job-submit Success ====================== The job has been successfully submitted to the WMProxy Your job identifier is: https://lb04.grid.sinica.edu.tw:9000/FtH87_dKEfp-7qE54xk0Sw ========================================================================== [hkw00@ui03 wms]$ glite-wms-job-status https://lb04.grid.sinica.edu.tw:9000/FtH87_dKEfp-7qE54xk0Sw ======================= glite-wms-job-status Success ===================== BOOKKEEPING INFORMATION: Status info for the Job : https://lb04.grid.sinica.edu.tw:9000/FtH87_dKEfp- 7qE54xk0Sw Current Status: Scheduled Status Reason: unavailable Destination: as-ce01.euasiagrid.org:8443/cream-pbs-euasia Submitted: Sat Feb 2 16:35:10 2013 UTC ==========================================================================
30
Example for WMS Ref gLite-3.2 User Guide 30 hkw00@ui03 wms]$ glite-wms-job-output --dir. https://lb04.grid.sinica.edu.tw:9000/FtH87_dKEfp-7qE54xk0Sw ================================================================================ JOB GET OUTPUT OUTCOME Output sandbox files for the job: https://lb04.grid.sinica.edu.tw:9000/FtH87_dKEfp-7qE54xk0Sw have been successfully retrieved and stored in the directory: /home/hkw00/HAII/ce/wms/hkw00_FtH87_dKEfp-7qE54xk0Sw ======================================================== [hkw00@ui03 wms]$ ls /home/hkw00/HAII/ce/wms/hkw00_FtH87_dKEfp-7qE54xk0Sw Host.log std.err std.out
31
Interlocution WMS - Workload Management System CREAM - Computing Resource Execution And Management Example Simple case for WMS Simple case for CREAM 31 Outline
32
Job Description Language for CREAM Ref gLite-3.2 User Guide 32 [hkw00@ui03 cream]$ ls checkHost.sh Host_cream.jdl [hkw00@ui03 cream]$ cat Host_cream.jdl JobType = "Normal"; CPUNumber = 1; Executable = "checkHost.sh”; StdOutput = "std.out"; StdError = "std.err”; InputSandBox = {"/home/hkw00/HAII/ce/cream/checkHost.sh"}; OutputSandBox = {"Host.log"}; OutputSandboxDestURI = {"gsiftp://as- ds01.euasiagrid.org/dpm/euasiagrid.org/home/euasia/hkw00/"}; RetryCount = 5; [hkw00@ui03 cream]$ cat checkHost.sh #!/bin/sh echo "HOST: `hostname`" >> Host.log printenv >> Host.log
33
Example CREAM Ref gLite-3.2 User Guide 33 [hkw00@ui03 cream]$ lcg-infosites --vo euasia ce # CPU Free Total Jobs Running Waiting ComputingElement ---------------------------------------------------------------- 256 155 0 0 0 as-ce01.euasiagrid.org:8443/cream-pbs-euasia 64 48 0 0 0 ce-qamar.utmgrid.utm.my:8443/cream-pbs-euasia 56 50 0 0 0 ce.utmgrid.utm.my:8443/cream-pbs-euasia (...) hkw00@ui03 cream]$ glite-ce-job-submit -r as-ce01.euasiagrid.org:8443/cream-pbs- euasia -a Host_cream.jdl https://as-ce01.euasiagrid.org:8443/CREAM378465856 [hkw00@ui03 cream]$ glite-ce-job-status https://as- ce01.euasiagrid.org:8443/CREAM378465856 ****** JobID=[https://as-ce01.euasiagrid.org:8443/CREAM378465856] Status = [DONE-OK] ExitCode = [0]
34
Example CREAM Ref gLite-3.2 User Guide 34 [hkw00@ui03 cream]$ lcg-ls srm://as- ds01.euasiagrid.org/dpm/euasiagrid.org/home/euasia/hkw00/ /dpm/euasiagrid.org/home/euasia/hkw00//Host.log (...) [hkw00@ui03 cream]$ lcg-cp srm://as- ds01.euasiagrid.org/dpm/euasiagrid.org/home/euasia/hkw00/Host.log file:`pwd`/Host.log [hkw00@ui03 cream]$ ls checkHost.sh Host_cream.jdl Host.log
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.