Presentation is loading. Please wait.

Presentation is loading. Please wait.

The gLite Workload Management System

Similar presentations


Presentation on theme: "The gLite Workload Management System"— Presentation transcript:

1 The gLite Workload Management System
Giuseppe La Rocca INFN Catania Italy First International Grid School for Industrial Applications 30 June - 07 July 2007 Acitrezza

2 Outline This presentation will cover the following arguments:
Overview of the gLite WMS Architecture Job Description Language Overview - Principal Attributes References and hands-on I’ll show you how the WMS works First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza

3 Overview of gLite Middleware
So a grid middleware is a layer that allow you to run in a transparent way your application in a specific grid resource who belong to a specific VO. First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza

4 Workload Management System
Workload Management System (WMS) comprises a set of Grid middleware components responsible for distribution and management of tasks across Grid resources. Purpose of Workload Manager (WM) is accept and satisfy requests for job management coming from its clients meaning of the submission request is to pass the responsibility of the job to the WM. WM will pass the job to an appropriate CE for executiontaking into account requirements and the preferences expressed in the job description. The decision of which resource should be used is the outcome of a matchmaking process. First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza

5 Job submission LFC Logging & Book-keeping UI Information System
Network Server LFC UI Information System Job Contr. - Condor CE characts & status SE characts & status Storage Element Computing Element Logging & Book-keeping First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza Footer 5

6 LFC Logging & Book-keeping UI UI: allows users to
Job Status Network Server LFC UI Information System UI: allows users to access the functionalities of the WMS (via command line, GUI, C++ and Java APIs) WMS: Workload Management System Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element Logging & Book-keeping First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza Footer 6

7 edg-job-submit myjob.jdl myjob.jdl
JobType = “Normal”; Executable = "$(CMS)/exe/sum.exe"; InputSandbox = {"/home/user/WP1testC","/home/file*”}; OutputSandbox = {“sim.err”, “test.out”, “sim.log"}; Requirements = other. GlueHostOperatingSystemName == “linux" && other.GlueCEPolicyMaxCPUTime > 10000; Rank = other.GlueCEStateFreeCPUs; Job Status RB node submitted Network Server LFC UI Information System Job Description Language (JDL) to specify job characteristics and requirements Job Contr. - CondorG Job Status CE characts & status SE characts & status LB is the component used to logs all the job management events Computing Element Storage Element Logging & Book-keeping First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza Footer 7

8 LFC Logging & Book-keeping NS: network daemon
responsible for accepting incoming requests Job Status submitted Network Server LFC Job UI Input Sandbox files WMS storage Information System Job Contr. - CondorG Job Status CE characts & status SE characts & status Computing Element Storage Element Logging & Book-keeping First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza Footer 8

9 LFC Logging & Book-keeping UI WMS WM: responsible to take
Job Status waiting submitted Network Server LFC UI Match- Maker/ Broker Job WMS storage Where must this job be executed ? Information System WM: responsible to take the appropriate actions to satisfy the request Job Contr. - CondorG Job Status CE characts & status SE characts & status Computing Element Storage Element Logging & Book-keeping First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza Footer 9

10 LFC Logging & Book-keeping Matchmaker: responsible
Where are (which SEs) the needed data ? Job Status waiting submitted Network Server LFC Matchmaker: responsible to find the “best” CE where to submit a job UI Match- Maker/ Broker WMS storage Information System What is the status of the Grid ? Job Contr. - Condor Job Status CE characts & status SE characts & status Computing Element Storage Element Logging & Book-keeping First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza Footer 10

11 LFC Logging & Book-keeping UI WMS Information System waiting submitted
Job Status waiting submitted Network Server LFC UI Match- Maker/ Broker WMS storage CE choice Information System Job Contr. - Condor Job Status CE characts & status SE characts & status Computing Element Storage Element Logging & Book-keeping First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza Footer 11

12 LFC Logging & Book-keeping UI WMS Information System
Job Status waiting submitted Network Server LFC UI WMS storage Information System Job Adapter Job Contr. - Condor Job Status CE characts & status JA: responsible for the final “touches” to the job before it’s passed to Condor (e.g. creation of wrapper script, etc.) SE characts & status . Computing Element Storage Element Logging & Book-keeping First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza Footer 12

13 LFC Logging & Book-keeping UI WMS Information System
Job Status submitted waiting ready Network Server LFC UI WMS storage Information System Job Job Contr. - Condor Job Status JC: responsible for the actual job management operations (done via CondorG) CE characts & status SE characts & status JC is responsible Computing Element Storage Element Logging & Book-keeping First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza Footer 13

14 LFC Logging & Book-keeping UI Information System Network Server
Job Status Network Server submitted waiting ready scheduled LFC UI WMS storage Information System Job Contr. - Condor Job Status Input Sandbox files CE characts & status SE characts & status Job Computing Element Storage Element Logging & Book-keeping First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza Footer 14

15 LFC Logging & Book-keeping UI WMS Information System Network Server
Job Status Network Server submitted waiting ready scheduled running LFC UI WMS storage Information System Job Contr. - Condor Job Status Input Sandbox “Grid enabled” data transfers/ accesses Computing Element Storage Element Logging & Book-keeping Job First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza Footer 15

16 LFC Logging & Book-keeping UI WMS Information System submitted waiting
Job Status submitted waiting ready scheduled running done Network Server LFC UI WMS storage Information System Job Contr. - Condor Job Status Output Sandbox files Computing Element Storage Element Logging & Book-keeping First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza Footer 16

17 LFC Logging & Book-keeping edg-job-get-output <dg-job-id> UI WMS
Job Status edg-job-get-output <dg-job-id> submitted waiting ready scheduled running done Network Server LFC UI WMS storage Information System Job Contr. - Condor Job Status Output Sandbox Computing Element Storage Element Logging & Book-keeping First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza Footer 17

18 LFC Logging & Book-keeping UI WMS Information System Network Server
Job Status Network Server submitted LFC UI waiting Output Sandbox files ready WMS storage Information System scheduled Job Contr. - Condor running Job Status done cleared Computing Element Storage Element Logging & Book-keeping First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza Footer 18

19 Possible job states Flag Meaning SUBMITTED submission logged in the LB
WAIT job match making for resources READY job being sent to executing CE SCHEDULED job scheduled in the CE queue manager RUNNING job executing on a WN of the selected CE queue DONE job terminated without grid errors CLEARED job output retrieved ABORT job aborted by middleware, check reason First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza Footer 19

20 Workload Management System
LFC Catalogue “User interface” Input “sandbox” DataSets info Information Service Output “sandbox” SE & CE info Resource Broker (WorkLoad Mgr.) Author. &Authen. Job Submit Event Input “sandbox” + Broker Info Output “sandbox” Job Query Job Status Publish Storage Element Logging & Book-keeping Computing Element Job Status First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza Footer 20

21 Command Line Interface
Job Submission $ edg-job-submit [options] <jdl_file> --vo <vo name> : perform submission with a different VO than the UI default one. --output, -o <output file> save jobId on a file. --resource, -r <resource value> specify the resource for execution. --nomsgi neither message nor errors on the stdout will be displayed. First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza

22 If the request has been correctly submitted this is the typical output that you can get:
edg-job-submit test.jdl ====================edg-job-submit Success ===================== The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is: - ============================================================== In case of failure, an error message will be displayed instead, and an exit status different form zero will be retured. First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza

23 If the command returns the following error message:
**** Error: API_NATIVE_ERROR **** Error while calling the "NSClient::multi" native api AuthenticationException: Failed to establish security context... **** Error: UI_NO_NS_CONTACT **** Unable to contact any Network Server it means that there are authentication problems between the UI and the Network Server (check your proxy or contact the site administrator). First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza

24 It is possible to see which CEs are eligible to run a job specified by a given JDL file using the command edg-job-list-match test.jdl Connecting to host lxshare0380.cern.ch, port 7772 Selected Virtual Organisation name (from UI conf file): dteam ********************************************************************* COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found: adc0015.cern.ch:2119/jobmanager-lcgpbs-infinite adc0015.cern.ch:2119/jobmanager-lcgpbs-long adc0015.cern.ch:2119/jobmanager-lcgpbs-short ********************************************************************** First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza

25 After a job is submitted, it is possible to see its status using the glite-job-status command.
edg-job-status ************************************************************* BOOKKEEPING INFORMATION: Printing status info for the Job: Current Status: Scheduled Status Reason: Job successfully submitted to Globus Destination: lxshare0277.cern.ch:2119/jobmanager-pbs-infinite reached on: Fri Aug 1 12:21: First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza

26 output using the - -dir <path name> option.
After the job has finished (it reaches the DONE status), its output can be copied to the UI edg-job-get-output Retrieving files from host lxshare0234.cern.ch ***************************************************************** JOB GET OUTPUT OUTCOME Output sandbox files for the job: - have been successfully retrieved and stored in the directory: /tmp/jobOutput/larocca_snPegp1YMJcnS22yF5pFlg By default, the output is stored under /tmp/jobOutput, but it is possible to specify in which directory to save the output using the - -dir <path name> option. First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza

27 A job can be canceled before it ends using the command edg-job-cancel.
Are you sure you want to remove specified job(s)? [y/n]n :y =================== edg-job-cancel Success==================== The cancellation request has been successfully submitted for the following job(s) - =========================================================== First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza

28 In gLite Job Description Language (JDL) is used to
describe jobs for execution on Grid. The JDL adopted within the gLite middleware is based upon Condor CLASSified Advertisement language (ClassAd). A ClassAd is a record-like structure composed of a finite number of attributes separated by a semi-colon (;) A ClassAd is highly flexible and can be used to represent arbitrary services The JDL is used in gLite to specify the job’s characteristics and constrains, which are used during the match-making process to select the best resources that satisfy job’s requirements. First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza

29 The JDL syntax consists on statements like: Attribute = value;
Comments must be preceded by a sharp character ( # ) or have to follow the C++ syntax WARNING: The JDL is sensitive to blank characters and tabs. No blank characters or tabs should follow the semicolon at the end of a line. First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza

30 In a JDL, some attributes are mandatory while others are optional.
An “essential” JDL is the following: Executable = “test.sh”; StdOutput = “std.out”; StdError = “std.err”; InputSandbox = {“test.sh”,”input.dat”}; OutputSandbox = {“std.out”,”std.err”}; If needed, arguments to the executable can be passed: Arguments = “Hello World!”; First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza

31 e.g. Arguments = “\”Hello World!\“ 10”;
If the argument contains quoted strings, the quotes must be escaped with a backslash e.g. Arguments = “\”Hello World!\“ 10”; Special characters such as &, |, >, < are only allowed if specified inside a quoted string or preceded by triple \ (e.g. Arguments = "-f file1\\\&file2";) The backtick character ` cannot be specified in the JDL. First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza

32 JDL : Relevant Attributes
First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza

33 “Interactive” + “MPI” not yet permitted
JobType (optional) Normal (simple, sequential job), Interactive, MPICH, Checkpointable, Partitionable, Parametric Or combination of them Checkpointable, Interactive Checkpointable, MPI E.g. JobType = “Interactive”; JobType = {“Interactive”,”Checkpointable”}; “Interactive” + “MPI” not yet permitted The main idea behind the concept of partitionable jobs is the ability to describe the processing of a job as a set of independent steps/iterations so that the job can be thought of as composed by a set of independent sub-jobs each one taking care of a step or of a sub-set of steps, and which can be executed in parallel. First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza

34 This is a string representing the executable/command name.
Executable (mandatory) This is a string representing the executable/command name. The user can specify an executable which is already on the remote CE Executable = {“/opt/EGEODE/GCT/egeode.sh”}; The user can provide a local executable name which will be staged from the UI to the WN Executable = {“egeode.sh”}; InputSandbox = {“/home/larocca/egeode/ egeode.sh”}; First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza

35 This is a string containing all the job command line arguments.
Arguments (optional) This is a string containing all the job command line arguments. E.g.: If your executable sum has to be started as: $ sum N1 N2 –out result.out Executable = “sum”; Arguments = “N1 N2 –out result.out”; First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza

36 List of environment settings needed by the job to run properly
Environment (optional) List of environment settings needed by the job to run properly E.g. Environment = {“JAVABIN=/usr/local/java”}; InputSandbox (optional) List of files on the UI local disk needed by the job for running The listed files will automatically staged to the remote resource E.g. InputSandbox ={“myscript.sh”,”/tmp/cc,sh”}; First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza

37 List of files, generated by the job, which have to be retrieved
OutputSandbox (optional) List of files, generated by the job, which have to be retrieved E.g. OutputSandbox = { “std.out”,”std.err”, “image.png” }; First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza

38 Job requirements on computing resources
Requirements (optional) Job requirements on computing resources Specified using attributes of resources published in the Information Service If not specified, default value defined in UI config\uration file is considered Default. Requirements = other.GlueCEStateStatus == "Production“; Requirements = other.GlueCEInfoLRMSType == “PBS” && other.GlueCEInfoTotalCPUs > 2 && Member (“ALICE-2.1.7”, other.GlueHostApplicationSoftwareRunTimeEnvironment); First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza

39 E.g.: Rank = other.GlueCEStateFreeCPUs;
Rank (optional) Floating-point expression used to rank CEs that have already fulfill the Requirements expression. The Rank expression can contain attributes that describe the CE in the Information System (IS). The evaluation of the rank expression is performed by the Resource Broker (RB) during the match-making phase. A higher numeric value equals a better rank. E.g.: Rank = other.GlueCEStateFreeCPUs; First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza

40 InputData (optional) This is a string or a list of strings representing the Logical File Name (LFN) orGrid Unique Identifier (GUID) needed by the job as input. The list is used by the RB to find the CE from which the specified files can be better accessed and schedules the job to run there. InputData = { “lfn:cmstestfile”, “guid:135b7b23-4a6a-11d7-87e7-9d101f8c8b70” }; First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza

41 DataAccessProtocol (mandatory if InputData has been specified)
The protocol or the list of protocols which the application is able to “speak” with for accessing files listed in InputData on a given SE. Supported protocols in gLite are currently gsiftp, and file. DataAccessProtocol = {“file”,“gsiftp”}; First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza

42 OutputSE = “aliserv6.ct.infn.it”;
OutputSE (optional) This string representing the URI of the Storage Element (SE) where the user wants to store the output data. This attribute is used by the Resource Broker to find the bestCE “close” to this SE and schedule the job there. OutputSE = “aliserv6.ct.infn.it”; First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza

43 This attribute contains the following three attributes:
OutputData (optional) This attribute allows the user to ask for the automatic upload and registration of datasets produced by the job on the Worker Node (WN). This attribute contains the following three attributes: OutputFile StorageElement LogicalFileName First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza

44 NodeNumber (mandatory if JobType=MPICH)
NodeNumber attribute is an integer specifying the number of nodes needed for a MPI job. The RB uses this attribute during the matchmaking for selecting those CE having a number of CPUs equals or greater the one specified in NodeNumber. NodeNumber = 5; First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza

45 JobSteps (mandatory for checkpointable or partitionable jobs)
JobSteps attribute can be either an integer representing the number of steps for a checkpointable or partitionable job e.g.: JobSteps = ; or a list of strings representing labels associated to the steps of a checkpointable or partitionable job e.g.: JobSteps = {“d0”, “d1”, ”gmos”}; First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza

46 CurrentStep (mandatory for checkpointable or partitionable jobs)
CurrentStep attribute used to indicate the initial step when submitting a checkpointable or partitionable job. CurrentStep = 2; First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza

47 Remember to initialize the proxy before to interact with the WMS!
References & Hands-on JDL (sottomissione via WMS Netrwork Server) Remember to initialize the proxy before to interact with the WMS! First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza

48 Thank you for your attention !!!!
First International Grid School for Industrial Applications – 30 June, 07 July - Acitrezza


Download ppt "The gLite Workload Management System"

Similar presentations


Ads by Google