Presentation is loading. Please wait.

Presentation is loading. Please wait.

Workload Management System on gLite middleware

Similar presentations


Presentation on theme: "Workload Management System on gLite middleware"— Presentation transcript:

1 Workload Management System on gLite middleware
Valeria Ardizzone INFN EGEE User Tutorial Bologna, June 2007

2 Outline Overview of WMS Architecture Job Description Language Overview
Task Queue, Information Supermarket, MatchMaker, Scheduling Policies, Job Submission Service, Job Logging & Bookkeeping. Job Description Language Overview Basic attributes Advanced attributes Practice Command line Exercises

3 Workload Management System (WMS)
Is the gLite3 component that allows users to submit jobs. Performs all tasks required to execute jobs. Comprises a set of Grid middleware components responsible for distribution and management of tasks across Grid resources. Hides to the user the complexity of the Grid.

4 WMS’s Architecture

5 cancellation) expressed
WMS’s Architecture Job management requests (submission, cancellation) expressed via a Job Description Language (JDL)

6 WMS’s Architecture Finds an appropriate CE for each submission
request, taking into account job requests and preferences, Grid status, utilization policies on resources

7 WMS’s Architecture Repository of resource information
available to matchmaker Updated via notifications and/or active polling on resources The information stored within the Information Supermarket are updated by the ISM Updater. Periodically each resource stored in the Information Supermarket is queried in order to update its status. If the resource answers correctly the relative information hosted by the Information Supermarket will be updated, otherwise the entry will be removed. If the MATCHMAKER doesn’t find any compatible resources contacting the Information Supermarket, it try to satisfy the request contacting the Information System. If during this second interaction a valid resource has found a copy of it is stored within the IS in order to be later used for other request.

8 immediately available
WMS’s Architecture Keeps submission requests Requests are kept for a while if no resources are immediately available It’s allows to store for a while also all the requests pending because no grid resources are immediately available.

9 WMS’s Architecture Performs the actual job submission and monitoring

10 WMS Components (1) Network Server NS - WMProxy
Accepts incoming requests from the UI (job submission, job removal) If valid, passes them to the Workload Manager

11 WMS Components (2) Workload Manager WM Core component of the WMS
Takes appropriate actions to satisfy requests Resource Broker (MatchMaker) RB Finds the resources that best match the request Information SuperMarket ISM Repository of resource information available in readonly mode to the RB Task Queue Give the possibility to keep the request if no resources are immediatelly avalaible Not matching request will be retried periodically (eager scheduling) Or wait for notification of avalaible resources (lazy scheduling)

12 WMS Components (3) eager scheduling (“push” model)
a job is bound to a resource as soon as possible. Once the decision has been taken, the job is passed to the selected resource for execution. lazy scheduling (“pull” model) the job is held by the WM until a resource becomes available. When this happens the resource is matched against the submitted job.

13 WMS Components (4) transfer of the input and of the output sandboxes
WMS components handling the job during its lifetime and performs the submission Job Adapter (JA) is responsible for making the final touches to the JDL expression for a job, before it is passed to CondorC for the actual submission creating the job wrapper script that creates the appropriate execution environment in the CE worker node transfer of the input and of the output sandboxes CondorC responsible for performing the actual job management operations job submission, job removal With the last release of middleware users can create and submit to the grid workflow jobs; in particular user can define the dependencies between differents nodes of the graph.

14 WMS Components (5) Log Monitor (LM) is responsible for
watching the CondorC log file intercepting interesting events concerning active jobs Proxy Renewal Service is responsible to assure that, for all the lifetime of a job, a valid user proxy exists within the WMS MyProxy Server is contacted in order to renew the user's credential Logging & Bookkeeping (LB) is responsible to Store events generated by the variuos components of the WMS Querying the LB user can retrieve information about the job status

15 Jobs State Machine (1/9) Submitted job is entered by the user to the User Interface but not yet transferred to Network Server for processing In the next few slides I will show you how the job’ status changes during its lifetime. The starting point, as I said before, is rappresented by a User that from a trusted UI submit to the grid a request for job submission in order to be executed in one of the grid resource. Request that has to be expressed using the JDL. With this languages user can also specify a list of input file that needs to be transfered from the UI to the WN. We said that the job status is SUBMITTED when it has been entered to the grid but it has not yet received by the Network Server.

16 Jobs State Machine (2/9) Waiting job accepted by NS and waiting for Workload Manager processing or being processed by WMHelper modules. When the job request reaches the WMS the Match-Maker process starts to search the best resource where this request can be satisfyied. To do so, the MM interacts with the BDII in order to retrieve the status of all the grid resources, and with the Catalog in order to detect the location of data if the job needs to use file that have been stored within a Storage Element.

17 Jobs State Machine (3/9) Ready job processed by WM but not yet transferred to the CE (local batch system queue). If at the end of the matchmaking algorithm the available resource has been found, the JA starts to preparate the wrapper script in order to correctly set the environment on the WN and the job’ status become READY.

18 Jobs State Machine (4/9) Scheduled job waiting in the queue on the CE.
When the job’ status becomes Scheduled this means that the job has reached the grid resource and that now it’s waiting the the CE will dispatch it in one of the available worker node in order to start its execution.

19 Jobs State Machine (5/9) Running job is running on Worker Node.

20 Jobs State Machine (6/9) Done job exited or considered to be in a terminal state by CondorC (e.g., submission to CE has failed in an unrecoverable way).

21 Jobs State Machine (7/9) Aborted job processing was aborted by WMS (waiting in the WM queue or CE for too long, expiration of user credentials). If something goes wrong, for example the job stays too long in a CE’s queue, or the user’s credentials expires, the job’ status becomes Aborted.

22 Jobs State Machine (8/9) Cancelled job has been successfully canceled on user request. Job’ status is CANCELLED if it has been succesfully deleted by the user.

23 Jobs State Machine (9/9) Cleared output sandbox was transferred to
the user or removed due to the timeout. Job’ status become CLEARED if the output files have been correctly retrieved or removed due to the timeout.

24 Job Description Language

25 The JDL language The Job Description Language (JDL) describes jobs for execution on Grid. The JDL adopted within the gLite middleware is based upon Condor’s CLASSified Advertisement language (ClassAd). A ClassAd is a record-like structure composed of a finite number of attributes separated by semi-colon (;) A ClassAd is highly flexible and can be used to represent arbitrary services The JDL file is processed by the “Match-making process” to select the best resource that satisfy the job’s requirements

26 The JDL language The JDL file lines have the format :
Attribute = expression; 2 categories of attributes: Job Attributes define the job itself Resources indicate the job constraints in terms of: Computing Resource Data and Storage resources Comments are indicated by # or // The JDL is sensitive to blank characters and tabs. No blank characters or tabs should follow the semicolon at the end of a line.

27 JDL : basic attributes [ Executable = “test.sh”;
In a JDL, some attributes are mandatory while others are optional. An “essential” JDL is the following: [ Executable = “test.sh”; StdOutput = “std.out”; StdError = “std.err”; InputSandbox = {“test.sh”}; OutputSandbox = {“std.out”,”std.err”}; ] If needed, arguments to the executable can be passed: Arguments = “arguments list”;

28 JDL : basic attributes Executable = “test.sh”; StdOutput = “std.out”;
StdError = “std.err”; InputSandbox = {“test.sh”}; OutputSandbox = {“std.out”,”std.err”}; Executable = < string > (mandatory) represents the execetable/command name you can specify an executable that: already exixts on the remote WN will be copied from the UI to the WN the arguments are reported in a specific attribute

29 JDL : basic attributes Arguments = < string > (optional)
arguments for executable file: “-out outputfile.dat” with: Executable = “execprog”; on the Worker Node (WN) we will have: $ execprog -out outputfile.dat the characters “” should be preceded by \ “ -a \”quoted string\” -bcd” becomes: $ execprog -a ”quoted string” –bcd Special characters (&, |, >, <) should be preceded by triple \ : Arguments = "-f file1\\\&file2";

30 JDL : basic attributes Executable = “test.sh”; StdOutput = “std.out”;
StdError = “std.err”; InputSandbox = {“test.sh”}; OutputSandbox = {“std.out”,”std.err”}; StdOutput, StdError, StdInput = < string > (optional) paths of the output / error / input files StdOutput and StdError: must be also in Output Sandbox could have the same value

31 JDL : basic attributes Executable = “test.sh”; StdOutput = “std.out”;
StdError = “std.err”; InputSandbox = {“test.sh”}; OutputSandbox = {“std.out”,”std.err”}; InputSandbox = < string | string list > (optional) contains the input files to be copied from the UI on the WN before the job execution only local UI files (for LFNs use the InputData attribute) the files can’t be over 10 MB each different files with different names (the destination dir is the same)

32 JDL : basic attributes Executable = “test.sh”; StdOutput = “std.out”;
StdError = “std.err”; InputSandbox = {“test.sh”}; OutputSandbox = {“std.out”,”std.err”}; OutputSandbox = < string | string list > contains the output files to be transferred from the WN on the UI after the job execution different files with different names (the destination dir is the same)

33 JDL : advanced attributes
Requirements (mandatory) Job requirements on Grid resources (CE,SE,…) Evaluation performed by the Match Maker Specified using attributes published by the Information Service If not specified, the default value is: Requirements = other.GlueCEStateStatus == "Production“; Examples: Requirements = other.GlueCEUniqueID == “adc006.cern.ch:2119/jobmanager-pbs-infinite” Requirements = Member(“ALICE ”, other.GlueHostApplicationSoftwareRunTimeEnviron ment); Requirements = other.GlueCEInfoTotalCPUs > 2 && other.GlueCEPolicyMaxRunningJobs < 2;

34 JDL : advanced attributes
Rank (mandatory) Floating-Point expression used to rank CEs that have already met the Requirements expression. can contain attributes that describe the CE in the Information System (IS). evaluation performed by the Resource Broker (RB) during the match-making phase. A higher numeric value equals a better rank. If not specified, the default value is: Rank = -other.GlueCEStateEstimatedResponseTime; E.g.: Rank = other.GlueCEStateFreeCPUs;

35 JDL : advanced attributes
Environment = < string | string list > (optional) environment variables strings format: < variable name > = < string > example: Environment = { “JOB_LOG_FILE=/tmp/job.log”, “INP_DIR=/tmp/input_files” };

36 References EGEE User Guide
JDL Attributes Attributes-v0-8.pdf Exercises on GILDA Wiki: WithRB withedgcommands


Download ppt "Workload Management System on gLite middleware"

Similar presentations


Ads by Google