Www.eu-eela.org E-science grid facility for Europe and Latin America gLite Job Management. User and Site Admin Tutorial Elisa Ingrà – INFN Catania Dublin.

Slides:



Advertisements
Similar presentations
EGEE is a project funded by the European Union under contract IST EGEE Tutorial Turin, January Hands on Job Services.
Advertisements

INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
Job Submission The European DataGrid Project Team
Development of test suites for the certification of EGEE-II Grid middleware Task 2: The development of testing procedures focused on special details of.
E-infrastructure shared between Europe and Latin America 12th EELA Tutorial for Users and System Administrators Architecture of the gLite.
SEE-GRID-SCI Hands-On Session: Workload Management System (WMS) Installation and Configuration Dusan Vudragovic Institute of Physics.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
1 Architecture of the gLite WMS Esther Montes Prado CIEMAT 10th EELA Tutorial Madrid,
IST E-infrastructure shared between Europe and Latin America Architecture of the gLite WMS Alexandre Duarte CERN Fifth EELA.
E-infrastructure shared between Europe and Latin America Architecture of the WMS Manuel Rubio del Solar CETA-CIEMAT EELA Tutorial, Mérida,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Submission Fokke Dijkstra RuG/SARA Grid.
Special Jobs Claudio Cherubino INFN - Catania. 2 MPI jobs on gLite DAG Job Collection Parametric jobs Outline.
Querétaro (Mexico), E2GRIS – Job Description Language JDL 1.
EGEE-II INFSO-RI Enabling Grids for E-sciencE International Summer School on Grid Computing 2006 gLite Information System and Workload.
INFSO-SSA International Collaboration to Extend and Advance Grid Education Architettura del Workload Management System. Descrizione del Job Description.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Special Jobs Matias Zabaljauregui UNLP.
E-science grid facility for Europe and Latin America Architettura del Workload Management System. Descrizione del Job Description Language.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Luciano Díaz ICN-UNAM Based on Domenico.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) WMPROXY API Python & C++ Diego Scardaci
Grid Initiatives for e-Science virtual communities in Europe and Latin America The Job Description Language JDL 1.
The gLite API – PART I Giuseppe LA ROCCA INFN Catania ACGRID-II School 2-14 November 2009 Kuala Lumpur - Malaysia.
INFSO-RI Enabling Grids for E-sciencE GILDA Praticals GILDA Tutors INFN Catania ICTP/INFM-Democritos Workshop on Porting Scientific.
Enabling Grids for E-sciencE Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite job submission Fokke Dijkstra Donald.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Using gLite API Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers.
INFSO-RI Enabling Grids for E-sciencE The gLite Workload Management System Elisabetta Molinari (INFN-Milan) on behalf of the JRA1.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Feb. 06, Introduction to High Performance and Grid Computing Faculty of Sciences,
Architecture of the gLite WMS (Workload Management System) Hands-on Paola Celio Universita’ Roma TRE INFN Roma TRE Sevilla Septembre 2007.
EGEE is a project funded by the European Union under contract IST EGEE Tutorial Turin, January Job Services Emidio.
Job Management DIRAC Project. Overview  DIRAC JDL  DIRAC Commands  Tutorial Exercises  What do you have learned? KEK 10/2012DIRAC Tutorial.
INFSO-RI Enabling Grids for E-sciencE Workflow Management in Giuseppe La Rocca INFN – Catania ICTP/INFM-Democritos Workshop on Porting.
E-infrastructure shared between Europe and Latin America 1 Workload Management System-WMS Luciano Diaz Universidad Nacional Autónoma de México - UNAM Mexico.
INFSO-RI Enabling Grids for E-sciencE Claudio Cherubino, INFN Catania Grid Tutorial for users Merida, April 2006 Special jobs.
High-Performance Computing Lab Overview: Job Submission in EDG & Globus November 2002 Wei Xing.
INFSO-RI Enabling Grids for E-sciencE Job Workflows with gLite Emidio Giorgio INFN NA4 Generic Applications Meeting 10 January 2006.
INFSO-RI Enabling Grids for E-sciencE Job Submission Tutorial (material from INFN Catania)
Workload Management System Jason Shih WLCG T2 Asia Workshop Dec 2, 2006: TIFR.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMPROXY usage Álvaro Fernández IFIC (CSIC)
INFSO-RI Enabling Grids for E-sciencE EGEE is a project funded by the European Union under contract IST Job sandboxes.
INFSO-RI Enabling Grids for E-sciencE Job Description Language (JDL) Giuseppe La Rocca INFN First gLite tutorial on GILDA Catania,
INFSO-RI Enabling Grids for E-sciencE GILDA Praticals Giuseppe La Rocca INFN – Catania gLite Tutorial at the EGEE User Forum CERN.
E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA Special Jobs Valeria Ardizzone INFN - Catania.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Practical using WMProxy advanced job submission.
EGEE 3 rd conference - Athens – 20/04/2005 CREAM JDL vs JSDL Massimo Sgaravatto INFN - Padova.
Biomed tutorial 1 Enabling Grids for E-sciencE INFSO-RI EGEE is a project funded by the European Union under contract IST JDL Flavia.
LCG2 Tutorial Viet Tran Institute of Informatics Slovakia.
Job Management Beijing, 13-15/11/2013. Overview Beijing, /11/2013 DIRAC Tutorial2  DIRAC JDL  DIRAC Commands  Tutorial Exercises  What do you.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Advanced Job Riccardo Rotondo
Introduction to Computing Element HsiKai Wang Academia Sinica Grid Computing Center, Taiwan.
Introduction to Job Description Language (JDL) Alessandro Costa INAF Catania Corso di Calcolo Parallelo Grid Computing Catania - ITALY September.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
FESR Trinacria Grid Virtual Laboratory Practical using WMProxy advanced job submission Emidio Giorgio INFN Catania.
Practical using C++ WMProxy API advanced job submission
Workload Management System on gLite middleware
Special jobs with the gLite WMS
gLite Advanced Job Management
Workload Management System ( WMS )
EGEE tutorial, Job Description Language - more control over your Job Assaf Gottlieb Tel-Aviv University EGEE is a project.
Alexandre Duarte CERN Fifth EELA Tutorial Santiago, 06/09-07/09,2006
Workload Management System
gLite Job Management Mario Reale GARR
gLite Advanced Job Management
gLite Job Management Amina KHEDIMI CERIST
gLite Job Management Christos Theodosiou
Job Description Language
GENIUS Grid portal Hands on
Job Description Language (JDL)
Job Submission M. Jouvin (LAL-Orsay)
Presentation transcript:

E-science grid facility for Europe and Latin America gLite Job Management. User and Site Admin Tutorial Elisa Ingrà – INFN Catania Dublin (Ireland), September 2008

Dublin (Ireland), Tutorial for User and Site Admin., – Overview of gLite Middleware

Dublin (Ireland), Tutorial for User and Site Admin., – Outline The Workload Management System (WMS) is the gLite component that allows users to submit jobs, and performs all tasks required to execute them, without exposing the user to the complexity of the Grid. –It is the responsibility of the user to describe his jobs and their requirements, and to retrieve the output when the jobs are finished. In the WLCG/EGEE Grid, two different workload management systems are deployed: the legacy LCG-2 system and the new system from the EGEE project, which is an evolution of the former and therefore has more functionalities. In the following sections, we will describe the basic concepts of the language used to describe a job and the basic command line interface to submit and manage simple jobs and special jobs.

Dublin (Ireland), Tutorial for User and Site Admin., – Workload Management System (WMS) comprises a set of Grid middleware components responsible for distribution and management of tasks across Grid resources. Purpose of Workload Manager (WM) is accept and satisfy requests for job management coming from its clients meaning of the submission request is to pass the responsibility of the job to the WM. WM will pass the job to an appropriate CE for execution taking into account requirements and the preferences expressed in the job description. The decision of which resource should be used is the outcome of a matchmaking process. Workload Management System

Dublin (Ireland), Tutorial for User and Site Admin., – Job Description Language The Job Description Language (JDL) is a high-level language based on the Classified Advertisement (ClassAd) language, used to describe jobs and aggregates of jobs with arbitrary dependency relations. –The JDL is used in WLCG/EGEE to specify the desired job characteristics and constraints, which are taken into account by the WMS to select the best resource to execute the job. –A job description is a file (called JDL file) consisting of lines having the format: attribute = expression; –Expressions can span several lines, but only the last one must be terminated by a semicolon. Literal strings are enclosed in double quotes. If a string itself contains double quotes, they must be escaped with a backslash (e.g.: Arguments = "\"hello\" 10“; ).

Dublin (Ireland), Tutorial for User and Site Admin., – Job Description Language The character “ ‘ ” cannot be used in the JDL. Comments must be preceded by a sharp character (#) or a double slash (//) at the beginning if each line. Multi-line comments must be enclosed between “/*” and “*/”. Attention! The JDL is sensitive to blank characters and tabs. No blank characters or tabs should follow the semicolon at the end of a line.

Dublin (Ireland), Tutorial for User and Site Admin., – Simple example Executable = "/bin/hostname"; StdOutput = "std.out"; StdError = "std.err"; The Executable attribute specifies the command to be run by the job. If the command is already present on the WN, it must be expressed as a absolute path; if it has to be copied from the UI, only the file name must be specified, and the path of the command on the UI should be given in the InputSandbox attribute. Executable = "test.sh"; InputSandbox = {"/home/doe/test.sh"}; StdOutput = "std.out"; StdError = "std.err";

Dublin (Ireland), Tutorial for User and Site Admin., – The Arguments attribute can contain a string value, which is taken as argument list for the executable: Arguments = "fileA 10"; In the Executable and in the Arguments attributes it may be necessary to use special characters, such as &, \, |, >, <. These characters should be preceded by triple \ in the JDL, or specified inside quoted strings e.g.: Arguments = "-f file1\\\&file2"; The attributes StdOutput and StdError define the name of the files containing the standard output and standard error of the executable, once the job output is retrieved.

Dublin (Ireland), Tutorial for User and Site Admin., – If files have to be copied from the UI to the execution node, they must be listed in the InputSandbox attribute: InputSandbox = {"test.sh",.., "fileN"}; The files to be transferred back to the UI after the job is finished can be specified using the OutputSandbox attribute: OutputSandbox = {"std.out","std.err"}; The InputSandbox cannot contain two files with the same name, even if they have a different absolute path, as when transferred they would overwrite each other.

Dublin (Ireland), Tutorial for User and Site Admin., – The shell environment of the job can be modified using the Environment attribute. Environment = {"CMS_PATH=$HOME/cms", "CMS_DB=$CMS_PATH/cmdb"}; The VirtualOrganisation attribute can be used to explicitly specify the VO of the user: VirtualOrganisation = “gilda";

Dublin (Ireland), Tutorial for User and Site Admin., – JobType – Normal (simple, sequential job), Interactive, MPICH, Checkpointable, Partitionable, Parametric –Or combination of them  Checkpointable, Interactive  Checkpointable, MPI JobType = “Interactive”; JobType = {“Interactive”,”Checkpointable”}; “Interactive” + “MPI” not yet permitted

Dublin (Ireland), Tutorial for User and Site Admin., – The Requirements attribute can be used to express constraints on the resources where the job should run. –Its value is a Boolean expression that must evaluate to true for a job to run on that specific CE. Note: Only one Requirements attribute can be specified (if there are more than one, only the last one is considered).If several conditions must be applied to the job, then they all must be combined in a single Requirements attribute. For example, let us suppose that the user wants to run on a CE using PBS as batch system, and whose WNs have at least two CPUs. He will write then in the job description file: Requirements = other.GlueCEInfoLRMSType == "PBS" && other.GlueCEInfoTotalCPUs > 1;

Dublin (Ireland), Tutorial for User and Site Admin., – The WMS can be also asked to send a job to a particular queue in a CE with the following expression: Requirements = other.GlueCEUniqueID == "lxshare0286.cern.ch:2119/jobmanager-pbs-short"; It is also possible to use regular expressions when expressing a requirement. –Let us suppose for example that the user wants all his jobs to run on any CE in the domain cern.ch. This can be achieved putting in the JDL file the following expression: Requirements = RegExp("cern.ch",other.GlueCEUniqueID); The opposite can be required by using: Requirements = (!RegExp("cern.ch", other.GlueCEUniqueID));

Dublin (Ireland), Tutorial for User and Site Admin., – If the job duration is significant, it is strongly advised to put a requirement on the maximum CPU time, or the wallclock time (expressed in minutes), needed for the job to complete. –For example, to express the fact that the job needs at least 8 CPU hours and 20 wallclock hours: Requirements = other.GlueCEPolicyMaxCPUTime > 480 && other.GlueCEPolicyMaxWallClockTime > 720; It is possible to have the WMS automatically resubmitting jobs which, for some reason, are aborted by the Grid. The user can limit the number of times the WMS should resubmit a job by using the JDL attributes RetryCount. RetryCount = 7; or RetryCount = 0;

Dublin (Ireland), Tutorial for User and Site Admin., – The proxy renewal feature of the WMS is automatically enabled, as long as the user has stored a longterm proxy in the default MyProxy server (usually defined in the MYPROXY SERVER ) environment variable. However it is possible to indicate to the WMS a different MyProxy server in the JDL file: MyProxyServer = “grid001.ct.infn.it";

Dublin (Ireland), Tutorial for User and Site Admin., – The choice of the CE where to execute the job, among all the ones satisfying the requirements, is based on the rank of the CE, a quantity expressed as a floating-point number. The CE with the highest rank is the one selected. –By default, the rank is equal to other.GlueCEStateEstimatedResponseTime, where the estimated response time is an estimation of the time interval between the job submission and the beginning of the job execution. – Rank = other.GlueCEStateFreeCPUs; which will rank best the CE with the most free CPUs.

Dublin (Ireland), Tutorial for User and Site Admin., – The WMProxy is the service responsible to provide access to the WMS functionality through a Web Service Interface The gLite WMProxy Server can be either accessed directly through the published WSDL, the C++ command line interface, or the API has been designed to efficiently handle a large number of requests for job submission and control to the WMS –it provides additional features such as bulk submission and the support for shared and compressed sandboxes for compound jobs. –It’s the natural replacement of the NS in the passage to the SOA approach. The WMProxy

Dublin (Ireland), Tutorial for User and Site Admin., – gLite WMS Architecture

Dublin (Ireland), Tutorial for User and Site Admin., – Job management requests (submission, cancellation) expressed via a Job Description Language (JDL) gLite WMS Architecture

Dublin (Ireland), Tutorial for User and Site Admin., – Finds an appropriate CE for each submission request, taking into account job requests and preferences, Grid status, utilization policies on resources gLite WMS Architecture

Dublin (Ireland), Tutorial for User and Site Admin., – Keeps submission requests Requests are kept for a while for a while if no resources are immediately available gLite WMS Architecture

Dublin (Ireland), Tutorial for User and Site Admin., – Repository of resource information information available to matchmaker Updated via notifications and/or active polling on resources gLite WMS Architecture

Dublin (Ireland), Tutorial for User and Site Admin., – Performs the actual job submission and monitoring gLite WMS Architecture

Dublin (Ireland), Tutorial for User and Site Admin., – The LB is responsible to: - Stores events generated by the various components of the WMS - Querying the LB user can retrieve information about the status of the job gLite WMS Architecture

Dublin (Ireland), Tutorial for User and Site Admin., – The Information Supermarket ISM represents one of the most notable improvements in the WM The ISM basically consists of a repository of resource information that is available in read only mode to the matchmaking engine – the update is the result of  the arrival of notifications  active polling of resources  some arbitrary combination of both

Dublin (Ireland), Tutorial for User and Site Admin., – The Task Queue The Task Queue represents the second most notable improvement in the WM internal design – possibility to keep a submission request for a while if no resources are immediately available that match the job requirements  technique used by the AliEn and Condor systems Non-matching requests – will be retried either periodically  eager scheduling approach – or as soon as notifications of available resources appear in the ISM  lazy scheduling approach

Dublin (Ireland), Tutorial for User and Site Admin., – Job Submission Services WMS components responsibe to handle the job during its lifetime and performs the submission Job Adapter (JA) – is responsible for  making the final touches to the JDL expression for a job, before it is passed to CondorC for the actual submission  creating the job wrapper script that creates the appropriate execution environment in the CE worker node transfer of the input and of the output sandboxes CondorC – responsible for  performing the actual job management operations job submission, job removal DAGMan – meta-scheduler  purpose is to navigate the graph  determine which nodes are free of dependencies  follow the execution of the corresponding jobs

Dublin (Ireland), Tutorial for User and Site Admin., – Log Monitor (LM) – is responsible for  watching the CondorC log file  intercepting interesting events concerning active jobs Proxy Renewal Service – is responsible to assure that,  for all the lifetime of a job, a valid user proxy exists within the WMS  MyProxy Server is contacted in order to renew the user's credential Logging & Bookkeeping (LB) – is responsible to  Stores events generated by the various components of the WMS  Querying the LB user can retrieve information about the status of the job Job Submission Services

Dublin (Ireland), Tutorial for User and Site Admin., – Jobs State Machine (1/9) Submitted job is entered by the user to the User Interface

Dublin (Ireland), Tutorial for User and Site Admin., – Jobs State Machine (2/9) Waiting job accepted and waiting for Workload Manager processing.

Dublin (Ireland), Tutorial for User and Site Admin., – Jobs State Machine (3/9) Ready job processed by WM but not yet transferred to the CE (local batch system queue).

Dublin (Ireland), Tutorial for User and Site Admin., – Jobs State Machine (4/9) Scheduled job waiting in the queue on the CE.

Dublin (Ireland), Tutorial for User and Site Admin., – Jobs State Machine (5/9) Running job is running.

Dublin (Ireland), Tutorial for User and Site Admin., – Jobs State Machine (6/9) Done job exited or considered to be in a terminal state by CondorC (e.g., submission to CE has failed in an unrecoverable way).

Dublin (Ireland), Tutorial for User and Site Admin., – Jobs State Machine (7/9) Aborted job processing was aborted by WMS (waiting in the WM queue or CE for too long, expiration of user credentials).

Dublin (Ireland), Tutorial for User and Site Admin., – Jobs State Machine (8/9) Cancelled job has been successfully canceled on user request.

Dublin (Ireland), Tutorial for User and Site Admin., – Jobs State Machine (9/9) Cleared output sandbox was transferred to the user or removed due to the timeout.

Dublin (Ireland), Tutorial for User and Site Admin., – an useful reminder

Dublin (Ireland), Tutorial for User and Site Admin., – The Command Line Interface The gLite WMS implements two different services to manage jobs: the Network Server and the WMProxy. –The recommended method to manage jobs is through the gLite WMS via WMProxy, because it gives the best performance and allows to use the most advanced functionalities The WMProxy implements several functionalities, among which: –submission of job collections; –faster authentication; –faster match-making; –faster response time for users; –higher job throughput.

Dublin (Ireland), Tutorial for User and Site Admin., – Delegating a proxy to WMProxy Each job submitted to WMProxy must be associated to a proxy credential previously delegated by the owner of the job to the WMProxy server. –This proxy is then used any time WMProxy needs to interact with other services for job related operations (e.g. submission to the CE, a GridFTP file transfer etc.) –There are two possible mechanisms to ask for a delegation of the user credentails:  asking the “automatic” delegation of the credentials during the submission operation  asking for an “explicit“ delegation

Dublin (Ireland), Tutorial for User and Site Admin., – To explicitly delegate a user proxy to WMProxy, the command to use is: glite-wms-job-delegate-proxy -d where is a string chosen by the user. For example, to delegate a proxy: $ glite-wms-job-delegate-proxy -d mydelegID Connecting to the service ======= glite-wms-job-delegate-proxy Success ======== Your proxy has been successfully delegated to the WMProxy: with the delegation identifier: mydelegID ====================================================

Dublin (Ireland), Tutorial for User and Site Admin., – Submitting a simple job Starting from a simple JDL file, we can submit it via WMProxy by doing: $ glite-wms-job-submit –d mydelegID test.jdl Connecting to the service r ======== glite-wms-job-submit Success ======== The job has been successfully submitted to the WMProxy Your job identifier is: ============================================== glite-wms-job-submit –a test.jdl For the automatic delegation

Dublin (Ireland), Tutorial for User and Site Admin., – The command returns to the user the job identifier (jobID), which uniquely defines the job and can be used to perform further operations on the job, like interrogating the system about its status, or canceling it. The format of the jobID is: [: ]/ where is guaranteed to be unique and is the host name of the Logging and Bookkeeping (LB) server for the job, which usually sits on the WMS used to submit the job.

Dublin (Ireland), Tutorial for User and Site Admin., – To submit jobs via WMProxy, it is required to have a valid VOMS proxy, otherwise the submission will fail with an error like the following: Error - Operation failed Unable to delegate the credential to the endpoint: r User not authorized: unable to check credential permission (/opt/glite/etc/glite_wms_wmproxy.gacl) (credential entry not found) credential type: person input dn: /C=CH/O=CERN/OU=GRID/CN=John Doe Troubleshooting

Dublin (Ireland), Tutorial for User and Site Admin., – Authorization The client must be properly authorized when interacts with the WMProxy service. This means that either the FQAN or the DN (in case of globus-style proxies) of the client must be properly listed and authorized in the glite_wms_wmproxy.gacl file on the WMProxy machine. etc]# cat glite_wms_wmproxy.gacl bio/Role=NULL

Dublin (Ireland), Tutorial for User and Site Admin., – If the command returns the following error: Error - WMProxy Server Error LCMAPS failed to map user credential Method: getFreeQuota Error code: 1208 it means that there are authentication problems between the UI and the WMProxy server (you may not be authorized to use that WMProxy server). Troubleshooting

Dublin (Ireland), Tutorial for User and Site Admin., – Options The -o option allows users to specify a file to which the jobID of the submitted job will be appended. This file can be given to other job management commands to perform operations on more than one job with a single command, and it is a convenient way to keep trace of one’s jobs. The -r option is used to directly send a job to a particular CE. If used, the match making will not be carried out. –The drawback is that the BrokerInfo file, which provides information about the evolution of the job, will not be created, and therefore the use of this option is discouraged.

Dublin (Ireland), Tutorial for User and Site Admin., – A CE is identified by, which is a string with the following format: : /jobmanager- - : /blah- - where and are the host name of the machine and the port where the Grid Gate is running –(the Globus Gatekeeper for the LCG CE and CondorC+BLAH for the gLite CE) is the name of one of the corresponding LRMS queue is the LRMS type, such as lsf, pbs, condor. E.g.: adc0015.cern.ch:2119/jobmanager-lcgpbs-infinite prep-ce-01.pd.infn.it:2119/blah-lsf-atlas

Dublin (Ireland), Tutorial for User and Site Admin., – Listing CE(s) that matching a job It is possible to see which CEs are useful to run a job described by a given JDL using: $ glite-wms-job-list-match –d mydelegID --rank test.jdl Connecting to the service ==================================================== COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found: *CEId* *Rank* - CE.pakgrid.org.pk:2119/jobmanager-lcgpbs-cms 0 - grid-ce0.desy.de:2119/jobmanager-lcgpbs-cms gw-2.ccc.ucl.ac.uk:2119/jobmanager-sge-default grid-ce2.desy.de:2119/jobmanager-lcgpbs-cms -107 ====================================================

Dublin (Ireland), Tutorial for User and Site Admin., – Retrieving the status of a job $ glite-wms-job-status ************************************************************ * BOOKKEEPING INFORMATION: Status info for the Job : Current Status: Done (Success) Exit code: 0 Status Reason: Job terminated successfully Destination: ce1.inrne.bas.bg:2119/jobmanager-lcgpbs-cms Submitted: Mon Dec 4 15:05: CET *********************************************************** The verbosity level controls the amount of information provided. The value of the -v option ranges from 0 to 3. The commands to get the job status can have several jobIDs as arguments, i.e.: glite-wms-job-status... or, more conveniently, the -i option can be used to

Dublin (Ireland), Tutorial for User and Site Admin., – The --noint option suppresses the interactivity and all the jobs are considered. If the --all option is used instead, the status of all the jobs owned by the user submitting the command is retrieved. The --from / --to [MM:DD:]hh:mm[:[CC]YY] options make the command query LB for jobs that were submitted after/before the specified date and time. The --status option makes the command retrieve only the jobs that are in the specified status The --exclude option makes it retrieve jobs that are not in the specified status. the option -o the command output can be written to a file

Dublin (Ireland), Tutorial for User and Site Admin., – Cancelling a job glite-wms-job-cancel Are you sure you want to remove specified job(s) [y/n]y : y Connecting to the service ========== glite-wms-job-cancel Success ============ The cancellation request has been successfully submitted for the following job(s): - ==================================================== If the cancellation is successful, the job will terminate in status CANCELLED

Dublin (Ireland), Tutorial for User and Site Admin., – Retrieving the output(s) $ glite-wms-job-output Connecting to the service ==================================================== = JOB GET OUTPUT OUTCOME Output sandbox files for the job: have been successfully retrieved and stored in the directory: /tmp/doe_yabp72aERhofLA6W2-LrJw ==================================================== The default location for storing the outputs (normally /tmp ) is defined in the UI configuration, but it is possible to specify in which directory to save the output using the --dir option.

Dublin (Ireland), Tutorial for User and Site Admin., – Retrieving Logging Information $ glite-wms-job-logging-info ********************************************************* LOGGING INFORMATION: Printing info for the Job : Event: RegJob - source = NetworkServer - timestamp = Thu Dec 14 14:35: CET --- Event: RegJob - source = NetworkServer - timestamp = Thu Dec 14 14:35: CET --- Event: UserTag - source = NetworkServer -timestamp = Thu Dec 14 14:35: CET [..]

Dublin (Ireland), Tutorial for User and Site Admin., – ‘Scattered’ Input Sandboxes A new feature introduced by the gLite WMS is the possibility to indicate input sandbox files stored not on the UI,but on a GridFTP server, and, similarly, to specify that output files should be transferred to a GridFTP server when the job finishes. InputSandbox = {"gsiftp://lxb0707.cern.ch/cms/fileA", "fileB"}; It is also possible to specify a base GridFTP URI with the attribute InputSandboxBaseURI –files expressed as simple file names or as relative paths will be looked for under that base URI. InputSandbox = {"fileA", "data/fileB", "file:///home/doe/fileC"}; InputSandboxBaseURI = "gsiftp://lxb0707.cern.ch/cms/doe";

Dublin (Ireland), Tutorial for User and Site Admin., – Storing output files in a GridFTP Server In order to store the output sandbox files to a GridFTP server, the OutputSandboxDestURI attribute must be used together with the usual OutputSandbox attribute. –The latter is used to list the output files created by the job in the WN to be transferred. –The former is used to express where the output files are to be transferred. OutputSandbox = {"fileA", "data/fileB", "fileC"}; OutputSandboxDestURI = {"gsiftp://lxb0707.cern.ch/cms/doe/fileA", "gsiftp://lxb0707.cern.ch/cms/doe/fileB","fileC"}; –where the first two files have to be copied to a GridFTP server, while the third file will be copied back to the WMS with the usual mechanism. Clearly, glite-wms-job-output will retrieve only the third file.

Dublin (Ireland), Tutorial for User and Site Admin., – Another possibility is to use the OutputSandboxBaseDestURI attribute to specify a base URI on a GridFTP server where the files listed in OutputSandbox will be copied. OutputSandbox = {"fileA", "fileB"}; OutputSandboxBaseDestURI = "gsiftp://lxb0707.cern.ch/cms/doe/"; will copy both files under the specified GridFTP URI. Note: the directory on the GridFTP where the files have to be copied must already exist.

Dublin (Ireland), Tutorial for User and Site Admin., – ‘Compressed’ Sandboxes A compressed archive is created with the input sandboxes files using libtar and zlib libraries –This is done automatically by WMProxy client commands –this mechanism can be enabled/disabled by the user through the JDL ( AllowZippedISB attribute) The archive is transferred (instead of single files) to the WMS –Besides the gain brought by compression, allows saving the overhead for several calls to globus-url-copy WMProxy service untars the files in the jobs directories when the job is ‘started’ and removes the archive

Dublin (Ireland), Tutorial for User and Site Admin., – Real Time Output Retrieval The user can enable the job perusal by setting the attribute PerusalFileEnable to true in the job JDL. –This makes the WN to upload, at regular time intervals (defined by the PerusalTimeInterval attribute and expressed in seconds), a copy of the output files specified using the glite-wms-job-perusal command to the WMS machine (by default), or to a GridFTP server specified by the attribute PerusalFilesDestURI Executable = "job.sh"; StdOutput = "stdout.log"; StdError = "stderr.log"; InputSandbox = {"job.sh"}; OutputSandbox = {"stdout.log","stderr.log","testfile.txt"}; PerusalFileEnable = true; PerusalTimeInterval = 30; RetryCount = 0;

Dublin (Ireland), Tutorial for User and Site Admin., – After the job has been submitted with glite-wms-job-submit, the user can choose which output files should be inspected: $ glite-wms-job-perusal --set -f stdout.log -f stderr.log -f testfile.txt \ Connecting to the service Connecting to the service ============ glite-wms-job-perusal Success ================ Files perusal has been successfully enabled for the job: ===========================================================

Dublin (Ireland), Tutorial for User and Site Admin., – and, when the job starts, the user can see one output file: $ glite-wms-job-perusal --get -f testfile.txt \ Connecting to the service Connecting to the service =========== glite-wms-job-perusal Success ============== The retrieved files have been successfully stored in: /tmp/doe_OoDVmWCAnhx_HiSPvASGsg ========================================================

Dublin (Ireland), Tutorial for User and Site Admin., – Special Jobs DAG Job Job Collection Parametric jobs

Dublin (Ireland), Tutorial for User and Site Admin., – DAG jobs A DAG job is a set of jobs where input, output, or execution of one or more jobs can depend on other jobs: The jobs are nodes (vertices) in the graph the edges (arcs) identify the dependencies Dependencies are represented through Directed Acyclic Graphs, where the nodes are jobs, and the edges identify the dependencies Their management has been improved with Shared sandboxes Attributes Inheritance Attribute references between nodes and with the ‘parent’ nodeA nodeBnodeC NodeF nodeD

Dublin (Ireland), Tutorial for User and Site Admin., – DAG -- JDL structure The JDL description of a DAG must not contain any OutputSandbox attribute occurrence also the ones in the descriptions of the nodes. The OutputSandbox of the DAG has to be considered as the sum of the output sandboxes of all its nodes.

Dublin (Ireland), Tutorial for User and Site Admin., – Attribute: Nodes

Dublin (Ireland), Tutorial for User and Site Admin., – Attribute: File The File attribute is a string representing the path on the local file system to a file containing the JDL description of a Job. It is important to note that this kind of representation can only be used when submitting to the WMS through a client ( glite-wms-job-submit ) able to resolve the path locally and to expand the JDL with the full description before passing it to the WMS. The File attribute cannot be specified together with the Description attribute within the same node description!

Dublin (Ireland), Tutorial for User and Site Admin., – Attribute: Dependencies

Dublin (Ireland), Tutorial for User and Site Admin., – DAG jdl [ type = "dag"; max_nodes_running = 4; nodes = [ nodeA = [ file ="nodes/nodeA.jdl" ; ]; nodeB = [ file ="nodes/nodeB.jdl" ; ]; nodeC = [ file ="nodes/nodeC.jdl" ; ]; nodeD = [ file ="nodes/nodeD.jdl"; ]; dependencies = { {nodeA, nodeB}, {nodeA, nodeC}, { {nodeB,nodeC}, nodeD } } ]; ]

Dublin (Ireland), Tutorial for User and Site Admin., – Job Collection A job collection is a set of independent jobs that user wants to submit and monitor as a single request Jobs of a collection are submitted as DAG nodes without dependencies JDL is a list of classad, which describes the subjobs [ Type = "collection"; VirtualOrganisation = “gilda"; nodes = { [ ], … }; ]

Dublin (Ireland), Tutorial for User and Site Admin., – Job collection example [ type = "collection"; InputSandbox = {"date.sh"}; RetryCount = 3; nodes = { [ file ="jobs/job1.jdl" ; ], [ Executable = "/bin/sh"; Arguments = "date.sh"; StdOutput = "date.out"; StdError = "date.err"; OutputSandbox ={"date.out", "date.err"}; ] ], [ file ="jobs/job3.jdl" ; ] }; ] All nodes will share this Input Sandbox

Dublin (Ireland), Tutorial for User and Site Admin., – Parametric Job A parametric job is a job where one or more of its attributes are parameterized Values of attributes vary according to a parameter Job monitoring / managing is always done through an unique jobID, as if the job was single (see submission of collection) [ JobType = "Parametric"; Executable = "/bin/sh"; Arguments = "md5.sh input_PARAM_.txt"; InputSandbox = {"md5.sh", "input_PARAM_.txt"}; StdOutput = "out_PARAM_.txt"; StdError = "err_PARAM_.txt"; Parameters = 4; ParameterStart = 1; ParameterStep = 1; OutputSandbox = {"out_PARAM_.txt", "err_PARAM_.txt"}; ]

Dublin (Ireland), Tutorial for User and Site Admin., – Parametric Job Parameter can be either a number, or a list of items (typically strings, but not enclosed within double quotes) Input Sandbox (if present) has to be coherent with parameters [ui-test] /home/giorgio/param > cat param2.jdl [ JobType = "Parametric"; Executable = “/bin/cat"; Arguments = “input_PARAM_.txt”; InputSandbox = "input_PARAM_.txt"; StdOutput = "myoutput_PARAM_.txt"; StdError = "myerror_PARAM_.txt"; Parameters = {EARTH,MOON,MARS}; OutputSandbox = {“myoutput_PARAM_.txt”}; ] [ui-test] /home/giorgio/param > ls inputEARTH.txt inputMARS.txt inputMOON.txt param2.jdl It is the list of the values the parameter must take.

Dublin (Ireland), Tutorial for User and Site Admin., – [ JobType = "Parametric"; Executable = “/bin/cat"; Arguments = “input_PARAM_.txt”; InputSandbox = "input_PARAM_.txt"; StdOutput = "myoutput_PARAM_.txt"; StdError = "myerror_PARAM_.txt"; Parameters = {EARTH,MOON,MARS}; OutputSandbox = {“myoutput_PARAM_.txt”}; ] Parametric Job: example Executable= “/bin/cat"; Arguments = “inputEARTH.txt”; InputSandbox = "inputEARTH.txt"; StdOutput = "myoutputEARTH.txt"; OutputSandbox = {“myoutputEARTH.txt”}; Executable = “/bin/cat"; Arguments = “inputMOON.txt”; InputSandbox = "inputMOON.txt"; StdOutput = "myoutputMOON.txt"; StdError = "myerrorMOON.txt"; OutputSandbox = {“myoutputMOON.txt”}; Executable = “/bin/cat"; Arguments = “inputMARS.txt”; InputSandbox = "inputMARS.txt"; StdOutput = "myoutputMARS.txt"; StdError = "myerrorMARS.txt"; OutputSandbox = {“myoutputMARS.txt”};

Dublin (Ireland), Tutorial for User and Site Admin., – Parametric job [ JobType = "Parametric"; Executable = "myjob.exe"; StdInput = "input_PARAM_.txt"; StdOutput = "output_PARAM_.txt"; StdError = "error_PARAM_.txt"; Parameters = 100; ParameterStart = 1; ParameterStep = 1; InputSandbox = {"myjob.exe", "input_PARAM_.txt"; OutputSandbox = {"output_PARAM_.txt", "error_PARAM_.txt"}; ] The parameter is a number the initial number of the running paramenter, the increment of the running parameter between consecutive jobs. Both attributes, ParameterStart and ParameterStep, can be set only if Parameters is a number.

Dublin (Ireland), Tutorial for User and Site Admin., – Shared Sandboxes for sub-jobs of compound jobs JDL has been extended to allow specification of the input sandbox at the level of the compound request (i.e. DAGs, Collections and Parametric jobs) This Input sandbox is trasferred only once by the new WMS client commands but can be accessed by all sub-jobs of the compound job. Each sub-jobs can refers to a single files of the “shared sandbox” InputSandbox = root.InputSandbox[0]; or to sandboxes of other sub-jobs. InputSandbox = root.nodes.nodeA.description.OutputSandbox[2];

Dublin (Ireland), Tutorial for User and Site Admin., – ‘Scattered’ Input Sandboxes A new feature introduced by the gLite WMS is the possibility to indicate input sandbox files stored not on the UI,but on a GridFTP server, and, similarly, to specify that output files should be transferred to a GridFTP server when the job finishes. InputSandbox = {"gsiftp://lxb0707.cern.ch/cms/fileA", "fileB"}; It is also possible to specify a base GridFTP URI with the attribute InputSandboxBaseURI – files expressed as simple file names or as relative paths will be looked for under that base URI. InputSandbox = {"fileA", "data/fileB", "file:///home/doe/fileC"}; InputSandboxBaseURI = "gsiftp://lxb0707.cern.ch/cms/";

Dublin (Ireland), Tutorial for User and Site Admin., – Storing output files in a GridFTP Server In order to store the output sandbox files to a GridFTP server, the OutputSandboxDestURI attribute must be used together with the usual OutputSandbox attribute. – The latter is used to list the output files created by the job in the WN to be transferred. – The former is used to express where the output files are to be transferred. OutputSandbox = {"fileA", "data/fileB", "fileC"}; OutputSandboxDestURI = {"gsiftp://lxb0707.cern.ch/cms/doe/fileA", "gsiftp://lxb0707.cern.ch/cms/doe/fileB","fileC"}; – where the first two files have to be copied to a GridFTP server, while the third file will be copied back to the WMS with the usual mechanism. Clearly, glite-wms-job-output will retrieve only the third file.

Dublin (Ireland), Tutorial for User and Site Admin., – Another possibility is to use the OutputSandboxBaseDestURI attribute to specify a base URI on a GridFTP server where the files listed in OutputSandbox will be copied. OutputSandbox = {"fileA", "fileB"}; OutputSandboxBaseDestURI = "gsiftp://lxb0707.cern.ch/cms/doe/"; will copy both files under the specified GridFTP URI. Note: the directory on the GridFTP where the files have to be copied must already exist.

Dublin (Ireland), Tutorial for User and Site Admin., – [ Type = "dag"; max_nodes_running = 5; InputSandbox = {"/tmp/foo/*.exe", "/home/larocca/bar", "gsiftp://neo.datamat.it:5678/tmp/cms_sim.exe ", "file:///tmp/myconf"}; InputSandboxBaseURI = "gsiftp://matrix.datamat.it:5432/tmp"; nodes = [ nodeA = [ description = [ JobType = "Normal"; Executable = "a.exe"; InputSandbox = { "/home/larocca/myfile.txt", root.InputSandbox}; ]; nodeF = [description = [ JobType = "Normal"; Executable = "b.exe"; Arguments = "1 2 3"; OutputSandbox = {"myoutput.txt", "myerror.txt" }; ]; nodeD = [description = [ JobType = "Checkpointable"; Executable = "b.exe"; Arguments = "1 2 3"; InputSandbox = { "file:///home/larocca/data.txt", root.nodes.nodeF.description.OutputSandbox[0] }; ]; nodeC = [ file = "/home/larocca/nodec.jdl"; ]; nodeB = [ file = "foo.jdl"; ]; ]; dependencies = { { nodeA, nodeB }, { nodeA, nodeC }, {nodeA, nodeF }, { { nodeB, nodeC, nodeF }, nodeD } }; ]; nodeA nodeBnodeC nodeF nodeD Dag JDL with shared Sandbox: example

Dublin (Ireland), Tutorial for User and Site Admin., – References WMProxy User’s guide – guide-v0-2.pdf JDL Attributes Specification – Attributes-v0-8.pdf gLite User’s guide –

Dublin (Ireland), Tutorial for User and Site Admin., – Questions …

Dublin (Ireland), Tutorial for User and Site Admin., – Hands-on ssh OS passwd : GridDUBXX PassPhrase : DUBLIN where XX = 01,..,25 bmissionhttps://grid.ct.infn.it/twiki/bin/view/GILDA/SimpleJobSu bmission