EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Job Submission Fokke Dijkstra RuG/SARA Grid.

Slides:



Advertisements
Similar presentations
EGEE is a project funded by the European Union under contract IST EGEE Tutorial Turin, January Hands on Job Services.
Advertisements

EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
The Grid Constantinos Kourouyiannis Ξ Architecture Group.
Job Submission The European DataGrid Project Team
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Submission Fokke Dijkstra RuG/SARA Grid.
The EDG Workload Management System – n° 1 The EDG Workload Management System.
Basic Grid Job Submission Alessandra Forti 28 March 2006.
FESR Consorzio COMETA - Progetto PI2S2 Using MPI to run parallel jobs on the Grid Marcello Iacono Manno Consorzio COMETA
Job Submission The European DataGrid Project Team
INFSO-RI Enabling Grids for E-sciencE Practicals on VOMS and MyProxy Emidio Giorgio INFN Retreat between GILDA and ESR VO, Bratislava,
Enabling Grids for E-sciencE EGEE-II INFSO-RI BG induction to GRID Computing and EGEE project – Sofia, 2006 Practical: Porting applications.
The gLite API – PART I Giuseppe LA ROCCA INFN Catania ACGRID-II School 2-14 November 2009 Kuala Lumpur - Malaysia.
INFSO-RI Enabling Grids for E-sciencE GILDA Praticals GILDA Tutors INFN Catania ICTP/INFM-Democritos Workshop on Porting Scientific.
Computational grids and grids projects DSS,
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.
Enabling Grids for E-sciencE Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
Nadia LAJILI User Interface User Interface 4 Février 2002.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
1 Esther Montes Prado CIEMAT 10th EELA Tutorial Madrid, Hands-on on WMS (Review and Summary)
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
Group 1 : Grid Computing Laboratory of Information Technology Supervisors: Alexander Ujhinsky Nikolay Kutovskiy.
Job Submission The European DataGrid Project Team
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite job submission Fokke Dijkstra Donald.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Introduction to Grid Computing, EGEE and Bulgarian Grid Initiatives Plovdiv, 2006.
INFSO-RI Enabling Grids for E-sciencE The gLite Workload Management System Elisabetta Molinari (INFN-Milan) on behalf of the JRA1.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Feb. 06, Introduction to High Performance and Grid Computing Faculty of Sciences,
Jan 31, 2006 SEE-GRID Nis Training Session Hands-on V: Standard Grid Usage Dušan Vudragović SCL and ATLAS group Institute of Physics, Belgrade.
EGEE is a project funded by the European Union under contract IST Job Description Language - more control over your Job Assaf Gottlieb University.
EGEE is a project funded by the European Union under contract IST EGEE Tutorial Turin, January Job Services Emidio.
WP1 WMS rel. 2.0 Some issues Massimo Sgaravatto INFN Padova.
E-infrastructure shared between Europe and Latin America 1 Workload Management System-WMS Luciano Diaz Universidad Nacional Autónoma de México - UNAM Mexico.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
Enabling Grids for E-sciencE Workload Management System on gLite middleware - commands Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi.
High-Performance Computing Lab Overview: Job Submission in EDG & Globus November 2002 Wei Xing.
EGEE-0 / LCG-2 middleware Practical.
Tier 3 Status at Panjab V. Bhatnagar, S. Gautam India-CMS Meeting, July 20-21, 2007 BARC, Mumbai Centre of Advanced Study in Physics, Panjab University,
INFSO-RI Enabling Grids for E-sciencE Job Submission Tutorial (material from INFN Catania)
Workload Management System Jason Shih WLCG T2 Asia Workshop Dec 2, 2006: TIFR.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Command Line Grid Programming Spiros Spirou Greek Application Support Team NCSR “Demokritos”
INFSO-RI Enabling Grids for E-sciencE Job Description Language (JDL) Giuseppe La Rocca INFN First gLite tutorial on GILDA Catania,
Data Management The European DataGrid Project Team
INFSO-RI Enabling Grids for E-sciencE GILDA Praticals Giuseppe La Rocca INFN – Catania gLite Tutorial at the EGEE User Forum CERN.
Further aspects of EGEE middleware components INFN, Catania EGEE is funded by the European Union under contract IST
Enabling Grids for E-sciencE EGEE-II INFSO-RI Porting an application to the EGEE Grid & Data management for Application Rachel Chen.
Enabling Grids for E-sciencE Sofia, 17 March 2009 INFSO-RI Introduction to Grid Computing, EGEE and Bulgarian Grid Initiatives –
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMS tricks & tips – further scripting Giuseppe.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Practical using WMProxy advanced job submission.
Job Submission The European DataGrid Project Team
Biomed tutorial 1 Enabling Grids for E-sciencE INFSO-RI EGEE is a project funded by the European Union under contract IST JDL Flavia.
User Interface UI TP: UI User Interface installation & configuration.
LCG2 Tutorial Viet Tran Institute of Informatics Slovakia.
Istituto Nazionale di Astrofisica Information Technology Unit INAF-SI Job with data management Giuliano Taffoni.
EGEE is a project funded by the European Union under contract IST GENIUS and GILDA Guy Warner NeSC Training Team Induction to Grid Computing.
GRID commands lines Original presentation from David Bouvet CC/IN2P3/CNRS.
Introduction to Computing Element HsiKai Wang Academia Sinica Grid Computing Center, Taiwan.
FESR Consorzio COMETA - Progetto PI2S2 Using MPI to run parallel jobs on the Grid Marcello Iacono Manno Consorzio Cometa
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
EGEE is a project funded by the European Union under contract IST Job Submission Giuseppe La Rocca EGEE NA4 Generic Applications INFN Catania.
Workload Management System on gLite middleware
EGEE tutorial, Job Description Language - more control over your Job Assaf Gottlieb Tel-Aviv University EGEE is a project.
Workload Management System
5. Job Submission Grid Computing.
The EU DataGrid Job Submission Services
Job Submission M. Jouvin (LAL-Orsay)
Presentation transcript:

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Submission Fokke Dijkstra RuG/SARA Grid tutorial Groningen September 2006

Enabling Grids for E-sciencE EGEE-II INFSO-RI Contents The LCG Workload Management System (WMS) in gLite Job Submission to EGEE / NL-Grid –Job Preparation –A simple example & Job Lifecycle –Job Description Language (JDL) –Job Submission & Monitoring –Some more advanced topics

Enabling Grids for E-sciencE EGEE-II INFSO-RI WMS ?

Enabling Grids for E-sciencE EGEE-II INFSO-RI The LCG WMS The user submits jobs via the Workload Management System The Goal of WMS is the distributed scheduling and resource management in a Grid environment. What does it allow Grid users to do? To submit their jobs To execute them To get information about their status To retrieve their output The WMS tries to –Optimize the usage of resources –Execute user jobs as fast as possible

Enabling Grids for E-sciencE EGEE-II INFSO-RI Logging & Bookkeeping (LB) Resource Broker (RB) Job Submission Service (JSS) Storage Element (SE) Computing Element (CE) Information System (BDII) LCG File Catalog (LFC) JDL User Interface (UI) WMS components

Enabling Grids for E-sciencE EGEE-II INFSO-RI Job Preparation You need to provide –A complete (enough) job description  What program?  What data?  Any requirements on OS, installed software, ?? –Possibly a program  You’re submitting in unknown territory!  Program portably!  Don’t rely on hard-coded paths or special locations  The program you send may not even be in $HOME! –Perhaps some input data –Perhaps instructions on what to do with the output

Enabling Grids for E-sciencE EGEE-II INFSO-RI How to Write a Job Description Here is a minimal job description (call it hello.jdl) We specified –The program to run and its arguments –Directed the standard error and output streams to files –Told it what to do with the output Executable = “/bin/echo”; Arguments = “Goedemiddag”; StdError = “stderr.log”; StdOutput = “stdout.log”; OutputSandbox = {“stderr.log”, “stdout.log”};

Enabling Grids for E-sciencE EGEE-II INFSO-RI Job Submission Example User issues a voms-proxy-init –enters his certificate’s password –Receives a valid Globus proxy User issues a: edg-job-submit mytest.jdl and gets back from the system a unique Job Identifier (JobId) User issues a: edg-job-status JobId to get logging information about the current status of his Job When the “OutputReady” status is reached, the user can issue a edg-job-get-output JobId and the system returns the name of the temporary directory where the job output can be found on the UI machine.

Enabling Grids for E-sciencE EGEE-II INFSO-RI Submitting it $ voms-proxy-init --voms tutor Your identity: /O=edgtutorial/O=users/O=rug/OU=rc/CN=Fokke Dijkstra Enter GRID pass phrase: Creating temporary proxy Done Contacting mu4.matrix.sara.nl:30007 [/O=dutchgrid/O=hosts/OU=sara.nl/CN=mu4.matrix.sara.nl] "tutor" Done Creating proxy Done Your proxy is valid until Mon Sep 11 23:22: $ edg-job-submit hello.jdl Selected Virtual Organisation name (from UI conf file): tutor Connecting to host mu3.matrix.sara.nl, port 7772 Logging to host mu3.matrix.sara.nl, port 9002 ******************************************************************************* JOB SUBMIT OUTCOME The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is: - ******************************************************************************* JobId

A Job Submission Example UI JDL Logging & Bookkeeping (LB) Resource Broker (RB) Job Submission Service (JSS) Storage Element (SE) Computing Element (CE) Information System (IS) Job Submit Event Input Sandbox Job Status submitted LCG File Catalog (LFC) User Interface (UI) Job Status waiting

Enabling Grids for E-sciencE EGEE-II INFSO-RI Checking the status $ edg-job-status ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : Current Status: Done (Success) Exit code: 0 Status Reason: Job terminated successfully Destination: mu6.matrix.sara.nl:2119/jobmanager-pbs-long reached on: Tue Jun 1 08:14: *************************************************************

A Job Submission Example UI JDL Logging & Bookkeeping (LB) Resource Broker (RB) Job Submission Service (JSS) Storage Element (SE) Computing Element (CE) Information System (IS) LCG File Catalog (LFC) Job Status submitted waiting User Interface (UI) Job Status ready BrokerInfo scheduled Job Status running Input Sandbox Output Sandbox doneoutputready

Enabling Grids for E-sciencE EGEE-II INFSO-RI Getting the Output $ edg-job-get-output Retrieving files from host: mu3.matrix.sara.nl ( for ) ******************************************************************************* JOB GET OUTPUT OUTCOME Output sandbox files for the job: - have been successfully retrieved and stored in the directory: /tmp/jobOutput/fokke_Nz6PWWJCjtT7YY3PJWDu5Q ******************************************************************************* $ cat /tmp/jobOutput/fokke_Nz6PWWJCjtT7YY3PJWDu5Q/std.out Goedemiddag

A Job Submission Example UI JDL Logging & Bookkeeping (LB) Resource Broker (RB) Job Submission Service (JSS) Storage Element (SE) Computing Element (CE) Information System (IS) LCG File Catalog (LFC) Output Sandbox cleared submitted waiting ready scheduled running done Job Status outputready Job Status

Enabling Grids for E-sciencE EGEE-II INFSO-RI Job Description Language (JDL) Based upon Condor’s CLASSified ADvertisement language (ClassAd) ClassAd is an extensible language Sequence of attributes (key,value pairs) separated by semi-colons. Executable = “/bin/echo”; Arguments = “Goedemiddag”; StdError = “stderr.log”; StdOutput = “stdout.log”; OutputSandbox = {“stderr.log”, “stdout.log”};

Enabling Grids for E-sciencE EGEE-II INFSO-RI Types of Attributes The supported attributes are grouped in two categories: –Job Define the job itself –Resources  Taken into account by the RB for carrying out the matchmaking algorithm  Computing Resource (Attributes) Used to build expressions of Requirements and/or Rank attributes by the user Have to be prefixed with “other.”  Data and Storage resources (Attributes) Input data to process, SE where to store output data, protocols spoken by application when accessing SEs

Enabling Grids for E-sciencE EGEE-II INFSO-RI Job Definition Attributes Executable (mandatory) –The command name Arguments (optional) –Job command line arguments StdInput, StdOutput, StdErr (optional) –Standard input/output/error of the job Environment (optional) –List of environment settings InputSandbox (optional) –List of files on the UI local disk needed by the job for running –The listed files are staged from the UI to the remote CE OutputSandbox (optional) –List of files, generated by the job, which have to be retrieved

Enabling Grids for E-sciencE EGEE-II INFSO-RI Resource Attributes Requirements –Job requirements on computing resources –Specified using attributes of resources published in the Information System –If not specified, default value defined in UI configuration file is considered  Default: other.GlueCEStateStatus == "Production" (the resource has to be in the Production grid) Rank –Expresses preference (how to rank resources that have already met the Requirements expression) –Specified using attributes of resources published in the Information Service –If not specified, default value defined in the UI configuration file is considered  Default: - other.GlueCEStateFreeCPUs (the highest number of free CPUs)

Enabling Grids for E-sciencE EGEE-II INFSO-RI “Data” Attributes InputData (optional) –Refers to data used as input by the job: these data are published in the Replica Catalog and stored in the SEs) –PFNs and/or LFNs DataAccessProtocol (mandatory if InputData specified) –The protocol or the list of protocols which the application is able to speak with for accessing InputData on a given SE OutputSE (optional) –The hostname of the output SE –RB uses it to choose a CE that is compatible with the job and is close to SE OutputData (optional) –Output Data that will be registered at the end of the job

Enabling Grids for E-sciencE EGEE-II INFSO-RI Example JDL File Executable = “gridTest”; StdError = “stderr.log”; StdOutput = “stdout.log”; InputSandbox = {“/home/joda/test/gridTest”}; OutputSandbox = {“stderr.log”, “stdout.log”}; InputData = “lfn:/grid/tutor/testbed ”; DataAccessProtocol = “gridftp”; Requirements = other.Architecture==“INTEL” && \ other.OpSys==“LINUX” && other.FreeCpus >=4; Rank = “other.GlueHostBenchmarkSF00”;

Enabling Grids for E-sciencE EGEE-II INFSO-RI Job Submission edg-job-submit [–r ] [–n ] [-c ] [-o ] -r the job is submitted by the RB directly to the computing element identified by -c the configuration file is used by the UI instead of the standard configuration file -o the generated edg_jobId is written in the Useful for other commands, e.g.: edg-job-status –i (or edg_jobId) -i the status information about edg_jobId contained in the are displayed --vo the VO under which the job will be run

Enabling Grids for E-sciencE EGEE-II INFSO-RI Other WMS UI Commands edg-job-list-match Lists resources matching a job description Performs the matchmaking without submitting the job edg-job-cancel Cancels a given job edg-job-status Displays the status of the job edg-job-get-output Returns the job-output (the OutputSandbox files) to the user edg-job-get-logging-info Displays logging information about submitted jobs (all the events “pushed” by the various components of the WMS) Very useful for debug purposes

Enabling Grids for E-sciencE EGEE-II INFSO-RI WMS Match Making The RB is the core component of WMS. It has to find the best suitable computing resource (CE) where the job will be executed It interacts with Data Management service and Information System They supply RB with all the information required for the resolution of the matches The CE chosen by RB has to match the job requirements (e.g. runtime environment, data access requirements, and so on) If 2 or more CEs satisfy all the requirements, the one with the best Rank is chosen

Enabling Grids for E-sciencE EGEE-II INFSO-RI Direct Job submission The RB has to deal with three possible scenarios. Scenario 1: Direct Job Submission  Job is scheduled on a given CE (specified in the edg-job- submit command via –r option)  RB doesn’t perform any matchmaking algorithm  Take care if InputData is specified!

Enabling Grids for E-sciencE EGEE-II INFSO-RI Brokered Job Submission, No InputData Scenario 2: Job Submission without data-access Requirements  Neither CE nor input data are specified.  RB starts the matchmaking algorithm, which consists of two phases: Requirements check (RB contacts the IS to check which CEs satisfy all the requirements) If more than one CE satisfies the job requirements, the CE with the best rank is chosen by the RB

Enabling Grids for E-sciencE EGEE-II INFSO-RI Brokered Job Submission, Grid Data Scenario 3: CE is not specified in the JDL  RB contacts Data Management service to find out which SE’s have copies of the requested input data sets  RB makes best effort match between Computing resources for which user is authorized SE’s “nearby” which can provide the requested data sets via the requested transfer protocol Any optional output SE specified in the job description  RB strategy consists of submitting jobs close to data!  The main two phases of the match making algorithm remain unchanged: Requirements check Rank computation  The matchmaking is only performed for CEs satisfying the data-access requirements (i.e. which are close to data)

Enabling Grids for E-sciencE EGEE-II INFSO-RI Proxy Renewal Why? –To avoid job failure because it outlived the validity of the initial proxy WMS support automatic proxy renewal mechanism as long as the user credentials are handled by a proxy server. 1.Create a proxy using voms-proxy-init 2.Register this proxy with the MyProxy server using myproxy-init –s [-t -c ] –d -n server is the server address (e.g. px.matrix.sara.nl) cred is the number of hours the proxy should be valid on the server proxy is the number of hours renewed proxies should be valid 3.Short term proxies can then be used to start jobs using grid-proxy-init –hours command 4.The Proxy is automatic renewed by WMS without user intervention for all the job life

Enabling Grids for E-sciencE EGEE-II INFSO-RI MPI jobs MPI –Message passing –Link with parallel library –Run on multiple processors gLite –Limited support –Some sites can run MPI jobs JobType –JobType=”MPICH”; –NodeNumber = 8; –Adds MPICH support as requirement –Executable run in paralllel on 8 CPU’s

Enabling Grids for E-sciencE EGEE-II INFSO-RI Other JobTypes Interactive –StdOutput, StdInput and StdError forwarded to user –default X window –Other tools Checkpointable –Job must save checkpoints –Checkpoints can be retrieved –Not fully supported yet

Enabling Grids for E-sciencE EGEE-II INFSO-RI Further Information The gLite User Guide! ClassAd Sara Grid pages

Enabling Grids for E-sciencE EGEE-II INFSO-RI UI configuration file Can be set if (expert) user is not happy with default one Most relevant attributes: –RB(s)  When submitting a job, the first specified RB is tried, if the operation fails the second one is considered, etc. –LBserver(s)  The LB to be used for a job is chosen by the RB  So when a edg-job-status is issued, the LB to contact is specified in the edg-jobid  This list specifies the LB(s) that must be contacted when issuing a edg-job- status –all / edg-job-get-logging-info –all (to have information for all the jobs belonging to that user) –Default JDL Requirements  other.GlueCEStateStatus == "Production" –Default JDL Rank  other.GlueCEStateFreeCPUs; –Default Virtual Organisation  Which VO the job should use to run

Enabling Grids for E-sciencE EGEE-II INFSO-RI UI Command Error Messages The UI commands accept some arguments in input. If the user makes a mistake via command line, the following messages can appear: Argument * is not allowed (the argument is not known) Argument * must be specified at the end of the command (both the jobId and JDL file name must be put at the end of the command line) Argument * is missing for the “—output” option (the user forgot to add the parameter, required by the argument) Argument “-all” cannot be specified with argument “—input” (some arguments are OR-exclusive) CEId format is: ; /jobmanager-. The provided CEID: “ has a wrong format. (the user has mis-spelled the CE identifier after –resource)

Enabling Grids for E-sciencE EGEE-II INFSO-RI Resource Broker errors During the calling of the RB API, the following can happen: Resource Broker “grid013g.cnaf.infn.it:7771” not available (can’t open a connection with the RB specified in the UI configuration file) Unable to get LB address from RB “grid013g.cnaf.infn.it” (the function get_lb_contact returned an error)

Enabling Grids for E-sciencE EGEE-II INFSO-RI JDL & Proxy Error Messages While the UI commands are checking the JDL file, the following errors may occur: Mandatory Attribute default error in the configuration file “/opt/edg/etc/UI_ConfigENV.cfg” (there aren’t any default values) Mandatory Attribute missing in JDL file “Executable” (Executable is one of the mandatory attributes) Multiple “InputSandbox” attribute found in JDL file (InputSandbox attribute is repeated twice) Wrong function call for list attribute *. Function usage is: “Member/IsMember(List, Value)” (e.g. in the requirements attribute the function Member/IsMember is used with a wrong syntax) Proxy (this refers to the security grid proxy and not to a proxy machine) –If the user specifies a duration for the proxy that he wants to provide, using the option –h of edg-job-submit, a possible message is Proxy certificate will expire in less then X hours. Creating a new X-hours- duration certificate (this to make sure that at least the required proxy validity is granted )