Job Submission M. Jouvin (LAL-Orsay)

Slides:



Advertisements
Similar presentations
EGEE is a project funded by the European Union under contract IST EGEE Tutorial Turin, January Hands on Job Services.
Advertisements

EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
Job Submission The European DataGrid Project Team
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Submission Fokke Dijkstra RuG/SARA Grid.
Basic Grid Job Submission Alessandra Forti 28 March 2006.
Job Submission The European DataGrid Project Team
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) WMPROXY API Python & C++ Diego Scardaci
The gLite API – PART I Giuseppe LA ROCCA INFN Catania ACGRID-II School 2-14 November 2009 Kuala Lumpur - Malaysia.
Enabling Grids for E-sciencE Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
Nadia LAJILI User Interface User Interface 4 Février 2002.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
Job Submission The European DataGrid Project Team
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite job submission Fokke Dijkstra Donald.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Provenance Challenge gLite Job Provenance.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Using gLite API Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers.
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
INFSO-RI Enabling Grids for E-sciencE The gLite Workload Management System Elisabetta Molinari (INFN-Milan) on behalf of the JRA1.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Feb. 06, Introduction to High Performance and Grid Computing Faculty of Sciences,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Submission Fokke Dijkstra RuG/SARA Grid.
Job Management DIRAC Project. Overview  DIRAC JDL  DIRAC Commands  Tutorial Exercises  What do you have learned? KEK 10/2012DIRAC Tutorial.
E-infrastructure shared between Europe and Latin America 1 Workload Management System-WMS Luciano Diaz Universidad Nacional Autónoma de México - UNAM Mexico.
Glite. Architecture Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services are supposed.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
Enabling Grids for E-sciencE Workload Management System on gLite middleware - commands Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi.
High-Performance Computing Lab Overview: Job Submission in EDG & Globus November 2002 Wei Xing.
EGEE-0 / LCG-2 middleware Practical.
Workload Management System Jason Shih WLCG T2 Asia Workshop Dec 2, 2006: TIFR.
1 DIRAC Job submission A.Tsaregorodtsev, CPPM, Marseille LHCb-ATLAS GANGA Workshop, 21 April 2004.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMPROXY usage Álvaro Fernández IFIC (CSIC)
EGEE-II INFSO-RI Enabling Grids for E-sciencE Command Line Grid Programming Spiros Spirou Greek Application Support Team NCSR “Demokritos”
EGEE-II INFSO-RI Enabling Grids for E-sciencE Practical using WMProxy advanced job submission.
Job Submission The European DataGrid Project Team
Biomed tutorial 1 Enabling Grids for E-sciencE INFSO-RI EGEE is a project funded by the European Union under contract IST JDL Flavia.
User Interface UI TP: UI User Interface installation & configuration.
Istituto Nazionale di Astrofisica Information Technology Unit INAF-SI Job with data management Giuliano Taffoni.
Job Management Beijing, 13-15/11/2013. Overview Beijing, /11/2013 DIRAC Tutorial2  DIRAC JDL  DIRAC Commands  Tutorial Exercises  What do you.
GRID commands lines Original presentation from David Bouvet CC/IN2P3/CNRS.
Introduction to Computing Element HsiKai Wang Academia Sinica Grid Computing Center, Taiwan.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
FESR Trinacria Grid Virtual Laboratory Practical using WMProxy advanced job submission Emidio Giorgio INFN Catania.
Practical using C++ WMProxy API advanced job submission
Architecture of the gLite WMS
Workload Management System on gLite middleware
Workload Management System ( WMS )
BOSS: the CMS interface for job summission, monitoring and bookkeeping
BOSS: the CMS interface for job summission, monitoring and bookkeeping
Corso di Calcolo Parallelo Grid Computing
EGEE tutorial, Job Description Language - more control over your Job Assaf Gottlieb Tel-Aviv University EGEE is a project.
Alexandre Duarte CERN Fifth EELA Tutorial Santiago, 06/09-07/09,2006
Introduction to Grid Technology
Workload Management System
BOSS: the CMS interface for job summission, monitoring and bookkeeping
Grid Services Ouafa Bentaleb CERIST, Algeria
Short update on the latest gLite status
5. Job Submission Grid Computing.
gLite Advanced Job Management
The EU DataGrid Job Submission Services
The gLite Workload Management System
EGEE Middleware: gLite Information Systems (IS)
gLite Job Management Christos Theodosiou
Grid Management Challenge - M. Jouvin
Information System (BDII)
Presentation transcript:

Job Submission M. Jouvin (LAL-Orsay) jouvin@lal.in2p3.fr Grid Administration Training LAL, Orsay, September 2008, 15-19

Job Submission - M. Jouvin Agenda Main components Main commands Job Description Language (JDL) Advanced Features Summary Job Submission - M. Jouvin 13/07/2019

Workload Management System Responsible for optimizing resource usage and execute user jobs as quickly as possible Uses several components: UI (User Interface) : access point for users WMS : grid resource broker, responsible to select the most appropriate resources to run user jobs Formerly called RB (Resource Broker) Name confusing with the Workload Management global service LB (Logging and Bookeeping) : store information about all the successive states a job goes through BDII (Information System) : real-time view of grid resources available and their state Job Submission - M. Jouvin 13/07/2019

Job Submission Workflow User Interface Resource Broker Information System Replica Catalogs 1. submit 2. query 3. query 4. submit 5. retrieve 6. retrieve publish status User Interface WMS Catalog Storage Element Computing Site 1 Site 2 0. create proxy Job Submission - M. Jouvin 13/07/2019

Job Submission - M. Jouvin User Interface UI : host “outside” grid installed with grid client tools and interfaces allowing a user to interact with grid Get a proxy Submit and manage jobs Transfer and manage data Job submission commands : glite-wms-job-xxx Legacy service LCG RB (now decommissioned) was using other commands : edg-job-xxx edg-job-xxx commands are incompatible with WMS and glite- wms-job-xxx commands are incompatible with LCG RB Options are basically the same, except for new ones glite-job-xxx (intermediate generation) commands no longer supported Job Submission - M. Jouvin 13/07/2019

Job Submission - M. Jouvin Main Commands glite-wms-job-submit [-d|-a] (edg-job-submit) Submit a job Retourns the jobID glite-wms-job-status (edg-job-status) Returns job status glite-wms-job-output (edg-job-get-output) Retrieve from WMS files specified in OutputSandbox attribute of JDL glite-wms-job-cancel (edg-job-cancel) Cancel job glite-wms-job-delegation-proxy –d identifier Create a delegation proxy to use when submitting a job glite-wms-job-list-match (edg-job-list-match) List resources compatible with requirements in job description Does matchmaking without submitting the job glite-wms-job-logging-info (edg-job-get-logging-info) Returns information about sucessive state of jobs with detailed information about problems C, C++ and Java APIs available for all these features Job Submission - M. Jouvin 13/07/2019

Job Submission - M. Jouvin Job submission uses a job description file (JDL) An application is never directly submitted Application may be pre-installed on the grid or transmitted with the job description Job submission command returns a job identifier Important to save to be able to retrieve job status and output Option ‘–o file’ allows to write job identifier in a file Use option ‘-I file’ with the same file with other commands File may contain a list of jobid (it is not overwritten) Proxy delegation required to interact with WMS (WMProxy). 2 modes available: Automatic : option –a at submission time Explicit : glite-wms-job-delegate-proxy –d identifier Use same ‘-d identifier’ at submission time Much more efficient when submitting a lot of jobs Job Submission - M. Jouvin 13/07/2019

Job Monitoring and Output Status of a running job can be display with command glite-wms-job-status Use option ‘–i jobids_file’ if jobid has been registered into a file at submission (glite-wms-job-submit -o) ‘watch –n seconds.. glite-wms-job-status’ for a refreshed view Don’t use too short intervals (minimum 30s) Don’t use for long jobs or for many jobs at once ‘--all’ allow to see all one’s jobs Detailed information status: glite-wms-job-logging-info [-v 2] Output retrieval: glite-wms-job-output stdout, stderr, output sandbox : all files listed in OutputSandbox User must retrieve job outputs : cannot be sent to him Only possible after the job has “successfully” completed Output sandbox generally kept 3 weeks on WMS Job Submission - M. Jouvin 13/07/2019

Job Submission - M. Jouvin JDL File Example JDL : Job Description Language Application and its arguments Input and output files « Requirements » et « Rank » Syntax based on «Condor ClassAd » Executable = “gridTest”; StdError = “stderr.log”; StdOutput = “stdout.log”; InputSandbox = {“/home/joda/test/gridTest”}; OutputSandbox = {“stderr.log”, “stdout.log”}; InputData = “lfn:testbed0-00019”; DataAccessProtocol = “gridftp”; Requirements = other.Architecture==“INTEL” && other.OpSys==“LINUX” && other.FreeCpus >=4; Rank = “other.GlueHostBenchmarkSF00”; Job attributes Data attributes Resource attributes Job Submission - M. Jouvin 13/07/2019

Job Submission - M. Jouvin JDL Attributes Job attributes Define job and its parameters Data and resource attributes Used by WMS for resource (CE) match making Different types of supported resources Computing resources (based on BDII) Software resources (tags) (based on BDII) Data location (based on file catalogs, e.g. LFC) Job Submission - M. Jouvin 13/07/2019

Job Submission - M. Jouvin Job Definition Executable (mandatory) Application/command name Arguments (optional) Arguments to pass to application on command line StdInput, StdOutput, StdError (optional) File name for job standard input/output/error Environment (optional) Environment variable to define before running application InputSandbox (optional) List of files local to UI that must be sent with the job to the CE OutputSandbox (optional) List of job output files that must be retrieved from CE and staged on WMS for later retrieval by user with glite-wms-job-output Generally size limit between 10 and 100 MB (site dependent) Files generally kept on WMS 3 weeks (site dependent) Job Submission - M. Jouvin 13/07/2019

Job Submission - M. Jouvin Resource Attributes Requirements Criteria to select the appropriate CE for running the job May use either resources published into BDII or LFC (file catalog) file names/guids or physical file names (SURL) A default requirement is defined in each UI (site specific) and is used if none specified by the user LCG CE : requirements are used only by WMS and not passed to the CE/batch scheduler Will change with new generation of CE, CREAM CE, about to be released Rank Criteria to sort resources (CE) matching requirements Based on resources published into BDII, not on LFC names Default ranking rules defined in the UI (site dependent) and used when non explicitly specified by the user Job Submission - M. Jouvin 13/07/2019

Job Submission - M. Jouvin Data Attributes InputData (optional) List data files that must be present on a CE “close SE” for the job to run Can be a LFN (LFC name) or a PFN (SE name called SURL) A close SE is a SE published into BDII as having good connection with CE (based on declaration, not on tests) Different from InputSandbox : no file transfer will occur for these files DataAccessProtocol (mandatory si InputData specified) A list of transfer protocol the application may use to access the files listed in InputData (e.g. gsiftp, rfio, dcap…) OutputSE and OutputData have been deprecated and must not be used Job Submission - M. Jouvin 13/07/2019

Job Submission - M. Jouvin Job Collections… Goal: submit many jobs at once to improve submission performance Arbitrary collections : 1 JDL per job Option ‘--collection directory’ DAG (Direct Acyclic Graph) : chained jobs Option ‘--dag directory’ Parametric jobs : 1 job run with several sets of parameters Depracated : use job collections instead For DAG and collections, directory option contains JDL of all sub- jobs ‘JDL file’ parameter should not be specified Job Submission - M. Jouvin 13/07/2019

Job Submission - M. Jouvin … Job Collections 1 jobid attached to the whole collection Allows to check status, retrieve outputs, cancel all the jobs with one command (same command as for a single job) Output : 1 directory per sub-job + 1 ids_nodes.map giving mapping between sub-jobs and directories Collection status is the aggregated status of all jobs Completed when all jobs are completed Successful if all jobs are successful Each sub-job has also its own jobid Possible to manage sub-jobs as single job : check status, retrieve output, cancel… When submitting many jobs at one, more efficient to use a collection even if managing each job independently afterwards Job Submission - M. Jouvin 13/07/2019

Job Submission - M. Jouvin Job Perusal Goal: access (some) output files when the job is running Can be done for any file produced by the job Done by WMS periodically “polling” CE and retrieving current file contents Requires 2 additional lines in JDL : PerusalFileEnable = true; PerusalTimeInterval = 120; # In seconds, not too low Definition and retrieval of files with 1 command: glite-wms-job-perusal [--set|--get|--unset] –f file jobid --set : used to define files to monitor, 1 -f per file --get : retrieve file difference with previous version retrieved --all retrieves the whole file, even if some chunks already transferred --nodisplay store files rather than display them --unset : cancel file monitoring (per file) Use only when needed: may impact WMS performance Job Submission - M. Jouvin 13/07/2019

Job Submission - M. Jouvin VO Software Area Each VO has a dedicated space on each CE to install its software Shared space between CE and WNs Job references this space through an environment variable: VO_VONAME_SW_DIR VONAME is the VO name with ‘.’ and ‘-’ replaced by ‘_’ Write access only for the VO SW manager World readable: not to store sensitive/private data or default permissions need to be adjusted Software Manager is a VMS role (managed by VO manager) SW area is managed and updated by submitting jobs with SW manager role SW area contents may be published into BDII using tags Lcg-ManageVOTag –host CE –vo voname … Allows to select CE based on SW installed Job Submission - M. Jouvin 13/07/2019

Job Submission - M. Jouvin Writing Data to SE A running job has the same credential (proxy) has the user who submitted it Don’t forget to use proxy renewal if job may end after proxy expiration Use MyProxyServer attribute in JDL Can access any SE user has right to access To write output file to SE, job must select a SE May be hard coded in job : not necessarily on the same site where job is running A (per VO) default SE is defined on each worker node and may be discovered with environment variable: VO_VONAME_DEFAULT_SE VONAME is the VO name with ‘.’ and ‘-’ replaced by ‘_’ Job Submission - M. Jouvin 13/07/2019

Job Submission - M. Jouvin Summary Grid is like a very large batch system Main component is WMS in charge of selecting the appropriate CE based on user’s requirements For this match making, WMS uses information system and file catalogs gLite WMS is a generic job submission service with many advance features : « Bulk » soumission: Job collections, DAG jobs… Swallow resubmission, Fuzzy Ranking… VOMS proxy renewal (including VOMS attributes) Significant performances (demonstrated > 20 kjobs/day/WMS) Not the only solution: Other brokers like GridWay Pilot job (similar to Condor Glide-ins) frameworks like DIRAC, PanDA… « Workflow managers » like TAVERNA, MOTEUR, … Job Submission - M. Jouvin 13/07/2019

Job Submission - M. Jouvin Useful Links Man pages for glite-wms-job-xxx commands GRIF gLite tutorial: https://trac.lal.in2p3.fr/GridSupport/wiki/Tutorial/JobSubm https://trac.lal.in2p3.fr/GridSupport/wiki/Tutorial/VOSoftware Job Submission - M. Jouvin 13/07/2019