1 Architecture of the gLite WMS Esther Montes Prado CIEMAT 10th EELA Tutorial Madrid, 8.5.2007.

Slides:



Advertisements
Similar presentations
Workload Management David Colling Imperial College London.
Advertisements

EGEE is a project funded by the European Union under contract IST EGEE Tutorial Turin, January Hands on Job Services.
EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
The Grid Constantinos Kourouyiannis Ξ Architecture Group.
Job Submission The European DataGrid Project Team
E-infrastructure shared between Europe and Latin America 12th EELA Tutorial for Users and System Administrators Architecture of the gLite.
SEE-GRID-SCI Hands-On Session: Workload Management System (WMS) Installation and Configuration Dusan Vudragovic Institute of Physics.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
IST E-infrastructure shared between Europe and Latin America Architecture of the gLite WMS Alexandre Duarte CERN Fifth EELA.
E-infrastructure shared between Europe and Latin America Architecture of the WMS Manuel Rubio del Solar CETA-CIEMAT EELA Tutorial, Mérida,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Submission Fokke Dijkstra RuG/SARA Grid.
Special Jobs Claudio Cherubino INFN - Catania. 2 MPI jobs on gLite DAG Job Collection Parametric jobs Outline.
Querétaro (Mexico), E2GRIS – Job Description Language JDL 1.
Basic Grid Job Submission Alessandra Forti 28 March 2006.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Architecture of the WMS Yaodong Cheng CC-IHEP, Chinese Academy of Sciences.
Glite WMS overview Alessandra Forti Computing Seminar Manchester 20th November 2008.
FESR Consorzio COMETA - Progetto PI2S2 Using MPI to run parallel jobs on the Grid Marcello Iacono Manno Consorzio COMETA
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Special Jobs Matias Zabaljauregui UNLP.
Grid Initiatives for e-Science virtual communities in Europe and Latin America The Job Description Language JDL 1.
INFSO-RI Enabling Grids for E-sciencE The Workload Management System: an overview Giuseppe La Rocca INFN – Catania ICTP/INFM-Democritos.
The gLite API – PART I Giuseppe LA ROCCA INFN Catania ACGRID-II School 2-14 November 2009 Kuala Lumpur - Malaysia.
INFSO-RI Enabling Grids for E-sciencE GILDA Praticals GILDA Tutors INFN Catania ICTP/INFM-Democritos Workshop on Porting Scientific.
Enabling Grids for E-sciencE Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)
Nadia LAJILI User Interface User Interface 4 Février 2002.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
LCG Middleware Testing in 2005 and Future Plans E.Slabospitskaya, IHEP, Russia CERN-Russia Joint Working Group on LHC Computing March, 6, 2006.
1 Esther Montes Prado CIEMAT 10th EELA Tutorial Madrid, Hands-on on WMS (Review and Summary)
INFSO-RI Enabling Grids for E-sciencE The gLite Workload Management System Elisabetta Molinari (INFN-Milan) on behalf of the JRA1.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Feb. 06, Introduction to High Performance and Grid Computing Faculty of Sciences,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Submission Fokke Dijkstra RuG/SARA Grid.
Architecture of the gLite WMS (Workload Management System) Hands-on Paola Celio Universita’ Roma TRE INFN Roma TRE Sevilla Septembre 2007.
EGEE is a project funded by the European Union under contract IST EGEE Tutorial Turin, January Job Services Emidio.
Job Management DIRAC Project. Overview  DIRAC JDL  DIRAC Commands  Tutorial Exercises  What do you have learned? KEK 10/2012DIRAC Tutorial.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
E-infrastructure shared between Europe and Latin America 1 Workload Management System-WMS Luciano Diaz Universidad Nacional Autónoma de México - UNAM Mexico.
INFSO-RI Enabling Grids for E-sciencE Claudio Cherubino, INFN Catania Grid Tutorial for users Merida, April 2006 Special jobs.
Enabling Grids for E-sciencE Workload Management System on gLite middleware - commands Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi.
High-Performance Computing Lab Overview: Job Submission in EDG & Globus November 2002 Wei Xing.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Workload management in gLite 3.x - MPI P. Nenkova, IPP-BAS, Sofia, Bulgaria Some of.
INFSO-RI Enabling Grids for E-sciencE Job Submission Tutorial (material from INFN Catania)
Workload Management System Jason Shih WLCG T2 Asia Workshop Dec 2, 2006: TIFR.
INFSO-RI Enabling Grids for E-sciencE EGEE is a project funded by the European Union under contract IST Job sandboxes.
INFSO-RI Enabling Grids for E-sciencE Job Description Language (JDL) Giuseppe La Rocca INFN First gLite tutorial on GILDA Catania,
INFSO-RI Enabling Grids for E-sciencE GILDA Praticals Giuseppe La Rocca INFN – Catania gLite Tutorial at the EGEE User Forum CERN.
Development of test suites for the certification of EGEE-II Grid middleware Task 2: The development of testing procedures focused on special details of.
Enabling Grids for E-sciencE Sofia, 17 March 2009 INFSO-RI Introduction to Grid Computing, EGEE and Bulgarian Grid Initiatives –
EGEE-II INFSO-RI Enabling Grids for E-sciencE Practical using WMProxy advanced job submission.
EGEE 3 rd conference - Athens – 20/04/2005 CREAM JDL vs JSDL Massimo Sgaravatto INFN - Padova.
Biomed tutorial 1 Enabling Grids for E-sciencE INFSO-RI EGEE is a project funded by the European Union under contract IST JDL Flavia.
User Interface UI TP: UI User Interface installation & configuration.
LCG2 Tutorial Viet Tran Institute of Informatics Slovakia.
Job Management Beijing, 13-15/11/2013. Overview Beijing, /11/2013 DIRAC Tutorial2  DIRAC JDL  DIRAC Commands  Tutorial Exercises  What do you.
Introduction to Computing Element HsiKai Wang Academia Sinica Grid Computing Center, Taiwan.
FESR Consorzio COMETA - Progetto PI2S2 Using MPI to run parallel jobs on the Grid Marcello Iacono Manno Consorzio Cometa
Introduction to Job Description Language (JDL) Alessandro Costa INAF Catania Corso di Calcolo Parallelo Grid Computing Catania - ITALY September.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
FESR Trinacria Grid Virtual Laboratory Practical using WMProxy advanced job submission Emidio Giorgio INFN Catania.
Architecture of the gLite WMS
Workload Management System on gLite middleware
Workload Management System ( WMS )
EGEE tutorial, Job Description Language - more control over your Job Assaf Gottlieb Tel-Aviv University EGEE is a project.
Alexandre Duarte CERN Fifth EELA Tutorial Santiago, 06/09-07/09,2006
Workload Management System
gLite Job Management Mario Reale GARR
gLite Job Management Amina KHEDIMI CERIST
The gLite Workload Management System
Job Description Language
Job Description Language (JDL)
Presentation transcript:

1 Architecture of the gLite WMS Esther Montes Prado CIEMAT 10th EELA Tutorial Madrid,

2 = New in gLite 3.0 New! Outline 1.This presentation will cover the following arguments:  Overview of WMS Architecture  Job Description Language Overview  WMProxy overview

3 First Part Architecture of the gLite WMS

4 User request WMS traslator Workload Manager Services

5 The Workload Management System The Workload Management System (WMS) comprises a set of Grid middleware components responsible for distribution and management of tasks across Grid resources. The purpose of the Workload Manager (WM) is accept and satisfy requests for job management coming from its clients  meaning of the submission request is to pass the responsibility of the job to the WM.  WM will pass the job to an appropriate CE for execution  taking into account requirements and the preferences expressed in the job description file matchmaking The decision of which resource should be used is the outcome of a matchmaking process. WMS Objectives

6 WMS Scheduling Policies A WM can adopt different policies to schedule a job:  Eager scheduling:  a job is bound to a resource as soon as possible  once the decision has been taken, the job is passed to the selected resource for execution  Lazy scheduling:  a job is held by the WM until a resource becomes available  when this happen, the resource is matched against the submitted jobs  Intermediate approches are possible

7 ISM represents one of the most notable improvements in the WM The ISM basically consists of a repository of resource information that is available in read only mode to the matchmaking engine  the update is the result of  the arrival of notifications  active polling of resources  some arbitrary combination of both New! WMS Information Supermarket (ISM)

8 The Task Queue represents the second most notable improvement in the WM internal design  possibility to keep a submission request for a while if no resources are immediately available that match the job requirements  technique used by the AliEn and Condor systems Non-matching requests  will be retried either periodically  eager scheduling approach  or as soon as notifications of available resources appear in the ISM  lazy scheduling approach New! WMS Task Queue

9 Job management requests (submission, cancellation) expressed via a Job Description Language (JDL) New! WMS Overall Architecture

10 Job management requests (submission, cancellation) expressed via a Job Description Language (JDL) Finds an appropriate CE for each submission request, taking into account job requests and preferences, Grid status, utilization policies on resources New! WMS Overall Architecture

11 Job management requests (submission, cancellation) expressed via a Job Description Language (JDL) Keeps submission requests Requests are kept for a while for a while if no resources are immediately available New! WMS Overall Architecture

12 Job management requests (submission, cancellation) expressed via a Job Description Language (JDL) Keeps submission requests Requests are kept for a while for a while if no resources are immediately available Repository of resource information information available to matchmaker Updated via notifications and/or active polling on resources New! WMS Overall Architecture

13 Performs the actual job submission and monitoring New! WMS Overall Architecture

14 The Network Server (NS) is a generic network daemon that provides support for the job control functionality. It is responsible for accepting incoming requests from the WMS-UI (e.g. job submission, job removal), which, if valid, are then passed to the Workload Manager. The Workload Manager Proxy (WMProxy) is a service providing access to WMS functionality through a Web Services based interface. Besides being the natural replacement of the NS in the passage to the SOA approach for the WMS architecture, it provides additional features such as bulk submission and the support for shared and compressed sandboxes for compound jobs. New! NS and WMProxy

15 WMS components handling the job during its lifetime and performs the submission Job Adapter (JA)  is responsible for  making the final touches to the JDL expression for a job, before it is passed to CondorC for the actual submission  creating the job wrapper script that creates the appropriate execution environment in the CE worker node  transfer of the input and of the output sandboxes CondorC  responsible for  performing the actual job management operations  job submission, job removal DAG Manager (DAGMan)  meta-scheduler from Condor  purpose is to navigate the graph  determine which nodes are free of dependencies  follow the execution of the corresponding jobs New! WMS Job Submission Services

16 Log Monitor (LM)  is responsible for  watching the CondorC log file  intercepting interesting events concerning active jobs Proxy Renewal Service  is responsible for assuring that,  for all the lifetime of a job, a valid user proxy exists within the WMS  MyProxy Server is contacted in order to renew the user's credential Logging & Bookkeeping (LB)  is responsible for  Storing events generated by the various components of the WMS  Delivering to the user information about the job ‘ s status WMS Job Submission Services

17 Jobs State Machine (1/9) Submitted job is entered by the user to the User Interface but not yet transferred to Network Server for processing WMS Job Submission Services

18 Jobs State Machine (2/9) Waiting job accepted by NS and waiting for Workload Manager processing or being processed by WMHelper modules. WMS Job Submission Services

19 Jobs State Machine (3/9) Ready job processed by WM but not yet transferred to the CE (local batch system queue). WMS Job Submission Services

20 Jobs State Machine (4/9) Scheduled job waiting in the queue on the CE. WMS Job Submission Services

21 Jobs State Machine (5/9) Running job is running. WMS Job Submission Services

22 Jobs State Machine (6/9) Done job exited or considered to be in a terminal state by CondorC (e.g., submission to CE has failed in an unrecoverable way). WMS Job Submission Services

23 Jobs State Machine (7/9) Aborted job processing was aborted by WMS (waiting in the WM queue or CE for too long, expiration of user credentials). WMS Job Submission Services

24 Jobs State Machine (8/9) Cancelled job has been successfully canceled on user request. WMS Job Submission Services

25 Jobs State Machine (9/9) Cleared output sandbox was transferred to the user or removed due to the timeout. WMS Job Submission Services

26 “User interface” “possible operations” Find the list of resources suitable to run a specific job Submit a job/DAG for execution on a remote Computing Element Check the status of a submitted job/DAG Cancel one or more submitted jobs/DAGs Retrieve the output files of a completed job/DAG (output sandbox) Retrieve and display bookkeeping information about submitted jobs/DAGs Retrieve and display logging information about submitted jobs/DAGs Retrieve checkpoint states of a submitted checkpointable job Start a local listener for an interactive job Service architecture

27 The most relevant commands to interact with the WMS (NS):  edg-job-submit  edg-job-list-match  edg-job-status  edg-job-get-output  edg-job-cancel In gLite 3.0:  glite-job-submit  glite-job-list-match  glite-job-status  glite-job-output  glite-job-cancel Command Line Interface

28 Command Line Interface Job Submission  Perform the job submission to the Grid.  $ edg-job-submit [options]  $ glite-job-submit [options]  where is a file containing the job description, usually with extension.jdl. Now, all examples with edg-*

29 Command Line Interface If the request has been correctly submitted this is the tipical output that you can get: edg-job-submit test.jdl ===================glite-job-submit Success==================== The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is: - =============================================================== In case of failure, an error message will be displayed instead, and an exit status different form zero will be retured. Command Line Interface

30 Command Line Interface It is possible to see which CEs are eligible to run a job specified by a given JDL file using the command edg-job-list-match test.jdl Connecting to host lxshare0380.cern.ch, port 7772 Selected Virtual Organisation name (from UI conf file): dteam ***************************************************************** COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found: adc0015.cern.ch:2119/jobmanager-lcgpbs-infinite adc0015.cern.ch:2119/jobmanager-lcgpbs-long adc0015.cern.ch:2119/jobmanager-lcgpbs-short ***************************************************************** Command Line Interface

31 Command Line Interface After a job is submitted, it is possible to see its status using the glite-job-status command. edg-job-status ************************************************************* BOOKKEEPING INFORMATION: Printing status info for the Job: Current Status: Scheduled Status Reason: unavailable Destination: lxshare0277.cern.ch:2119/jobmanager-pbs-infinite reached on: Fri Aug 1 12:21: ************************************************************* Command Line Interface

32 Command Line Interface A job can be canceled before it ends using the command glite-job-cancel. edg-job-cancel Are you sure you want to remove specified job(s)? [y/n]n :y ===================glite-job-cancel Success==================== The cancellation request has been successfully submitted for the following job(s) - =========================================================== Command Line Interface

33 Command Line Interface After the job has finished (it reaches the DONE status), its output can be copied to the UI edg-job-get-output Retrieving files from host lxshare0234.cern.ch ***************************************************************** JOB GET OUTPUT OUTCOME Output sandbox files for the job: - have been successfully retrieved and stored in the directory: /tmp/jobOutput/snPegp1YMJcnS22yF5pFlg ***************************************************************** By default, the output is stored under /tmp, but it is possible to specify in which directory to save the output using the - -dir option. Command Line Interface

34 Second Part Job Description Language

35 Job Description Language match-making process The JDL is used in gLite to specify the job ’ s characteristics and constrains, which are used during the match-making process to select the best resources that satisfy job ’ s requirements.

36 JDL syntax JDL syntax The JDL syntax consists on statements like: Attribute = value; Attribute = value; Comments must be preceded by a sharp character # ( # ) or have to follow the C++ syntax WARNING: The JDL is sensitive to blank characters and tabs. No blank characters or tabs should follow the semicolon at the end of a line.

In a JDL, some attributes are mandatory while others are optional. An “ essential ” JDL is the following: Executable = “ test.sh ” ; StdOutput = “ std.out ” ; StdError = “ std.err ” ; InputSandbox = { “ test.sh ” }; OutputSandbox = { “ std.out ”, ” std.err ” }; If needed, arguments to the executable can be passed: Arguments = “ Hello World! ” ; JDL Syntax

38 JDL Syntax If the argument contains quoted strings, the quotes must be escaped with a backslash e.g. Arguments = “\”Hello World!\“ 10”; Special characters such as &, |, >, < are only allowed if specified inside a quoted string or preceded by triple \ e.g. Arguments = "-f file1\\\&file2";

39 Workload Manager Service The JDL allows the description of the following request types supported by the WMS:  Job: a simple application  DAG: a direct acyclic graph of dependent jobs  With WMProxy  Collection: a set of independent jobs  With WMProxy

40 Jobs The Workload Management System currently supports the following types for Jobs :  Normal: a simple batch, a set of commands to be processed as single unit  Interactive: a job whose standard streams are forwarded to the submitting client  MPICH: a parallel application using MPICH-P4 implementation of MPI  Partitionable: a job which is composed by a set of independent steps/iterations  Checkpointable: a job able to save its state  Parametric: a job whose JDL contains parametric attributes (e.g. Arguments, StdInput etc.) whose values can be made vary in order to obtain submission of several instances of similar jobs only differing for the value of the parameterized attributes. Support for parametric jobs is only available when the submission to the WMS is done through the WMProxy service a set of independent sub- jobs, each one taking care of a step or of a sub-set of steps, and which can be executed in parallel the job execution can be suspended and resumed later, starting from the same point where it was first stopped

41 JDL: Relevant Attributes JobType JobType (optional) Normal (simple, sequential job), Interactive, MPICH, Checkpointable, Partitionable, Parametric Or combination of them Checkpointable, Interactive Checkpointable, MPI E.g.:  JobType = “Interactive”;  JobType = {“Interactive”,”Checkpointable”}; “ Interactive ” + “ MPI ” not yet permitted “ Interactive ” + “ MPI ” not yet permitted

42 JDL: Relevant Attributes (cont.) Executable Executable (mandatory) This is a string representing the executable/command name. The user can specify an executable which is already on the remote CE Executable = {“/opt/EGEODE/GCT/egeode.sh”}; The user can provide a local executable name, which will be staged from the UI to the WN. Executable = {“egeode.sh”}; InputSandbox = {“/home/larocca/egeode/egeode.sh”};

43 JDL: Relevant Attributes (cont.) Arguments Arguments (optional) This is a string containing all the job command line arguments. E.g.: If your executable sum has to be started as: $ sum N1 N2 –out result.out Executable = “sum”; Arguments = “N1 N2 –out result.out”;

44 JDL: Relevant Attributes (cont.) Environment Environment (optional) List of environment settings needed by the job to run properly Environment = {“JAVA_HOME=/usr/java/j2sdk1.4.2_08”}; E.g. Environment = {“JAVA_HOME=/usr/java/j2sdk1.4.2_08”}; InputSandbox InputSandbox (optional) List of files on the UI local disk needed by the job for proper running The listed files will be automatically staged to the remote resource InputSandbox ={“myscript.sh”,”/tmp/cc.sh”}; E.g. InputSandbox ={“myscript.sh”,”/tmp/cc.sh”};

45 JDL: Relevant Attributes (cont.) OutputSandbox OutputSandbox (optional) List of files, generated by the job, which have to be retrieved from the CE OutputSandbox ={ “std.out”,”std.err”, “image.png”}; E.g. OutputSandbox ={ “std.out”,”std.err”, “image.png”};

46 JDL: Relevant Attributes (cont.) Requirements Requirements (optional) Job requirements on computing resources Specified using attributes of resources published in the Information Service If not specified, default value defined in UI configuration file is considered Requirements = other.GlueCEStateStatus == "Production“; Default. Requirements = other.GlueCEStateStatus == "Production“; Requirements=other.GlueCEUniqueID == “adc006.cern.ch:2119/jobmanager-pbs-infinite” Requirements=Member(“ALICE ”, other.GlueHostApplicationSoftwareRunTimeEnvir onment);

47 References JDL Attributes:  grid/docs/DataGrid-01-TEN _2.pdf grid/docs/DataGrid-01-TEN _2.pdf   wm/api_doc/wms_jdl/index.html wm/api_doc/wms_jdl/index.html LCG-2 User Guide Manual Series:  UserGuide.html UserGuide.html

48 Third Part Workload Manager Proxy

49 WMProxy WMProxy (Workload Manager Proxy)  is a new service providing access to the gLite Workload Management System (WMS) functionality through a simple Web Services based interface.  has been designed to handle a large number of requests for job submission  gLite 1.5 => ~180 secs for 500 jobs  goal is to get in the short term to ~60 secs for 1000 jobs  it provides additional features such as bulk submission and the support for shared and compressed sandboxes for compound jobs.  It’s the natural replacement of the NS in the passage to the SOA approach.

50 New request types Support for new types strongly relies on newly developed JDL converters and on the DAG submission support  all JDL conversions are performed on the server  a single submission for several jobs All new request types can be monitored and controlled through a single handle (the request id)  each sub-jobs can be however followed-up and controlled independently through its own id “ Smarter ” WMS client commands/API  allow submission of DAGs, collections and parametric jobs exploiting the concept of “shared sandbox”  allow automatic generation and submission of collections and DAGs from sets of JDL files located in user specified directories on the UI

51 WMProxy C++ client commands The commands to interact with WMProxy Service are: glite-wms-job-submit glite-wms-job-list-match glite-wms-job-cancel glite-wms-job-output In our examples: glite-wms-job-* are edg-job-*

52 References gLite 3.0 User Guide  R-GMA overview page  GLUE Schema  JDL attributes specification for WM proxy  WMProxy quickstart  wm/wmproxy_client_quickstart.shtml wm/wmproxy_client_quickstart.shtml WMS user guides 

Edificio Bronce Plaza Manuel Gómez Moreno s/n Madrid. España Tel.: / 25 Fax: Questions …