Enabling Grids for E-sciencE www.eu-egee.org Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

Slides:



Advertisements
Similar presentations
Workload Management David Colling Imperial College London.
Advertisements

EGEE is a project funded by the European Union under contract IST EGEE Tutorial Turin, January Hands on Job Services.
EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
The Grid Constantinos Kourouyiannis Ξ Architecture Group.
Job Submission The European DataGrid Project Team
Riccardo Bruno, INFN.CT Sevilla, 10-14/09/2007 GENIUS Exercises.
INFSO-RI Enabling Grids for E-sciencE Architecture of the gLite Workload Management System Giuseppe Andronico INFN EGEE Tutorial.
E-infrastructure shared between Europe and Latin America 12th EELA Tutorial for Users and System Administrators Architecture of the gLite.
SEE-GRID-SCI Hands-On Session: Workload Management System (WMS) Installation and Configuration Dusan Vudragovic Institute of Physics.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
1 Architecture of the gLite WMS Esther Montes Prado CIEMAT 10th EELA Tutorial Madrid,
IST E-infrastructure shared between Europe and Latin America Architecture of the gLite WMS Alexandre Duarte CERN Fifth EELA.
E-infrastructure shared between Europe and Latin America Architecture of the WMS Manuel Rubio del Solar CETA-CIEMAT EELA Tutorial, Mérida,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Submission Fokke Dijkstra RuG/SARA Grid.
Querétaro (Mexico), E2GRIS – Job Description Language JDL 1.
Basic Grid Job Submission Alessandra Forti 28 March 2006.
Glite WMS overview Alessandra Forti Computing Seminar Manchester 20th November 2008.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) WMPROXY API Python & C++ Diego Scardaci
Grid Initiatives for e-Science virtual communities in Europe and Latin America The Job Description Language JDL 1.
INFSO-RI Enabling Grids for E-sciencE The Workload Management System: an overview Giuseppe La Rocca INFN – Catania ICTP/INFM-Democritos.
The gLite API – PART I Giuseppe LA ROCCA INFN Catania ACGRID-II School 2-14 November 2009 Kuala Lumpur - Malaysia.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
Nadia LAJILI User Interface User Interface 4 Février 2002.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite job submission Fokke Dijkstra Donald.
INFSO-RI Enabling Grids for E-sciencE The gLite Workload Management System Elisabetta Molinari (INFN-Milan) on behalf of the JRA1.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Feb. 06, Introduction to High Performance and Grid Computing Faculty of Sciences,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Submission Fokke Dijkstra RuG/SARA Grid.
EGEE is a project funded by the European Union under contract IST Job Description Language - more control over your Job Assaf Gottlieb University.
Architecture of the gLite WMS (Workload Management System) Hands-on Paola Celio Universita’ Roma TRE INFN Roma TRE Sevilla Septembre 2007.
EGEE is a project funded by the European Union under contract IST EGEE Tutorial Turin, January Job Services Emidio.
Job Management DIRAC Project. Overview  DIRAC JDL  DIRAC Commands  Tutorial Exercises  What do you have learned? KEK 10/2012DIRAC Tutorial.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
Enabling Grids for E-sciencE Workload Management System on gLite middleware - commands Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi.
High-Performance Computing Lab Overview: Job Submission in EDG & Globus November 2002 Wei Xing.
EGEE is a project funded by the European Union under contract IST WS-Based Advance Reservation and Co-allocation Architecture Proposal T.Ferrari,
EGEE-II INFSO-RI Enabling Grids for E-sciencE Workload management in gLite 3.x - MPI P. Nenkova, IPP-BAS, Sofia, Bulgaria Some of.
Workload Management System Jason Shih WLCG T2 Asia Workshop Dec 2, 2006: TIFR.
INFSO-RI Enabling Grids for E-sciencE Job Description Language (JDL) Giuseppe La Rocca INFN First gLite tutorial on GILDA Catania,
EGEE is a project funded by the European Union under contract IST Job Description Language – How to control your Job Nadav Grossaug IsraGrid.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Practical using WMProxy advanced job submission.
EGEE 3 rd conference - Athens – 20/04/2005 CREAM JDL vs JSDL Massimo Sgaravatto INFN - Padova.
Biomed tutorial 1 Enabling Grids for E-sciencE INFSO-RI EGEE is a project funded by the European Union under contract IST JDL Flavia.
User Interface UI TP: UI User Interface installation & configuration.
LCG2 Tutorial Viet Tran Institute of Informatics Slovakia.
Introduction to Computing Element HsiKai Wang Academia Sinica Grid Computing Center, Taiwan.
Introduction to Job Description Language (JDL) Alessandro Costa INAF Catania Corso di Calcolo Parallelo Grid Computing Catania - ITALY September.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
Architecture of the gLite WMS
Workload Management System on gLite middleware
Workload Management System ( WMS )
EGEE tutorial, Job Description Language - more control over your Job Assaf Gottlieb Tel-Aviv University EGEE is a project.
Job Submission in the DataGrid Workload Management System
Introduction to Grid Technology
Workload Management System
gLite Job Management Mario Reale GARR
5. Job Submission Grid Computing.
gLite Advanced Job Management
gLite Job Management Amina KHEDIMI CERIST
The gLite Workload Management System
Workload Management System (WMS) & Job Description Language (JDL)
gLite Job Management Christos Theodosiou
Job Description Language
Job Description Language (JDL)
Job Submission M. Jouvin (LAL-Orsay)
Presentation transcript:

Enabling Grids for E-sciencE Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam) November 5th, 2007 Credits: Valeria Ardizzone and other EGEE colleagues…

ACGrid School 5-9/ Enabling Grids for E-sciencE Outline  Overview of WMS Architecture  Task Queue, Information Supermarket, MatchMaker, Scheduling Policies, Job Submission Service, Job Logging & Bookkeeping.  Job Description Language Overview  Basic attributes  Advanced attributes  Practice  Command line  Exercises

ACGrid School 5-9/ Enabling Grids for E-sciencE Workload Management System (WMS) Is the gLite3 component that allows users to submit jobs. Performs all tasks required to execute jobs. Comprises a set of Grid middleware components responsible for distribution and management of tasks across Grid resources. Hides to the user the complexity of the Grid.

ACGrid School 5-9/ Enabling Grids for E-sciencE WMS’s Architecture

ACGrid School 5-9/ Enabling Grids for E-sciencE WMS’s Architecture Job management requests (submission, cancellation) expressed via a Job Description Language (JDL)

ACGrid School 5-9/ Enabling Grids for E-sciencE WMS’s Architecture Finds an appropriate CE for each submission request, taking into account job requests and preferences, Grid status, utilization policies on resources

ACGrid School 5-9/ Enabling Grids for E-sciencE WMS’s Architecture Repository of resource information information available to matchmaker Updated via notifications and/or active polling on resources

ACGrid School 5-9/ Enabling Grids for E-sciencE WMS’s Architecture Keeps submission requests Requests are kept for a while for a while if no resources are immediately available

ACGrid School 5-9/ Enabling Grids for E-sciencE WMS’s Architecture Performs the actual job submission and monitoring

ACGrid School 5-9/ Enabling Grids for E-sciencE WMS Components (1) Network Server NS - WMProxy Accepts incoming requests from the UI (job submission, job removal) If valid, passes them to the Workload Manager

ACGrid School 5-9/ Enabling Grids for E-sciencE WMS Components (2) Workload Manager WM Core component of the WMS Takes appropriate actions to satisfy requests – Resource Broker (MatchMaker) RB Finds the resources that best match the request – Information SuperMarket ISM Repository of resource information available in readonly mode to the RB – Task Queue  Give the possibility to keep the request if no resources are immediately available  Not matching request will be retried periodically (eager scheduling)  Or wait for notification of avalaible resources (lazy scheduling)

ACGrid School 5-9/ Enabling Grids for E-sciencE WMS Components (3) eager scheduling (“push” model) a job is bound to a resource as soon as possible. Once the decision has been taken, the job is passed to the selected resource for execution. lazy scheduling (“pull” model) the job is held by the WM until a resource becomes available. When this happens the resource is matched against the submitted job.

ACGrid School 5-9/ Enabling Grids for E-sciencE WMS Components (4) WMS components handling the job during its lifetime and performs the submission Job Adapter (JA) – is responsible for  making the final touches to the JDL expression for a job, before it is passed to CondorC for the actual submission  creating the job wrapper script that creates the appropriate execution environment in the CE worker node transfer of the input and of the output sandboxes CondorC – responsible for  performing the actual job management operations job submission, job removal

ACGrid School 5-9/ Enabling Grids for E-sciencE WMS Components (5) Log Monitor (LM) – is responsible for  watching the CondorC log file  intercepting interesting events concerning active jobs Proxy Renewal Service – is responsible to assure that,  for all the lifetime of a job, a valid user proxy exists within the WMS  MyProxy Server is contacted in order to renew the user's credential Logging & Bookkeeping (LB) – is responsible to  Store events generated by the variuos components of the WMS  Querying the LB user can retrieve information about the job status

ACGrid School 5-9/ Enabling Grids for E-sciencE Jobs State Machine (1/9) Submitted job is entered by the user to the User Interface but not yet transferred to Network Server for processing

ACGrid School 5-9/ Enabling Grids for E-sciencE Jobs State Machine (2/9) Waiting job accepted by NS and waiting for Workload Manager processing or being processed by WMHelper modules.

ACGrid School 5-9/ Enabling Grids for E-sciencE Jobs State Machine (3/9) Ready job processed by WM but not yet transferred to the CE (local batch system queue).

ACGrid School 5-9/ Enabling Grids for E-sciencE Jobs State Machine (4/9) Scheduled job waiting in the queue on the CE.

ACGrid School 5-9/ Enabling Grids for E-sciencE Jobs State Machine (5/9) Running job is running on Worker Node.

ACGrid School 5-9/ Enabling Grids for E-sciencE Jobs State Machine (6/9) Done job exited or considered to be in a terminal state by CondorC (e.g., submission to CE has failed in an unrecoverable way).

ACGrid School 5-9/ Enabling Grids for E-sciencE Jobs State Machine (7/9) Aborted job processing was aborted by WMS (waiting in the WM queue or CE for too long, expiration of user credentials).

ACGrid School 5-9/ Enabling Grids for E-sciencE Jobs State Machine (8/9) Cancelled job has been successfully canceled on user request.

ACGrid School 5-9/ Enabling Grids for E-sciencE Jobs State Machine (9/9) Cleared output sandbox was transferred to the user or removed due to the timeout.

ACGrid School 5-9/ Enabling Grids for E-sciencE Job Description Language

ACGrid School 5-9/ Enabling Grids for E-sciencE The JDL language Job Description Language (JDL) The Job Description Language (JDL) describes jobs for execution on Grid. CLASSified Advertisement language (ClassAd) The JDL adopted within the gLite middleware is based upon Condor’s CLASSified Advertisement language (ClassAd). A ClassAd is a record-like structure composed of a finite number of attributes separated by semi-colon (;) A ClassAd is highly flexible and can be used to represent arbitrary services The JDL file is processed by the “Match-making process” to select the best resource that satisfy the job’s requirements

ACGrid School 5-9/ Enabling Grids for E-sciencE JDL file lines The JDL file lines have the format : Attribute = expression; 2 categories of attributes: 1.Job Attributes define the job itself 2.Resources indicate the job constraints in terms of: Computing Resource Data and Storage resources The JDL language Comments are indicated by # or // The JDL is sensitive to blank characters and tabs. No blank characters or tabs should follow the semicolon at the end of a line.

ACGrid School 5-9/ Enabling Grids for E-sciencE In a JDL, some attributes are mandatory while others are optional. An “essential” JDL is the following: If needed, arguments to the executable can be passed: Arguments = “arguments list”; [ Executable = “test.sh”; StdOutput = “std.out”; StdError = “std.err”; InputSandbox = {“test.sh”}; OutputSandbox = {“std.out”,”std.err”}; ] JDL : basic attributes

ACGrid School 5-9/ Enabling Grids for E-sciencE Executable = “test.sh”; StdOutput = “std.out”; StdError = “std.err”; InputSandbox = {“test.sh”}; OutputSandbox = {“std.out”,”std.err”}; Executable = (mandatory) represents the execetable/command name you can specify an executable that: already exixts on the remote WN will be copied from the UI to the WN the arguments are reported in a specific attribute JDL : basic attributes

ACGrid School 5-9/ Enabling Grids for E-sciencE Arguments = (optional) arguments for executable file: “-out outputfile.dat” with: Executable = “execprog”; on the Worker Node (WN) we will have: $ execprog -out outputfile.dat the characters “” should be preceded by \ “ -a \”quoted string\” -bcd” becomes: $ execprog -a ”quoted string” –bcd Special characters (&, |, >, <) should be preceded by triple \ : Arguments = "-f file1\\\&file2"; JDL : basic attributes

ACGrid School 5-9/ Enabling Grids for E-sciencE StdOutput, StdError, StdInput = (optional) paths of the output / error / input files StdOutput and StdError: must be also in Output Sandbox could have the same value Executable = “test.sh”; StdOutput = “std.out”; StdError = “std.err”; InputSandbox = {“test.sh”}; OutputSandbox = {“std.out”,”std.err”}; JDL : basic attributes

ACGrid School 5-9/ Enabling Grids for E-sciencE InputSandbox = (optional) contains the input files to be copied from the UI on the WN before the job execution only local UI files (for LFNs use the InputData attribute) the files can’t be over 10 MB each different files with different names (the destination dir is the same) Executable = “test.sh”; StdOutput = “std.out”; StdError = “std.err”; InputSandbox = {“test.sh”}; OutputSandbox = {“std.out”,”std.err”}; JDL : basic attributes

ACGrid School 5-9/ Enabling Grids for E-sciencE OutputSandbox = contains the output files to be transferred from the WN on the UI after the job execution different files with different names (the destination dir is the same) Executable = “test.sh”; StdOutput = “std.out”; StdError = “std.err”; InputSandbox = {“test.sh”}; OutputSandbox = {“std.out”,”std.err”}; JDL : basic attributes

ACGrid School 5-9/ Enabling Grids for E-sciencE JDL : advanced attributes JobType JobType (optional) Normal (simple job) Interactive (an interactive session with the user is established) MPICH (an MPI parallel job) Checkpointable (that the job execution can be suspended, and resumed later, starting from the same point where it was first stopped) Partitionable (composed by a set of independent steps which can be executed in parallel) Parametric (contains parametric attributes varying from a sumission to another)

ACGrid School 5-9/ Enabling Grids for E-sciencE Requirements Requirements (mandatory) Job requirements on Grid resources (CE,SE,…) Evaluation performed by the Match Maker Specified using attributes published by the Information Service If not specified, the default value is: Requirements = other.GlueCEStateStatus == "Production“; Examples: Requirements = other.GlueCEUniqueID == “clrlcgce01.in2p3.fr:2119/jobmanager-lcgpbs-auvergrid” Requirements = Member(“AUVERGRID ”, other.GlueHostApplicationSoftwareRunTimeEnvironment); Requirements = other.GlueCEInfoTotalCPUs > 2 && other.GlueCEPolicyMaxRunningJobs < 2; JDL : advanced attributes

ACGrid School 5-9/ Enabling Grids for E-sciencE JDL : advanced attributes Rank Rank (mandatory) Floating-Point expression used to rank CEs that have already met the Requirements expression. can contain attributes that describe the CE in the Information System (IS). evaluation performed by the Resource Broker (RB) during the match-making phase. A higher numeric value equals a better rank. If not specified, the default value is: Rank = -other.GlueCEStateEstimatedResponseTime; Rank = other.GlueCEStateFreeCPUs; E.g.: Rank = other.GlueCEStateFreeCPUs;

ACGrid School 5-9/ Enabling Grids for E-sciencE Environment = (optional) environment variables strings format: = example: Environment = { “JOB_LOG_FILE=/tmp/job.log”, “INP_DIR=/tmp/input_files” }; JDL : advanced attributes

ACGrid School 5-9/ Enabling Grids for E-sciencE References EGEE User Guide UserGuide.pdf JDL Attributes JDL-Attributes-v0-8.pdf

ACGrid School 5-9/ Enabling Grids for E-sciencE Thank you