Job Description Language

Slides:



Advertisements
Similar presentations
Workload Management David Colling Imperial College London.
Advertisements

EGEE is a project funded by the European Union under contract IST EGEE Tutorial Turin, January Hands on Job Services.
EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
FESR Consorzio COMETA - Progetto PI2S2 The gLite Workload Management System Annamaria Muoio INFN Catania Italy
The Grid Constantinos Kourouyiannis Ξ Architecture Group.
EGEE is funded by the European Union under contract IST Elena Slabospitskaya IHEP NA3 manager for Russia An inroduction to services provided.
Riccardo Bruno, INFN.CT Sevilla, 10-14/09/2007 GENIUS Exercises.
INFSO-RI Enabling Grids for E-sciencE Architecture of the gLite Workload Management System Giuseppe Andronico INFN EGEE Tutorial.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Submission Fokke Dijkstra RuG/SARA Grid.
The EDG Workload Management System – n° 1 The EDG Workload Management System.
Basic Grid Job Submission Alessandra Forti 28 March 2006.
Job Submission The European DataGrid Project Team
EGEE Summer School Grid Systems – 3-8 July Job submission into the LHC Grid (Job Management + JDL) EGEE is funded by the European Union under.
INFSO-RI Enabling Grids for E-sciencE The Workload Management System: an overview Giuseppe La Rocca INFN – Catania ICTP/INFM-Democritos.
Computational grids and grids projects DSS,
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.
Enabling Grids for E-sciencE Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
1 Esther Montes Prado CIEMAT 10th EELA Tutorial Madrid, Hands-on on WMS (Review and Summary)
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
Job Submission The European DataGrid Project Team
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite job submission Fokke Dijkstra Donald.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Feb. 06, Introduction to High Performance and Grid Computing Faculty of Sciences,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Submission Fokke Dijkstra RuG/SARA Grid.
EGEE is a project funded by the European Union under contract IST Job Description Language - more control over your Job Assaf Gottlieb University.
Architecture of the gLite WMS (Workload Management System) Hands-on Paola Celio Universita’ Roma TRE INFN Roma TRE Sevilla Septembre 2007.
EGEE is a project funded by the European Union under contract IST EGEE Tutorial Turin, January Job Services Emidio.
High-Performance Computing Lab Overview: Job Submission in EDG & Globus November 2002 Wei Xing.
EGEE is a project funded by the European Union under contract IST WS-Based Advance Reservation and Co-allocation Architecture Proposal T.Ferrari,
Workload Management System Jason Shih WLCG T2 Asia Workshop Dec 2, 2006: TIFR.
INFSO-RI Enabling Grids for E-sciencE Job Description Language (JDL) Giuseppe La Rocca INFN First gLite tutorial on GILDA Catania,
INFSO-RI Enabling Grids for E-sciencE GILDA Praticals Giuseppe La Rocca INFN – Catania gLite Tutorial at the EGEE User Forum CERN.
EGEE is a project funded by the European Union under contract IST Job Description Language – How to control your Job Nadav Grossaug IsraGrid.
E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA Special Jobs Valeria Ardizzone INFN - Catania.
Job Submission The European DataGrid Project Team
Biomed tutorial 1 Enabling Grids for E-sciencE INFSO-RI EGEE is a project funded by the European Union under contract IST JDL Flavia.
Istituto Nazionale di Astrofisica Information Technology Unit INAF-SI Job with data management Giuliano Taffoni.
EGEE is a project funded by the European Union under contract IST GENIUS and GILDA Guy Warner NeSC Training Team Induction to Grid Computing.
GRID commands lines Original presentation from David Bouvet CC/IN2P3/CNRS.
FESR Consorzio COMETA - Progetto PI2S2 Jobs with Input/Output data Fabio Scibilia, INFN - Catania, Italy Tutorial per utenti e.
Introduction to Job Description Language (JDL) Alessandro Costa INAF Catania Corso di Calcolo Parallelo Grid Computing Catania - ITALY September.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
Advanced services in gLite Gergely Sipos and Peter Kacsuk MTA SZTAKI.
ELTE lectures Grid Systems – 2004/ nd semester - 1 Job submission into the LHC Grid EGEE is funded by the European Union under contract IST
Information System testing for LCG-1
Architecture of the gLite WMS
First proposal for a modification of the GIS schema
Workload Management System on gLite middleware
Special jobs with the gLite WMS
Design rationale and status of the org.glite.overlay component
Corso di Calcolo Parallelo Grid Computing
lcg-infosites documentation (v2.1, LCG2.3.1) 10/03/05
EGEE tutorial, Job Description Language - more control over your Job Assaf Gottlieb Tel-Aviv University EGEE is a project.
Job Submission in the DataGrid Workload Management System
I2G CrossBroker Enol Fernández UAB
Workload Management System
gLite Job Management Mario Reale GARR
5. Job Submission Grid Computing.
Special Jobs: MPI Alessandro Costa INAF Catania
login: clermont-ferrandxx password: GridCLExx
The EU DataGrid Job Submission Services
The gLite Workload Management System
Workload Management System (WMS) & Job Description Language (JDL)
Job Description Language
Job Description Language (JDL)
Job Submission M. Jouvin (LAL-Orsay)
Presentation transcript:

Job Description Language Gergely Sipos, Péter Kacsuk MTA SZTAKI Title: title of this talk. Also place in footer on master slide. Presenter’s name. EGEE is funded by the European Union under contract IST-2003-508833 Grid Computing School – 10-12 July 2006, Rio de Janeiro 1

Job Description Language The supported attributes are grouped in two categories: Job Attributes Define the job itself Resource expression attributes Taken into account by the RB for carrying out the matchmaking algorithm (to choose the “best” resource where to submit the job) Computing Resource Used to build expressions of Requirements and/or Rank attributes by the user Have to be prefixed with “other.” (external) or “self.” (internal) Data and Storage resources Input data to process, SE where to store output data, protocols spoken by application when accessing SEs Grid Computing School – 10-12 July 2006, Rio de Janeiro 2

JDL: some relevant attributes JobType Normal (simple, sequential job), Interactive, MPICH, Checkpointable Or combination of them Executable (mandatory) The command name Arguments (optional) Job command line arguments StdInput, StdOutput, StdError (optional) Standard input/output/error of the job Environment (optional) List of environment settings InputSandbox (optional) List of files on the UI local disk needed by the job for running The listed files will automatically staged to the remote resource OutputSandbox (optional) List of files, generated by the job, which have to be retrieved VirtualOrganisation (optional) A different way to specify the VO of the user Grid Computing School – 10-12 July 2006, Rio de Janeiro 3

JDL: some relevant attributes II Input Data (For the broker but no data movement) DataAccessProtocol file|gridftp|rfio (Together with InputData) Output Data {OutputFile= [CE path] [ StorageElement= SE ] [ LogicalFileName = lfn:fileName ] }(Real Data movement – LCG, no data movement - gLite) OutputSE rank requirements MyProxyServer RetryCount NodeNumber JobSteps Grid Computing School – 10-12 July 2006, Rio de Janeiro 4

Example of JDL file [ JobType = “Normal”; Executable = "/exe/sum.exe"; InputSandbox = {"/home/user/WP1testC","/home/file*”, "/home/user/DATA/*"}; OutputSandbox = {“sim.err”, “test.out”, “sim.log"}; Requirements = (other.GlueHostOperatingSystemName == “linux") && (other.GlueCEPolicyMaxWallClockTime > 10000); Rank = other.GlueCEStateFreeCPUs; ] Grid Computing School – 10-12 July 2006, Rio de Janeiro 6

A “real world” JDL file job attributes part [ JobType = "normal"; Executable = "lexor_wrap.sh"; StdOutput = "dc2.003020.digit.A8_QCD._01730.job.log.3"; StdError = "dc2.003020.digit.A8_QCD._01730.job.log.3"; OutputSandbox {"metadata.xml", "lexor_wrap.log","dq_337704_stagein.log","dq_337704_stageout.log",\ "dc2.003020.digit.A8_QCD._01730.job.log.3" }; RetryCount = 0; Arguments = "dc2.003020.simul.A8_QCD._01730.pool.root,\ dc2.003020.digit.A8_QCD._01730.pool.root.3 100 0"; Environment = { "LEXOR_WRAPPER_LOG=lexor_wrap.log","LEXOR_STAGEOUT_MAXATTEMPT=5","LEXOR_STAGEOUT_INTERVAL=60","LEXOR_LCG_GFAL_INFOSYS=atlas-bdii.cern.ch:2170","LEXOR_T_RELEASE=8.0.7","LEXOR_T_PACKAGE=8.0.7.5/JobTransforms","LEXOR_T_BASEDIR=JobTransforms-08-00-07-05","LEXOR_TRANSFORMATION=share/dc2.g4digit.trf","LEXOR_STAGEIN_LOG=dq_337704_stagein.log","LEXOR_STAGEIN_SCRIPT=dq_337704_stagein.sh","LEXOR_STAGEOUT_LOG=dq_337704_stageout.log","LEXOR_STAGEOUT_SCRIPT=dq_337704_stageout.sh" }; MyProxyServer = "lxb0727.cern.ch"; VirtualOrganisation = "atlas"; rank = -other.GlueCEStateEstimatedResponseTime job attributes part Grid Computing School – 10-12 July 2006, Rio de Janeiro 7

A “real world” JDL file (cont.) resource attributes part requirements = ( Member("VO-atlas-lcg-release-0.0.2", other.GlueHostApplicationSoftwareRunTimeEnvironment) && (other.GlueCEStateStatus == "Production“) && !Member("VO-atlas-has-m1", other.GlueHostApplicationSoftwareRunTimeEnvironment)) && (other.GlueCEInfoHostName != "lcgce02.gridpp.rl.ac.uk" ) && (other.GlueCEInfoHostName != "lcg-ce.lps.umontreal.ca" ) && (other.GlueCEInfoHostName != "lcgce02.triumf.ca" ) && (other.GlueCEInfoHostName != "ce-a.ccc.ucl.ac.uk" ) && Member("VO-atlas-release-8.0.7", other.GlueHostApplicationSoftwareRunTimeEnvironment)) && ( other.GlueCEPolicyMaxCPUTime >= (Member("LCG-2_1_0",other.GlueHostApplicationSoftwareRunTimeEnvironment) ? ( 36000000 / 60 ) : 36000000 ) / other.GlueHostBenchmarkSI00 ) ) && ( other.GlueHostNetworkAdapterOutboundIP == true ) && (other.GlueHostMainMemoryRAMSize >= 512 ) ); ] Grid Computing School – 10-12 July 2006, Rio de Janeiro 8

Requirements Job requirements on the resources Specified using GLUE attributes of resources published in the Information Service Its value is a boolean expression Only one requirements can be specified ( one C-like logic expression ) if there are more than one, only the last one is taken into account If not specified, default value defined in UI configuration file is considered Default: other.GlueCEStateStatus == "Production" (the resource has to be able to accept jobs and dispatch them on WNs) Grid Computing School – 10-12 July 2006, Rio de Janeiro 9

Relevant Glue Attributes 1 (State) State (objectclass GlueCEState) GlueCEStateRunningJobs: number of running jobs GlueCEStateWaitingJobs: number of jobs not running GlueCEStateTotalJobs: total number of jobs (running + waiting) GlueCEStateStatus: queue status: queueing (jobs are accepted but not run), production (jobs are accepted and run), closed (jobs are neither accepted nor run), draining (jobs are not accepted but those in the queue are run) GlueCEStateWorstResponseTime: worst possible time between the submission of a job and the start of its execution GlueCEStateEstimatedResponseTime: estimated time between the submission of a job and the start of its execution GlueCEStateFreeCPUs: number of CPUs available to the scheduler Grid Computing School – 10-12 July 2006, Rio de Janeiro 10

Relevant Glue Attributes 2 (Hardware) Architecture (objectclass GlueHostArchitecture) GlueHostArchitecturePlatformType: platform description GlueHostArchitectureSMPSize: number of CPUs Processor (objectclass GlueHostProcessor) GlueHostProcessorVendor: name of the CPU vendor GlueHostProcessorModel: name of the CPU model GlueHostProcessorVersion: version of the CPU GlueHostProcessorOtherProcessorDescription: other description for the CPU […] Grid Computing School – 10-12 July 2006, Rio de Janeiro 11

Relevant Glue Attributes 3 (HW & Software) Application software (objectclass GlueHostApplicationSoftware) GlueHostApplicationSoftwareRunTimeEnvironment: list of software installed on this host Main memory (objectclass GlueHostMainMemory) GlueHostMainMemoryRAMSize: physical RAM GlueHostMainMemoryVirtualSize: size of the configured virtual memory Benchmark (objectclass GlueHostBenchmark) GlueHostBenchmarkSI00: SpecInt2000 benchmark GlueHostBenchmarkSF00: SpecFloat2000 benchmark Network adapter (objectclass GlueHostNetworkAdapter) […] GlueHostNetworkAdapterOutboundIP: permission for outbound connectivity GlueHostNetworkAdapterInboundIP: permission for inbound connectivity Grid Computing School – 10-12 July 2006, Rio de Janeiro 12

Relevant Glue Attributes 4: policy of LRMS GlueCEPolicyMaxWallClockTime: maximum wall clock time available to jobs submitted to the CE, in seconds (previously it was in minutes) GlueCEPolicyMaxCPUTime: maximum CPU time available to jobs submitted to the CE, in seconds (previously it was in minutes) GlueCEPolicyMaxTotalJobs: maximum allowed total number of jobs in the queue GlueCEPolicyMaxRunningJobs: maximum allowed number of running jobs in the queue Grid Computing School – 10-12 July 2006, Rio de Janeiro 13

Exercise: JDL Requirements other.GlueCEInfoLRMSType == “PBS” && other.GlueCEInfoTotalCPUs > 1 (the resource has to use PBS as the LRMS and whose WNs have at least two CPUs) Member(“CMSIM-133”, other.GlueHostApplicationSoftwareRunTimeEnvironment) (a particular experiment software has to run on the resource and this information is published on the resource environment) The Member operator tests if its first argument is a member of its second argument. Used in case of multi attribute. RegExp(“cern.ch”, other.GlueCEUniqueId) (the job has to run on the CEs in the domain cern.ch) Matches the regular expression (other.GlueHostNetworkAdapterOutboundIP == true) && Member(“VO-alice-Alien”, other.GlueHostApplicationSoftwareRunTimeEnvironment) && Member(“VO-alice-Alien-v4-01-Rev-01”, other.GlueHostApplicationSoftwareRunTimeEnvironment) && (other.GlueCEPolicyMaxWallClockTime > 86000) (the resource must have some packages installed VO-alice-Alien and VO-alice-Alien-v4-01-Rev-01 and the job may run for more than 86000 WallClock time units) Grid Computing School – 10-12 July 2006, Rio de Janeiro 14

Rank Expresses preference (how to rank resources that have already met the Requirements expression) It is expressed as a floating-point number The CE with the highest rank is the one selected (see Matchmaking later on) If not specified, default value defined in the UI configuration file is considered Example: -other.GlueCEStateEstimatedResponseTime (the lowest estimated traversal time) Usually the default Grid Computing School – 10-12 July 2006, Rio de Janeiro 15

WMS Matchmaking Grid Computing School – 10-12 July 2006, Rio de Janeiro 16

The Matchmaking algorithm The matchmaker has the goal to find the best suitable CE where to execute the job To accomplish this task, the WMS interacts with the other EGEE components (File Catalogue, and Information Service) There are three different scenarios to deal with: Direct job submission Job submission without data-access requirements Job submission with data-access requirements Grid Computing School – 10-12 July 2006, Rio de Janeiro 17

The Matchmaking algorithm: direct job submission CE defined in the JDL The WMS does not perform any matchmaking algorithm at all The job is simply submitted to the specified CE CE defined during the edg-job-submit (glite-job-submit) command: If the CEId is specified then the WMS Does NOT check whether the user is authorised to access the CE Does NOT interact with the File Catalog for the resolution of files requirements Only checks the JDL syntax, while converting the JDL into a ClassAd Syntax: edg-job-submit --resource <ce_id> <job.jdl> command ce_id = hostaname:port/jobmanager-lsf-grid01 Grid Computing School – 10-12 July 2006, Rio de Janeiro 18

The user JDL contains some requirements The Matchmaking algorithm: job submission without data access requirements (I) The user JDL contains some requirements Once the JDL has been received by the WMS and converted in ClassAd, the WMS invokes the matchmaker The matchmaker has to find if the characteristics and status of Grid resources match the job requirements Grid Computing School – 10-12 July 2006, Rio de Janeiro 19

There are two phases of evaluation: The Matchmaking algorithm: job submission without data access requirements (II) There are two phases of evaluation: Requirements check: The Matchmaker contacts the BDII in order to create a set of suitable CEs compliant with user requirements and where the user is authorized to submit jobs The Matchmaker creates the set of suitable CEs Ranking phase: The Matchmaker contacts the BDII again to obtain the values of those attributes that are in the rank expression The CE with maximum rank value is selected If 2 or more CE have same rank, Matchmakes selects random one Can adopt a stochastic selection (enabling fuzzyness) The user has to set the JDL FuzzyRank attribute to true The rank value = probability to select the CE The higher the rank value is, the higher the probability is. Grid Computing School – 10-12 July 2006, Rio de Janeiro 20

The Matchmaking algorithm: job submission with data access requirements (I) The user can specify in the JDL the following attributes InputData represents the input files InputData = {“lfn:my-file-001"} lfn=logical file name, see Data Management OutputSE represents the SE where the output file should be staged OutputSE = "gilda-se-01.pd.infn.it"; OutputData represents the output files OutputFile = "dummy.dat"; StorageElement = "gilda-se-01.pd.infn.it"; LogicalFileName = "lfn:iome_outputData"; DataAccessProtocol represents the protocol spoken by the application to access the file DataAccessProtocol = "gsiftp"; Match- Maker/ Broker FC IS Grid Computing School – 10-12 July 2006, Rio de Janeiro 21

The Matchmaker finds the most suitable CEs taking into account The Matchmaking algorithm: job submission with data access requirements (II) The Matchmaker finds the most suitable CEs taking into account the SEs where input data are physically stored the SE where output data should be staged Previous to requirements and ranking checks, the broker Performs a pre-match processing interacts with File Catalog Filters CEs satisfying both data access and user authorization requirements Grid Computing School – 10-12 July 2006, Rio de Janeiro 22

The Matchmaker finds most suitable CEs considering The Matchmaking algorithm: job submission with data access requirements(III) Summary The Matchmaker interacts with a File Catalogue and the Information Service The FC is used to resolve the location of data (see Data Management talk for more details ) The Matchmaker finds most suitable CEs considering SEs where input data are physically stored SEs where output data should be staged Previous to requirements and ranking checks, the broker Performs a pre-match processing (access the FC) Filters CEs satisfying both data access and user authorization requirements Grid Computing School – 10-12 July 2006, Rio de Janeiro 23