EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org www.glite.org Architecture of the WMS Yaodong Cheng CC-IHEP, Chinese Academy of Sciences.

Slides:



Advertisements
Similar presentations
Workload Management David Colling Imperial College London.
Advertisements

EGEE is a project funded by the European Union under contract IST EGEE Tutorial Turin, January Hands on Job Services.
EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
The Grid Constantinos Kourouyiannis Ξ Architecture Group.
Job Submission The European DataGrid Project Team
Development of test suites for the certification of EGEE-II Grid middleware Task 2: The development of testing procedures focused on special details of.
INFSO-RI Enabling Grids for E-sciencE Architecture of the gLite Workload Management System Giuseppe Andronico INFN EGEE Tutorial.
E-infrastructure shared between Europe and Latin America 12th EELA Tutorial for Users and System Administrators Architecture of the gLite.
SEE-GRID-SCI Hands-On Session: Workload Management System (WMS) Installation and Configuration Dusan Vudragovic Institute of Physics.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
1 Architecture of the gLite WMS Esther Montes Prado CIEMAT 10th EELA Tutorial Madrid,
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
IST E-infrastructure shared between Europe and Latin America Architecture of the gLite WMS Alexandre Duarte CERN Fifth EELA.
E-infrastructure shared between Europe and Latin America Architecture of the WMS Manuel Rubio del Solar CETA-CIEMAT EELA Tutorial, Mérida,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Submission Fokke Dijkstra RuG/SARA Grid.
Special Jobs Claudio Cherubino INFN - Catania. 2 MPI jobs on gLite DAG Job Collection Parametric jobs Outline.
Basic Grid Job Submission Alessandra Forti 28 March 2006.
Glite WMS overview Alessandra Forti Computing Seminar Manchester 20th November 2008.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Special Jobs Matias Zabaljauregui UNLP.
Grid Initiatives for e-Science virtual communities in Europe and Latin America The Job Description Language JDL 1.
INFSO-RI Enabling Grids for E-sciencE The Workload Management System: an overview Giuseppe La Rocca INFN – Catania ICTP/INFM-Democritos.
The gLite API – PART I Giuseppe LA ROCCA INFN Catania ACGRID-II School 2-14 November 2009 Kuala Lumpur - Malaysia.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Computational grids and grids projects DSS,
Enabling Grids for E-sciencE Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite job submission Fokke Dijkstra Donald.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Using gLite API Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers.
INFSO-RI Enabling Grids for E-sciencE WMS + LB Installation Emidio Giorgio Giuseppe La Rocca INFN EGEE Tutorial, Rome November.2005.
INFSO-RI Enabling Grids for E-sciencE The gLite Workload Management System Elisabetta Molinari (INFN-Milan) on behalf of the JRA1.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Feb. 06, Introduction to High Performance and Grid Computing Faculty of Sciences,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Submission Fokke Dijkstra RuG/SARA Grid.
EGEE is a project funded by the European Union under contract IST Job Description Language - more control over your Job Assaf Gottlieb University.
E-infrastructure shared between Europe and Latin America 1 Workload Management System-WMS Luciano Diaz Universidad Nacional Autónoma de México - UNAM Mexico.
Glite. Architecture Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services are supposed.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
INFSO-RI Enabling Grids for E-sciencE Claudio Cherubino, INFN Catania Grid Tutorial for users Merida, April 2006 Special jobs.
Enabling Grids for E-sciencE Workload Management System on gLite middleware - commands Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi.
High-Performance Computing Lab Overview: Job Submission in EDG & Globus November 2002 Wei Xing.
EGEE-0 / LCG-2 middleware Practical.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Alexandre Duarte CERN IT-GD-OPS UFCG LSD 1st EELA Grid School.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Workload management in gLite 3.x - MPI P. Nenkova, IPP-BAS, Sofia, Bulgaria Some of.
Workload Management System Jason Shih WLCG T2 Asia Workshop Dec 2, 2006: TIFR.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMPROXY usage Álvaro Fernández IFIC (CSIC)
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid2Win : gLite for Microsoft Windows Roberto.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Command Line Grid Programming Spiros Spirou Greek Application Support Team NCSR “Demokritos”
INFSO-RI Enabling Grids for E-sciencE EGEE is a project funded by the European Union under contract IST Job sandboxes.
INFSO-RI Enabling Grids for E-sciencE Job Description Language (JDL) Giuseppe La Rocca INFN First gLite tutorial on GILDA Catania,
EGEE is a project funded by the European Union under contract IST Job Description Language – How to control your Job Nadav Grossaug IsraGrid.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Practical using WMProxy advanced job submission.
EGEE 3 rd conference - Athens – 20/04/2005 CREAM JDL vs JSDL Massimo Sgaravatto INFN - Padova.
Biomed tutorial 1 Enabling Grids for E-sciencE INFSO-RI EGEE is a project funded by the European Union under contract IST JDL Flavia.
LCG2 Tutorial Viet Tran Institute of Informatics Slovakia.
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
Introduction to Computing Element HsiKai Wang Academia Sinica Grid Computing Center, Taiwan.
Introduction to Job Description Language (JDL) Alessandro Costa INAF Catania Corso di Calcolo Parallelo Grid Computing Catania - ITALY September.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
Workload Management System on gLite middleware
Workload Management System ( WMS )
EGEE tutorial, Job Description Language - more control over your Job Assaf Gottlieb Tel-Aviv University EGEE is a project.
Introduction to Grid Technology
Workload Management System
gLite Job Management Mario Reale GARR
The gLite Workload Management System
Job Description Language
Presentation transcript:

EGEE-II INFSO-RI Enabling Grids for E-sciencE Architecture of the WMS Yaodong Cheng CC-IHEP, Chinese Academy of Sciences The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, 2008

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 outline glite WMS overview Workload architecture and components –Workload components  CE –Job states –WMproxy –Job Description Language References

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 ReplicaCatalogue Logging & Book-keepingStorageElementComputingElement InformationService Job Status Author. &Authen. Job Submit Event Job Query Job Status Input “sandbox” + Broker Info Output “sandbox” Publish SE & CE info “User interface” Workload Management System DataSets info Input “sandbox” gLite components

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 WMS functionality The Workload Management SystemThe Workload Management System (WMS) is the gLite component responsible for the management of user’s jobs : their – submission – scheduling – execution – status monitoring – output retrieval Its core component is the Workload Manager (WM) The WM handles the requests for job management coming from the WMS clients – The submission request hands over the responsibility of the job to the WM.  WM will dispatch the job to an appropriate Computing Element for execution taking into account requirements and the preferences expressed in the job description (JDL file) match-makingThe choice of the best matching resource to be used is the outcome of the so called match-making process.

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 Current WMS architecture gLite provides modules for the following workload-related components: UI ( all major gLite client components to the system ) WMS node ( scheduling on the grid, match-making, complete job management ) –gLite WMS –LCG RB LB server ( logging and bookkeeping ) Computing Element ( access point to a pool of resources) –gLite CE –LCG CE Worker Node ( the actual execution host in a given cluster ) Torque/Maui LRMS ( local scheduler & job management for the CE and the WNs – Also LSF interfaced) The StorageIndex interface ( to data catalogs ) MyProxy server ( user proxy renewal ) VOMS [ security ] : auth / authZ BD-II ( LCG )

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 gLite WMS WMS client UI RGMA,BDIIs client I/O clients LFC clients Globus clients UI WMS CE LB WN 1 WN 2 WN 3 WN 4 Network Server WMproxy Workload Manager (WM) Local Logger Job Controller Condor-C master schedd collector negotiator launcher advertiser Globus gatekeeper Condor-C master schedd BLAHPD LRMS ( PBS serv,sched LSF serv ) Grid FTP Proxy renewd Log Monitor PBS mom PBS mom PBS mom PBS mom logd interlogger Bookkeeping srv SEIndex BD-II CEmon based on CONDOR-C VOMS LFC File Catalog

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 LCG RB WMS client UI RGMA,BDIIs client LFC clients Globus clients UI WMS CE LB WN 1 WN 2 WN 3 WN 4 Network Server Resource Broker WM Local Logger Job Controller Condor-G Globus GRAM Globus gatekeeper Globus GRAM, Globus JobManager (fork,pbs,lsf) LRMS ( PBS serv,sched LSF serv ) Grid FTP Proxy renewd Log Monitor PBS mom PBS mom PBS mom PBS mom logd interlogger File Catalogs Bookkeeping srv BD-II VOMS LFC File Catalog based on CONDOR-G Globus GRAM

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 The Architecture of WMS

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 Job management requests (submission, cancellation) expressed via a Job Description Language (JDL)

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 WMS’s Architecture Keeps submission Requests Requests are kept for a while, waiting for for a while, waiting for being dispatched If there is no matching resource available

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 WMS’s Architecture Repository of resource information information Updated via notifications and/or active polling on sources Provide matchmaker With information to decide best resources for request.

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 WMS’s Architecture Finds an appropriate CE or resource for job request according to the information from ISM. Taking into account job preferences, resource status, policies on resources

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 WMS’s Architecture Performs the actual job submission and monitoring Normally it is Condor.

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 WMS’s Architecture Computing Element is the place where you jobs run

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 Access points to workload: NS and WMproxy The Network Server (NS) is a generic network daemon providing support for the job control functionality. It is responsible for accepting incoming requests from the WMS-UI (e.g. job submission, job removal), which, if valid, are then passed to the Workload Manager. The Workload Manager Proxy (WMProxy) is a service providing access to WMS functionality through a Web Services based interface. Besides being the natural replacement of the NS in the passage to the SOA approach for the WMS architecture, it provides additional features such as bulk submission and the support for shared and compressed sandboxes for compound jobs.

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 WMS components Network Server NS (Old service) –It ‘s a generic daemon accepting requests from the UserInterface and verifying the user’s credentials Workload Manager Proxy WMProxy (New service) –Provides access to WMS functionality through a Web Services based interface –Each job submitted to a WMProxy Service is given the delegated credentials of the user who submitted it. –These credentials can then be used to perform operations requiring interactions with other services –WMProxy advantages:  web service, SOAP  job collections, DAG jobs, shared and compressed  sandboxes –WMProxy caveats:  needs delegated credentials  Delegate once,submit many

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 WMS components (cont.) Workload Manager (WM) –Is responsible for  Calls Matchmaker to find the resource which best matches the job requirements.  Interacting with Information System and File catalog.  Calculates the ranking of all the matchmaked resource Information Supermarket (ISM) –is responsible for  basically consists of a repository of resource information that is available in read only mode to the matchmaking engine Job Adapter –is responsible for  making the final touches to the JDL expression for a job, before it is passed to CondorC for the actual submission  creating the job wrapper script that creates the appropriate execution environment in the CE worker node transfer of the input and of the output sandboxes

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 WMS components (cont.) Job Controller (JC) –Is responsible for  Converts the condor submit file into ClassAd  hands over the job to CondorC Condor –responsible for  performing the actual job management operations: job submission, removal  DAG Manager (DAGMan) –It is a meta-scheduler whose purpose is to navigate the GRAPH (DAG) determine dependencies and follow the execution of the corresponding jobs Log Monitor –is responsible for  watching the Condor log file  intercepting interesting events concerning active jobs events affecting the job state machine  triggering appropriate actions.

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 Task Queue and Scheduling policies –Task Queue  Gives the possibility to keep track of the requests if no resources are immediatelly avalaible  Non-matching requests will be retried periodically (eager scheduling)  Or wait for notification of avalaible resources (lazy scheduling) –eager scheduling (“push” model)  a job is bound to a resource as soon as possible. Once the decision has been taken, the job is handed over to the selected resource for execution. –lazy scheduling (“pull” model)  the job is held by the WM until a resource becomes available. When this happens the resource is matched against the submitted job.

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 gLIte WMS server architecture ismdump.fl file available resources WMS configuration file $GLITE_LOCATION/etc/glite_wms.conf NS WM JC accepts /istantiate connections to / from UI check user authorization forwards requests to the WorkLoad Manager accepts requests from the NS performs match-making submits classAds-based job requests handles submission, job management via the job controller UI starts all job-related CONDOR daemons hands over the job to Condor CONDOR – C CE CEmon contact BD-II contact

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 WMS -> CE Computing Element is built on a homogeneous farm of computing nodes (called Worker Nodes) Also there are many components inside CE such as gatekeeper, globus-jobmanager,..

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 Gatekeeper Grants access to the CE and map grid user to a local user id.

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 Batch System A cluster of compute nodes controlled by a head node. handles the job execution Example: Torque (Open PBS), PBS

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 A typical case of glite-enabled grid Many CE in glite-enabled grid Few WMS coordinating the CEs and broker jobs to proper CEs.

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 Computing Element Components Gatekeeper –Grants access to the CE.. Authenticate users and map users to local accounts. – forks the globus-jobmanager. globus-jobmanager –Fork Condor-C (in CE) to help submit jobs to batch systems. BLAPHD (Batch Local ASCII Helper Protocol Daemon) –Offer an unique interface for condor-c(in CE) to submit jobs to different batch systems – BLAPHD commands is used by Condor-C (in CE) to submit jobs to the batch system. Batch System –handles the job execution on the available local worker nodes. –Batch System consists of:  torque (formerly known as OpenPBS) resource manager.  maui job scheduler. A cluster MUST be homogeneous. Worker nodes –It is the host executing the jobs. –Also responsible for downloading and uploading jobs’ data from or to WMS or SE.

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 The current gLite CE LSF PBS/ Torque Condor Gatekeeper LCAS LCMAPS WSS CEMon Condor-CBlahpd Notificat ions Launch Condor-C Submit job Local batch system CE

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 WMS - user interaction: JDL file Write your Job Description File (JDL file) (classAds ) Type = “Job”; JobType = “Normal”; Executable = “/bin/bash”; Arguments = “mySimulationShellScritp.sh”; StdInput = “stdin”; StdOutput = “stdout”; StdError = “stderr”; InputSandbox = {“mySimulationShellScritp.sh”,“stdin”,“data-card- 1.file”,”data-card-2.file”}; OutputSandbox = {“stderr”, “stdout”,“outputfile1.data”,”histos.zebra”}; Environment = {“JOB_LOG_FILE=/tmp/myJob.log”}; Requirements = Member(“EGEE-preprod ”,other.GlueHostApplicationSoftwareRunTimeEnvironment);

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 How to deal with your job Create a proxy : voms-proxy-init --voms egtest Make sure you see available CEs matching it –edg-job-list-match myVeryFirstJob.jdl –glite-job-list-match myVeryFirstJob.jdl Submit your JDL to the WMS / network server –edg-job-submit myVeryFirstJob.jdl –glite-job-submit myVeryFirstJob.jdl Query the job status –edg-job-status –glite-job-status Get the job’s output – edg-job-get-output – glite-job-output

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 What happens when we submit a job to the gLite WMS server A JDL file is defined on the UI A proxy-file is created by the user starting from his certificates using VOMS ( voms-proxy-init ) The JDL gets submitted to the WMS (NS) –A job-wrapper is created on the UI and transferred to the WMS node, including the user’s proxy –The Network Server checks authorization and then forwards the job to the Workload Manager on the same machine (WMS)

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 a User submits a job…. The WM performs the match-making matching the available resources stored in the ISM and the classAds describing the requirements of the job Hand off to Condor-C : all major required condor processes started by the Job Controller –The corresponding user ‘s condor schedd on the target destination CE has to be started : –This is done using the Globus gatekeeper and jobmanager fork running on that matching CE The ball is in Condor’s court now : condor to condor job management ( Condor-c )

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 What happens when we submit a job to the LCB RB A JDL file is defined on the UI A proxy-file is created by the user starting from his certificates using VOMS ( voms-proxy-init ) The JDL gets submitted to the WMS (NS) –A job-wrapper is created on the UI and transferred to the WMS node, including the user’s proxy –The Network Server checks authorization and then forwards the job to Resource Broker

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 User submits a job… The RB performs the match-making matching the available resources stored in the Information System (BDII) and the classAds describing the requirements of the job –there is no local cache of the IS on the LCG RB Hand off to Condor-G : –Globus GRAM is used to handle job and proxy –Through the Globus gatekeeper the Globus jobmanager (usually PBS or LSF) is accessed on the destination CE

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 Job State Machine

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 Job State Machine Submitted job is entered by the user to the User Interface but not yet transferred to Network Server for processing

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 Job State Machine Waiting job accepted by NS and waiting for Workload Manager processing or being processed by WMHelper modules.

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 Job State Machine Ready job processed by WM but not yet transferred to the CE (local batch system queue).

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 Job State Machine Scheduled job waiting in the queue on the CE.

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 Job State Machine Running job is running.

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 Job State Machine Done job exited or considered to be in a terminal state by CondorC (e.g., submission to CE has failed in an unrecoverable way).

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 Job State Machine Aborted job processing was aborted by WMS (waiting in the WM queue or CE for too long, expiration of user credentials).

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 Job State Machine Cancelled job has been successfully canceled on user request.

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 Job State Machine Cleared output sandbox was transferred to the user or removed due to the timeout.

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 Possible job states Flag Meaning SUBMITTEDsubmission logged in the LB WAITjob match making for resources READYjob being sent to executing CE SCHEDULEDjob scheduled in the CE queue manager RUNNING job executing on a WN of the selected CE queue DONEjob terminated without grid errors CLEAREDjob output retrieved ABORT job aborted by middleware, check reason

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 WMS client tools The most relevant commands to interact with the WMS (NS): –edg-job-submit –edg-job-list-match –edg-job-status –edg-job-get-output –edg-job-cancel In gLite 3.0: –glite-job-submit –glite-job-list-match –glite-job-status –glite-job-output –glite-job-cancel

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 gLite WMProxy WMProxy (Workload Manager Proxy) –is a new service providing access to the gLite Workload Management System (WMS) functionality through a simple Web Services based interface. –has been designed to handle a large number of requests for job submission  gLite 1.5 => ~180 secs for 500 jobs  goal is to get in the short term to ~60 secs for 1000 jobs –it provides additional features such as bulk submission and the support for shared and compressed sandboxes for compound jobs. –It’s the natural replacement of the NS in the passage to the SOA approach.

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 WMproxy: New request types Support for new types strongly relies on newly developed JDL converters and on the DAG submission support –all JDL conversions are performed on the server –a single submission for several jobs All new request types can be monitored and controlled through a single handle (the request id) –each sub-jobs can be however followed-up and controlled independently through its own id “Smarter” WMS client commands/API –allow submission of DAGs, collections and parametric jobs exploiting the concept of “shared sandbox” –allow automatic generation and submission of collections and DAGs from sets of JDL files located in user specified directories on the UI

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 WMProxy CLI commands The commands to interact with WMProxy Service are: glite-wms-job-submit glite-wms-job-list-match glite-wms-job-cancel glite-wms-job-output

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 WMproxy: submitting a collection of jobs Place all JLDs to be submitted in a directory ( for example./Collect) voms-proxy-init --voms gilda glite-wms-job-delegate-proxy –d DelegString glite-job-submit –d DelegString –o myJIDs --collection./Collect glite-wms-job-status -i myJIDs glite-wms-job-output –i myJIDs

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 Job Description Language

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 Job Description Language match-making process The JDL is used in gLite to specify the job’s characteristics and constrains, which are used during the match-making process to select the best resources that satisfy job’s requirements.

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 Job Description Language (cont.) JDL syntax The JDL syntax consists on statements like: Attribute = value; Comments must be preceded by a sharp character # ( # ) or have to follow the C++ syntax WARNING: The JDL is sensitive to blank characters and tabs. No blank characters or tabs should follow the semicolon at the end of a line.

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 Job Description Language (cont.) In a JDL, some attributes are mandatory while others are optional. An “essential” JDL is the following: Executable = “test.sh”; StdOutput = “std.out”; StdError = “std.err”; InputSandbox = {“test.sh”}; OutputSandbox = {“std.out”,”std.err”}; If needed, arguments to the executable can be passed: Arguments = “Hello World!”;

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 Job Description Language (cont.) If the argument contains quoted strings, the quotes must be escaped with a backslash e.g. Arguments = “\”Hello World!\“ 10”; Special characters such as &, |, >, < are only allowed if specified inside a quoted string or preceded by triple \ (e.g. Arguments = "-f file1\\\&file2";)

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 Workload Manager Service The JDL allows the description of the following request types supported by the WMS: Job: a simple application DAG: a direct acyclic graph of dependent jobs With WMSProxy Collection: a set of independent jobs With WMSProxy

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 JDL: Relevant Attributes JobType JobType (optional) Normal (simple, sequential job), Interactive, MPICH, Checkpointable, Partitionable, Parametric Or combination of them Checkpointable, Interactive Checkpointable, MPI JobType = “Normal”; E.g. JobType = “Normal”;

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 JDL: Relevant Attributes (cont.) Executable Executable (mandatory) This is a string representing the executable/command name. The user can specify an executable which is already on the remote CE Executable = {“/opt/EGEODE/GCT/egeode.sh”}; The user can provide a local executable name, which will be staged from the UI to the WN. Executable = {“egeode.sh”}; InputSandbox = {“/home/larocca/egeode/egeode.sh”}; InputSandbox = {“/home/larocca/egeode/egeode.sh”};

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 JDL: Relevant Attributes (cont.) Arguments Arguments (optional) This is a string containing all the job command line arguments. E.g.: If your executable sum has to be started as: $ sum N1 N2 –out result.out Executable = “sum”; Executable = “sum”; Arguments = “N1 N2 –out result.out”; Arguments = “N1 N2 –out result.out”;

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 JDL: Relevant Attributes (cont.) Environment Environment (optional) List of environment settings needed by the job to run properly Environment = {“JAVA_HOME=/usr/java/j2sdk1.4.2_08”}; E.g. Environment = {“JAVA_HOME=/usr/java/j2sdk1.4.2_08”}; InputSandbox InputSandbox (optional) List of files on the UI local disk needed by the job for proper running The listed files will be automatically staged to the remote resource InputSandbox ={“myscript.sh”,”/tmp/cc.sh”}; E.g. InputSandbox ={“myscript.sh”,”/tmp/cc.sh”};

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 JDL: Relevant Attributes (cont.) OutputSandbox OutputSandbox (optional) List of files, generated by the job, which have to be retrieved from the CE OutputSandbox ={ “std.out”,”std.err”, “image.png”}; – E.g. OutputSandbox ={ “std.out”,”std.err”, “image.png”};

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 JDL: Relevant Attributes (cont.) Requirements Requirements (optional) Job requirements on computing resources Specified using attributes of resources published in the Information Service If not specified, default value defined in UI configuration file is considered Requirements = other.GlueCEStateStatus == "Production“; Default. Requirements = other.GlueCEStateStatus == "Production“; Requirements=other.GlueCEUniqueID == “adc006.cern.ch:2119/jobmanager- pbs-infinite” – Requirements=Member(“ALICE ”, other.GlueHostApplicationSoftwareRunTimeEnvironment);

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 References JDL Attributes 0_2.pdf wm/api_doc/wms_jdl/index.html LCG-2 User Guide Manual Series

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 References gLite 3.0 User Guide – GLUE Schema – JDL attributes specification for WM proxy – WMProxy quickstart – WMS user guides – EGEE gLite LCG Open Grid Forum Globus Alliance VDT

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /63 Questions…