Download presentation
Presentation is loading. Please wait.
Published byCarter Murray Modified over 11 years ago
1
Generic MPI Job Submission by the P-GRADE Grid Portal Zoltán Farkas (zfarkas@sztaki.hu) MTA SZTAKI
2
Contents MPI Standards Implementations P-GRADE Grid Portal Workflow execution, file handling Direct job submission Brokered job submission
3
MPI MPI stands for M essage P assing I nterface Standards 1.1 and 2.0 MPI Standard features: Collective communication (1.1+) Point-to-Point communication (1.1+) Group management (1.1+) Dynamic Processes (2.0) Programming Language APIs …
4
MPI Implementations MPICH Freely available implementation of MPI Runs on many architectures (even on Windows) Implements Standards 1.1 (MPICH) and 2.0 (MPICH2) Supports Globus (MPICH-G2) Nodes are allocated upon application execution LAM/MPI Open-source implementation of MPI Implements Standards 1.1 and parts of 2.0 Many interesting features (checkpoint) Nodes are allocated before application execution Open MPI Implements Standard 2.0 Uses technologies of other projects
5
MPICH execution on x86 clusters Application can be started … … using mpirun … specifying: number of requested nodes (-np ), a file containing the nodes to be allocated (-machinefile ) [OPTIONAL], the executable, executable arguments. $ mpirun –np 7./cummu –N –M –p 32 Processes are spawned using rsh or ssh, depending on the configuration
6
MPICH x86 execution – requirements Executable (and input files) must be present on worker nodes: Using Shared Filesystem, or User distributes the files before invoking mpirun. Accessing worker nodes from the host running mpirun: Using rsh or ssh Without user interaction (host-based authentication)
7
P-GRADE Grid Portal Technologies: – Apache Tomcat – GridSphere – Java Web Start – Condor – Globus – EGEE Middleware – Scripts
8
P-GRADE Grid Portal Workflow execution: DAGMan as workflow scheduler pre and post script to perform tasks around job exeution Direct job execution using GT-2: GridFTP, GRAM pre: create temporary storage directory, copy input files job: Condor-G is executing a wrapper script post: download results Job execution using EGEE broker (both LCG/gLite): pre: create application context as input sandbox job: Scheduler universe Condor job executing a script, which does job submission, status polling, output downloading. A wrapper script is submitted to the broker post: error checking
9
Workflow Manager Portlet
10
Workflow example Jobs Input/output files Data transfers
11
Portal: File handling Local files: – User has access to these files through the Portal – Local input files are uploaded from the user machine – Local output files are downloaded to the user machine Remote files: – Files reside on EGEE Storage Elements or are accessible using GridFTP – EGEE SE files: lfn:/… guid:… – GridFTP files: gsiftp://…
12
Workflow Files File Types – In/Out – Local/Remote File Names – Internal – Global File Lifetime – Permanent – Volatile
13
Portal: Direct job execution The resource to be used is known before job execution The user must have a valid, accepted certificate Local files are supported Remote GridFTP files are supported, even in case of grid-unaware applications Jobs may be sequential or MPI applications
14
Direct exec: step-by-step I. 1.Pre script: creates a storage directory on the selected sites front-end node, using the fork jobmanager local input files are copied to this directory from the Portal machine using GridFTP remote input files are copied using GridFTP (in case of errors, a two-phase copy is tried using Portal machine) 1.Condor-G job: a wrapper script (wrapper p ) is specified as the real executable a single job is submitted to the requested jobmanager, for MPI jobs the hostcount RSL attribute is used to specify the number of requested nodes
15
Direct exec: step-by-step II. 1.LRSM: allocate the number of requested nodes (if needed) start wrapper p on one of the allocated nodes (master worker node) 2.Wrapper p (running on master worker node): copies the executable and input files from the front-end node (scp or rcp) in case of PBS jobmanagers, executable and input files are copied to the allocated nodes (PBS_NODEFILE). In case of non- PBS jobmanagers, shared filesystem is required, as the host names of the allocated nodes cannot be determined wrapper p searches for mpirun the real executable is started using the found mpirun in case of PBS jobmanagers, output files are copied from the allocated worker nodes to the master worker node) output files are copied to the front-end node
16
Direct exec: step-by-step III. 1.Post script: local output files are copied from the temporary working directory created by the pre script to the Portal machine using GridFTP remote output files are copied using GridFTP (in case of errors, a two-phase copy is tried using Portal machine) 2.DAGMan: keeps on job scheduling…
17
Direct execution Remote file storage Portal machine Temp. Storage 1 1 ForkGridFTP PBS 2 Master WN Wrapper p Slave WN 1 Slave WN n-1 3 In/exe mpirun Output Executable In/exe Output Executable In/exe Output Executable 4 4 5 5
18
Direct Submission Summary Pros: – Users can add remote file support to legacy applications – Works for both sequential and MPI(CH) applications – For PBS jobmanagers, there is no need to have a shared filesystem (support for other jobmanagers can be added, depends on informations provided by jobmanagers) – Works in case of jobmanagers, which do not support MPI – Faster, than submitting with the broker Cons: – user needs to specify the execution resource – currently doesnt work on non-PBS jobmanagers without shared filesystems
19
Portal: Brokered job submission EGEE Resource Broker is used The resource to be used is unknown before job execution The user must have a valid, accepted certificate Local files are supported Remote files residing on Storage Elements are supported, even in case of grid-unaware applications Jobs may be sequential or MPI applications
20
Broker exec: step-by-step I. 1.Pre script: creates the Scheduler universe Condor submit file 1.Scheduler Universe Condor job: the job is a shell script the script is responsible for: job submission: a wrapper script (wrapper rb ) is specified as the real executable in the JDL file job status polling job output downloading
21
Broker exec: step-by-step II. 1.Resource Broker: handles requests of the Scheduler universe Condor job sends the job to a CE watches its exeution reports errors … 2.LRMS on CE: allocates the requested number of nodes starts wrapper rb on the master worker node using mpirun
22
Broker exec: step-by-step III. 1.Wrapper rb : the script is started by mpirun, so this script starts on every allocated worker node like an MPICH process checks if remote input files are already present. If not, they are downloaded from the storage element if the user specified any remote output files, they are removed from the storage the real executable is started with the arguments passed to the script. These arguments already contain MPICH-specific ones after the executable has been finished, remote output files are uploaded to the storage element (only in case of gLite) 2.Post script: nothing special…
23
Broker execution Portal Machine Resource Broker Front-end node Globus PBS Master WN … mpirun wrapper rb Real exe Slave WN 1 wrapper rb Real exe Slave WN n-1 wrapper rb Real exe Storage Element 2 3 4 5 5 5
24
Broker Submission Summary Pros: – adds support for remote file handling in case of legacy applications – extends the functionality of the EGEE broker – one solution supports both sequential and MPI applications Cons: – slow application execution – status polling generates high load with 500+ jobs
25
Experimental results Tested some selected SEEGRID CEs using the broker from command line and the direct job submission from P-GRADE Portal with a job requesting 3 nodes OKFailed (exe not found)grid2.cs.bilkent.edu.tr OKFailedgrid01.rcub.bg.ac.yu (?) OKFailedcluster1.csk.kg.ac.yu OK ce02.grid.acad.bg Failed (job held)OKce01.isabella.grnet.gr OKScheduledce.ulakbim.gov.tr OKFailed (exe not found)ce.phy.bg.ac.yu Portal Direct ResultBroker ResultCE Name
26
Thank you for your attention ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.