Download presentation
Presentation is loading. Please wait.
1
Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes
2
NIC CLUSTER OVERVIEW Start page: http://hpc.mst.edu/http://hpc.mst.edu/ Node Allocation and Usage policy: http://hpc.mst.edu/accessandpolicies/ http://hpc.mst.edu/accessandpolicies/ The Shared NIC Cluster Hardware and Software The NIC cluster had 64-bit nodes (http://hpc.mst.edu/hardware/) with an Ethernet network and eventually Infiniband interconnect, with the following standard software suite:http://hpc.mst.edu/hardware/ The Torque/PBS scheduler. Compilers: GCC, Intel-9, Intel-10, and Intel-11 Compiler Suites. Applications and Libraries listed at http://hpc.mst.edu/applications/http://hpc.mst.edu/applications/ InfiniBand offers point-to-point bidirectional serial links intended for the connection of processors with high-speed peripherals such as disks. InfiniBand also offers multicast operations.serial links
3
Cluster pictures
4
PBS Job Scripts NIC cluster uses PBS (Portable Batch System) Why? Improves overall system efficiency Fair access to all users since it maintains a scheduling policy Provides protection against dead nodes
5
How PBS works User writes a batch script for the job and submits it to PBS with the qsub command. PBS places the job into a queue based on its resource requests and runs the job when those resources become available. The job runs until it either completes or exceeds one of its resource request limits. PBS copies the job’s output into the directory from which the job was submitted and optionally notifies the user via email that the job has ended.
6
Step 1: Login Off-campus machine: Connect to campus using MST VPNusing MST VPN > ssh nic.mst.edu On-campus machine (VPN is not required) > ssh nic.mst.edu Use the following to set up your MPI path correctly. $ module load openmpi/gnu To make this your default run $ savemodules Visit http://hpc.mst.edu/examples/openmpi/c/http://hpc.mst.edu/examples/openmpi/c/ to get information about OpenMP
7
DFS files are not directly accessible at the cluster Use sftp command to transfer any files from your DFS space (S: drive) e.g.> sftp minersftp.mst.edu > get X.c > quit You may also use WinSCP in Windows or Fugu from OS X. No DFS Support
8
Step 2: Compile MPICH Programs Syntax: C: mpicc –o hello hello.c C++: mpiCC –o hello hello.cpp Note: Before compilation, make sure the MPICH library path is set or use the export command like below: export PATH=/opt/mpich/gnu/bin: $PATH Executable file
9
Step 3: Write PBS batch script file Ex1: A simple script file (pbs_script) A job named “HELLO” requests 8 nodes and at most 15 minutes of runtime. #!/bin/bash #PBS –N HELLO #PBS –l walltime=0:15:00 #PBS –l nodes=8 #PBS –q @nic-cluster.mst.edu mpirun –n 8 /nethome/users/ercal/MPI/hello
10
Some PBS Directive options -N jobname (name the job “jobname”) -q @nic-cluster.mst.edu (The cluster address to send the job to) -e errfile (redirect standard error to a file named errfile) -o outfile (redirect standard output to a file named outfile) -j oe (combine standard output and standard error) -l walltime=N (request a walltime of N in the form hh:mm:ss) -l cput=N (request N sec of CPU time; or in the form hh:mm:ss) -l mem=N[KMG][BW] (request total N kilo| mega| giga} {bytes| words} of memory on all requested processors together) -l nodes=N:ppn=M (request N nodes with M processors per node)
11
Step 3.1: Submit a Job Use PBS command qsub Syntax : qsub pbs-job-filename Example : > qsub pbs_script returns the message 555.nic-p1.srv.mst.edu (555 is the job ID that PBS automatically assigns to your job)
12
Result after job completion An error file and an output file are created. The names are usually of the form: jobfilename.o(jobid) jobfilename.e(jobid) Ex:simplejob.e555 – Contains STDERR simplejob.o555 – Contains STDOUT -j oe (combine standard output and standard error)
13
Ex2: Another sample batch script (pbs_script) #PBS -N hello #PBS -l mem=200mb #PBS -l walltime=0:15:00 #PBS -l nodes=2:ppn=2 #PBS -j oe #PBS –m abe mpirun –n 8 /nethome/users/ercal/MPI/hello This job “hello” requests 15 minutes of wall-time, 2 nodes using 2 processors each (4 processors), and 200MB of memory (100MB per node; 50MB per processor). Also, the output and error are written to one file. How many processes are created?
14
Tools qstat (jobid) qstat –u (userid) qstat -a This command returns the status of specified job(s). qdel (jobid) This command deletes a job. size executable_file_name gives O/P in the following format: text data bss dec hex filename 1650312 71928 6044136 7766376 768168 hello (This can help to check memory requirements before submitting a job)
15
Tips for programming in MPICH Use compiler optimizing flags for faster code. Some of them are: -O2 (moderate optimization) -funroll-loops (enables loop unrolling optimizations) -Wall (enables all common warnings) -ansi (enables ANSI C/C++ compliance) -pedantic (enables strictness of language compliance) Avoid using pointers in your program, unless absolutely necessary.
16
No scanf allowed in MPICH (fscanf is allowed). Instead, pass input to your program using command line arguments argc and argv in the mpiexec command of the PBS script Example: mpirun –n 8 /nethome/users/ercal/MPI/cpi …arguments… To give your job a high priority, set wall-time ≤ 15 minutes (#PBS -l walltime=0:15:00) and number of nodes ≤ 32 (#PBS -l nodes=16:ppn=2) Tips (cont.)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.