Using Paraguin to Create Parallel Programs Assignment 2 Using Paraguin to Create Parallel Programs
Cluster at UNCW Submit Host: babbage Head Node: harpua User Computers Dedicated Cluster Ethernet interface Master node Submit Host: babbage Switch Head Node: harpua Compute nodes Compute Nodes: compute-0-0, compute-0-1, compute-0-2, … 9/4/2012
Cluster at UNCW We use the Sun Grid Engine (SGE) to schedule jobs on the cluster This is to allow users to have exclusive use of the compute nodes so that users’ applications don’t interfere with the performance of others The scheduler (SGE) is responsible for allocating compute nodes to jobs exclusively Compile as normal: $ mpicc hello.c –o hello 9/4/2012
SGE But running is done through a job submission file Some SGE commands: qsub <job submission file> – submits a job to the schedule to run qstat – see the status of submitted jobs (waiting, queued, running, terminated, etc.) qdel <#> - deletes a job (by number) from the system qhost – see a list of hosts 9/4/2012
SGE Example job submission file (hello.sge): #!/bin/sh # Usage: qsub hello.sge #$ -S /bin/sh #$ -pe orte 16 # Specify how many processors we want # -- our name --- #$ -N Hello # Name for the job #$ -l h_rt=00:01:00 # Request 1 minute to execute #$ -cwd # Make sure that the .e and .o file arrive in the working directory #$ -j y # Merge the standard out and standard error to one file mpirun -np $NSLOTS ./hello 9/4/2012
SGE Example job submission file (hello.sge): #!/bin/sh # Usage: qsub hello.sge #$ -S /bin/sh #$ -pe orte 16 # Specify how many processors we want 9/4/2012
SGE Example job submission file (hello.sge): # -- our name --- #$ -N Hello # Name for the job #$ -l h_rt=00:01:00 # Request 1 minute to execute The name of the job plus the name of the output files: Hello.o### and Hello.op### Indicates that the job will need only a minute. This is important so that SGE will clean up if the program hangs or terminates incorrectly. May need to increase the time for longer programs or it will terminate the program before it has completed. 9/4/2012
SGE Example job submission file (hello.sge): #$ -cwd # Make sure that the .e and .o file arrive in the working directory #$ -j y # Merge the standard out and standard error to one file Do the job in the current directory SGE will create 3 files: Hello.o##, Hello.e##, and Hello.op##. The –j y command will merge the Hello.o and Hello.e files (std out and error). 9/4/2012
SGE Example job submission file (hello.sge): mpirun -np $NSLOTS ./hello And finally the command to run the MPI program. $NSLOTS is the same number given with the #$ -pe orte 16 line. 9/4/2012
SGE Example $ qstat $ qsub hello.sge Your job 106 ("Hello") has been submitted job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 106 0.00000 Hello cferner qw 09/04/2012 09:08:38 16 $ The state of “qw” means queued and waiting. 9/4/2012
SGE Example $ qstat job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 106 0.55500 Hello cferner r 09/04/2012 09:11:43 all.q@compute-0-0.local 16 [cferner@babbage mpi_assign]$ The state of “r” means running 9/4/2012
SGE Example $ ls hello hello.c Hello.o106 Hello.po106 hello.sge ring ring.c ring.sge test test.c test.sge $ cat Hello.o106 Hello world from master process 0 running on compute-0-2.local Message from process = 1 : Hello world from process 1 running on compute-0-2.local Message from process = 2 : Hello world from process 2 running on compute-0-2.local … You will want to clean up the output files when you are done with them or you will end up with a bunch of clutter. 9/4/2012
Deleting a job $ qstat job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 108 0.00000 Hello cferner qw 09/04/2012 09:18:20 16 $ qdel 108 cferner has registered the job 108 for deletion $ 9/4/2012
Assignment 2 Setup (Do this only once) Put these lines in the file .bash_profile export MACHINE=x86_64-redhat-linux export SUIFHOME=/share/apps/suifhome export COMPILER_NAME=gcc `perl $SUIFHOME/setup_suif -sh` Run the command: $ . .bash_profile Notice the 2 periods and the space between them
Hello World Program Program is given to you You simply need to compile it and run it (using a job submission file) Try running it on my processors Produce documentation of compiling and running the program
Matrix Multiplication Matrix Multiplication skeleton program is given to you in Appendix Includes: Opening the input file Reading the input Taking a time stamp Taking a 2nd time stamp Computing the elapsed time between the time stamps Printing the results
Matrix Multiplication You need to: Broadcast the error to the processors and exit in necessary Scatter the input Compute the partial results Gather the partial results
Heat Distribution Using the stencil pattern, model the distribution of heat in a room that has a fireplace along one wall
Heat Distribution The newly computed values will be the average of its neighbors (diagonals also) as well as its own old value So each value at location i,j should be the average of 9 values This reduces oscillations
Producing a Visual of the Output Produced with X11 Graphics Produced with Excel
Producing a Visual of the Output See the document http://coitweb.uncc.edu/~abw/ITCS4145F13/As signments/X11GraphicsNotes.pdf for help with creating graphics using X11. The Excel Graph is a surface plot
Monte Carlo Estimation of π (required for Graduates/optional for Undergraduates) Scatter/Gather pattern, but uses broadcast and reduce This is not a workflow pattern π can also be estimated by integrating the function , but you aren’t asked to do this.
Questions?