Download presentation
Published byEleanore Watts Modified over 9 years ago
1
Using Parallel Computing Resources at Marquette
2
HPC Resources Local Resources Regional Resources National Resources
HPCL Cluster hpcl.mscs.mu.edu PARIO Cluster pario.eng.mu.edu PERE Cluster pere.marquette.edu MU Grid Regional Resources Milwaukee Institute SeWhip National Resources NCSA ANL TeraGrid Resources Commercial Resources Amazon EC2
3
Pere Cluster 128 HP ProLiant BL280c G6 Server Blade To MARQNET
1024 Intel Xeon 5550 Cores (Nehalem) 50 TB raw storage 3 TB main memory To MARQNET Gigabit Ethernet Interconnection Head Node Infiniband Interconnection Compute Node #1 /16 /16 Compute Node #2 Compute Node #3 Compute Node #128
4
Steps to Run A Parallel Code
Get the source code You can do it either on your local computer and then transfer to hpcl.mscs.mu.edu, or Use vi to edit a new one on hpcl.mscs.mu.edu Compile your source code using mpicc, mpicxx or mpif77 Write a submission script for your job vi myscript.sh Use qsub to submit the script. qsub myscript.sh
5
Getting Parallel Code hello.c
You can write the code on your development machine using IDE and then transfer the code to the cluster. (Recommended) For small code, you can also directly edit it on the cluster.
6
Transfer File to Cluster
Method 1: sftp (text or GUI) sftp put simple.c bye Method 2: scp scp simple.c Method 3: rsync rsync -rsh=ssh -av example \
7
Compile MPI Programs Method 1: Using MPI compiler wrappers
mpicc: for c code mpicxx/mpic++/mpiCC: for c++ code mpif77, mpif90: for FORTRAN code Examples: mpicc –o hello hello.c mpif90 –o hello hello.f
8
Compile MPI Programs (cont.)
Method 2: Using standard compilers with mpi library Note: MPI is just a library, so you can link the library to your code to get the executables. Examples: gcc -o ping ping.c \ -I/usr/mpi/gcc/openmpi-1.2.8/include \ -L/usr/mpi/gcc/openmpi-1.2.8/lib64 -lmpi
9
Compiling Parallel Code – Using Makefile
10
Job Scheduler A kind of software that provide
Job submission and automatic execution Job monitoring and control Resource management Priority management Checkpoint …. Usually implemented as master/slave architecture Commonly used Job Schedulers PBS: PBS Pro/TORQUE SGE (Sun Grid Engine, Oracle) LSF (Platform Computing) Condor (UW Madison)
11
Access the Pere Cluster
ssh Account management Based on Active Directory, you use the same username and password to login Pere as the one you are using for your Marquette . Need your professor to help you sign up. Transfer files from/to Pere
12
Modules The Modules package is used to customize your environment settings. control what versions of a software package will be used when you compile or run a program. Using modules module avail check which modules are available module load <module> set up shell variables to use a module module unload remove a module module list show all loaded modules module help get help on using module
13
Using MPI on Pere Multiple MPI compilers available, each may need different syntax OpenMPI compiler (/usr/mpi/gcc/openmpi-1.2.8) mpicc –o prog prog.c mpif90 –o prog prog.f mvapich compiler (/usr/mpi/gcc/mvapich-1.1.0) PGI compiler (/cluster/pgi/linux86-64/10.2) pgcc –Mmpi –o prog prog.c pgf90 –Mmpi –o prog prog.f Intel compiler icc –o prog prog.c –lmpi ifort –o prog prog.f -lmpi
14
Pere Batch Queues Pere current runs PBS/TORQUE TORQUE usage
qsub myjob.qsub submit job scripts qstat view job status qdel job-id delete job pbsnodes show nodes status pbstop show queue status
15
Sample Job Scripts on Pere
#!/bin/sh #PBS -N hpl #PBS -l nodes=64:ppn=8,walltime=01:00:00 #PBS -q batch #PBS -j oe #PBS -o hpl-$PBS_JOBID.log cd $PBS_O_WORKDIR cat $PBS_NODEFILE mpirun -np hostfile `echo $PBS_NODEFILE` xhpl Assign a name to the job Request resources: 64 nodes, each with 8 processors, 1 hour Submit to batch queue Merge stdout and stderr output Redirect output to a file Change work dir to current dir Print allocated nodes (not required) Run the mpi program
16
Extra Help For Accessing Pere
Contact me. User’s guide for pere
17
Using Condor Resources:
18
Using Condor 1. Write a submit script – simple.job
Universe = vanilla Executable = simple Arguments = 4 10 Log = simple.log Output = simple.out Error = simple.error Queue 2. Submit the script to condor pool condor_submit simple.job 3. Watch the job run condor_q condor_q –sub <you-username>
19
Doing a Parameter Sweep
Can put a collections of jobs in the same submit scripts to do a parameter sweep. Universe = vanilla Executable = simple Arguments = 4 10 Log = simple.log Output = simple.$(Process).out Error = simple.$(Process).error Queue Arguments = 4 11 Arguments = 4 12 Tell condor to use different output for each job Use queue to tell the individual jobs Can be run independently
20
Condor DAGMan DAGMAn, lets you submit complex sequences of jobs as long as they can be expressed as a directed acylic graph Each job in the DAG can only one queue. Commands: condor_submit_dag simple.dag ./watch_condor_q
21
Submit MPI Jobs to Condor
Difference from serial jobs: use MPI universe machine_count > 1 When there is no shared file system, transfer executables and output from/to local systems by specifying should_transfer_file and when_to_transfer_output
22
Questions How to implement parameter sweep using SGE/PBS?
How to implement DAG on SGE/PBS? Is there better ways to run the a large number of jobs on the cluster? Which resource I should use and where I can find help?
23
Gigabit Ethernet Interconnection
HPCL Cluster Head Node Compute Node #1 Compute Node #2 Compute Node #3 Compute Node #4 Gigabit Ethernet Interconnection To MARQNET /16
24
How to Access HPCL Cluster
On Windows: Using SSH Secure Shell or PUTTY On Linux: Using ssh command
25
Developing & Running Parallel Code
Identify Problem & Analyze Requirement Analyze Performance Bottleneck Designing Parallel Algorithm Coding Writing Parallel Code Building Binary Code (Compiling) Compiling Testing Code Running Solving Realistic Problems (Running Production Release)
26
Steps to Run A Parallel Code
Get the source code You can do it either on your local computer and then transfer to hpcl.mscs.mu.edu, or Use vi to edit a new one on hpcl.mscs.mu.edu Compile your source code using mpicc, mpicxx or mpif77 They are located under /opt/openmpi/bin. Use which command to find it location; If not in your path, add the next line to your shell initialization file (e.g., ~/.bash_profile) export PATH=/opt/openmpi/bin:$PATH Write a submission script for your job vi myscript.sh Use qsub to submit the script. qsub myscript.sh
27
Getting Parallel Code hello.c
You can write the code on your development machine using IDE and then transfer the code to the cluster. (Recommended) For small code, you can also directly edit it on the cluster.
28
Transfer File to Cluster
Method 1: sftp (text or GUI) sftp put simple.c bye Method 2: scp scp simple.c Method 3: rsync rsync -rsh=ssh -av example \ Method 4: svn or cvs svn co \ svn+ssh://hpcl.mscs.mu.edu/mscs6060/example
29
Compile MPI Programs Method 1: Using MPI compiler wrappers
mpicc: for c code mpicxx/mpic++/mpiCC: for c++ code mpif77, mpif90: for FORTRAN code Examples: mpicc –o hello hello.c mpif90 –o hello hello.f Looking the cluster documentation or consulting system administrators for the types of available compilers and their locations.
30
Compile MPI Programs (cont.)
Method 2: Using standard compilers with mpi library Note: MPI is just a library, so you can link the library to your code to get the executables. Examples: gcc -o ping ping.c \ -I/usr/mpi/gcc/openmpi-1.2.8/include \ -L/usr/mpi/gcc/openmpi-1.2.8/lib64 -lmpi
31
Compiling Parallel Code – Using Makefile
32
Job Scheduler A kind of software that provide
Job submission and automatic execution Job monitoring and control Resource management Priority management Checkpoint …. Usually implemented as master/slave architecture Commonly used Job Schedulers PBS: PBS Pro/TORQUE SGE (Sun Grid Engine, Oracle) LSF (Platform Computing) Condor (UW Madison)
33
Using SGE to Manage Jobs
HPCL cluster using SGE as job scheduler Basic commands qsub submit a job to the batch scheduler qstat examine the job queue qdel delete a job from the queue Other commands qconf SGE queue configuration qmon graphical user's interface for SGE qhost show the status of SGE hosts, queues, jobs
34
Submit a Serial Job simple.sh
35
Submit Parallel Jobs to HPCL Cluster
force to use bash for shell interpreter Request Parallel Environment orte using 64 slots (or processors) Run the job in specified director Merge two output files (stdout, stderr) Redirect output to a log file Run mpi program For your program, you may need to change the processor number, the program name at the last line, and the job names.
36
References SUN Grid Engine User’s Guide Command used commands Submit job: qsub Check status: qstat Delete job: qdel Check configuration: qconf Check the manual of a command man qsub
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.