Download presentation
Presentation is loading. Please wait.
Published byDiana Hines Modified over 8 years ago
1
HPC for Statistics Grad Students
2
A Cluster Not just a bunch of computers Linked CPUs managed by queuing software – Cluster – Node – CPU
3
Clusters Stat Cluster – Need FAS account – If you have FAS account and can’t access the cluster contact earneson@sfu.caearneson@sfu.ca – Read Matt Pratola’s webpage first (references)! – 160 CPUs in 20 nodes (1 node = 2 x Intel quad- core Xeon 2.66GHz w/ 2GB RAM per CPU) – Access: ssh warrior.stat.sfu.ca ssh stat-cl.stat.sfu.ca
4
Clusters IRMACS Cluster – Need IRMACS account – Contact ???? support@irmacs.sfu.ca for access if you don’t have itsupport@irmacs.sfu.ca – 80 CPUs in10 nodes (1 node = 2 x Intel quad-core Xeon 2.66GHz w/ 2GB RAM per CPU) – Access: ssh head.irmacs.sfu.ca
5
WestGrid – Need to apply for account with permission from Charmaine – > 5000 CPUs with various chips and >> 10 TB of storage
6
Cluster Login & File Access Log in from the terminal ssh head.irmacs.sfu.ca File transfer: scp (secure copy) and fugu scp Script-pbs.txt username@head.irmacs.sfu.ca:/directory username@head.irmacs.sfu.ca:/directory
7
Submitting Jobs to the Queuing Software (stat-cl) PBS Script #!/bin/bash #PBS -N job_name #PBS -q queue_name #PBS -M user_email #PBS -m bae ”Email (b)efore execution (a)fter execution (e)rror occurs" #PBS -l nodes=2 ”Number of CPUs needed” #PBS -o path_to_job_log #PBS -e path_to_error_log Example: Single R job #! /bin/bash #PBS -N Chi-square #PBS -q batch #PBS -M ejuarezc@sfu.caejuarezc@sfu.ca #PBS -m bae /usr/local/bin/R CMD BATCH Chi-square.R
8
Threading & Parallel Processing Lots of statistical jobs can utilize parallel processing, but threading is much less common Threading: sending calls to subroutines out to separate CPUs for simultaneous processing Parallel processing: separate CPUs performing similar, but independent jobs – Simulation – Bootstrapping
9
Parallel Processing in Clusters MPI – Message Passing Interface – Software which manages the how the parallel (or threaded) processes are sent arguments and return their results Use RMPI package to construct a parallel job in R then use MPIRUN to send that job to CPUs on the cluster – Master – Slaves
10
Example PBS Script for RMPI Job (Stat Cluster) #! /bin/bash #PBS -N Cox_MPI #PBS -q default #PBS -M dthompso@stat.sfu.ca #PBS -j oe #PBS -o cox_mpi.out #PBS -e cox_mpi.err #PBS -d /home/math2/dthompso/RMPIex #PBS -m ea #PBS -l nodes=4:ppn=8 # The mpirun command line is rather complicated, so we define it here. DO NOT CHANGE! set MPIRUN="mpirun -np 1 -hostfile $PBS_NODEFILE --mca btl ^openib,udapl --mca pls_rsh_agent /usr/bin/ssh" # Here is where you specify the executable you want the cluster to run. $MPIRUN /math/local2-linux/stat/bin/R --vanilla --no-save --no-restore -f RMPI_cox_test.R
11
R Code RMPI Job Components of RMPI R Code: - initialization of slaves - creation of function to pass to slaves - submitting data & functions to the slaves - output results
12
R Packages on the Cluster Some packages are already installed – check! 1.Run an interactive session qsub -I 2.Start R (stat cluster) R (IRMACS cluster) /usr/local/bin/R 3.Attempt to access the library library( )
13
R Packages on the Cluster (cont’d) If you need a new package 1.Download the binary and place the gz file on the cluster (Fugu or scp ) 2.Make a package installation folder: mkdir $HOME/R mkdir $HOME/R/x86_64-unknown-linux-gnu-library mkdir $HOME/R/x86_64-unknown-linux-gnu-library/2.7 3.Install: cd $HOME R CMD INSTALL PACKAGENAME.gz
14
Helpful online resources RMPI http://math.acadiau.ca/ACMMaC/Rmpi/index.html Stat Cluster http://www.stat.sfu.ca/~mtpratol/computing.html IRMACS Cluster http://www.irmacs.sfu.ca/infrastructure/cluster/support http://simon.bonners.ca/blog/blog5.php/irmacs: http://simon.bonners.ca/blog/blog5.php/irmacs WestGrid http://www.westgrid.ca/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.