Download presentation
Presentation is loading. Please wait.
Published byAugusta Dorsey Modified over 9 years ago
1
Advanced topics Cluster Training Center for Simulation and Modeling September 4, 2015
2
– Advanced linux usage File editing Functions Exit codes – Frank: Resource monitoring – Frank: Array jobs – Frank: Dependencies – Frank: Local scratch – MPI: Job submission Advanced Cluster Usage Topics
3
ADVANCED BASH SCRIPTING
4
Many editors to choose from – vim – emacs – nano Nano is the easiest – Do this once: echo “include /usr/share/nano/sh.nanorc” > ~/.nanorc Files and editors
5
Simple scripts are commands in a file – Run through the SHELL –sh – Or include #!/bin/bash at the top and make executable Input arguments –./ arg1 arg2 – $1 is the first argument – $2 is the second BASH: Input and Output
6
Iterate over collections with for Numeric ranges –for VARIABLE in 1 2 3 4 5 –for i in {1..5} –for (( c=1; c<=5; c++ )) File names –for VARIABLE in file1 file2 file3 Command output –for OUTPUT in $(Linux-Or-Unix-Command-Here) BASH: loops http://swcarpentry.github.io/shell-novice/04-loop.html
7
Type man test for testing commands ( [ is the same command ) [ VAR1 OPERATOR VAR2 ] – Integers: -eq, -ge, -gt, -le, -lt, -ne – Strings =, != – Files -nt, -ot – Check for existence -e #files -d #directories BASH: conditionals
8
Taking action with conditionals if ( CONDITION1 ) then #do something elif ( CONDITION2 ) then #do something else else #just do this fi BASH: conditionals
9
ssh frank.sam.pitt.edu /home/sam/training/loops Bash functions
10
Modularize your commands for reusability Functions must be defined at the top of the script Same rules for input arguments Bash functions
11
ssh frank.sam.pitt.edu /home/sam/training/functions Bash functions
12
Unix processes have “exit codes” – Inform the user if the execution was successful – Control the processing and collection of data – 0 means “all is well” Beware of false positives! – The exit code is stored in $? –set –o pipefail Exit codes
13
ssh frank.sam.pitt.edu /home/sam/training/exits Exit codes
14
man pbs_resources -l mem= gb – Job will die if exceeded – Defaults at http://core.sam.pitt.edu/frank/batch#The_Frank_Queues -l ddisk= gb checkjob – Parallel efficiency – Memory usage – Swap usage – Scheduling details Resource utilization
15
/home/sam/training/resources Resource utilization
16
pcmd – Run program on all nodes prun – Wrapper for mpirun/mpiexec/mpdrun/charmrun – pernode/npernode $OMP_NUM_THREADS ssh n[0-9]* – Direct access to compute node –/scr/.clusman0.localdomain Specialized Commands
17
FILE SYSTEMS
18
Both MPI and Frank share –$HOME : 100 GB per user –/mnt/mobydisk/groupshares Request access in a ticket More space available per group Expected to be faster by end of year –/pan Data is already on /mnt/mobydisk Will be retired soon File systems
19
Array jobs
20
-t x-y,z%n – %n means only allow n to run concurrently $PBS_ARRAYID – The array counter – file.$PBS_ARRAYID.input qstat -t – To view array jobs – (will not show in pbstop right now) qdel JOBID[] / JOBID[x] – delete all array elements. – -t Array jobs
21
ssh frank.sam.pitt.edu /home/sam/training/arrays Array jobs
22
-W depend= :,... – syncwith – [before/after] – [before/after]ok – [before/after]notok – [before/after]any Job dependency
23
Every node has 1 – 2 TB of local disk – Speedup file reads and writes – No competition with other users – Requires moving data before and after computation – Data is deleted immediately after completion – Easy access to this data using $LOCAL Using local scratch
24
#!/bin/bash cd $PBS_O_WORKDIR #copy all input data to every node pcmd rsync -aP * $LOCAL #process data in $LOCAL cd $LOCAL /path/to/executable > $PBS_O_WORKDIR/output #copy back important data from master rsync -aP * $PBS_O_WORKDIR $LOCAL
25
Pre-mature abortion of a job – Useful data may not get copied back – The trap will execute after a termination signal qdel from user Walltime limit reached Error in the program – The trap will only have 5 seconds to execute – Carefully plan data copies Don’t rely on trap Using traps
26
#!/bin/bash cd $PBS_O_WORKDIR #copy all input data to every node pcmd rsync -aP * $LOCAL #process data in $LOCAL cd $LOCAL /path/to/executable > $PBS_O_WORKDIR/output #copy back important data from master rsync -aP * $PBS_O_WORKDIR #copy back really important data from master only trap “rsync –aP restart-file $PBS_O_WORKDIR” EXIT $LOCAL
27
ssh frank.sam.pitt.edu /home/sam/training/scratch $SCRATCH
28
THE MPI CLUSTER
29
MPI cluster Intended for Distributed Memory Parallel codes – Message Passing Interface (MPI) http://core.sam.pitt.edu/MPIcluster
30
New modules with Spack – CP2K – VASP – METIS / PARMETIS – TRILINOS – GAMESS The MPI Cluster
31
sbatch, salloc, #SBATCH –-N Total number of nodes ( >= 2) –--tasks-per-node Number of MPI ranks per node –--cpus-per-task Controls threading expectations $OMP_NUM_THREADS must be set manually – defaults to 20! MPI: Slurm
32
sbatch – Submit a batch script that contains #SBATCH declarations – Arguments on the command line override #SBATCH –srun is required for all compute tasks salloc – Submit a job for interactive use – Shell is returned on the LOGIN NODE –srun is required to run a compute task MPI: Slurm
33
Use srun to launch all compute tasks –prun and mpirun will not work – Nodes, tasks and cpus are imported from sbatch and salloc Can be overridden with each srun command Cannot change number of nodes MPI: Slurm
34
Scratch usage –sbcast Single file –rsync Single node by default Use srun to rsync to all nodes –srun –chdir=$LOCAL... MPI: Slurm http://core.sam.pitt.edu/MPIcluster#Local_Scratch_directory
35
ssh mpi.sam.pitt.edu /home/sam/training/mpi/scratch MPI: Scratch
36
HPC applications run best when processes are bound to cores – Eliminates context switching – Controls memory access Not all cores have the same access to memory –srun --cpu_bind=cores usually best choice http://core.sam.pitt.edu/MPIcluster#Process_affinity MPI: Affinity http://blogs.cisco.com/performance/process-and-memory-affinity-why-do-you-care
37
ssh mpi.sam.pitt.edu /home/sam/training/mpi/hybridPi MPI hands-on
38
ssh keys for passwordless login –ssh-keygen – Add contents of ~/.ssh/id_dsa.pub to authorized_keys Persistent sessions – tmux – Leave interactive jobs running Pro tips
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.