Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced topics Cluster Training Center for Simulation and Modeling September 4, 2015.

Similar presentations


Presentation on theme: "Advanced topics Cluster Training Center for Simulation and Modeling September 4, 2015."— Presentation transcript:

1 Advanced topics Cluster Training Center for Simulation and Modeling September 4, 2015

2 – Advanced linux usage File editing Functions Exit codes – Frank: Resource monitoring – Frank: Array jobs – Frank: Dependencies – Frank: Local scratch – MPI: Job submission Advanced Cluster Usage Topics

3 ADVANCED BASH SCRIPTING

4 Many editors to choose from – vim – emacs – nano Nano is the easiest – Do this once: echo “include /usr/share/nano/sh.nanorc” > ~/.nanorc Files and editors

5 Simple scripts are commands in a file – Run through the SHELL –sh – Or include #!/bin/bash at the top and make executable Input arguments –./ arg1 arg2 – $1 is the first argument – $2 is the second BASH: Input and Output

6 Iterate over collections with for Numeric ranges –for VARIABLE in 1 2 3 4 5 –for i in {1..5} –for (( c=1; c<=5; c++ )) File names –for VARIABLE in file1 file2 file3 Command output –for OUTPUT in $(Linux-Or-Unix-Command-Here) BASH: loops http://swcarpentry.github.io/shell-novice/04-loop.html

7 Type man test for testing commands ( [ is the same command ) [ VAR1 OPERATOR VAR2 ] – Integers: -eq, -ge, -gt, -le, -lt, -ne – Strings =, != – Files -nt, -ot – Check for existence -e #files -d #directories BASH: conditionals

8 Taking action with conditionals if ( CONDITION1 ) then #do something elif ( CONDITION2 ) then #do something else else #just do this fi BASH: conditionals

9 ssh frank.sam.pitt.edu /home/sam/training/loops Bash functions

10 Modularize your commands for reusability Functions must be defined at the top of the script Same rules for input arguments Bash functions

11 ssh frank.sam.pitt.edu /home/sam/training/functions Bash functions

12 Unix processes have “exit codes” – Inform the user if the execution was successful – Control the processing and collection of data – 0 means “all is well” Beware of false positives! – The exit code is stored in $? –set –o pipefail Exit codes

13 ssh frank.sam.pitt.edu /home/sam/training/exits Exit codes

14 man pbs_resources -l mem= gb – Job will die if exceeded – Defaults at http://core.sam.pitt.edu/frank/batch#The_Frank_Queues -l ddisk= gb checkjob – Parallel efficiency – Memory usage – Swap usage – Scheduling details Resource utilization

15 /home/sam/training/resources Resource utilization

16 pcmd – Run program on all nodes prun – Wrapper for mpirun/mpiexec/mpdrun/charmrun – pernode/npernode $OMP_NUM_THREADS ssh n[0-9]* – Direct access to compute node –/scr/.clusman0.localdomain Specialized Commands

17 FILE SYSTEMS

18 Both MPI and Frank share –$HOME : 100 GB per user –/mnt/mobydisk/groupshares Request access in a ticket More space available per group Expected to be faster by end of year –/pan Data is already on /mnt/mobydisk Will be retired soon File systems

19 Array jobs

20 -t x-y,z%n – %n means only allow n to run concurrently $PBS_ARRAYID – The array counter – file.$PBS_ARRAYID.input qstat -t – To view array jobs – (will not show in pbstop right now) qdel JOBID[] / JOBID[x] – delete all array elements. – -t Array jobs

21 ssh frank.sam.pitt.edu /home/sam/training/arrays Array jobs

22 -W depend= :,... – syncwith – [before/after] – [before/after]ok – [before/after]notok – [before/after]any Job dependency

23 Every node has 1 – 2 TB of local disk – Speedup file reads and writes – No competition with other users – Requires moving data before and after computation – Data is deleted immediately after completion – Easy access to this data using $LOCAL Using local scratch

24 #!/bin/bash cd $PBS_O_WORKDIR #copy all input data to every node pcmd rsync -aP * $LOCAL #process data in $LOCAL cd $LOCAL /path/to/executable > $PBS_O_WORKDIR/output #copy back important data from master rsync -aP * $PBS_O_WORKDIR $LOCAL

25 Pre-mature abortion of a job – Useful data may not get copied back – The trap will execute after a termination signal qdel from user Walltime limit reached Error in the program – The trap will only have 5 seconds to execute – Carefully plan data copies Don’t rely on trap Using traps

26 #!/bin/bash cd $PBS_O_WORKDIR #copy all input data to every node pcmd rsync -aP * $LOCAL #process data in $LOCAL cd $LOCAL /path/to/executable > $PBS_O_WORKDIR/output #copy back important data from master rsync -aP * $PBS_O_WORKDIR #copy back really important data from master only trap “rsync –aP restart-file $PBS_O_WORKDIR” EXIT $LOCAL

27 ssh frank.sam.pitt.edu /home/sam/training/scratch $SCRATCH

28 THE MPI CLUSTER

29 MPI cluster Intended for Distributed Memory Parallel codes – Message Passing Interface (MPI) http://core.sam.pitt.edu/MPIcluster

30 New modules with Spack – CP2K – VASP – METIS / PARMETIS – TRILINOS – GAMESS The MPI Cluster

31 sbatch, salloc, #SBATCH –-N Total number of nodes ( >= 2) –--tasks-per-node Number of MPI ranks per node –--cpus-per-task Controls threading expectations $OMP_NUM_THREADS must be set manually – defaults to 20! MPI: Slurm

32 sbatch – Submit a batch script that contains #SBATCH declarations – Arguments on the command line override #SBATCH –srun is required for all compute tasks salloc – Submit a job for interactive use – Shell is returned on the LOGIN NODE –srun is required to run a compute task MPI: Slurm

33 Use srun to launch all compute tasks –prun and mpirun will not work – Nodes, tasks and cpus are imported from sbatch and salloc Can be overridden with each srun command Cannot change number of nodes MPI: Slurm

34 Scratch usage –sbcast Single file –rsync Single node by default Use srun to rsync to all nodes –srun –chdir=$LOCAL... MPI: Slurm http://core.sam.pitt.edu/MPIcluster#Local_Scratch_directory

35 ssh mpi.sam.pitt.edu /home/sam/training/mpi/scratch MPI: Scratch

36 HPC applications run best when processes are bound to cores – Eliminates context switching – Controls memory access Not all cores have the same access to memory –srun --cpu_bind=cores usually best choice http://core.sam.pitt.edu/MPIcluster#Process_affinity MPI: Affinity http://blogs.cisco.com/performance/process-and-memory-affinity-why-do-you-care

37 ssh mpi.sam.pitt.edu /home/sam/training/mpi/hybridPi MPI hands-on

38 ssh keys for passwordless login –ssh-keygen – Add contents of ~/.ssh/id_dsa.pub to authorized_keys Persistent sessions – tmux – Leave interactive jobs running Pro tips


Download ppt "Advanced topics Cluster Training Center for Simulation and Modeling September 4, 2015."

Similar presentations


Ads by Google