Using the BYU Supercomputers
Resources
Basic Usage After your account is activated: – ssh You will be logged in to an interactive node – Jobs that run on the supercomputer are submitted to the batch queuing system You can develop code on the interactive nodes
Running Jobs The process – User creates a shell script that will: tell the scheduler what is needed run the user’s job – User submits the shell script to the batch scheduler queue – Machines register with the scheduler offering to run jobs – Scheduler allocates jobs to machines and tracks the jobs – The shell script is run on the first node of the group of nodes assigned to a job – When finished, all stdout and stderr are collected back and given to the user in files
Scheduling Jobs Basic commands – sbatch scheduling_shell_script sbatch scheduling_shell_script – squeue [-u username] – scancel jobnumber – sacct [-l]
Job Submission Scripts #!/bin/bash #SBATCH --time=01:00:00 # walltime #SBATCH --ntasks=64 # number of processor cores (i.e. tasks) #SBATCH --nodes=1 # number of nodes #SBATCH --mem-per-cpu=1024M # memory per CPU core #SBATCH -J "test" # job name #SBATCH # address #SBATCH --mail-type=END # Compatibility variables for PBS. Delete if not needed. export PBS_NODEFILE=`/fslapps/fslutils/generate_pbs_nodefile` export PBS_JOBID=$SLURM_JOB_ID export PBS_O_WORKDIR="$SLURM_SUBMIT_DIR" export PBS_QUEUE=batch # Set the max number of threads to use for programs using OpenMP. Should be <= ppn. Does nothing if the program doesn't use OpenMP. export OMP_NUM_THREADS=$SLURM_CPUS_ON_NODE mpirun hello #!/bin/bash #SBATCH --time=01:00:00 # walltime #SBATCH --ntasks=64 # number of processor cores (i.e. tasks) #SBATCH --nodes=1 # number of nodes #SBATCH --mem-per-cpu=1024M # memory per CPU core #SBATCH -J "test" # job name #SBATCH # address #SBATCH --mail-type=END # Compatibility variables for PBS. Delete if not needed. export PBS_NODEFILE=`/fslapps/fslutils/generate_pbs_nodefile` export PBS_JOBID=$SLURM_JOB_ID export PBS_O_WORKDIR="$SLURM_SUBMIT_DIR" export PBS_QUEUE=batch # Set the max number of threads to use for programs using OpenMP. Should be <= ppn. Does nothing if the program doesn't use OpenMP. export OMP_NUM_THREADS=$SLURM_CPUS_ON_NODE mpirun hello documentation/slurm/scri pt-generator
Viewing Your Jobs -bash-4.1$ sbatch hello.pbs ^[[ASubmitted batch job bash-4.1$ squeue -u mjc22 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) m6n mpitest mjc22 CG 0:05 2 m6-18-[6-7] -bash-4.1$ sacct -j JobID JobName Partition Account AllocCPUS State ExitCode mpitest m6n mjc22 30 COMPLETED 0: bat+ batch mjc22 30 COMPLETED 0: orted mjc22 2 COMPLETED 0:0
Developing Code Normal linux code development tools – gcc, g++, gdb, etc. Intel compiler – icc, ifort Editing – vi – emacs – edit on your own machine and transfer Parallel code development – icc –openmp – gcc –fopenmp – mpicc
Output stderr and stdout from each node are collected into files – Jobname.oJOBNUM – Jobname.eJOBNUM -bash-3.2$ less slurm out I am proc 7 of 30 running on m Sending messages Receiving messages I am proc 8 of 30 running on m Sending messages I am proc 9 of 30 running on m Sending messages I am proc 10 of 30 running on m Sending messages I am proc 0 of 30 running on m Sending messages Receiving messages 0: 1: Hello 0: 2: Hello 0: 3: Hello 0: 4: Hello 0: 5: Hello 0: 6: Hello 0: 7: Hello 0: 8: Hello 0: 9: Hello 0: 10: Hello I am proc 1 of 30 running on m bash-3.2$ less slurm out I am proc 7 of 30 running on m Sending messages Receiving messages I am proc 8 of 30 running on m Sending messages I am proc 9 of 30 running on m Sending messages I am proc 10 of 30 running on m Sending messages I am proc 0 of 30 running on m Sending messages Receiving messages 0: 1: Hello 0: 2: Hello 0: 3: Hello 0: 4: Hello 0: 5: Hello 0: 6: Hello 0: 7: Hello 0: 8: Hello 0: 9: Hello 0: 10: Hello I am proc 1 of 30 running on m6-18-6
Backfill Scheduling time Job A Job B 10 node system Job C A BCD Job D
Backfill Scheduling Requires real time limit to be set More accurate (shorter) estimate gives more chance to be running earlier Short jobs can move through system quicker Uses system better by avoiding waste of cycles during wait