Download presentation
Presentation is loading. Please wait.
Published byGerard Jones Modified over 9 years ago
1
Using the BYU Supercomputers
2
Resources
3
Basic Usage After your account is activated: – ssh yourid@ssh.fsl.byu.edu You will be logged in to an interactive node – Jobs that run on the supercomputer are submitted to the batch queuing system You can develop code on the interactive nodes
4
Running Jobs The process – User creates a shell script that will: tell the scheduler what is needed run the user’s job – User submits the shell script to the batch scheduler queue – Machines register with the scheduler offering to run jobs – Scheduler allocates jobs to machines and tracks the jobs – The shell script is run on the first node of the group of nodes assigned to a job – When finished, all stdout and stderr are collected back and given to the user in files
5
Scheduling Jobs Basic commands – sbatch scheduling_shell_script sbatch scheduling_shell_script – squeue [-u username] – scancel jobnumber – sacct [-l]
6
Job Submission Scripts #!/bin/bash #SBATCH --time=01:00:00 # walltime #SBATCH --ntasks=64 # number of processor cores (i.e. tasks) #SBATCH --nodes=1 # number of nodes #SBATCH --mem-per-cpu=1024M # memory per CPU core #SBATCH -J "test" # job name #SBATCH --mail-user=myemail@gmail.com # email address #SBATCH --mail-type=END # Compatibility variables for PBS. Delete if not needed. export PBS_NODEFILE=`/fslapps/fslutils/generate_pbs_nodefile` export PBS_JOBID=$SLURM_JOB_ID export PBS_O_WORKDIR="$SLURM_SUBMIT_DIR" export PBS_QUEUE=batch # Set the max number of threads to use for programs using OpenMP. Should be <= ppn. Does nothing if the program doesn't use OpenMP. export OMP_NUM_THREADS=$SLURM_CPUS_ON_NODE mpirun hello #!/bin/bash #SBATCH --time=01:00:00 # walltime #SBATCH --ntasks=64 # number of processor cores (i.e. tasks) #SBATCH --nodes=1 # number of nodes #SBATCH --mem-per-cpu=1024M # memory per CPU core #SBATCH -J "test" # job name #SBATCH --mail-user=myemail@gmail.com # email address #SBATCH --mail-type=END # Compatibility variables for PBS. Delete if not needed. export PBS_NODEFILE=`/fslapps/fslutils/generate_pbs_nodefile` export PBS_JOBID=$SLURM_JOB_ID export PBS_O_WORKDIR="$SLURM_SUBMIT_DIR" export PBS_QUEUE=batch # Set the max number of threads to use for programs using OpenMP. Should be <= ppn. Does nothing if the program doesn't use OpenMP. export OMP_NUM_THREADS=$SLURM_CPUS_ON_NODE mpirun hello https://marylou.byu.edu/ documentation/slurm/scri pt-generator
7
Viewing Your Jobs -bash-4.1$ sbatch hello.pbs ^[[ASubmitted batch job 7295257 -bash-4.1$ squeue -u mjc22 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 7295257 m6n mpitest mjc22 CG 0:05 2 m6-18-[6-7] -bash-4.1$ sacct -j 7295257 JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 7295257 mpitest m6n mjc22 30 COMPLETED 0:0 7295257.bat+ batch mjc22 30 COMPLETED 0:0 7295257.0 orted mjc22 2 COMPLETED 0:0
8
Developing Code Normal linux code development tools – gcc, g++, gdb, etc. Intel compiler – icc, ifort Editing – vi – emacs – edit on your own machine and transfer Parallel code development – icc –openmp – gcc –fopenmp – mpicc
9
Output stderr and stdout from each node are collected into files – Jobname.oJOBNUM – Jobname.eJOBNUM -bash-3.2$ less slurm-7295257.out I am proc 7 of 30 running on m6-18-6 Sending messages Receiving messages I am proc 8 of 30 running on m6-18-6 Sending messages I am proc 9 of 30 running on m6-18-6 Sending messages I am proc 10 of 30 running on m6-18-6 Sending messages I am proc 0 of 30 running on m6-18-6 Sending messages Receiving messages 0: 1: Hello 0: 2: Hello 0: 3: Hello 0: 4: Hello 0: 5: Hello 0: 6: Hello 0: 7: Hello 0: 8: Hello 0: 9: Hello 0: 10: Hello I am proc 1 of 30 running on m6-18-6 -bash-3.2$ less slurm-7295257.out I am proc 7 of 30 running on m6-18-6 Sending messages Receiving messages I am proc 8 of 30 running on m6-18-6 Sending messages I am proc 9 of 30 running on m6-18-6 Sending messages I am proc 10 of 30 running on m6-18-6 Sending messages I am proc 0 of 30 running on m6-18-6 Sending messages Receiving messages 0: 1: Hello 0: 2: Hello 0: 3: Hello 0: 4: Hello 0: 5: Hello 0: 6: Hello 0: 7: Hello 0: 8: Hello 0: 9: Hello 0: 10: Hello I am proc 1 of 30 running on m6-18-6
10
Backfill Scheduling time Job A Job B 10 node system Job C A BCD Job D
11
Backfill Scheduling Requires real time limit to be set More accurate (shorter) estimate gives more chance to be running earlier Short jobs can move through system quicker Uses system better by avoiding waste of cycles during wait
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.