Queueing System Peter Wad Sackett.

Queueing System Peter Wad Sackett

Using a Queueing System
Most Super Computers have some form of Queueing System in order allocate the ressources efficiently and fairly among the many users. Any Queueing System has ways of Putting a job in the execution queue (resource allocation) Removing a job from the execution queue Terminating a running job (user/system) Supervising a running job (user/system) Alerting the user to events (start, stop, error) Acting on events Accounting - it is not free, you know The following slides will be in more detail, using the Computerome Queueing System as an example.

Queueing System basics
Computerome has 540 thinnodes containing 2 CPU’s with 14 cores per CPU and 128 GB ram, and 27 fatnodes containing 4 CPU’s with 8 cores per CPU and 1024 GB RAM. Furthermore, there are two login nodes, which technically corresponds to a fatnode. All this compute power is served by a 9 PB non-distributed file system. A job (program) is submitted for execution on one node (usually the login node), but executed on another. The Queueing System allocates the compute resources and decides when to run the job. The QS logs in on a node on your behalf and starts your job. This means that your job starts from scratch in your home without any environment/modules loaded except what you have set up in the file .bashrc in your home. Jobs stay in the queue until their requirements (RAM, CPU, time) are met – also if they never can be met.

Job overhead Any QS has a turn-around time for a job. That is the time it takes to put the job in the queue, to schedule the job and allocate resources, and to end the job and free up the resources again. 3 minutes turn-around according to Computerome operators. Any job you run should therefore take at least 3 times as long. On a practical level a QS can have difficulty handling all submissions if they come too quick; 1000 quick jobs are worse than 10 long jobs. You can help the stability of the QS by not submitting jobs as fast as you (your program) can. Insert a small delay between submissions. 1-2 seconds delay between submissions is fine.

Practical advice I asked a Computerome operator about the biggest challenges people face when using the system. The answer: RTFM (read the fucking manual), it is online. Learn basic Unix. Clean up your act; Delete temporary files, compress big files. Avoid running large jobs/programs directly on the login node. Make an interactive session on a compute node instead, see later. You can easily edit programs, navigate the file system, do small jobs (1 core, 1 minut, 1 GB RAM) on the login node.

Activating the Queueing System
The Queueing system consists of two elements. Torque: The Terascale Open-source Resource and QUEue Manager is a distributed resource manager providing control over batch jobs and distributed compute nodes. Moab: The Moab Cluster Suite is a cluster workload management package, that integrates scheduling, managing, monitoring, and reporting of cluster workloads. There is a certain overlap of functionality between the elements. To use the Queueing System, i.e. submit jobs, job control, etc. you use the module system. Simply write on the command line: module load moab torque Now relevant commands are available to you in your session.

Submitting a job The recommended command is qsub.
qsub submits a job script. The job script can contain various commands and/or option to the queueing system. This is really the best way to control your jobs, but it requires a special made script for each job, which can get cumbersome. qsub itself can accept many of the most used options on the command line. You minimum need to specify How long the job will run How much RAM the job needs How many cores are required Who shall pay for the compute There are several other important options. Example submision: qsub –l nodes=1:ppn=1,mem=10gb,walltime=10:00 –A pr_course –W group_list=pr_course –M –m e –d /home/projects/pr_course /home/projects/pr_course/jobscript.sh

qsub parameters – job limitation
Using –l (dash small L) option you specify the resources required for the job. These are system specific. On Computerome there is: -l nodes=1:ppn=3 You use 1 node per job, unless you use MPI or know what you are doing. If in any doubt at all, 1 node is what you use. ppn is the number of cores on the node you use. Max is 28 thinnodes and 32 on fatnodes. Most ”normal” jobs use 1 core. By reserving all cores you also get all the RAM. -l mem=20gb How much RAM you reserve for the job on the node(s). Thinnodes has 128 GB, fatnodes 1024GB. Don’t require the max, as the system needs some for itself. -l walltime=1:30:00 How long time is reserved for the job to run. Here is 1½ hour. These are the most important resource allocations. There are others. -l nodes=1:ppn=1,mem=10gb,walltime=30:00

qsub parameters - accounting
The accounting is done with –A <string>, which is a simple string associated with the job. -A some_project The job need to execute as belonging to a group on the system with the –W option which allows for additional attributes. -W group_list=some_group On Computerome this has been simplified thus, that the unix group is the same as the accounting string. This concretely means that the accounting for this course is done like -A pr_course –W group_list=pr_course Somewhat annoying to write it twice, but that is how it is. The accounting is REALLY important as if you do not do this, then a random group/project (where you are a member) gets the bill for the compute.

qsub parameters - misc The job’s working directory is default your home. Specify another working directory with –d option. -d /home/projects/pr_course Every job produces 2 files which contains the STDOUT and the STDERR output of the job. The files have standard names and locations, which you will discover very quickly, but this can be controlled with the –o (output) and –e (error) option. -e /home/projects/pr_course/errorfile -o /home/projects/pr_course/ As shown, you can decide the file name or just the directory. You can even suppress the files by using /dev/null as destination. Name your job with –N option for easy recognition. Max 11 chars. -N xyzjob.001 Give arguments to the jobscript when the job is launched, use –F and qoutes. Only qsub. See later example. -F “myarg1 myarg2”

qsub parameters – getting information
Who should be mailed on events; -M option. -M What should be mailed on a job event, option -m: the posibilities are a(bort), b(egin) and/or e(nd). Especially e is very valuable, as the mail will contain data about how long time, core usage and RAM usage the job had. Great for planning the next job. -m abe Log in interactively on a node and test various aspects of your job with –I option. Just like being there yourself with a terminal. This is only relevant for qsub and there is no need for a jobscript. -I Job can have dependencies with the –W option. The most important is afterok which starts a job after other job(s) have completed OK. -W depend=afterok: : More about qsub at

job script example #!/bin/sh
### Note: No commands may be executed until after the #PBS lines ### Notice how the options are used on the #PBS lines ### Account information #PBS -W group_list=pr_course -A pr_course ### Output files (comment out the next 2 lines to get the job name used instead) #PBS -e test.err #PBS -o test.log ### Number of nodes #PBS -l nodes=1:ppn=8:thinnode ### Requesting time #PBS -l walltime=12:00:00 ### Here follows the user commands: # Load all required modules for the job module load tools module load anaconda3/4.0.0 cd /home/projects/pr_course python myprogram.py python myotherprogram.py

Real life example of qsub
First create a simple job script. This make the run go much easier when you have to repeat it due to errors. #!/bin/sh #PBS -W group_list=pr_course -A pr_course #PBS -l nodes=1:ppn=1, mem=2GB,walltime=1:00:00 ### Below line eliminates annoying STDIN files, but use only when program works #PBS -e /dev/null -o /dev/null ### Send mail (or not) #PBS –m ae –M /home/people/pwsa/myotherprogram.py Now simply use qsub to run it. qsub myscript.sh It is a good idea to use full path to programs and data files. Then no mistakes are made.

Real life example of qsub with parameters
First create a simple job script. The bash variables $1, $2, $3 etc. will correspond to the parameters in the –F option. #!/bin/sh #PBS -W group_list=pr_course -A pr_course #PBS -l nodes=1:ppn=1, mem=2GB,walltime=1:00:00 ### Below line eliminates annoying STDIN files, but use only when program works #PBS -e /dev/null -o /dev/null ### Send mail (or not) #PBS –m ae –M /home/people/pwsa/complementfasta.py $1 $2 /home/people/pwsa/translatetoprotein.py $2 $3 Now use qsub to run it with –F option. qsub –F ”/home/people/pwsa/inputfastafile.fsa /home/people/pwsa/outputfastafile.fsa /home/people/pwsa/outputproteinfile.fsa” myscript.sh This way you can make general job scripts that work on different files or parameters.

xqsub extensions xqsub is simply a wrapper around qsub, which takes a few more options than qsub and creates a jobscript for you and submits it. Meant for easy submission of a single command. The most important and convenient extension option is –de for direct execution, which builds the submission script for you. -de mynormalprogram myarguments -de unzip bigfile.zip There is also the options –re and –ro which somewhat replaces the options –e and –o in qsub. However, file must be specified. -re /home/projects/pr_course/errorfile -ro /home/projects/pr_course/outputfile If you want to avoid empty STDIN files, also add to xqsub -e /dev/null –o /dev/null

Real life example of xqsub
Since xqsub simply makes a direct execution on a node, all options, parameters, etc. must be on the command line. This is rather unwieldy. This example assumes you know what you are doing and don’t make mistakes because any informational error messages are eliminated. xqsub –e /dev/null –o /dev/null –ro /dev/null –re /dev/null –A pr_course –W group_list=pr_course –l nodes=1:pnn=1,mem=2GB,walltime=0:10:00 –de /home/people/pwsa/complementfasta.py /home/project/pr_course/human.fsa /home/people/pwsa/revdna.fsa If you would like more error output, then it gets simpler  xqsub –A pr_course –W group_list=pr_course –l nodes=1:pnn=1,mem=2GB,walltime=0:10:00 –de /home/people/pwsa/complementfasta.py /home/project/pr_course/human.fsa /home/people/pwsa/revdna.fsa Now you just have to invesitgate (and later delete) the STDIN files.

Good advice for submitting
The queueing system logs in on a semi-random node as you and starts executing your job. This means that you need to set up the correct modules that is needed for execution – either in the job script or in the .bashrc file. It also means that you start in a place – working directory – which perhaps is not the same place as where you submitted the job. Use absolute paths to avoid confusion. The job is getting scheduled according to a ”fair share” principle. If you are a heavy user, your job can get delayed so other people also have a chance. The more precise you specify the requirements for the job, the easier it is to fit you into an empty time slot and the better cpu usage you get. Don’t make a job that takes 1 hour with 12 cores followed by 1 hour of only 1 core usage. You pay for 24 core hours, when you only use 13 core hours. A job that seems aborted for no reason often lacks RAM.

Job control A job that is waiting in the queue or currently being executed can be canceled/killed. canceljob cancels job with job id mjobctl –c cancels job with job id mjobctl –c –w user=pws cancels all jobs belonging to user mjobctl is rather versatile and can be used for other forms of job control, f.x. requeueing a job or extending time limit. Do man mjobctl Extending time limits has to be done by a Computerome operator.

Job supervision What is happening in the queue?
qstat –r List running jobs qstat –a List all jobs showq –r List running jobs showq List all jobs showq –c List completed jobs What is happening with your job? checkjob –v –v jobid Yes, double –v for lots of info

Final words Don’t run on the login node, unless it is very simple and quick jobs. Especially if you are timing jobs, you can expect different results due to the disturbance from other people using the login node. Get an interactive session on your own node by typing iqsub Everything you do on the interactive node will be calm and peaceful. Easy peasy

Queueing System Peter Wad Sackett.

Similar presentations

Presentation on theme: "Queueing System Peter Wad Sackett."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Queueing System Peter Wad Sackett.

Similar presentations

Presentation on theme: "Queueing System Peter Wad Sackett."— Presentation transcript:

Similar presentations

About project

Feedback