Download presentation
Presentation is loading. Please wait.
Published byOscar Long Modified over 9 years ago
1
Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005
2
2 Topics Interactive –Serial –Parallel –Limits Batch –Serial –Parallel –Queues and Policies Charging Comparison with Seaborg
3
3 Execution Environment Four login nodes –Serial jobs only –CPU limit: 60 minutes –Memory limit: 64 MB 320 compute nodes –“Interactive” parallel jobs –Batch serial and parallel jobs –Scheduled by PBSPro Queue limits and policies established to meet system objectives –User input is critical!
4
4 Interactive Jobs Serial jobs run on login nodes – cd, ls, pathf90, etc. –./a.out Parallel jobs run on compute nodes –Controlled by PBSPro mpirun -np 16./a.out qsub -I -q interactive -l nodes=8:ppn=2 % cd $PBS_O_WORKDIR % mpirun -np 16./a.out qsub -I -q batch -l nodes=32:ppn=2,walltime=18:00:00
5
5 PBSPro Marketed by Altair Engineering –Based on open source Portable Batch System developed for NASA –Also installed on DaVinci Batch scripts contain directives: #PBS -o myjob.out Directives may also appear as command- line options: qsub -o myjob.out …
6
6 Simple Batch Script #PBS -l nodes=8:ppn=2,walltime=00:30:00 #PBS -N myjob #PBS -o myjob.out #PBS -e myjob.err #PBS -A mp999 #PBS -q debug #PBS -V cd $PBS_O_WORKDIR mpirun -np 16./a.out
7
7 Useful PBS Options (1) -A repo Charge this job to repository repo Default: Your default repository -N jobname Provide name for job; up to 15 printable, non- whitespace characters Default: Name of batch script -q qname Submit job to batch queue qname Default: batch
8
8 Useful PBS Options (2) -S shell Specify shell as the scripting language Default: Your login shell -V Export current environment variables into the batch job environment Default: Do not export
9
9 Useful PBS Options (3) -o outfile Write STDOUT to outfile Default:.o -e errfile Write STDERR to errfile Default:.e -j [ eo | oe ] Join STDOUT and STDERR on STDOUT ( eo ) or STDERR ( oe ) Default: Do not join
10
10 Useful PBS Options (4) -m [ a | b | e | n ] E-main notification a = send mail when job aborted by system b = send mail when job begins e = send mail when job ends n = do not send mail Options a, b, and e may be combined Default: a
11
11 Batch Queues SubmitExecuteNodesWalltime interactive 1 – 1630 mins debug 1 – 3230 mins batch batch16 1 – 1648 hours batch32 17 – 3224 hours batch64 33 – 6412 hours batch128 65 – 1286 hours batch256 129 – 2566 hours low 1 – 646 hours
12
12 Batch Queue Policies Each user may have: –One running interactive job –One running debug job –Four jobs running over entire system Only one batch128 job is allowed to run at a time. The batch256 queue usually has a run limit of zero. NERSC staff will arrange to run jobs of this size.
13
13 Submitting Batch Jobs % qsub myjob 93935.jacin03 % Record jobid for tracking!
14
14 Deleting Batch Jobs % qdel 93935.jacin03 %
15
15 Monitoring Batch Jobs (1) PBS command qstat % qstat Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 93295.jacin03-ib job5 einstein 00:00:00 R batch16 93894.jacin03 EV80fl02_3 legendre 0 H batch16 93330.jacin03 test.script laplace 00:00:23 R batch32 93897.jacin03 runlu8x8 rasputin 0 Q batch32 93334.jacin03-m mtp_mg_3wat_o2a fibonacci 00:00:11 R batch16... Use -u option for single-user output % qstat -u einstein Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 93295.jacin03-ib job5 einstein 00:00:00 R batch16 %
16
16 Monitoring Batch Jobs (2) NERSC command qs % qs JOBID ST USER NAME NDS REQ USED SUBMIT 93939 R gauss STDIN 1 00:30:00 00:10:43 Oct 2 16:47:00 93891 R einstein runlu4x8 16 01:00:00 00:38:48 Oct 2 15:23:36 93918 R inewton r4_16 8 01:00:00 00:10:37 Oct 2 15:36:35... 93785 Q inewton r4_64 32 01:00:00 - Oct 2 08:42:36 93828 Q rasputin nodemove 64 00:05:00 - Oct 2 12:00:11 93897 Q einstein runlu8x8 32 01:00:00 - Oct 2 15:24:27... 93893 H legendre EV80fl02_2 4 03:00:00 - Oct 2 15:24:23 93894 H legendre EV80fl02_3 4 03:00:00 - Oct 2 15:24:24 93917 H legendre EV80fl98_5 4 03:00:00 - Oct 2 15:26:06... Also provides -u option
17
17 Monitoring Batch Jobs (3) NERSC website has current queue look: http://www.nersc.gov/nusers/status/jacquard/qstat Also has completed jobs list: http://www.nersc.gov/nusers/status/jacquard/pbs_summary Numerous filtering options available –Owner –Account –Queue –Jobid
18
18 Charging Machine charge factor (cf) = 4 –Based on benchmarks and user applications –Currently under review Serial interactive –Charge = cf cputime –Always charged to default repository All parallel –Charge = cf 2 nodes walltime –Charged to default repo unless -A specified
19
19 Things To Look Out For (1) Do not set group write permission for your home directory; it will prevent PBS from running your jobs. Library modules must be loaded at runtime as well as linktime. Propagation of environment variables to remote processes is incomplete; contact NERSC consulting for help.
20
20 Things To Look Out For (2) Do not run more that one MPI program in a single batch script. If your login shell is bash, you may see: accept: Resource temporarily unavailable done. In this case, specify a different shell using the -S directive, such as: #PBS -S /usr/bin/ksh
21
21 Things To Look Out For (3) Batch jobs always start in $HOME. To get to directory where job was submitted: cd $PBS_O_WORKDIR For jobs that work with large files: cd $SCRATCH/some_subdirectory PBS buffers output and error files until job completes. To view files (in home directory) while running: -k oe
22
22 Things To Look Out For (3) The following is just a warning and can be ignored: Warning: no access to tty (Bad file descriptor). Thus no job control in this shell.
23
23 LoadLeveler vs. PBS LLPBSLLPBS #@ node #PBS -l nodes #@ notification #PBS -m #@ tasks_per_node #PBS -l ppn #@ shell #PBS -S #@ wall_clock_limit #PBS -l walltime #@ output #PBS -o #@ class #PBS -q #@ error #PBS -e #@ job_name #PBS -N #@ environment #PBS -V #@ account_no #PBS -A
24
24 Resources NERSC Website http://www.nersc.gov/nusers/resources/jacquard/running_jobs.php http://www.nersc.gov/vendor_docs/altair/PBSPro_7.0_User_Guide.pdf NERSC Consulting 1-800-66-NERSC, menu option 3, 8 am - 5 pm, Pacific time (510) 486-8600, menu option 3, 8 am - 5 pm, Pacific time consult@nersc.gov http://help.nersc.gov/consult@nersc.govhttp://help.nersc.gov/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.