Using the BYU SP-2
Our System Interactive nodes (2) –used for login, compilation & testing –marylou10.et.byu.edu I/O and scheduling nodes (7) –used for the batch scheduling system and the parallel file system Compute nodes (26) –22 4 processor –416 processor
Compilers xlcC xlCC++ xlfFortran Parallel Compilers –mpcc –mpCC –mpxlf Optimization –-O5 -qarch=pwr3 -qtune=pwr3 -qhot Libraries –-lblas, -lfftw, -llapack, -lessl
Other Stuff Documentation – – Launching parallel jobs –done through the batch scheduler –Your job is a shell script that you hand to the batch scheduler for execution –Can look at xloadl for help creating script
Batch job scheduler Batch Schedulers –PBS (Portable Batch System) open source –LoadLeveler - descendent of Condor The process –user submits jobs to queue –machines register with scheduler offering to run jobs of certain class –scheduler allocates jobs to machines and tracks them –once started, jobs are scheduled by kernel
Scheduling parallel jobs jobs can ask for –number of nodes (1 CPU) –number of tasks per node (multiple CPUs) –non shared nodes (multiple CPUs) mixing jobs can be bad –two intense I/O processes on a 2 CPU node can ruin performance for both –same for two RAM intensive processes
Scheduling parallel jobs (2) All allocated nodes and processors and resources are allocated for the duration of the entire job No dynamic adjustments, except by creating jobs with multiple steps –each step can have different requirements –each step can express dependency on other steps
Scheduling parallel jobs (3) Management must –allow some jobs to use the entire machine –allow short jobs to get started quickly they should not have to wait weeks in the queue Some very long jobs may be needed, but are to be avoided
Backfill scheduling time Job A Job B 10 nodes system Job C A B CD Job D
Backfill scheduling Requires real time limit to be set More accurate (shorter) estimate gives more chance to be running earlier Short jobs can move through system quicker Uses system better by avoiding waste of cycles during wait
Using LoadLeveler Graphical user interface: xloadl Make shell script with LoadLeveler keywords as shell comments = thing.log = thing.err = short = thingx = 6,10 = 4 = (Adapter==hps_us)
Sample LoadLeveler Script #!/bin/ksh job_type = parallel input = /dev/null output = $(Executable).$(Cluster).$(Process).out error = $(Executable).$(Cluster).$(Process).err initialdir = /gstudent/student_rt_y/directory notify_user = class = short notification = complete checkpoint = no restart = no requirements = (Arch == "power3") blocking = unlimited total_tasks = 4 network.MPI = switch,shared,US queue./your_exe_and_any_args
Sample serial job #!/bin/ksh job_type = serial input = /dev/null output = $(Executable).$(Cluster).$(Process).out error = $(Executable).$(Cluster).$(Process).err initialdir = /gstudent/student_rt_y notify_user = class = medium notification = complete checkpoint = no restart = no queue paupnew Hlav3ashort.paup
LoadLeveler commands llq: shows all jobs –can also use showq llq -s JobID : show why not running llclass : shows classes llstatus : shows machines llcancel JobID : cancel job llhold JobID : put job in hold state
Sample llq output bash-2.05a$ llq Id Owner Submitted ST PRI Class Running On m1015i mdt36 8/7 12:41 R 50 long m1009i m1015i mdt36 8/7 12:41 R 50 long m1019i m1015i jl447 8/12 16:25 R 50 long m1012i m1015i to5 8/13 08:44 R 50 long m1045i m1015i to5 8/13 08:44 R 50 long m1045i … m1015i taskman 8/14 08:13 R 50 short m1017i m1015i taskman 8/14 08:13 R 50 short m1014i m1015i taskman 8/14 08:13 R 50 short m1017i m1015i taskman 8/14 08:13 R 50 short m1014i m1015i taskman 8/14 08:13 R 50 short m1011i m1015i mendez 8/14 13:07 I 50 long m1015i cr66 8/14 12:40 I 50 medium m1015i jl447 8/13 07:08 I 50 long m1015i dvd 8/13 10:45 I 50 medium m1015i dvd 8/13 11:22 I 50 medium m1015i dvd 8/13 11:25 I 50 medium m1015i mdt36 8/13 08:51 I 50 long m1015i mdt36 8/13 08:50 I 50 long … m1015i taskman 8/14 08:27 I 50 short m1015i taskman 8/14 08:57 I 50 short m1015i taskman 8/14 08:57 I 50 short 58 job step(s) in queue, 23 waiting, 0 pending, 35 running, 0 held, 0 preempted
Sample showq output bash-2.05a$ showq ACTIVE JOBS JOBNAME USERNAME STATE PROC REMAINING STARTTIME m1015i taskman Running 1 18:39:00 Wed Aug 14 08:06:24 m1015i taskman Running 1 18:39:00 Wed Aug 14 08:06:24 m1015i taskman Running 1 18:39:00 Wed Aug 14 08:06:24 … m1015i taskman Running 1 21:33:42 Wed Aug 14 11:01:06 m1015i taskman Running 1 23:43:05 Wed Aug 14 13:10:29 m1015i dvd Running 4 2:15:10:38 Wed Aug 14 04:38:02 m1015i mdt36 Running 8 2:23:14:21 Wed Aug 7 12:41:45 … m1015i jar65 Running 4 9:04:07:44 Tue Aug 13 17:35:08 m1015i jar65 Running 4 9:08:28:16 Tue Aug 13 21:55:40 m1015i to5 Running 8 9:21:11:49 Wed Aug 14 10:39:13 m1015i to5 Running 8 9:21:11:49 Wed Aug 14 10:39:13 35 Active Jobs 150 of 184 Processors Active (81.52%) 26 of 34 Nodes Active (76.47%) IDLE JOBS JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME m1015i jl447 Idle 2 5:00:00:00 Tue Aug 13 07:08:09 m1015i dvd Idle 8 3:00:00:00 Tue Aug 13 10:45:18 … 23 Idle Jobs NON-QUEUED JOBS JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME Total Jobs: 58 Active Jobs: 35 Idle Jobs: 23 Non-Queued Jobs: 0
LoadLeveler environment Normally same as your login environment Limits are set, use llclass -l to see values –ulimit -S -a –ulimit -H -a Big heap requirements –-bmaxdata:0x up to 2 GB data (heap) –-q64 -bmaxdata:0x…. Up to 8 EB