Download presentation
Presentation is loading. Please wait.
Published byHollie Jasmine Houston Modified over 9 years ago
1
Using the BYU SP-2
2
Our System Interactive nodes (2) –used for login, compilation & testing –marylou10.et.byu.edu I/O and scheduling nodes (7) –used for the batch scheduling system and the parallel file system Compute nodes (26) –22 4 processor –416 processor
3
Compilers xlcC xlCC++ xlfFortran Parallel Compilers –mpcc –mpCC –mpxlf Optimization –-O5 -qarch=pwr3 -qtune=pwr3 -qhot Libraries –-lblas, -lfftw, -llapack, -lessl
4
Other Stuff Documentation –http://www-1.ibm.com/servers/eserver/pseries/library/sp_books/http://www-1.ibm.com/servers/eserver/pseries/library/sp_books/ –http://marylou.byu.eduhttp://marylou.byu.edu Launching parallel jobs –done through the batch scheduler –Your job is a shell script that you hand to the batch scheduler for execution –Can look at xloadl for help creating script
5
Batch job scheduler Batch Schedulers –PBS (Portable Batch System) open source –LoadLeveler - descendent of Condor The process –user submits jobs to queue –machines register with scheduler offering to run jobs of certain class –scheduler allocates jobs to machines and tracks them –once started, jobs are scheduled by kernel
6
Scheduling parallel jobs jobs can ask for –number of nodes (1 CPU) –number of tasks per node (multiple CPUs) –non shared nodes (multiple CPUs) mixing jobs can be bad –two intense I/O processes on a 2 CPU node can ruin performance for both –same for two RAM intensive processes
7
Scheduling parallel jobs (2) All allocated nodes and processors and resources are allocated for the duration of the entire job No dynamic adjustments, except by creating jobs with multiple steps –each step can have different requirements –each step can express dependency on other steps
8
Scheduling parallel jobs (3) Management must –allow some jobs to use the entire machine –allow short jobs to get started quickly they should not have to wait weeks in the queue Some very long jobs may be needed, but are to be avoided
9
Backfill scheduling time Job A Job B 10 nodes system Job C A B CD Job D
10
Backfill scheduling Requires real time limit to be set More accurate (shorter) estimate gives more chance to be running earlier Short jobs can move through system quicker Uses system better by avoiding waste of cycles during wait
11
Using LoadLeveler Graphical user interface: xloadl Make shell script with LoadLeveler keywords as shell comments # @output = thing.log # @error = thing.err # @class = short # @queue # @executable = thingx # @node = 6,10 # @tasks_per_node = 4 # @requirements = (Adapter==hps_us)
12
Sample LoadLeveler Script #!/bin/ksh # @ job_type = parallel # @ input = /dev/null # @ output = $(Executable).$(Cluster).$(Process).out # @ error = $(Executable).$(Cluster).$(Process).err # @ initialdir = /gstudent/student_rt_y/directory # @ notify_user = student_rt_y@byu.edu # @ class = short # @ notification = complete # @ checkpoint = no # @ restart = no # @ requirements = (Arch == "power3") # @ blocking = unlimited # @ total_tasks = 4 # @ network.MPI = switch,shared,US # @ queue./your_exe_and_any_args
13
Sample serial job #!/bin/ksh # @ job_type = serial # @ input = /dev/null # @ output = $(Executable).$(Cluster).$(Process).out # @ error = $(Executable).$(Cluster).$(Process).err # @ initialdir = /gstudent/student_rt_y # @ notify_user = student_rt_y@byu.edu # @ class = medium # @ notification = complete # @ checkpoint = no # @ restart = no # @ queue paupnew Hlav3ashort.paup
14
LoadLeveler commands llq: shows all jobs –can also use showq llq -s JobID : show why not running llclass : shows classes llstatus : shows machines llcancel JobID : cancel job llhold JobID : put job in hold state
15
Sample llq output bash-2.05a$ llq Id Owner Submitted ST PRI Class Running On ------------------------ ---------- ----------- -- --- ------------ ----------- m1015i.1127.0 mdt36 8/7 12:41 R 50 long m1009i m1015i.1128.0 mdt36 8/7 12:41 R 50 long m1019i m1015i.1497.0 jl447 8/12 16:25 R 50 long m1012i m1015i.1544.0 to5 8/13 08:44 R 50 long m1045i m1015i.1545.0 to5 8/13 08:44 R 50 long m1045i … m1015i.1602.0 taskman 8/14 08:13 R 50 short m1017i m1015i.1598.0 taskman 8/14 08:13 R 50 short m1014i m1015i.1601.0 taskman 8/14 08:13 R 50 short m1017i m1015i.1599.0 taskman 8/14 08:13 R 50 short m1014i m1015i.1600.0 taskman 8/14 08:13 R 50 short m1011i m1015i.1626.0 mendez 8/14 13:07 I 50 long m1015i.1625.0 cr66 8/14 12:40 I 50 medium m1015i.1513.0 jl447 8/13 07:08 I 50 long m1015i.1572.0 dvd 8/13 10:45 I 50 medium m1015i.1576.0 dvd 8/13 11:22 I 50 medium m1015i.1577.0 dvd 8/13 11:25 I 50 medium m1015i.1566.0 mdt36 8/13 08:51 I 50 long m1015i.1564.0 mdt36 8/13 08:50 I 50 long … m1015i.1612.0 taskman 8/14 08:27 I 50 short m1015i.1624.0 taskman 8/14 08:57 I 50 short m1015i.1623.0 taskman 8/14 08:57 I 50 short 58 job step(s) in queue, 23 waiting, 0 pending, 35 running, 0 held, 0 preempted
16
Sample showq output bash-2.05a$ showq ACTIVE JOBS-------------------- JOBNAME USERNAME STATE PROC REMAINING STARTTIME m1015i.1581.0 taskman Running 1 18:39:00 Wed Aug 14 08:06:24 m1015i.1582.0 taskman Running 1 18:39:00 Wed Aug 14 08:06:24 m1015i.1580.0 taskman Running 1 18:39:00 Wed Aug 14 08:06:24 … m1015i.1615.0 taskman Running 1 21:33:42 Wed Aug 14 11:01:06 m1015i.1613.0 taskman Running 1 23:43:05 Wed Aug 14 13:10:29 m1015i.1575.0 dvd Running 4 2:15:10:38 Wed Aug 14 04:38:02 m1015i.1127.0 mdt36 Running 8 2:23:14:21 Wed Aug 7 12:41:45 … m1015i.1567.0 jar65 Running 4 9:04:07:44 Tue Aug 13 17:35:08 m1015i.1569.0 jar65 Running 4 9:08:28:16 Tue Aug 13 21:55:40 m1015i.1547.0 to5 Running 8 9:21:11:49 Wed Aug 14 10:39:13 m1015i.1546.0 to5 Running 8 9:21:11:49 Wed Aug 14 10:39:13 35 Active Jobs 150 of 184 Processors Active (81.52%) 26 of 34 Nodes Active (76.47%) IDLE JOBS---------------------- JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME m1015i.1513.0 jl447 Idle 2 5:00:00:00 Tue Aug 13 07:08:09 m1015i.1572.0 dvd Idle 8 3:00:00:00 Tue Aug 13 10:45:18 … 23 Idle Jobs NON-QUEUED JOBS---------------- JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME Total Jobs: 58 Active Jobs: 35 Idle Jobs: 23 Non-Queued Jobs: 0
17
LoadLeveler environment Normally same as your login environment Limits are set, use llclass -l to see values –ulimit -S -a –ulimit -H -a Big heap requirements –-bmaxdata:0x80000000 up to 2 GB data (heap) –-q64 -bmaxdata:0x…. Up to 8 EB
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.