Download presentation
Presentation is loading. Please wait.
Published byApril Page Modified over 9 years ago
1
Research Computing Environment at the University of Alberta Diego Novillo Research Computing Support Group University of Alberta April 1999
2
29 April 19992 Computing Environment SGI Origin 2000, 42 CPUs, 10Gb RAM Mix of interactive and batch jobs 2 CPUs for interactive activity 40 CPUs used by batch jobs Batch jobs managed by LSF (Platform)
3
How is the system being used?
4
29 April 19994 Monthly System Utilization (CPU days) Monthly System Utilization (CPU days) Theoretical max
5
29 April 19995 Average wait time in queue (hours) Average wait time in queue (hours) Started using load thresholds Need to balance parallel jobs
6
29 April 19996 System usage by job type
7
29 April 19997 Some thoughts on usage Scalar use is predominant (so far) We are starting to push the system Jobs are waiting too long in the queue Need to modify queue policies –Lower runtime limits –Checkpoint/restart –Limit on number of jobs submitted
8
Using LSF
9
29 April 19999 Job queues Parallel queue par –High priority –Slot-based: up to 32 processors –Jobs are never suspended Sequential queue nic –Low priority –threshold-based: up to 95% system utilization –Jobs can be preempted by parallel jobs
10
29 April 199910 Job queues II Two special queues –npseq For sequential jobs that do not wish to be preempted Very low priority Only 4 slots available –special Jobs that need to run longer than system limit Only 1 slot available Must be approved by committee
11
29 April 199911 Fairshare system Jobs are scheduled according to priorities Priorities are dynamic and based on –Number of shares –Past usage (currently 2 weeks of history) –Type of job (parallel jobs higher priority) Resource availability also important
12
29 April 199912 Getting started Complete LSF documentation online http://www.ualberta.ca/CNS/RESEARCH/LSF/ Man pages also available Add one line to your login files source /usr/local/lsf/cshrc.lsf ( C shell ) or. /usr/local/lsf/profile.lsf ( Bourne shell )
13
29 April 199913 Submitting jobs % bsub [options] pgm args -q name Which queue to use -n num How many processors -o file Output file Queue defaults to ‘nic’. If no output file is given, results are mailed to you.
14
29 April 199914 Watching jobs % bjobs [options] -l All the details -p Only pending jobs (and why) -a All jobs (even finished ones) -uall All the jobs in the system jobid Just the job with this id
15
29 April 199915 Manipulating jobs % bkill jobid Kills the job (can also send signal) % bstop jobid Suspends the job (even if not running) % bresume jobid Resumes the job
16
29 April 199916 Getting usage statistics We keep monthly stats in our web page http://www.ualberta.ca/CNS/RESEARCH/ For current information % bacct [opts] Total usage for your jobs. Can specify dates and jobs % priorities (or bhpart -r ) Lists all the priorities for different groups
17
29 April 199917 Monitoring load on the system % bqueues Shows queues and how loaded they are % lsload Quick glance at the load on the system Also GUI tools ( xlsbatch, xlsmon ) Please use sparingly as they add to interactive load on the system.
18
29 April 199918 Contact Information Visit our home page http://www.ualberta.ca/CNS/RESEARCH/ Questions and comments Research.Support@Ualberta.CA
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.