Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cluster Usage Session: NBCR clusters introduction August 3, 2007

Similar presentations


Presentation on theme: "Cluster Usage Session: NBCR clusters introduction August 3, 2007"— Presentation transcript:

1 Cluster Usage Session: NBCR clusters introduction August 3, 2007
Wes Goodman 9/18/2018 © 2007 UC Regents

2 Where to start National Biochemical Computational Research
How to get an account: Familiarize yourself with the account policy Subscribe to NBCR-support mailing list Subscribe to NBCR-announce mailing list 9/18/2018 © 2007 UC Regents

3 Where to get help For support email to support@nbcr.net
User services web page: Access to training sessions on Wiki Tools/downloads Documentation Cluster monitoring access User guides at Wiki 9/18/2018 © 2007 UC Regents

4 Generate public/private keypair
For linux: % ssh-keygen -t dsa Generally, it’s best to accept the default locations Enter a strong password to encrypt your private key 9/18/2018 © 2007 UC Regents

5 Remote login For login use ssh (not rsh or telnet)
% ssh or % ssh -l accname puzzle.nbcr.net You may have to specify your private key location % ssh -i /path/to/private/key puzzle.nbcr.net On first login, passphrase protect your private key % ssh-keygen -p You may now ssh to either kryptonite or oolite Available clusters: kryptonite.nbcr.net oolite.nbcr.net 9/18/2018 © 2007 UC Regents

6 Keys management with Agent
Login on a cluster % ssh Start an agent % eval `ssh-agent` Add identities to your agent % ssh-add or % ssh-add ~/.ssh/mykeys/my_special_key.pub Verify that identities are added % ssh-add -l 1024 e9:a6:59:89:f0:f1:87:8e:88:54 /Users/nadya/.ssh/id_dsa (DSA) - OK Could not open connection to your authentication agent - ERROR ! Can execute any command now on any node % cluster-fork ps -u$USER % ssh c0-0 9/18/2018 © 2007 UC Regents

7 Introduction to Sun Grid Engine
What is a grid? A collection of computing resources that perform tasks A grid node can be a compute server, data collector, visualisation terminal.. SGE is a resource management software Accepts jobs submitted by users Schedules them for execution on appropriate systems based on resource management policies Can submit 100s of jobs without worrying where it will run 9/18/2018 © 2007 UC Regents

8 What is SGE? Two versions of SGE: Sun Grid Engine (on Rocks clusters)
Distributed under the open source license From sunsource.net Sun N1 Grid Engine N1 stack is available at no cost Paid support from SUN 9/18/2018 © 2007 UC Regents

9 Job Management Not recommended to run jobs directly!
Use installed load scheduler SUN Grid Engine Load management tool for HETEROGENEOUS distributed computing environment PBS/Torque More sophisticated scheduling Why? You can submit multiple jobs and have it queued (and go home!) Fair Share Allow other people to use the cluster also! (for Myrinet MPI jobs) 9/18/2018 © 2007 UC Regents

10 Host Roles Master Host Controls overall cluster activity
Frontend, head node It runs the master daemon: sge_qmaster, controlling queues, jobs, status, user access permission Also the scheduler: sge_schedd Execution Host executes SGE jobs execution daemon: sge_execd Runs jobs on its hosts Forwards sys status/info to sge_qmaster 9/18/2018 © 2007 UC Regents

11 Host Roles continued Submit Host
They are allowed for submitting and controlling batch job only No daemon required to run in this type of host. Administration Host SGE administrator console usually 9/18/2018 © 2007 UC Regents

12 Job Management Your administrator must setup
a global default queue (all.q) More fine-tunned queues can be setup depending on cluster/user community short.q, long.q, weekend.q, fluent.0.q, fluent.1.q As a user, you only need to know how to Submit your jobs (serial or MPI) Monitor your jobs Get the results 9/18/2018 © 2007 UC Regents

13 Some SGE Commands Command Description
qconf SGE's cluster, queue etc configuration qmod Modify queue statues: enabled or suspended qacct Extract accounting information from cluster qalter Changes the attributes of submitted but pending jobs qdel Job deletion qhold Holds back submitted jobs for execution qhost Shows status information about SGE hosts qmon X-windows Motif interface qrsh SGE queue based rsh facility qselect List queue matching selection criteria qsh Opens an interactive shell on a low-loaded hosts qstat Status listing of jobs and queues qsub Commandline interface to submit jobs to SGE qtcsh SGE queue based TCSH facility qtcsh, qsh - extended command shells that can transparently distribute execution of programs/applications to least loaded hosts via SGE. 9/18/2018 © 2007 UC Regents

14 $ qhost $ qhost HOSTNAME ARCH NPROC LOAD MEMTOT MEMUSE SWAPTO SWAPUS
global compute lx26-amd G M M compute lx26-amd G M M compute lx26-amd G M M compute lx26-amd G M M compute lx26-amd G M M compute lx26-amd G M M compute lx26-amd G M M compute lx26-amd G M M compute lx26-amd G M M 9/18/2018 © 2007 UC Regents

15 QMON GUI interface for SGE Administration/Submission
Requires you to run either Linux/Unix on your desktop or have a X-emulator (Hummingbird) on your Windows PC. 9/18/2018 © 2007 UC Regents

16 Submitting Jobs Command line (qsub) & Graphical (qmon)
Standard, Batch, Array, Interactive, Parallel SGE schedule jobs based on Job priorities User -> FIFO Admin -> can affect with priority settings Equal-Share-Scheduling Scheduler -> user_sort setting Prevents a single user from hogging the queues Recommended!!! 9/18/2018 © 2007 UC Regents

17 $ qsub Output/error by default in home directory
Look in /opt/gridengine/examples/jobs % qsub simple.sh your job 224 ("simple.sh") has been submitted % cd ~ % more simple.sh.e224 % more simple.sh.o224 Wed Aug 9 14:56:16 PDT 2006 Wed Aug 9 14:56:36 PDT 2006 Use qstat to check job status #!/bin/sh date sleep 10 hostname 9/18/2018 © 2007 UC Regents

18 Submit autodock job % qsub adsub.sh
your job 225 (adsub.sh") has been submitted #!/bin/sh # request Bourne shell as shell for job #$ -S /bin/sh # work from current dir and put stderr/stdout here #$ -cwd ulimit -s unlimited autodock3 -p test.dpf -l test.dlg status=$? if [ "$status" = "0" ] ; then echo "successful completion $status" else echo "error running autodock3" fi 9/18/2018 © 2007 UC Regents

19 GUI Submit Monitor Control 9/18/2018 © 2007 UC Regents

20 $ qconf Show all the queues % qconf -sql Show the given queue
% qconf -sq all.q Show command usage % qconf -help Show complex attributes % qconf -sc 9/18/2018 © 2007 UC Regents

21 Advanced Submit Advanced or Batch jobs == shell scripts
Can be as complicated as you want or even an application! #!/bin/bash # # compiles my program every time and create the executable and run it! # change to my working directory cd TEST # compile the job f77 flow.f -o flow -lm -latlas # run the job ./flow myinput.dat 9/18/2018 © 2007 UC Regents

22 Requestable Attributes
User submit jobs by specifying a job requirement profile of the hosts or of the queues SGE will match the job requirements and run on suitable hosts Attributes Disk space CPU Memory Software (Fluent lic) OS 9/18/2018 © 2007 UC Regents

23 Attributes continued Relop Requestable Consumable
Relational operation used to compute whether a queue meets a user request Requestable Can be specified by user or not (eg in qsub) Consumable Manage limited resources, eg licence or cpu #name shortcut type value relop requestable consumable defs arch a STRING none == YES NO none num_proc p INT == YES NO load_avg la DOUBLE >= NO NO slots s INT <= YES YES % qsub -l arch=glinux load_avg=0.01 myjob.sh 9/18/2018 © 2007 UC Regents

24 Attributes continued By default, all requests are hard
Hard requests are checked first, followed by soft If hard request is not satisfied job is not run For soft requests, SGE attempts to run on “best” fit Important resources mt - memory total mf - memory free s - processor slots st - total swap How to request specifc memory/swap space/cpu/ ? % qsub -soft -l mt=250K,st=100K,mf=300G simple.sh % qsub -hard -l mt=250K,st=100K,mf=300G simple.sh 9/18/2018 © 2007 UC Regents

25 Array Jobs Parameterized and repeated execution of the same program (in a script) is ideal for the array job facility SGE provides efficient implementation of array jobs Handle computations as an array of independent tasks joined into a single job Can monitor and controlled as a total or by individual tasks or subset of tasks 9/18/2018 © 2007 UC Regents 25

26 $ qsub Submitting an Array Job from command line
-l option requests for a hard CPU time limit of 45mins -t option defines the task index range 2-10:2 specifies 2,4,6,8,10 Uses $SGE_TASK_ID to find out whether they are task 2, 4, 6, 8 or 10 To find input record As seed for random number generator % qsub -l h_cpu=0:45:0 -t 2-10:2 render.sh data.in 9/18/2018 © 2007 UC Regents

27 Job cleanup Use SGE command % qdel <job_id> Use Rocks command
% cluster-fork killall <your_executable_name> 9/18/2018 © 2007 UC Regents

28 SGE submit script Script contents make it executable #!/bin/tcsh
#$ -S /bin/tcsh setenv MPI=/opt/mpich/gnu/bin $MPI/mpirun -machinefile machines -np $NSLOTS appname make it executable $ chmod +x runprog.sh 9/18/2018 © 2007 UC Regents

29 Submit file options # meet given resource request #$ -l h_rt=600
# specify interpreting shell for the job #$ -S /bin/sh # use path for standard output of the job #$ -o /your/path # execute from current dir See “man qsub” for more options #$ -cwd # run on 32 processes in mpich PE #$ -pe mpich 32 # Export all environmental variables #$ -V # Export these environmental variables #$ -v MPI_ROOT,FOOBAR=BAR # Merge stderr into stdout #$ -j y 9/18/2018 © 2007 UC Regents

30 SGE Options What other SGE options are available?
-o/-e: Redirect stdout and stderr -l: Walltime ex: -l h_rt=24:00:00 #24 hour run There are resource limits for walltime Also, queues: -l short / -l medium / -l long 16 / 24 / 48 hour walltime respectively Notification -M 9/18/2018 © 2007 UC Regents

31 SGE Options Additional options: -R y/n -N foo -hold_jid job_id
Resource reservation Up to 20 reservations supported -N foo Sets the name of the job to foo -hold_jid job_id holds current job execution until job ‘job_id’ is done useful for sequencing jobs 9/18/2018 © 2007 UC Regents

32 SGE Options Array jobs Another cool trick: -t start-end:inc
This tells SGE the task index range for instances We can access this variable with $SGE_TASK_ID Another cool trick: -m e: mail notification on errors 9/18/2018 © 2007 UC Regents

33 AutoDock: A Quick Example
Examining the submission file: #!/bin/bash #$ -N sim4520 #$ -o sim4520_krel1-nowats.std.out #$ -e sim4520_krel1-nowats.std.err #$ -t 1-119 #$ -S /bin/bash #$ -cwd #$ -m e export STACK_SIZE="unlimited" TASK=`ls /share/apps/track1/autodock_handson/Sim_45208/Sim_42508_dockings | head - $SGE_TASK_ID | tail -1` cd /share/apps/track1/autodock_handson/Sim_45208/Sim_42508_dockings/${TASK} F=`ls *.dpf` NAME=`basename $F .dpf` /home/install/usr/apps/autodock4/bin/autodock4 -p ${NAME}.dpf -l ${NAME}.dlg 9/18/2018 © 2007 UC Regents

34 AutoDock Hands-On This is an array job Some other cool tricks:
Notice the -t flag. This tells SGE the task index range for instances We can access this variable with $SGE_TASK_ID Some other cool tricks: -N: Specifies job name -m e: mail notification on errors Setting STACK_SIZE=“unlimited” is a fix for autodock4 segmentation faults 9/18/2018 © 2007 UC Regents


Download ppt "Cluster Usage Session: NBCR clusters introduction August 3, 2007"

Similar presentations


Ads by Google