Using hpc Instructor : Seung Hun An, DCS Lab, School of EECSE, Seoul National University
What is hpc System IBM RS/6000 SP, Aix nodes and 16 processors per node 144 Gbyte memory, 3TByte LoadLeveler & Poe LoadLeveler is recommanded hpc.snu.ac.kr Connect by telnet, ssh, rsh Teratem is available at
System Setting & Using Bourne shell ksh(default), bash Use export instead of setenv General step of using Edit cmd file Compile source file Submit machine code into the machine
Command file #!/bin/ksh job_type = parallel executable = ~/KISA/LLL/execution input = /dev/null output = $(Executable).$(Cluster).$(Process).out error = $(Executable).$(Cluster).$(Process).err initialdir = /u/dcslab notify_user = class = gold step_name = LLL notification = complete checkpoint = no restart = no requirements = (Arch == "R6000") && (OpSys == "AIX43") node = 4 total_tasks = 15 network.MPI = css0,shared,US,high queue
Running example [sp01: ~/KISA/LLL] $ mpcc parallel_allswap.c [sp01: ~/KISA/LLL] $ mv a.out execution [sp01: ~/KISA/LLL] $ llsubmit lll.cmd llsubmit: The job "sp " has been submitted. [sp01: ~/KISA/LLL] $ llstatus Name Schedd InQ Act Startd Run LdAvg Idle Arch OpSys sp01 Avail Idle R6000 AIX43 sp02 Avail 1 1 Run R6000 AIX43 sp03 Avail 0 0 Run R6000 AIX43 sp04 Avail 0 0 Run R6000 AIX43 sp05 Avail 0 0 Run R6000 AIX43 sp06 Avail 0 0 Run R6000 AIX43 sp07 Avail 0 0 Run R6000 AIX43 sp08 Avail 0 0 Run R6000 AIX43 sp09 Avail 0 0 Run R6000 AIX43 R6000/AIX43 9 machines 13 jobs 123 running Total Machines 9 machines 13 jobs 123 running The Central Manager is defined on sp02 All machines on the machine_list are present. [sp01: ~/KISA/LLL] $
llq [sp01: ~/KISA/LLL] $ llq Id Owner Submitted ST PRI Class Running On sp mrdlab1 9/14 03:04 R 50 long sp02 sp spscs 9/15 16:16 R 50 silver sp05 sp spscs 9/15 16:16 R 50 silver sp07 sp flowsys1 9/15 17:00 R 50 silver sp04 sp seongkim 9/15 22:37 R 50 gold sp06 sp shinkj 9/16 12:11 R 50 gold sp04 sp janggrp 9/16 12:28 R 50 gold sp09 sp janggrp 9/16 12:28 R 50 gold sp03 sp biosys 9/16 15:26 R 50 silver sp03 sp hpcb0011 9/16 16:53 R 50 silver sp03 sp microsys 9/16 17:25 R 50 silver sp08 sp microsys 9/16 17:25 R 50 silver sp08 sp dcslab 9/16 19:06 ST 50 gold sp08 13 job steps in queue, 0 waiting, 1 pending, 12 running, 0 held
llclass & llcancel llclass [sp01: ~/KISA/LLL] $ llclass Name MaxJobCPU MaxProcCPU Free Max Description d+hh:mm:ss d+hh:mm:ss Slots Slots gold Serial & parallel batch job silver Serial & parallel batch job long Long time job general Test or Interactive job llcancel When cancel one or more jobs from the Loadleveler queue