Download presentation
Presentation is loading. Please wait.
1
CommLab PC Cluster (Ubuntu OS version)
PC Cluster Manager: Sandy Ardianto
2
Outline Architecture Sending jobs to slaves Specification Torque PBS
Torque Features How to use PBS file example Registration PBS command Connect using Putty Python Example Upload & Download files Matlab Example References
3
Architecture *NAS (Network Attached Storage) Master
Public IP: Local IP: Slave01 IP: Slave02 IP: … Slave16 IP: NAS *NAS (Network Attached Storage)
4
Specification Master Slave01-14 Slave15-16 CPUs 8 cores Xeon 16 cores Memory 16GB 4GB Folder /home of master and all slaves are synchronized using NAS All OS have been changed from Centos 5.6 to Ubuntu 14.04
5
How to use Commlab PC cluster
6
Registration Contact cluster manager (sandyardianto@gmail.com)
<name> <username> (ex. sardianto [Sandy Ardianto]) <password> <advisor> < >
7
Connect using SSH Putty:
8
Upload & Download files
Filezilla : Default location: /home/<username>
9
Sending Jobs to Slaves
10
Torque PBS (Portable Batch System)
Terascale Open-source Resource and QUEue Manager (TORQUE) a distributed resource manager providing control over batch jobs and distributed compute nodes
11
Torque Features (1/2) Fault Tolerance
Additional failure conditions checked/handled Node health check script support Scheduling Interface Extended query interface providing the scheduler with additional and more accurate information Extended control interface allowing the scheduler increased control over job behavior and attributes Allows the collection of statistics for completed jobs
12
Torque Features (2/2) Scalability
Significantly improved server to MOM communication model Ability to handle larger clusters (over 15 TF/2,500 processors) Ability to handle larger jobs (over 2000 processors) Ability to support larger server messages Usability Extensive logging additions More human readable logging (i.e. no more 'error on command 42')
13
PBS File Example Some useful variables: $PBS_JOBID: the job identifier
Job Name Some useful variables: $PBS_JOBID: the job identifier $PBS_JOBNAME: the job name $PBS_O_WORKDIR: the absolute path where qsub command sent Error output file Output string on the terminal Queue name (batch, batch1-batch16) Ppn: Processor per nodes Compute unit Assign specific slave Nodes=slaveXX (XX=01-16)
14
Check which computer available to use
Open in browser
15
PBS Command Sending jobs: qsub <filename.sh>
Show jobs status: qstat Run the jobs: qrun <job ID> Stop jobs: qdel <job ID> Status: Q - Queue R - Running E - Error C - Completed
16
Python - Hello World Example
Files available at pbs.sh hello.py
17
Running Hello World qsub pbs.sh qrun <job ID> qstat
cat 3.master-job_name.log Qsub to send job to master Qrun to run the job Qstat to check job status
18
Matlab Example Files available at http://140.113.211.20
pbs_matlab.sh mtest.m
19
Running Matlab Example (1/2)
qsub pbs_matlab.sh qrun <job ID> qstat Qsub to send job to master Qrun to run the job Qstat to check job status
20
Running Matlab Example (2/2)
head master- job_name.log Head master-job_name.log show first 20 line of log
21
Any Problem/Question ? Contact me!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.