CommLab PC Cluster (Ubuntu OS version)

Slides:



Advertisements
Similar presentations
Parallel ISDS Chris Hans 29 November 2004.
Advertisements

Koç University High Performance Computing Labs Hattusas & Gordion.
Southgreen HPC system Concepts Cluster : compute farm i.e. a collection of compute servers that can be shared and accessed through a single “portal”
Software Tools Using PBS. Software tools Portland compilers pgf77 pgf90 pghpf pgcc pgCC Portland debugger GNU compilers g77 gcc Intel ifort icc.
VIPBG LINUX CLUSTER By Helen Wang Sept. 10, 2014.
IT MANAGEMENT OF FME, 21 ST JULY  THE HPC FACILITY  USING PUTTY AND WINSCP TO ACCESS THE SERVER  SENDING FILES TO THE SERVER  RUNNING JOBS 
Job Submission on WestGrid Feb on Access Grid.
Introduction to RCC for Intro to MRI 2014 July 25, 2014.
Tuesday, September 08, Head Node – Magic.cse.buffalo.edu Hardware Profile Model – Dell PowerEdge 1950 CPU - two Dual Core Xeon Processors (5148LV)
Electronic Visualization Laboratory, University of Illinois at Chicago MPI on Argo-new Venkatram Vishwanath Electronic Visualization.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
BIOSTAT LINUX CLUSTER By Helen Wang October 11, 2012.
VIPBG LINUX CLUSTER By Helen Wang March 29th, 2013.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin.
O.S.C.A.R. Cluster Installation. O.S.C.A.R O.S.C.A.R. Open Source Cluster Application Resource Latest Version: 2.2 ( March, 2003 )
How to get started on cees Mandy SEP Style. Resources Cees-clusters SEP-reserved disk20TB SEP reserved node35 (currently 25) Default max node149 (8 cores.
BSP on the Origin2000 Lab for the course: Seminar in Scientific Computing with BSP Dr. Anne Weill –
Network Queuing System (NQS). Controls batch queues Only on Cray SV1 Presently 8 queues available for general use and one queue for the Cray analyst.
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
HPC for Statistics Grad Students. A Cluster Not just a bunch of computers Linked CPUs managed by queuing software – Cluster – Node – CPU.
FTP Short for File Transfer Protocol, the protocol for exchanging files over the Internet.protocolfilesInternet works in the same way as HTTP for transferring.
How to for compiling and running MPI Programs. Prepared by Kiriti Venkat.
Software Tools Using PBS. Software tools Portland compilers pgf77 pgf90 pghpf pgcc pgCC Portland debugger GNU compilers g77 gcc Intel ifort icc.
Running Parallel Jobs Cray XE6 Workshop February 7, 2011 David Turner NERSC User Services Group.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Portable Batch System – Definition and 3 Primary Roles Definition: PBS is a distributed workload management system. It handles the management and monitoring.
Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our system’s architecture  Flow chart of the hadoop’s job(web crawler) working.
Introduction to Parallel Computing Presented by The Division of Information Technology Computer Support Services Department Research Support Group.
Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128.
+ Vieques and Your Computer Dan Malmer & Joey Azofeifa.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
By: Joel Dominic and Carroll Wongchote 4/18/2012.
Using ROSSMANN to Run GOSET Studies Omar Laldin ( using materials from Jonathan Crider, Harish Suryanarayana ) Feb. 3, 2014.
CFI 2004 UW A quick overview with lots of time for Q&A and exploration.
An Brief Introduction Charlie Taylor Associate Director, Research Computing UF Research Computing.
Editing, Transferring, and Running Files on Vieques Daniel Malmer Dowell Lab Short Reads Course 6/9/15.
BIOSTAT LINUX CLUSTER By Helen Wang October 6, 2016.
Advanced Computing Facility Introduction
GRID COMPUTING.
Welcome to Indiana University Clusters
PARADOX Cluster job management
INTRODUCTION TO VIPBG LINUX CLUSTER
Operating Systems & System Software
HPC usage and software packages
INTRODUCTION TO VIPBG LINUX CLUSTER
OpenPBS – Distributed Workload Management System
MPI Basics.
Welcome to Indiana University Clusters
How to use the HPCC to do stuff
BIOSTAT LINUX CLUSTER By Helen Wang October 29, 2015.
Computational Physics (Lecture 17)
Hodor HPC Cluster LON MNG HPN Head Node Comp Node Comp Node Comp Node
Introduction to HDFS: Hadoop Distributed File System
Postdoctoral researcher Department of Environmental Sciences, LSU
Short Read Sequencing Analysis Workshop
Welcome to our Nuclear Physics Computing System
Paul Sexton CS 566 February 6, 2006
Introduction to HPC Workshop
Introduction to TAMNUN server and basics of PBS usage
CCR Advanced Seminar: Running CPLEX Computations on the ISE Cluster
Operating Systems Bina Ramamurthy CSE421 11/27/2018 B.Ramamurthy.
Welcome to our Nuclear Physics Computing System
Support for ”interactive batch”
High Performance Computing in Bioinformatics
Introduction to High Performance Computing Using Sapelo2 at GACRC
The Main Features of Operating Systems
EN Software Carpentry The Linux Kernel and Associated Operating Systems.
Quick Tutorial on MPICH for NIC-Cluster
Working in The IITJ HPC System
Presentation transcript:

CommLab PC Cluster (Ubuntu OS version) PC Cluster Manager: Sandy Ardianto sandyardianto@gmail.com

Outline Architecture Sending jobs to slaves Specification Torque PBS Torque Features How to use PBS file example Registration PBS command Connect using Putty Python Example Upload & Download files Matlab Example References

Architecture *NAS (Network Attached Storage) Master Public IP: 140.113.211.20 Local IP: 192.168.1.2 Slave01 IP: 192.168.1.3 Slave02 IP: 192.168.1.4 … Slave16 IP: 192.168.1.18 NAS 192.168.1.35 *NAS (Network Attached Storage)

Specification Master Slave01-14 Slave15-16 CPUs i7-2600 @3.4GHz 8 cores Xeon E5620 @2.4GHz 16 cores Memory 16GB 4GB Folder /home of master and all slaves are synchronized using NAS All OS have been changed from Centos 5.6 to Ubuntu 14.04

How to use Commlab PC cluster

Registration Contact cluster manager (sandyardianto@gmail.com) <name> <username> (ex. sardianto [Sandy Ardianto]) <password> <advisor> <e-mail>

Connect using SSH Putty: http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html

Upload & Download files Filezilla : https://filezilla-project.org/download.php?type=client Default location: /home/<username>

Sending Jobs to Slaves

Torque PBS (Portable Batch System) Terascale Open-source Resource and QUEue Manager (TORQUE) a distributed resource manager providing control over batch jobs and distributed compute nodes

Torque Features (1/2) Fault Tolerance Additional failure conditions checked/handled Node health check script support Scheduling Interface Extended query interface providing the scheduler with additional and more accurate information Extended control interface allowing the scheduler increased control over job behavior and attributes Allows the collection of statistics for completed jobs https://en.wikipedia.org/wiki/TORQUE

Torque Features (2/2) Scalability Significantly improved server to MOM communication model Ability to handle larger clusters (over 15 TF/2,500 processors) Ability to handle larger jobs (over 2000 processors) Ability to support larger server messages Usability Extensive logging additions More human readable logging (i.e. no more 'error 15038 on command 42') https://en.wikipedia.org/wiki/TORQUE

PBS File Example Some useful variables: $PBS_JOBID: the job identifier Job Name Some useful variables: $PBS_JOBID: the job identifier $PBS_JOBNAME: the job name $PBS_O_WORKDIR: the absolute path where qsub command sent Error output file Output string on the terminal Queue name (batch, batch1-batch16) Ppn: Processor per nodes Compute unit Assign specific slave Nodes=slaveXX (XX=01-16)

Check which computer available to use Open http://140.113.211.20/ganglia in browser

PBS Command Sending jobs: qsub <filename.sh> Show jobs status: qstat Run the jobs: qrun <job ID> Stop jobs: qdel <job ID> Status: Q - Queue R - Running E - Error C - Completed

Python - Hello World Example Files available at http://140.113.211.20 pbs.sh hello.py

Running Hello World qsub pbs.sh qrun <job ID> qstat cat 3.master-job_name.log Qsub to send job to master Qrun to run the job Qstat to check job status

Matlab Example Files available at http://140.113.211.20 pbs_matlab.sh mtest.m

Running Matlab Example (1/2) qsub pbs_matlab.sh qrun <job ID> qstat Qsub to send job to master Qrun to run the job Qstat to check job status

Running Matlab Example (2/2) head -20 24.master- job_name.log Head -20 24.master-job_name.log  show first 20 line of log

Any Problem/Question ? Contact me! sandyardianto@gmail.com