INTRODUCTION TO VIPBG LINUX CLUSTER

Slides:

Advertisements

Similar presentations

Learning Unix/Linux Bioinformatics Orientation 2008 Eric Bishop.

Advertisements

Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

Chapter One The Essence of UNIX.

CCPR Workshop Lexis Cluster Introduction October 19, 2007 David Ash.

Southgreen HPC system Concepts Cluster : compute farm i.e. a collection of compute servers that can be shared and accessed through a single “portal”

VIPBG LINUX CLUSTER By Helen Wang Sept. 10, 2014.

Introduction to HPC Workshop October Introduction Rob Lane HPC Support Research Computing Services CUIT.

Introducing the Command Line CMSC 121 Introduction to UNIX Much of the material in these slides was taken from Dan Hood’s CMSC 121 Lecture Notes.

Guide To UNIX Using Linux Third Edition

High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.

Understanding the Basics of Computational Informatics Summer School, Hungary, Szeged Methos L. Müller.

Introduction to UNIX/Linux Exercises Dan Stanzione.

BIOSTAT LINUX CLUSTER By Helen Wang October 10, 2013.

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

BIOSTAT LINUX CLUSTER By Helen Wang October 11, 2012.

VIPBG LINUX CLUSTER By Helen Wang March 29th, 2013.

HPC at HCC Jun Wang Outline of Workshop1 Overview of HPC Computing Resources at HCC How to obtain an account at HCC How to login a Linux cluster at HCC.

Bigben Pittsburgh Supercomputing Center J. Ray Scott

CCPR Workshop Introduction to the Cluster July 13, 2006.

HPC for Statistics Grad Students. A Cluster Not just a bunch of computers Linked CPUs managed by queuing software – Cluster – Node – CPU.

Introduction to Programming Using C An Introduction to Operating Systems.

Running Parallel Jobs Cray XE6 Workshop February 7, 2011 David Turner NERSC User Services Group.

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

Introduction to HPC Workshop October Introduction Rob Lane & The HPC Support Team Research Computing Services CUIT.

Unix Servers Used in This Class  Two Unix servers set up in CS department will be used for some programming projects  Machine name: eustis.eecs.ucf.edu.

Portable Batch System – Definition and 3 Primary Roles Definition: PBS is a distributed workload management system. It handles the management and monitoring.

Advanced topics Cluster Training Center for Simulation and Modeling September 4, 2015.

Introduction to Parallel Computing Presented by The Division of Information Technology Computer Support Services Department Research Support Group.

NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable.

Grid Computing: An Overview and Tutorial Kenny Daily BIT Presentation 22/09/2016.

An Brief Introduction Charlie Taylor Associate Director, Research Computing UF Research Computing.

BIOSTAT LINUX CLUSTER By Helen Wang October 6, 2016.

Advanced Computing Facility Introduction

Compute and Storage For the Farm at Jlab

Overview of Linux Fall 2016 Dr. Donghyun Kim

Hackinars in Bioinformatics

GRID COMPUTING.

ENEE150 Discussion 01 Section 0101 Adam Wang.

Specialized Computing Cluster An Introduction

Welcome to Indiana University Clusters

PARADOX Cluster job management

Development Environment

HPC usage and software packages

INTRODUCTION TO VIPBG LINUX CLUSTER

Welcome to Indiana University Clusters

How to use the HPCC to do stuff

BIOSTAT LINUX CLUSTER By Helen Wang October 29, 2015.

Chapter 2: System Structures

Lecture #3 Modern OS characteristics

Part 3 – Remote Connection, File Transfer, Remote Environments

Architecture & System Overview

CommLab PC Cluster (Ubuntu OS version)

The Linux Operating System

Assignment Preliminaries

Postdoctoral researcher Department of Environmental Sciences, LSU

Introduction to HPC Workshop

Lecture #3 Modern OS characteristics

Introduction to TAMNUN server and basics of PBS usage

Compiling and Job Submission

CCR Advanced Seminar: Running CPLEX Computations on the ISE Cluster

Introduction Paul Flynn

High Performance Computing in Bioinformatics

UNIX/LINUX Commands Using BASH Copyright © 2017 – Curt Hill.

CSE 390a Lecture 3 bash shell continued: processes; multi-user systems; remote login; editors slides created by Marty Stepp, modified by Jessica Miller.

Introduction to High Performance Computing Using Sapelo2 at GACRC

CSE 390a Lecture 3 bash shell continued: processes; multi-user systems; remote login; editors slides created by Marty Stepp, modified by Jessica Miller.

CSE 390a Lecture 3 bash shell continued: processes; multi-user systems; remote login; editors slides created by Marty Stepp, modified by Jessica Miller.

Quick Tutorial on MPICH for NIC-Cluster

Instructor: Xiuwen Liu Department of Computer Science

Working in The IITJ HPC System

Presentation transcript:

INTRODUCTION TO VIPBG LINUX CLUSTER By Helen Wang October 12, 2016

What is HPC High-Performance Computing (HPC) clusters are characterized by many cores and processors, lots of memory, high-speed networking, and large data stores – all shared across many rack-mounted servers. User programs that run on a cluster are called jobs, and they are typically managed through a queueing system for optimal utilization of all available resources. An HPC cluster is made of many separate servers, called nodes, filling into racks. HPC typically involves simulation of numerical models or analysis of data from scientific instrumentation. At the core of HPC is manageable hardware and systems software either developed by systems programmers or provided by software company, which allow researchers to devote their energies to their code.

Basic Beowulf Cluster Structure

A brief look of our cluster

What our cluster can do Intensive computation Leveraging a total of 600+ cores, and large memory available, the Linux cluster system is configured to ensure multiple job types run efficiently. Parallel computation Linear processing - Each section of computation does not need node communication much, such as image processing – near-linear result with adding more nodes Explicit parallel computation programs that have explicitly coded parallel sections of their code that can run independently such as OpenMx Fully parallelizable programs in that they contain one or more interdependent serial or atomic components that must be synchronized across all instances of the parallel application. This synchronization is usually done through libraries such as R library snow, or MPI

why we use cluster and what to expect Use it when you have lots of independent programs to run to finish a project – think about parallel Use it when your program is taking very long time to finish –give your laptop a break Use it when your program is using up lot of memory – to avoid crash on your pc What to expect It is not going to be super fast to run a single command line even with hundreds of cpu cores It is not set up for one single user even it looks like one super computer to you You are sharing resources with lots of hardcore users

What is our cluster configuration a newly updated Linux Beowulf cluster, installed with applications supporting computationally intensive processing. The following description can be used for the "Computing Resources" section of grant applications. - 32 Dell PE R630/R620 servers with CentOS/Redhat 6.7 64 bits Linux OS - 600+ cores using Intel Xeon 56XX processors (2.67GHz to 3.4GHz) - Total 4TB RAM ( 80GB-128GB per node) 450TB network attached storage with 500TB backup storage – 10TB internal disk storage ( 120GB-900GB per node) - 40GB InfiniBand network connections to all nodes and storage. - Fail-over redundant master servers

Software available on cluster R 3.2.3 with CRAN packages and BIOCONDUCTOR packages C++/G++ and Fortran compilers JAVA compiler Perl compiler Python 3.4 and Biopython compiler SAS-9.4 for Linux 64bit PLINK MPI and OpenMPI PBS Pro 12.2 (a portable batch system for cluster) Additional open source software upon user request

Who can use it and when? The cluster supports department faculty, research staff, and students working on intensive computation in both serial and parallel processing. All users are able to access cluster through VCU VPN for a secured connection. You can use it at anytime and anywhere, mostly when you start your graduate thesis. Please consider start your computation work on cluster for at least 6-8 months in advanced, since the system can be busy and you won’t get the sufficient resources for estimated time. Come to see me to prepare the code and parallel processing.

VIPBG Cluster Login Info Server Name: light.vipbg.vcu.edu IP: 128.172.6.233 2nd server as failover: group.vipbg.vcu.edu IP: 128.172.6.234 (invisible on mission) Software recommended to access servers: PC USERS: 1. MobaXterm http://mobaxterm.mobatek.net/ 2. ssh /open ssh / putty / winscp MAC USERS: Mac Terminal, Fetch

Access Cluster Server and nodes Master node (master1 / master2): light.vipbg.vcu.edu Running CentOS ( redhat kenrnel)Version 6.7, x86-64 Open source or Software download – choose 64 bits CentOS or RHEL 6 if possible Purposes: front-end user interface; slow; - not for running any jobs. Jobs running on master will be terminated without notice accessible from outside through VPN; Slave nodes (nodes): node1–node29 : dual quart core or 6 core Intel Xeon processors with128-196 GB GB RAM computation; not prefer to access user interface, accessible via master and managed by portable batch management ( PBS ); fast; internal network; -10.0.0.X, not accessible directly from outside

Access “LIGHT” from your computer How to use MobaXterm or ssh to access server Outside of VCU please use VCU VPN Client through ramsvpn.vcu.edu http://www.ts.vcu.edu/software-center/security/vpn/ Open new session then SSH to add the server name Open “SSH settings” to fill the information remote hostname: 128.172.6.233 username: YOUR_ACCOUNT_NAME port number: 22 Open session settings to put merlot in Session Name ssh –X USERACCT@SERVER_IP for graphical access Select the server to test the connection and exchange keys by giving password Create profile or bookmark for the easy access every time

UNIX Commands You Need to Know pwd clear mkdir cp cd mv ls head more less wc man rm chmod grep with head and tail nano sed cut top scp acct1@server1:/yourpath/yourfiles acct2@server2:/yourpath/ nproc –all lscpu cat /proc/cpuinfo echo Cores = $(( $(lscpu | awk '/Socket/{ print $2 }') * $(lscpu | awk '/Core/{ print $4 }') )) vmstat -s | grep memory cat /proc/meminfo | grep MemTotal | awk '{ print $2 }’ awk '/MemTotal/ {print $2}' /proc/meminfo

How to use cluster to submit jobs IMPORTANT The logon machine (LIGHT.VIPBG.VCU.EDU) is only used for login. Jobs running on merlot will be terminated without notice. What do you need to submit a job via PBS an executable script, the script can be a program code ( R, SAS or other language) , shell script or a collection of command lines. test how much resources and time you may need before you submit multiple jobs. which queue to use to submit a job Node and queue configuration Nodes: Nodes are the physical computer servers incorporated together to make the cluster. Queues: Queues are being used by pbs scheduler to send jobs to different nodes, each node is assigned to a queue to handle different type of jobs. Most queue limits can be checked by running the command qstat -q. Note that if you need more job permissions, please send request to system admin and your supervisor to get a temporary expansion on job submissions

Nodes and queues configuration Queue Name Node assigned Job Limit (proc/user) Comments Express Node4 2 Run small urgent jobs serial 13-22 20 Run R and generic jobs RAM<2GB Openmx 9-12, 23-25 15 Openmx and parallel jobs workq 7-8, 26,27,28 Run large memory jobs RAM<30GB Mxq Node5 10 Run traditional mx jobs

Submitting a Job Jobs are submitted to a PBS queue so that PBS can dispatch them to be run on one or more of a cluster's compute nodes. There are two main types of PBS jobs: Non-interactive Batch Jobs: This is the most common PBS job. A job script is created that contains PBS resource requests and the commands necessary to execute the job. The job script is then submitted to PBS to be run non-interactively. Interactive Batch Jobs: This is a way to get an interactive terminal on one or more of the compute nodes of a cluster. Commands can then be run interactively through that terminal directly on the compute nodes for the duration of the job. Interactive jobs are helpful for such things as program debugging and running many short jobs.

#PBS –I walltime=HH:MM:SS A PBS script is a standard Unix/Linux shell script that contains a few extra comments at the beginning that specify directives to PBS. These comments all begin with #PBS. The most important PBS directives are: PBS Directives Description #PBS –I walltime=HH:MM:SS This directive specifies the maximum walltime (real time, not CPU time) that a job should take. If this limit is exceeded, PBS will stop the job. Keeping this limit close to the actual expected time of a job can allow a job to start more quickly than if the maximum walltime is always requested. #PBS -l pmem=SIZEgb This directive specifies the maximum amount of physical memory used by any process in the job. For example, if the job would run four processes and each would use up to 2 GB (gigabytes) of memory, then the directive would read #PBS -l pmem=2gb #PBS -l nodes=N:ppn=M This specifies the number of nodes (nodes=N) and the number of processors per node (ppn=M) that the job should use. PBS treats a processor core as a processor, so a system with eight cores per compute node can have ppn=8 as its maximum ppn request. Note that unless a job has some inherent parallelism of its own through something like MPI or OpenMPI, requesting more than a single processor on a single node is usually wasteful and can impact the job start time. #PBS -q queuename This specifies what PBS queue a job should be submitted to. This is only necessary if a user has access to a special queue. This option can and should be omitted for jobs being submitted to a system's default queue. #PBS -j oe Normally when a command runs it prints its output to the screen. This output is often normal output and error output. This directive tells PBS to put both normal output and error output into the same output file.

An example of PBS script #This is a sample PBS script. It will request 1 processor on 1 node for 10 hours. # #Request 1 processors on 1 node # #PBS -l nodes=1:ppn=1 # #Request 10 hours of walltime # #PBS -l walltime=10:00:00 # #Request 1 gigabyte of memory per process # #PBS -l mem=1gb # #Request that regular output and terminal output go to the same file # #PBS -j oe # #The following is the body of the script. By default, PBS scripts execute in your home directory, not the #directory from which they were submitted. The following line places you in the directory from which the job #was submitted. # cd $PBS_O_WORKDIR # #Now we want to run the program "hello". "hello" is in the directory that this script is being submitted from, #$PBS_O_WORKDIR. # echo " " echo " " echo "Job started on `hostname` at `date`" ./hello echo " " echo "Job Ended at `date`" echo " "

Template used on cluster Modify template to create your own pbs script for running programs #!/bin/bash #PBS -q serial #PBS -N MYSCRIPT # # cd to the directory from which I submitted the job. Otherwise it will execute in my home directory. set WORKDIR = ~/YOURWORDIR #PBS -V #echo “PBS batch job id is $PBS_JOBID“ echo "Working directory of this job is: " $WORKDIR echo "Beginning to run job“ Command line you need to execute the job ( /home/huan/bin/calculate - PARAMETEERS)

Job Submission Syntax qsub SCRIPTFILE Existing job submission scripts /usr/local/bin/q* R USERS qR YOUR_R_SCRIPT Large memory R jobs qRL YOUR_R_SCRIPT SAS USERS qsas YOUR_SAS_CODE Generic or other resources qsub YOUR_OWN_SCRIPT Interactive Batch Jobs Interactive PBS jobs are similar to non-interactive PBS jobs in that they are submitted to PBS via the command qsub. When submitting an interactive PBS job, PBS script is not necessary. All PBS directives can be specified on the command line. The syntax for qsub for submitting an interactive PBS job is: qsub -I ... pbs directives.. The -I flag above tells qsub that this is an interactive job. The following example shows using qsub to submit an interactive job using one processor on one node for four hours merlot:~$ qsub -I -l nodes=1:ppn=1 -l walltime=4:00:00 qsub: waiting for job 1064159.merlot.bis.vcu.edu start qsub: job 1064159.merlot.bis.vcu.edu ready node12:~$ There are two things of note here. The first is that the qsub command doesn't exit when run with the interactive -I flag. Instead, it waits until the job is started and gives a prompt on the first compute node assigned to a job. The second thing of note is the prompt node12:~$ - this shows that commands are now being executed on the compute node node12.

Monitoring and Managing Jobs Check Job Status using qstat Command Description qstat Shows the status of all PBS jobs. The time displayed is the CPU time used by the job. qstat –s qstat -a Shows the status of all PBS jobs. The time displayed is the walltime used by the job. qstat –u USERID Shows the status all PBS jobs submitted by the user userid. The time displayed is the walltime used by the job. qstat -n Shows the status all PBS jobs along with a list of compute nodes that the job is running on. qstat –f JOBID Shows detailed information about the job jobid.

Job Running Status State meaning Q The job is queued and is waiting to start. R The job is currently running E The job is currently ending. H The job has a user or system hold on it and will not be eligible to run until the hold is removed.

- qdel JOBID delete a job by Job_ID Managing jobs Deleting jobs - qdel JOBID delete a job by Job_ID - qdel $(qselect –u USERNAME) delete all jobs owned by USERNAME View job output If the PBS directive #PBS -j oe is used in a PBS script, the non-error and the error output are both written to the Jobname.oJob_ID file. JobName.oJobID : This file would contain the non-error output that would normally be written to the screen. JobName.eJobID: This file would contain the error output that would normally be written to the screen.

More to monitor a node To check a node configuration $pbsnodes NODE# To check a node status nodestatus NODE# Limitation for the name of the SCRIPT No more than 10 characters no space in between no special characters. use a temporary name if necessary and change it back when the job is done.

At Last Edit file using nano or vi http://www.ts.vcu.edu/askit/research/unix-for-researchers-at-vcu/unix-text-editors/the-pico-editor/ http://www.ts.vcu.edu/askit/research/unix-for-researchers-at-vcu/unix-text-editors/the-vi-text-editor/ use samba connection to map a network drive on PC, recommending to use “EditPad Lite” Useful links http://www.ts.vcu.edu/askit/research/unix-for-researchers-at-vcu/unix-survival-guide-a-user-manual/ Wiki page for vipbg cluster – need vcu eID to login https://wiki.vcu.edu/display/vipbgit/VIPBG+Cluster+System