Batch Queuing Systems The Portable Batch System (PBS) and the Load Sharing Facility (LSF) queuing systems share much common functionality in running batch.

Slides:

Advertisements

Similar presentations

© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Job Submission.

Advertisements

CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.

Running DiFX with SGE/OGE Helge Rottmann Max-Planck-Institut für Radioastronomie Bonn, Germany DiFX Meeting Sydney.

Chapter 3. MPI MPI = Message Passing Interface Specification of message passing libraries for developers and users –Not a library by itself, but specifies.

Using the Argo Cluster Paul Sexton CS 566 February 6, 2006.

Job Submission Using PBSPro and Globus Job Commands.

Koç University High Performance Computing Labs Hattusas & Gordion.

Network for Computational Nanotechnology (NCN) Purdue, Norfolk State, Northwestern, UC Berkeley, Univ. of Illinois, UTEP Basic Portable Batch System (PBS)

HPCC Mid-Morning Break MPI on HPCC Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Research

Southgreen HPC system Concepts Cluster : compute farm i.e. a collection of compute servers that can be shared and accessed through a single “portal”

Software Tools Using PBS. Software tools Portland compilers pgf77 pgf90 pghpf pgcc pgCC Portland debugger GNU compilers g77 gcc Intel ifort icc.

Using Clusters -User Perspective. Pre-cluster scenario So many different computers: prithvi, apah, tejas, vayu, akash, agni, aatish, falaq, narad, qasid.

Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.

VIPBG LINUX CLUSTER By Helen Wang Sept. 10, 2014.

IT MANAGEMENT OF FME, 21 ST JULY  THE HPC FACILITY  USING PUTTY AND WINSCP TO ACCESS THE SERVER  SENDING FILES TO THE SERVER  RUNNING JOBS 

Introduction to HPC Workshop October Introduction Rob Lane HPC Support Research Computing Services CUIT.

ISG We build general capability Job Submission on the Olympus Cluster J. DePasse; S. Brown, PhD; T. Maiden Pittsburgh Supercomputing Center Public Health.

DCC/FCUP Grid Computing 1 Resource Management Systems.

Job Submission on WestGrid Feb on Access Grid.

Sun Grid Engine Grid Computing Assignment – Fall 2005 James Ruff Senior Department of Mathematics and Computer Science Western Carolina University.

Using the BYU Supercomputers. Resources Basic Usage After your account is activated: – ssh ssh.fsl.byu.edu You will be logged in to an interactive node.

Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes.

ISG We build general capability Purpose After this tutorial, you should: Be comfortable submitting work to the batch queuing system of olympus and be familiar.

 Accessing the NCCS Systems  Setting your Initial System Environment  Moving Data onto the NCCS Systems  Storing Data on the NCCS Systems  Running.

Tech talk 20th June Andrey Grid architecture at PHENIX Job monitoring and related stuff in multi cluster environment.

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

VIPBG LINUX CLUSTER By Helen Wang March 29th, 2013.

Bigben Pittsburgh Supercomputing Center J. Ray Scott

Carnegie MellonCarnegie Mellon University Robust Speech Group1 An Introduction to the Portable Batch System (PBS) Michael L. Seltzer (with a huge thank.

MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator Department of Computer Science Iowa State University.

How to get started on cees Mandy SEP Style. Resources Cees-clusters SEP-reserved disk20TB SEP reserved node35 (currently 25) Default max node149 (8 cores.

Introduction to Using SLURM on Discover Chongxun (Doris) Pan September 24, 2013.

Rochester Institute of Technology Job Submission Andrew Pangborn & Myles Maxfield 10/19/2015Service Oriented Cyberinfrastructure Lab,

Using the BYU Supercomputers. Resources Basic Usage After your account is activated: – ssh You will be logged in to an interactive.

Batch Systems In a number of scientific computing environments, multiple users must share a compute resource: –research clusters –supercomputing centers.

Research Computing Environment at the University of Alberta Diego Novillo Research Computing Support Group University of Alberta April 1999.

Network Queuing System (NQS). Controls batch queues Only on Cray SV1 Presently 8 queues available for general use and one queue for the Cray analyst.

Andrey Meeting 7 October 2003 General scheme: jobs are planned to go where data are and to less loaded clusters SUNY.

HPC for Statistics Grad Students. A Cluster Not just a bunch of computers Linked CPUs managed by queuing software – Cluster – Node – CPU.

How to for compiling and running MPI Programs. Prepared by Kiriti Venkat.

Software Tools Using PBS. Software tools Portland compilers pgf77 pgf90 pghpf pgcc pgCC Portland debugger GNU compilers g77 gcc Intel ifort icc.

Running Parallel Jobs Cray XE6 Workshop February 7, 2011 David Turner NERSC User Services Group.

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

Introduction to HPC Workshop October Introduction Rob Lane & The HPC Support Team Research Computing Services CUIT.

Portable Batch System – Definition and 3 Primary Roles Definition: PBS is a distributed workload management system. It handles the management and monitoring.

Advanced topics Cluster Training Center for Simulation and Modeling September 4, 2015.

Debugging Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.

Introduction to Parallel Computing Presented by The Division of Information Technology Computer Support Services Department Research Support Group.

Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128.

Introduction to HPC Workshop March 1 st, Introduction George Garrett & The HPC Support Team Research Computing Services CUIT.

NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable.

Grid Computing: An Overview and Tutorial Kenny Daily BIT Presentation 22/09/2016.

An Brief Introduction Charlie Taylor Associate Director, Research Computing UF Research Computing.

Advanced Computing Facility Introduction

GRID COMPUTING.

Welcome to Indiana University Clusters

PARADOX Cluster job management

Unix Scripts and PBS on BioU

HPC usage and software packages

Welcome to Indiana University Clusters

Special jobs with the gLite WMS

BIOSTAT LINUX CLUSTER By Helen Wang October 29, 2015.

Postdoctoral researcher Department of Environmental Sciences, LSU

Paul Sexton CS 566 February 6, 2006

Introduction to HPC Workshop

Compiling and Job Submission

Exploring the Power of EPDM Tasks - Working with and Developing Tasks in EPDM By: Marc Young XLM Solutions

gLite Job Management Christos Theodosiou

Quick Tutorial on MPICH for NIC-Cluster

Working in The IITJ HPC System

Presentation transcript:

Batch Queuing Systems The Portable Batch System (PBS) and the Load Sharing Facility (LSF) queuing systems share much common functionality in running batch jobs. However, they differ in their implementation of the batch environment and their user commands. Table 1 below provides a comparative list of command options to help users migrating from LSF (used on halem) to PBS (used on palm and discover).

Queuing System LSF(halem) PBS (palm and discover) Resource Directive Sentinel #BSUB#PBS # of Nodes/Processors-n (nodes)On palm: -l ncpus= (Processors) On discover: -l select= (nodes) Wall Clock Limit-W hh:mm-l walltime=hh:mm:ss Queue-q notification-B sends mail when job begins -N sends job report when finished -m b sends mail when job begins -m e sends mail when job ends address-u -M Initial Directory(default = job submission directory)(default = $HOME) Job Name-J -N STDOUT-o STDERR-e STDERR & STDOUT to same file (use -o without -e)-j oe (both to STDOUT) -j eo (both to STDERR) Project to charge-P -W group_list= Table 1: Syntax for frequently used options

Queuing System LSF on halem PBS on palm or discover Submissionbsubqsub Deletionbkillqdel Statusbjobsqstat Queue Listbqueues -lqstat -Q GUI monitorxpbsmon Table 2: Frequently used job management commands ( check man pages of each command for more information) The following table compares commonly-used LSF and PBS commands to control and monitor the jobs. Batch Job Management

Both LSF and PBS provide support for special environment variables, which simplify scripting and configuration of the batch jobs. Queuing System LSF on halem PBS on palm or discover Processor List$LSB_HOSTScat $PBS_NODEFILE Directory of Submission$LS_SUBCWD$PBS_O_WORKDIR Job Id$LSB_JOBID$PBS_JOBID Table 3: Useful environmental variables Environment Variables

Example Batch Scripts The following simple LSF and PBS submission scripts compare how the batch systems request comparable resources and run the same parallel executable: LSF example : #!/bin/csh #BSUB -n 4 #BSUB -W 6:00 #BSUB -q special_b #BSUB -J myJobName #BSUB -o out.o%J #BSUB -u #BSUB -P k1234 echo "Master Host: `hostname` " echo "Node List: $LSB_HOSTS " cd $LS_SUBCWD prun -n 16./mpihello To submit job, type: bsub < script_name PBS example : #!/bin/csh #PBS -l select=4:ncpus=4 <--- on discover or… #PBS -l ncpus=16 <--- on palm #PBS -l walltime=6:00:00 #PBS -q general #PBS -N myJobName #PBS -j oe #PBS -me -M #PBS -W group_list=k1234 echo "Master Host: $PBS_O_HOST" echo "Nodes:"; cat -n $PBS_NODEFILE cd $PBS_O_WORKDIR mpirun -np 16./mpihello To submit job, type: qsub script_name

Interactive Batch Both queuing systems can enter an interactive batch mode, commonly used for debugging, by using the -Is (LSF) or -I (PBS) option. Other options are the same as previously shown, but will be entered all on one line. Commands for the two different queuing systems are compared below: LSF example (halem) : % bsub -Is -Pk1234 -qspecial_b -W6:00 -n4 /usr/dlocal/bin/tcsh When the requested processors are available, the interactive prompt will appear: bsub> cd $LS_SUBCWD bsub> prun -n 16./mpihello bsub> exit PBS example (discover or palm) : on discover: % qsub -I -W group_list=k1234 -q general -l walltime=06:00:00,select=4:ncpus=4 or on palm: % qsub -I -W group_list=k1234 -q general -l walltime=06:00:00,ncpus=16 When the requested processors are available, the interactive prompt will appear: % cd $PBS_O_WORKDIR % mpirun -np 16./mpihello % exit