Job Submission on WestGrid Feb 15 2005 on Access Grid.

Slides:

Advertisements

Similar presentations

CGrid 2005, slide 1 Empirical Evaluation of Shared Parallel Execution on Independently Scheduled Clusters Mala Ghanesh Satish Kumar Jaspal Subhlok University.

Advertisements

S.Chechelnitskiy / SFU Simon Fraser Running CE and SE in a XEN virtualized environment S.Chechelnitskiy Simon Fraser University CHEP 2007 September 6 th.

Parallel ISDS Chris Hans 29 November 2004.

♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.

Job Submission Using PBSPro and Globus Job Commands.

Southgreen HPC system Concepts Cluster : compute farm i.e. a collection of compute servers that can be shared and accessed through a single “portal”

Software Tools Using PBS. Software tools Portland compilers pgf77 pgf90 pghpf pgcc pgCC Portland debugger GNU compilers g77 gcc Intel ifort icc.

PBS Job Management and Taskfarming Joachim Wagner

Using Clusters -User Perspective. Pre-cluster scenario So many different computers: prithvi, apah, tejas, vayu, akash, agni, aatish, falaq, narad, qasid.

Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.

OSCAR Jeremy Enos OSCAR Annual Meeting January 10-11, 2002 Workload Management.

Tutorial on MPI Experimental Environment for ECE5610/CSC

IT MANAGEMENT OF FME, 21 ST JULY  THE HPC FACILITY  USING PUTTY AND WINSCP TO ACCESS THE SERVER  SENDING FILES TO THE SERVER  RUNNING JOBS 

Introduction to HPC Workshop October Introduction Rob Lane HPC Support Research Computing Services CUIT.

ISG We build general capability Job Submission on the Olympus Cluster J. DePasse; S. Brown, PhD; T. Maiden Pittsburgh Supercomputing Center Public Health.

IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)

Reference: Message Passing Fundamentals.

CS 213 Commercial Multiprocessors. Origin2000 System – Shared Memory Directory state in same or separate DRAMs, accessed in parallel Upto 512 nodes (1024.

Sun Grid Engine Grid Computing Assignment – Fall 2005 James Ruff Senior Department of Mathematics and Computer Science Western Carolina University.

Science Advisory Committee Meeting - 20 September 3, 2010 Stanford University 1 04_Parallel Processing Parallel Processing Majid AlMeshari John W. Conklin.

Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes.

High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.

ISG We build general capability Purpose After this tutorial, you should: Be comfortable submitting work to the batch queuing system of olympus and be familiar.

Introduction to UNIX/Linux Exercises Dan Stanzione.

Task Farming on HPCx David Henty HPCx Applications Support

Executing OpenMP Programs Mitesh Meswani. Presentation Outline Introduction to OpenMP Machine Architectures Shared Memory (SMP) Distributed Memory MPI.

High Throughput Computing with Condor at Purdue XSEDE ECSS Monthly Symposium Condor.

Gilbert Thomas Grid Computing & Sun Grid Engine “Basic Concepts”

Chao “Bill” Xie, Victor Bolet, Art Vandenberg Georgia State University, Atlanta, GA 30303, USA February 22/23, 2006 SURA, Washington DC Memory Efficient.

Introduction to HPC resources for BCB 660 Nirav Merchant

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

VIPBG LINUX CLUSTER By Helen Wang March 29th, 2013.

Bigben Pittsburgh Supercomputing Center J. Ray Scott

March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.

17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin.

Cluster Workstations. Recently the distinction between parallel and distributed computers has become blurred with the advent of the network of workstations.

Using the BYU Supercomputers. Resources Basic Usage After your account is activated: – ssh You will be logged in to an interactive.

Batch Scheduling at LeSC with Sun Grid Engine David McBride Systems Programmer London e-Science Centre Department of Computing, Imperial College.

Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,

Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”

HPC for Statistics Grad Students. A Cluster Not just a bunch of computers Linked CPUs managed by queuing software – Cluster – Node – CPU.

Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.

1 Lattice QCD Clusters Amitoj Singh Fermi National Accelerator Laboratory.

Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.

How to for compiling and running MPI Programs. Prepared by Kiriti Venkat.

Software Tools Using PBS. Software tools Portland compilers pgf77 pgf90 pghpf pgcc pgCC Portland debugger GNU compilers g77 gcc Intel ifort icc.

Running Parallel Jobs Cray XE6 Workshop February 7, 2011 David Turner NERSC User Services Group.

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

Introduction to HPC Workshop October Introduction Rob Lane & The HPC Support Team Research Computing Services CUIT.

HUBbub 2013: Developing hub tools that submit HPC jobs Rob Campbell Purdue University Thursday, September 5, 2013.

Portable Batch System – Definition and 3 Primary Roles Definition: PBS is a distributed workload management system. It handles the management and monitoring.

Background Computer System Architectures Computer System Software.

Introduction to HPC Workshop March 1 st, Introduction George Garrett & The HPC Support Team Research Computing Services CUIT.

NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable.

Using ROSSMANN to Run GOSET Studies Omar Laldin ( using materials from Jonathan Crider, Harish Suryanarayana ) Feb. 3, 2014.

CFI 2004 UW A quick overview with lots of time for Q&A and exploration.

An Brief Introduction Charlie Taylor Associate Director, Research Computing UF Research Computing.

Advanced Computing Facility Introduction

GRID COMPUTING.

PARADOX Cluster job management

HPC usage and software packages

Architecture & System Overview

CommLab PC Cluster (Ubuntu OS version)

NGS computation services: APIs and Parallel Jobs

Introduction to HPC Workshop

Mike Becher and Wolfgang Rehm

Hybrid Programming with OpenMP and MPI

Introduction to High Performance Computing Using Sapelo2 at GACRC

Quick Tutorial on MPICH for NIC-Cluster

Working in The IITJ HPC System

Presentation transcript:

Job Submission on WestGrid Feb on Access Grid

Introduction  Simon Sharpe, one member of the WestGrid support team  The best way to contact us is to  This seminar tells you;  How to run, monitor, or cancel your jobs  How to select the best site for your job  How to adapt your job submission for different sites  How to get your jobs running as quickly as possible  Feel free to interrupt if you have questions  Simon Sharpe, one member of the WestGrid support team  The best way to contact us is to  This seminar tells you;  How to run, monitor, or cancel your jobs  How to select the best site for your job  How to adapt your job submission for different sites  How to get your jobs running as quickly as possible  Feel free to interrupt if you have questions

Getting into the Queue  HPC Resources are valuable research tools  A batch queuing system is needed to  Match jobs to resources  Deliver maximum bang for the research buck  Distribute jobs and collect output across parallel CPUs  Ensure a fair sharing of resources  HPC Resources are valuable research tools  A batch queuing system is needed to  Match jobs to resources  Deliver maximum bang for the research buck  Distribute jobs and collect output across parallel CPUs  Ensure a fair sharing of resources

Getting into the Queue  WestGrid compute sites use TORQUE/Moab  Based on PBS (Portable Batch System)  You need just a few commands common to WestGrid machines  There are important differences in job submission among sites you need to know about  With the diversity of WestGrid, it is possible that there is more than one machine suitable for your job  WestGrid compute sites use TORQUE/Moab  Based on PBS (Portable Batch System)  You need just a few commands common to WestGrid machines  There are important differences in job submission among sites you need to know about  With the diversity of WestGrid, it is possible that there is more than one machine suitable for your job

A Simple Sample  The script file serialhello.pbs tells TORQUE how to run the C program serialhello  This example show how to run a serial job on Glacier, which is a good choice for serial jobs  The qsub command tells TORQUE to run the job described in the script file serialhello.pbs  This example show how to run a serial job on Glacier, which is a good choice for serial jobs  The qsub command tells TORQUE to run the job described in the script file serialhello.pbs  When your job completes, TORQUE creates two new files in the current directory capturing;  error out from the job  standard out  When your job completes, TORQUE creates two new files in the current directory capturing;  error out from the job  standard out

End of Seminar  Thanks for coming  I wish it was that easy  Thanks for coming  I wish it was that easy

HPC: One Size Does Not Fit All  When the only tool you have is a hammer, every job looks like a nail  Things that affect system selection;  System dictated by executable or licensing  MPI or OpenMP  Availability: How busy is the system?  Amount of RAM required  Speed or number of processors  When the only tool you have is a hammer, every job looks like a nail  Things that affect system selection;  System dictated by executable or licensing  MPI or OpenMP  Availability: How busy is the system?  Amount of RAM required  Speed or number of processors

HPC: One Size Does Not Fit All  Things that affect system selection (continued);  Scalability of your application  Inter-processor communication requirements  Queue limits (walltime, number of CPUs)  Inertia: It is where we’ve always run it  Things that affect system selection (continued);  Scalability of your application  Inter-processor communication requirements  Queue limits (walltime, number of CPUs)  Inertia: It is where we’ve always run it

Uses of WestGrid Machines MachineUseInterconnectCPUs Glacier IBM Xeon Serial, moderate parallel MPI GigE Shared in node 1680 Dual CPUs/node Matrix HP XC Alpha MPI ParallelInfiniband, Shared in node 256 Dual CPUs/node Lattice HP SC Alpha Moderate MPI parallel, serial Quadrics, Shared in node 144, 68 (G03) Quad CPUs/node Cortex IBM Power5 OpenMP, MPI Parallel Shared memory64, 64, 4 Nexus SGI Origin MIPS OpenMP, MPI Parallel Shared memory256, 64, 64, 36, 32, 32, 8 Robson IBM Power5 Serial, moderate MPI parallel GigE, Shared in node 56 Dual CPUs/node

TORQUE and Moab Commands qsub scriptSubmit this job to the queue, common options include -l mem=1GB -l nodes=4:ppn=2 or, on Nexus –l ncpus=4 -l walltime=06:00:00 -q queue-name -m and –M for notifications showqShow me the jobs in the queue qstat jobidShow the status of the job in the queue, common options include -a and -an qdel jobidDelete this job number from the queue

Sample MPI job on Glacier Parallel jobs have differing degrees of parallelism Glacier, which has a slower interconnect than other WestGrid machines, may not turn out to be the best place for your parallel job Latency: Like the time it takes to dial and say “hello” Bandwidth: How fast can you talk? If your parallel job does not require intensive communications between processes, it may be worth testing on Glacier More info on Glacier submissions at;

MPI Submission on Glacier  We need to tell TORQUE how many processors we need  This asks for 2 nodes and 2 processors per node (4 CPUs)  Similar script to last time, but now calling program parallelized with MPI  Adding the walltime estimate helps TORQUE schedule the job  Note that we can pass directives;  on the command line or  in the script  Similar script to last time, but now calling program parallelized with MPI  Adding the walltime estimate helps TORQUE schedule the job  Note that we can pass directives;  on the command line or  in the script  This time we wait in the queue

Sample MPI job on Matrix Matrix is an HP XC cluster using AMD Opterons and Infiniband Interconnect 64-bit Linux Not intended for serial work A good home for parallel jobs More info on Matrix submissions at;

Running MPI Jobs on Matrix For Matrix, use nodes and processors/node (ppn) to tell TORQUE how many CPUs your job needs Matrix machines have 2 CPUs/Node A minimal TORQUE script to run a parallel MPI job on Matrix Standard and Error output dropped into the directory we submitted from

Sample MPI job on Lattice Lattice is an HP Alpha cluster connected with Quadrics 64-bit Tru64 Intended for parallel work Four processor shared memory Quadrics interconnect for more than 4 processors MPI communicates through interconnect or shared memory, as appropriate Also being used for some serial work More info on Lattice submissions at;

Running MPI Jobs on Lattice For Lattice, use nodes and processors/node to set number of processors. Lattice has 4 processors on each node. In this case we ask for 2 CPUs on one box and 2 on another A minimal TORQUE script to run a parallel MPI job on Lattice Standard and error out dropped into the directory we submitted from

Sample Serial Job on Lattice  Lattice has a high-speed Quadrics interconnect  If your job is serial, it does not take advantage of the Quadrics interconnect  Glacier may be an alternative  Having said that, many serial jobs are run on Lattice  Lattice has a high-speed Quadrics interconnect  If your job is serial, it does not take advantage of the Quadrics interconnect  Glacier may be an alternative  Having said that, many serial jobs are run on Lattice

Running Serial Jobs on Lattice On Lattice, we tell TORQUE to run the job described in the script file serialhello.pbs A minimal TORQUE script to run a serial job on Lattice Standard and error out dropped into the directory we submitted from

Sample Parallel job on Cortex Cortex is a machine with IBM Power5 SMP processors Running AIX Not for serial work A good home for large parallel applications needing shared memory and/or fast interconnection Good for large memory jobs More info on Cortex submissions at;

Running Serial Jobs on Cortex On Cortex, we tell TORQUE to run the job described in the script file mpihello.pbs The script which describes how we want cortex to run the parallel program mpihello The standard output file, dropped into our working directory

Sample Parallel Job on Nexus  Nexus is a collection of SGI SMP machines  Several sizes serviced by different queues.  Test on smaller machines, heavy lifting on large ones  A good home for parallel jobs with intense communication requirements and/or large memory needs  More information at;  Nexus is a collection of SGI SMP machines  Several sizes serviced by different queues.  Test on smaller machines, heavy lifting on large ones  A good home for parallel jobs with intense communication requirements and/or large memory needs  More information at;

Running OpenMP Jobs on Nexus For Nexus, match ncpus with OMP_NUM_THREADS In this case we ask for 8 CPUs on the Helios machine (8-32 CPUs) You can try trivial OpenMP jobs from the command line. This job ran interactively on the head node. You should not use more than 2 processors for interactive jobs. To run jobs requiring real processing, you must submit them to TORQUE

Sample Serial Job on Robson  Robson is a new 56 processor Power5 system  64-bit Linux  Good for serial work, may be suitable for some parallel processing.  Message passing through MPI  More info at;  Robson is a new 56 processor Power5 system  64-bit Linux  Good for serial work, may be suitable for some parallel processing.  Message passing through MPI  More info at;

Running Serial Jobs on Robson This is a minimal serial job submission script for Robson. It runs the executable “hello” A more elaborate script example is available; Robson also runs MPI parallel jobs, as described on the above web page TORQUE drops the Error Out (zero –length in this case) and Standard Out to the directory we submitted from

Shortening HPC Cycle  Try your jobs at different sites  Test your process on small jobs  Give realistic walltimes, memory requirements  Apply for a larger Resource Allocation   Try your jobs at different sites  Test your process on small jobs  Give realistic walltimes, memory requirements  Apply for a larger Resource Allocation 

Summary  HPC jobs have differing requirements  WestGrid provides an increasing variety of tools  Use the system that is best for your job  Start off simple and small  Find out how well your job scales  Getting help  Because of implementation differences, “man qsub” might not be your best source of help  Support pages as listed throughout this presentation   HPC jobs have differing requirements  WestGrid provides an increasing variety of tools  Use the system that is best for your job  Start off simple and small  Find out how well your job scales  Getting help  Because of implementation differences, “man qsub” might not be your best source of help  Support pages as listed throughout this presentation 