Requesting Resources on an HPC Facility

Slides:

Advertisements

Similar presentations

Software stack on the worker nodes AMD Opteron/Intel Westmere Redhat 64bit Scientific Linux Portland, GNU,Intel OpenMPI Sun Grid Engine v6 Ganglia.

Advertisements

Publishing applications on the web via the Easa Portal and integrating the Sun Grid Engine Publishing applications on the web via the Easa Portal and integrating.

Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

CCPR Workshop Lexis Cluster Introduction October 19, 2007 David Ash.

Southgreen HPC system Concepts Cluster : compute farm i.e. a collection of compute servers that can be shared and accessed through a single “portal”

Introduction to HPC Workshop October Introduction Rob Lane HPC Support Research Computing Services CUIT.

DCC/FCUP Grid Computing 1 Resource Management Systems.

6/2/20071 Grid Computing Sun Grid Engine (SGE) Manoj Katwal.

Sun Grid Engine Grid Computing Assignment – Fall 2005 James Ruff Senior Department of Mathematics and Computer Science Western Carolina University.

Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes.

Introduction to UNIX/Linux Exercises Dan Stanzione.

Getting Started with HPC On Iceberg

Christian Kocks April 3, 2012 High-Performance Computing Cluster in Aachen.

Gilbert Thomas Grid Computing & Sun Grid Engine “Basic Concepts”

Bigben Pittsburgh Supercomputing Center J. Ray Scott

17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin.

Getting Started with HPC On Iceberg Michael Griffiths Corporate Information and Computing Services The University of Sheffield

CCPR Workshop Introduction to the Cluster July 13, 2006.

Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,

HPC for Statistics Grad Students. A Cluster Not just a bunch of computers Linked CPUs managed by queuing software – Cluster – Node – CPU.

1 High-Performance Grid Computing and Research Networking Presented by David Villegas Instructor: S. Masoud Sadjadi

Getting Started with HPC On Iceberg Michael Griffiths and Deniz Savas Corporate Information and Computing Services The University of Sheffield

Getting Started on Emerald Research Computing Group.

Using the Weizmann Cluster Nov Overview Weizmann Cluster Connection Basics Getting a Desktop View Working on cluster machines GPU For many more.

Cluster Computing Applications for Bioinformatics Thurs., Sept. 20, 2007 process management shell scripting Sun Grid Engine running parallel programs.

Submitting Jobs to the Sun Grid Engine at Sheffield and Leeds (Node1)

Remote & Collaborative Visualization. TACC Remote Visualization Systems Longhorn – Dell XD Visualization Cluster –256 nodes, each with 48 GB (or 144 GB)

Advanced topics Cluster Training Center for Simulation and Modeling September 4, 2015.

Cliff Addison University of Liverpool NW-GRID Training Event 26 th January 2007 SCore MPI Taking full advantage of GigE.

Introduction to Parallel Computing Presented by The Division of Information Technology Computer Support Services Department Research Support Group.

Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128.

Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.

NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable.

Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.

Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.

Requesting Resources on an HPC Facility Michael Griffiths and Deniz Savas Corporate Information and Computing Services The University of Sheffield

INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.

Introduction to High Performance Computing Michael Griffiths and Deniz Savas Corporate Information and Computing Services The University of Sheffield

Grid Computing: An Overview and Tutorial Kenny Daily BIT Presentation 22/09/2016.

An Brief Introduction Charlie Taylor Associate Director, Research Computing UF Research Computing.

1 High-Performance Grid Computing and Research Networking Presented by Javier Delgodo Slides prepared by David Villegas Instructor: S. Masoud Sadjadi

Advanced Computing Facility Introduction

Intermediate SCC Usage

Compute and Storage For the Farm at Jlab

GRID COMPUTING.

Auburn University

Welcome to Indiana University Clusters

PARADOX Cluster job management

HPC usage and software packages

OpenPBS – Distributed Workload Management System

2. OPERATING SYSTEM 2.1 Operating System Function

Welcome to Indiana University Clusters

How to use the HPCC to do stuff

Using Paraguin to Create Parallel Programs

Hodor HPC Cluster LON MNG HPN Head Node Comp Node Comp Node Comp Node

Architecture & System Overview

CommLab PC Cluster (Ubuntu OS version)

BIMSB Bioinformatics Coordination

Postdoctoral researcher Department of Environmental Sciences, LSU

Compiling and Job Submission

CCR Advanced Seminar: Running CPLEX Computations on the ISE Cluster

Advanced Computing Facility Introduction

Support for ”interactive batch”

High Performance Computing in Bioinformatics

Sun Grid Engine.

Introduction to High Performance Computing Using Sapelo2 at GACRC

Quick Tutorial on MPICH for NIC-Cluster

Working in The IITJ HPC System

Presentation transcript:

Requesting Resources on an HPC Facility (Using the Sun Grid Engine Job Scheduler) Michael Griffiths and Norbert Gyenge Corporate Information and Computing Services The University of Sheffield www.sheffield.ac.uk/cics/research

Review: Objectives Understand what High Performance Computing is Be able to access remote HPC Systems by different methods Run Applications on a remote HPC system Manage files using the Linux Operating Systems Know how to use the different kinds of file storage systems Run applications using a Scheduling System Know how to get more resources and how to get resources dedicated for your research Know how to enhance your research through shell scripting Know how to get help and training What is Research Computing Who uses research computing? What is available? How is it managed? What are the Challenges?

Outline Using the Job Scheduler – Interactive Jobs Batch Jobs Task arrays Running Parallel Jobs GPUs and remote Visualisation Beyond ShARC Accessing tier 2 resources What is Research Computing Who uses research computing? What is available? How is it managed? What are the Challenges?

1. Using the Job Scheduler Interactive Jobs http://www.sheffield.ac.uk/cics/research/hpc/using/interactive Batch Jobs http://www.sheffield.ac.uk/cics/research/hpc/using/batch 4

Running Jobs A note on interactive jobs Software that requires intensive computing should be run on the worker nodes and not the head node. You should run compute intensive interactive jobs on the worker nodes by using the qsh or qrsh command. Maximum ( and also default) time limit for interactive jobs is 8 hours.

Sun Grid Engine ShARC login nodes are gateways to the cluster of worker nodes. Login nodes’ main purpose is to allow access to the worker nodes but NOT to run cpu intensive programs. All cpu intensive computations must be performed on the worker nodes. This is achieved by; qsh command for interactive jobs and qsub command for batch jobs. Once you log into ShARC, taking advantage of the power of a worker-node for interactive work is done simply by typing qsh and working in the new shell window that is opened. The next set of slides assume that you are already working on one of the worker nodes (qsh session). 6

Practice Session 1: Running Applications on ShARC (Problem 1) Case Studies Analysis of Patient Inflammation Data Running an R application how to submit jobs and run R interactively List available and loaded modules load the module for the R package Start the R Application and plot the inflammation data

Managing Your Jobs Sun Grid Engine Overview SGE is the resource management system, job scheduling and batch control system. (Others available such as PBS, Torque/Maui, Platform LSF ) Starts up interactive jobs on available workers Schedules all batch orientated ‘i.e. non-interactive’ jobs Attempts to create a fair-share environment Optimizes resource utilization

Scheduling ‘qsub’ batch jobs on the cluster SGE worker node SGE worker node SGE worker node SGE worker node SGE worker node B Slot 1 C Slot 1 C Slot 2 A Slot 1 B Slot 1 C Slot 1 A Slot 1 A Slot 2 B Slot 1 C Slot 1 C Slot 2 C Slot 3 B Slot 1 B Slot 2 B Slot 3 Queue-A Queue-B Queue-C Queues Policies Priorities Share/Tickets Resources Users/Projects SGE MASTER node JOB Y JOB Z JOB X JOB O JOB N JOB U Scheduling ‘qsub’ batch jobs on the cluster

Managing Jobs monitoring and controlling your jobs http://www.shef.ac.uk/cics/research/hpc/using/batch There are a number of commands for querying and modifying the status of a job running or waiting to run. These are; qstat (query job status) qstat –u username qstat –u “*” qdel (delete a job) qdel jobid qmon ( a GUI interface for SGE )

Demonstration 1 Running Jobs batch job example Using the R package to analyse patient data qsub example: qsub –l h_rt=10:00:00 –o myoutputfile –j y myjob OR alternatively … the first few lines of the submit script myjob contains - $!/bin/bash $# -l h_rt=10:00:00 $# -o myoutputfile $# -j y and you simply type; qsub myjob

Practice Session: Submitting Jobs To ShARC (Problem 2 & 3) Patient Inflammation Study run the R example as a batch job Case Study Fish population simulation Submitting jobs to Sun Grid Engine Instructions are in the readme file in the sge folder of the course examples From an interactive session Load the compiler module Compile the fish program Run test1, test2 and test3

Managing Jobs: Reasons for job failures http://www.shef.ac.uk/cics/research/hpc/using/iceberg/requirements SGE cannot find the binary file specified in the job script You ran out of file storage. It is possible to exceed your filestore allocation limits during a job that is producing large output files. Use the quota command to check this. Required input files are missing from the startup directory Environment variable is not set correctly (LM_LICENSE_FILE etc) Hardware failure (eg. mpi ch_p4 or ch_gm errors)

Finding out the memory requirements of a job Real Memory Limits: Default real memory allocation is 2 GBytes Real memory resource can be requested by using –l rmem= Jobs exceeding the real memory allocation will not be deleted but will run with reduced efficiency and the user will be emailed about the memory deficiency. When you get warnings of that kind, increase the real memory allocation for your job by using the –l rmem= parameter. Determining the virtual memory requirements for a job; qstat –f –j jobid | grep mem The reported figures will indicate the currently used memory ( vmem ) Maximum memory needed since startup ( maxvmem) cumulative memory_usage*seconds ( mem )

Running GPU parallel jobs GPU parallel processing is supported on 8 Nvidia Tesla K80 GPUs units attached to ShARC. You can then submit jobs that use the GPU facilities by using the following three parameters to the qsub command; -l gpu=nn where 1<= nn <= 8 is the number of gpu-modules to be used by the job.

Managing Jobs : Running cpu-parallel jobs More many processor tasks Sharing memory Distributed Memory Parallel environment needed for a job can be specified by the: -pe <env> nn parameter of qsub command, where <env> is.. smp : These are shared memory OpenMP jobs and therefore must run on a single node using its multiple processors. mpi : MPI libraries. These are MPI jobs running on multiple hosts Compilers that support MPI. PGI , Intel, GNU

Demonstration 3 Running a parallel job Test 6 provides an opportunity to practice submitting parallel jobs to the scheduler. To run testmpi6, compile the mpi example Load the openmpi compiler module module load mpi/intel/openmpi/1.8.3 compile the diffuse program mpicc diffuse.c -o diffuse qsub testmpi6 Use qstat to monitor the job examine the output

Managing Jobs: Running arrays of jobs https://www.sheffield.ac.uk/cics/research/hpc/iceberg/runbatch/examples2 Many processors running a copy of a task independently Add the –t parameter to the qsub command or script file (with #$ at beginning of the line) Example: –t 1-10 This will create 10 tasks from one job Each task will have its environment variable $SGE_TASK_ID set to a single unique value ranging from 1 to 10. There is no guarantee that task number m will start before task number n , where m<n .

Practice Session: Submitting A Task Array To Iceberg (Problem 4) Case Study Fish population simulation Submitting jobs to Sun Grid Engine Instructions are in the readme file in the sge folder of the course examples From an interactive session Run the SGE task array example Run test4, test5

9. Remote Visualisation See Specialist High Speed Visualization Access to iceberg http://www.sheffield.ac.uk/cics/research/hpc/using/access/intro Undertake visualisation using thin clients accessing remote high quality visualisation hardware Remote visualisation removes the need to transfer data and allows researchers to visualise data sets on remote visualisation servers attached to the high performance computer and its storage facility

Client Access to Visualisation Cluster Iceberg – Campus Compute Cloud VirtualGL Server (NVIDIA GPU) VirtualGL Client

Remote Visualisation Using SGD Star a browser and goto https://myapps.shef.ac.uk login to Sun Global Desktop Under ShARC Applications an interactive session or a login node From the terminal type the command qsh-vis This opens a shell with instructions to either Open a browser and enter the address http://sharc.shef.ac.uk:XXXX Start Tiger VNCViewer on your desktop Use the address sharc.shef.ac.uk:XXXX XXXX is a port address provided on the sharc terminal When requested use your usual sharc user credentials

Remote Desktop Through VNC

Remote Visualisation Using Tiger VNC and the Putty SHH Client Login in to sharc using putty At the prompt type qsh-vis This opens a shell with instructions to either Open a browser and enter the address http://sharc.shef.ac.uk:XXXX Start Tiger VNCViewer on your desktop Use the address sharc.shef.ac.uk:XXXX XXXX is a port address provided on the iceberg terminal When requested use your usual iceberg user credentials

Beyond ShARC ShARC OK for many compute problems Purchasing dedicated resource National tier 2 facility for more demanding compute problems Archer Larger facility for grand challenge problems (pier review process to access) https://www.sheffield.ac.uk/cics/research/hpc/costs

High Performance Computing Tiers Tier 1 computing Archer Tier 2 Computing Peta-5, jade Tier 3 Computing sharc

Purchasing Resource https://www.sheffield.ac.uk/cics/research/hpc/costs Buying nodes using framework Research Groups purchase HPC equipment against their research grant this hardware is integrated with Iceberg cluster Buying slice of time Research groups can purchase servers for a length of time specified by the research group (cost is 1.0p/core per hour) Servers are reserved for dedicated usage by the research group using a provided project name When reserved nodes are idle they become available to the general short queues. They are quickly released for use by the research group when required. For information e-mail research-it@Sheffield.ac.uk

National HPC Services Tier-2 Facilities http://www.hpc-uk.ac.uk/ https://goo.gl/j7UvBa Archer UK National Supercomputing Service Hardware – CRAY XC30 2632 Standard nodes Each node contains two Intel E5-2697 v2 12-core processors Therefore 2632*2*12 63168 cores. 64 GB of memory per node 376 high memory nodes with128GB memory Nodes connected to each other via ARIES low latency interconnect Research Data File System – 7.8PB disk http://www.archer.ac.uk/ EPCC HPCC Facilities http://www.epcc.ed.ac.uk/facilities/national-facilities Training and expertise in parallel computing

Links for Software Downloads Moba X-term https://mobaxterm.mobatek.net/ Putty http://www.chiark.greenend.org.uk/~sgtatham/putty/ WinSCP http://winscp.net/eng/download.php TigerVNC http://sourceforge.net/projects/tigervnc/