Requesting Resources on an HPC Facility Michael Griffiths and Deniz Savas Corporate Information and Computing Services The University of Sheffield www.sheffield.ac.uk/wrgrid.

Slides:



Advertisements
Similar presentations
Software stack on the worker nodes AMD Opteron/Intel Westmere Redhat 64bit Scientific Linux Portland, GNU,Intel OpenMPI Sun Grid Engine v6 Ganglia.
Advertisements

Linux, it's not Windows A short introduction to the sub-department's computer systems Gareth Thomas.
Publishing applications on the web via the Easa Portal and integrating the Sun Grid Engine Publishing applications on the web via the Easa Portal and integrating.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
CCPR Workshop Lexis Cluster Introduction October 19, 2007 David Ash.
Southgreen HPC system Concepts Cluster : compute farm i.e. a collection of compute servers that can be shared and accessed through a single “portal”
6/2/20071 Grid Computing Sun Grid Engine (SGE) Manoj Katwal.
Sun Grid Engine Grid Computing Assignment – Fall 2005 James Ruff Senior Department of Mathematics and Computer Science Western Carolina University.
Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes.
High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.
Installing and running COMSOL on a Windows HPCS2008(R2) cluster
JGI/NERSC New Hardware Training Kirsten Fagnan, Seung-Jin Sul January 10, 2013.
Introduction to UNIX/Linux Exercises Dan Stanzione.
Apache Airavata GSOC Knowledge and Expertise Computational Resources Scientific Instruments Algorithms and Models Archived Data and Metadata Advanced.
Getting Started with HPC On Iceberg
Gilbert Thomas Grid Computing & Sun Grid Engine “Basic Concepts”
WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
VIPBG LINUX CLUSTER By Helen Wang March 29th, 2013.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
Introduction to the HPCC Dirk Colbry Research Specialist Institute for Cyber Enabled Research.
17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin.
Getting Started with HPC On Iceberg Michael Griffiths Corporate Information and Computing Services The University of Sheffield
Advanced SCC Usage Research Computing Services Katia Oleinik
CCPR Workshop Introduction to the Cluster July 13, 2006.
Invitation to Computer Science 5 th Edition Chapter 6 An Introduction to System Software and Virtual Machine s.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 3: Operating-System Structures System Components Operating System Services.
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
HPC for Statistics Grad Students. A Cluster Not just a bunch of computers Linked CPUs managed by queuing software – Cluster – Node – CPU.
1 High-Performance Grid Computing and Research Networking Presented by David Villegas Instructor: S. Masoud Sadjadi
Getting Started with HPC On Iceberg Michael Griffiths and Deniz Savas Corporate Information and Computing Services The University of Sheffield
Using the Weizmann Cluster Nov Overview Weizmann Cluster Connection Basics Getting a Desktop View Working on cluster machines GPU For many more.
Cluster Computing Applications for Bioinformatics Thurs., Sept. 20, 2007 process management shell scripting Sun Grid Engine running parallel programs.
Submitting Jobs to the Sun Grid Engine at Sheffield and Leeds (Node1)
Remote & Collaborative Visualization. TACC Remote Visualization Systems Longhorn – Dell XD Visualization Cluster –256 nodes, each with 48 GB (or 144 GB)
Portable Batch System – Definition and 3 Primary Roles Definition: PBS is a distributed workload management system. It handles the management and monitoring.
Advanced topics Cluster Training Center for Simulation and Modeling September 4, 2015.
Getting Started: XSEDE Comet Shahzeb Siddiqui - Software Systems Engineer Office: 222A Computer Building Institute of CyberScience May.
Virtual Machines Module 2. Objectives Define virtual machine Define common terminology Identify advantages and disadvantages Determine what software is.
Cliff Addison University of Liverpool NW-GRID Training Event 26 th January 2007 SCore MPI Taking full advantage of GigE.
Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128.
NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
UNIX U.Y: 1435/1436 H Operating System Concept. What is an Operating System?  The operating system (OS) is the program which starts up when you turn.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Introduction to High Performance Computing Michael Griffiths and Deniz Savas Corporate Information and Computing Services The University of Sheffield
Grid Computing: An Overview and Tutorial Kenny Daily BIT Presentation 22/09/2016.
An Brief Introduction Charlie Taylor Associate Director, Research Computing UF Research Computing.
Advanced Computing Facility Introduction
Compute and Storage For the Farm at Jlab
Auburn University
Welcome to Indiana University Clusters
PARADOX Cluster job management
HPC usage and software packages
OpenPBS – Distributed Workload Management System
2. OPERATING SYSTEM 2.1 Operating System Function
Welcome to Indiana University Clusters
Using Paraguin to Create Parallel Programs
Hodor HPC Cluster LON MNG HPN Head Node Comp Node Comp Node Comp Node
Architecture & System Overview
CommLab PC Cluster (Ubuntu OS version)
Compiling and Job Submission
CCR Advanced Seminar: Running CPLEX Computations on the ISE Cluster
Advanced Computing Facility Introduction
Support for ”interactive batch”
Requesting Resources on an HPC Facility
Quick Tutorial on MPICH for NIC-Cluster
Working in The IITJ HPC System
Presentation transcript:

Requesting Resources on an HPC Facility Michael Griffiths and Deniz Savas Corporate Information and Computing Services The University of Sheffield (Using the Sun Grid Engine Job Scheduler)

1.Using the Job Scheduler – Interactive Jobs 2.Batch Jobs 3.Task arrays 4.Running Parallel Jobs 5.GPUs and remote Visualisation 6.Beyond Iceberg Accessing the N8 tier 2 facility Outline

6.Using the Job Scheduler Interactive Jobs – Batch Jobs –

Running Jobs A note on interactive jobs Software that requires intensive computing should be run on the worker nodes and not the head node. You should run compute intensive interactive jobs on the worker nodes by using the qsh or qrsh command. Maximum ( and also default) time limit for interactive jobs is 8 hours.

Sun Grid Engine Two iceberg headnodes are gateways to the cluster of worker nodes. Headnodes’ main purpose is to allow access to the worker nodes but NOT to run cpu intensive programs. All cpu intensive computations must be performed on the worker nodes. This is achieved by; – qsh command for interactive jobs and – qsub command for batch jobs. Once you log into iceberg, taking advantage of the power of a worker-node for interactive work is done simply by typing qsh and working in the new shell window that is opened. The next set of slides assume that you are already working on one of the worker nodes (qsh session).

Practice Session 1: Running Applications on Iceberg (Problem 1) Case Studies –Analysis of Patient Inflammation Data Running an R application how to submit jobs and run R interactively List available and loaded modules load the module for the R package Start the R Application and plot the inflammation data

Managing Your Jobs Sun Grid Engine Overview SGE is the resource management system, job scheduling and batch control system. ( Others available such as PBS, Torque/Maui, Platform LSF ) Starts up interactive jobs on available workers Schedules all batch orientated ‘i.e. non-interactive’ jobs Attempts to create a fair-share environment Optimizes resource utilization

Scheduling ‘qsub’ batch jobs on the cluster SGE worker node SGE MASTER node Queue-AQueue-BQueue-C A Slot 1A Slot 2B Slot 1C Slot 1C Slot 2C Slot 3B Slot 1B Slot 2B Slot 3 B Slot 1C Slot 1 C Slot 2A Slot 1B Slot 1C Slot 1  Queues  Policies  Priorities  Share/Tickets  Resources  Users/Projects JOB YJOB ZJOB XJOB UJOB OJOB N

Demonstration 1 Using the R package to analyse patient data qsub example: qsub –l h_rt=10:00:00 –o myoutputfile –j y myjob OR alternatively … the first few lines of the submit script myjob contains - $!/bin/bash $# -l h_rt=10:00:00 $# -o myoutputfile $# -j y and you simply type; qsub myjob Running Jobs batch job example

Submitting your job There are two SGE commands submitting jobs; –qsh or qrsh : To start an interactive job –qsub : To submit a batch job There are also a list of home produced commands for submitting some of the popular applications to the batch system. They all make use of the qsub command. These are; runfluent, runansys, runmatlab, runabaqus

Managing Jobs monitoring and controlling your jobs There are a number of commands for querying and modifying the status of a job running or waiting to run. These are; –qstat or Qstat (query job status) qstat –u username –qdel (delete a job) qdel jobid –qmon ( a GUI interface for SGE )

Practice Session: Submitting Jobs To Iceberg (Problem 2 & 3) Patient Inflammation Study run the R example as a batch job Case Study –Fish population simulation Submitting jobs to Sun Grid Engine Instructions are in the readme file in the sge folder of the course examples –From an interactive session Load the compiler module Compile the fish program Run test1, test2 and test3

Managing Jobs: Reasons for job failures –SGE cannot find the binary file specified in the job script –You ran out of file storage. It is possible to exceed your filestore allocation limits during a job that is producing large output files. Use the quota command to check this. –Required input files are missing from the startup directory –Environment variable is not set correctly (LM_LICENSE_FILE etc) –Hardware failure (eg. mpi ch_p4 or ch_gm errors)

Finding out the memory requirements of a job Virtual Memory Limits: –Default virtual memory limits for each job is 6 GBytes –Jobs will be killed if virtual memory used by the job exceeds the amount requested via the –l mem= parameter. Real Memory Limits: –Default real memory allocation is 2 GBytes –Real memory resource can be requested by using –l rmem= –Jobs exceeding the real memory allocation will not be deleted but will run with reduced efficiency and the user will be ed about the memory deficiency. –When you get warnings of that kind, increase the real memory allocation for your job by using the –l rmem= parameter. –rmem must always be less than mem Determining the virtual memory requirements for a job; –qstat –f –j jobid | grep mem –The reported figures will indicate - the currently used memory ( vmem ) - Maximum memory needed since startup ( maxvmem) - cumulative memory_usage*seconds ( mem ) –When you run the job next you need to use the reported value of vmem to specify the memory requirement

Managing Jobs: Running arrays of jobs Many processors running a copy of a task independently Add the –t parameter to the qsub command or script file (with #$ at beginning of the line) –Example: –t 1-10 This will create 10 tasks from one job Each task will have its environment variable $SGE_TASK_ID set to a single unique value ranging from 1 to 10. There is no guarantee that task number m will start before task number n, where m<n.

Managing Jobs : Running cpu-parallel jobs More many processor tasks –Sharing memory –Distributed Memory Parallel environment needed for a job can be specified by the: -pe nn parameter of qsub command, where is.. –openmp : These are shared memory OpenMP jobs and therefore must run on a single node using its multiple processors. –openmpi-ib : OpenMPI library-Infiniband. These are MPI jobs running on multiple hosts using the Infiniband Connection ( 32GBits/sec ) –mvapich2-ib : Mvapich-library-Infiniband. As above but using the MVAPICH MPI library. Compilers that support MPI. –PGI, Intel, GNU

Running GPU parallel jobs GPU parallel processing is supported on 8 Nvidia Tesla Fermi M2070s GPU units attached to iceberg. In order to use the GPU hardware you will need to join the GPU project by ing You can then submit jobs that use the GPU facilities by using the following three parameters to the qsub command; -P gpu -l arch=intel* -l gpu=nn where 1<= nn <= 8 is the number of gpu-modules to be used by the job. P stands for project that you belong to. See next slide.

Demonstration 3 Test 6 provides an opportunity to practice submitting parallel jobs to the scheduler. To run testmpi6, compile the mpi example –Load the openmpi compiler module –module load mpi/intel/openmpi/1.8.3 compile the diffuse program –mpicc diffuse.c -o diffuse –qsub testmpi6 –Use qstat to monitor the job examine the output Running a parallel job

Practice Session: Submitting A Task Array To Iceberg (Problem 4) Case Study –Fish population simulation Submitting jobs to Sun Grid Engine Instructions are in the readme file in the sge folder of the course examples –From an interactive session Run the SGE task array example –Run test4, test5

9. Remote Visualisation See –Specialist High Speed Visualization Access to iceberg – Undertake visualisation using thin clients accessing remote high quality visualisation hardware Remote visualisation removes the need to transfer data and allows researchers to visualise data sets on remote visualisation servers attached to the high performance computer and its storage facility

VirtualGL VirtualGL is an open source package which gives any UNIX or Linux remote display software the ability to run 3D applications with full hardware accelerations. VirtualGL can also be used in conjunction with remote display software such as VNC to provide 3D hardware accelerated rendering for OpenGL applications. VirtualGL is very useful in providing remote display to thin clients which lack the 3D hardware acceleration.

Client Access to Visualisation Cluster VirtualGL Client Iceberg – Campus Compute Cloud VirtualGL Server (NVIDIA GPU)

Remote Visualisation Using SGD Star a browser and goto – –login to Sun Global Desktop Under Iceberg Applications start the Remote visualisation session This opens a shell with instructions to either –Open a browser and enter the address –Start Tiger VNCViewer on your desktop Use the address iceberg.shef.ac.uk:XXXX XXXX is a port address provided on the iceberg terminal When requested use your usual iceberg user credentials

Remote Desktop Through VNC

Remote Visualisation Using Tiger VNC and the Putty SHH Client Login in to iceberg using putty At the prompt type qsh-vis This opens a shell with instructions to either –Open a browser and enter the address –Start Tiger VNCViewer on your desktop Use the address iceberg.shef.ac.uk:XXXX XXXX is a port address provided on the iceberg terminal When requested use your usual iceberg user credentials

Beyond Iceberg Iceberg OK for many compute problems Purchasing dedicated resource N8 tier 2 facility for more demanding compute problems Hector/Archer Larger facility for grand challenge problems (pier review process to access)

High Performance Computing Tiers Tier 1 computing –Hector, Archer Tier 2 Computing –Polaris Tier 3 Computing –Iceberg

Purchasing Resource Buying nodes using framework –Research Groups purchase HPC equipment against their research grant this hardware is integrated with Iceberg cluster Buying slice of time –Research groups can purchase servers for a length of time specified by the research group (cost is 1.7p/core per hour) Servers are reserved for dedicated usage by the research group using a provided project name When reserved nodes are idle they become available to the general short queues. They are quickly released for use by the research group when required. For information

The N8 Tier 2 Facility: Polaris Note N8 is for users whose research problems require greater resource than that available through Iceberg Registration is through Projects –Authourisation by a supervisor or project leader to register project with the N8 –Users obtain a project code from supervisor or project leader –Complete online form provide an outline of work explaining why N8 resources are required

5312 Intel Sandy Bridge cores Co-located with 4500-core Leeds HPC Purchased through Esteem framework agreement: SGI hardware #291 in June 2012 Top500 Polaris: Specifications

National HPC Services Archer UK National Supercomputing Service Hardware – CRAY XC Standard nodes Each node contains two Intel E v2 12-core processors Therefore 2632*2* cores. 64 GB of memory per node 376 high memory nodes with128GB memory Nodes connected to each other via ARIES low latency interconnect Research Data File System – 7.8PB disk EPCC HPCC Facilities Training and expertise in parallel computing

Links for Software Downloads Putty WinSCP TigerVNC Main_Page