Requesting Resources on an HPC Facility Michael Griffiths and Deniz Savas Corporate Information and Computing Services The University of Sheffield www.sheffield.ac.uk/wrgrid.

Requesting Resources on an HPC Facility Michael Griffiths and Deniz Savas Corporate Information and Computing Services The University of Sheffield www.sheffield.ac.uk/wrgrid (Using the Sun Grid Engine Job Scheduler)

1.Using the Job Scheduler – Interactive Jobs 2.Batch Jobs 3.Task arrays 4.Running Parallel Jobs 5.GPUs and remote Visualisation 6.Beyond Iceberg Accessing the N8 tier 2 facility Outline

6.Using the Job Scheduler Interactive Jobs –http://www.sheffield.ac.uk/cics/research/hpc/using/interactivehttp://www.sheffield.ac.uk/cics/research/hpc/using/interactive Batch Jobs –http://www.sheffield.ac.uk/cics/research/hpc/using/runbatchhttp://www.sheffield.ac.uk/cics/research/hpc/using/runbatch

Running Jobs A note on interactive jobs Software that requires intensive computing should be run on the worker nodes and not the head node. You should run compute intensive interactive jobs on the worker nodes by using the qsh or qrsh command. Maximum ( and also default) time limit for interactive jobs is 8 hours.

Sun Grid Engine Two iceberg headnodes are gateways to the cluster of worker nodes. Headnodes’ main purpose is to allow access to the worker nodes but NOT to run cpu intensive programs. All cpu intensive computations must be performed on the worker nodes. This is achieved by; – qsh command for interactive jobs and – qsub command for batch jobs. Once you log into iceberg, taking advantage of the power of a worker-node for interactive work is done simply by typing qsh and working in the new shell window that is opened. The next set of slides assume that you are already working on one of the worker nodes (qsh session).

Practice Session 1: Running Applications on Iceberg (Problem 1) Case Studies –Analysis of Patient Inflammation Data Running an R application how to submit jobs and run R interactively List available and loaded modules load the module for the R package Start the R Application and plot the inflammation data

Managing Your Jobs Sun Grid Engine Overview SGE is the resource management system, job scheduling and batch control system. ( Others available such as PBS, Torque/Maui, Platform LSF ) Starts up interactive jobs on available workers Schedules all batch orientated ‘i.e. non-interactive’ jobs Attempts to create a fair-share environment Optimizes resource utilization

Scheduling ‘qsub’ batch jobs on the cluster SGE worker node SGE MASTER node Queue-AQueue-BQueue-C A Slot 1A Slot 2B Slot 1C Slot 1C Slot 2C Slot 3B Slot 1B Slot 2B Slot 3 B Slot 1C Slot 1 C Slot 2A Slot 1B Slot 1C Slot 1  Queues  Policies  Priorities  Share/Tickets  Resources  Users/Projects JOB YJOB ZJOB XJOB UJOB OJOB N

Demonstration 1 Using the R package to analyse patient data qsub example: qsub –l h_rt=10:00:00 –o myoutputfile –j y myjob OR alternatively … the first few lines of the submit script myjob contains - $!/bin/bash $# -l h_rt=10:00:00 $# -o myoutputfile $# -j y and you simply type; qsub myjob Running Jobs batch job example

Submitting your job There are two SGE commands submitting jobs; –qsh or qrsh : To start an interactive job –qsub : To submit a batch job There are also a list of home produced commands for submitting some of the popular applications to the batch system. They all make use of the qsub command. These are; runfluent, runansys, runmatlab, runabaqus

Managing Jobs monitoring and controlling your jobs http://www.sheffield.ac.uk/cics/research/hpc/using/runbatch/sge http://www.sheffield.ac.uk/cics/research/hpc/using/runbatch/sge There are a number of commands for querying and modifying the status of a job running or waiting to run. These are; –qstat or Qstat (query job status) qstat –u username –qdel (delete a job) qdel jobid –qmon ( a GUI interface for SGE )

Practice Session: Submitting Jobs To Iceberg (Problem 2 & 3) Patient Inflammation Study run the R example as a batch job Case Study –Fish population simulation Submitting jobs to Sun Grid Engine Instructions are in the readme file in the sge folder of the course examples –From an interactive session Load the compiler module Compile the fish program Run test1, test2 and test3

Managing Jobs: Reasons for job failures http://www.shef.ac.uk/cics/research/hpc/using/requirements http://www.shef.ac.uk/cics/research/hpc/using/requirements –SGE cannot find the binary file specified in the job script –You ran out of file storage. It is possible to exceed your filestore allocation limits during a job that is producing large output files. Use the quota command to check this. –Required input files are missing from the startup directory –Environment variable is not set correctly (LM_LICENSE_FILE etc) –Hardware failure (eg. mpi ch_p4 or ch_gm errors)

Finding out the memory requirements of a job Virtual Memory Limits: –Default virtual memory limits for each job is 6 GBytes –Jobs will be killed if virtual memory used by the job exceeds the amount requested via the –l mem= parameter. Real Memory Limits: –Default real memory allocation is 2 GBytes –Real memory resource can be requested by using –l rmem= –Jobs exceeding the real memory allocation will not be deleted but will run with reduced efficiency and the user will be emailed about the memory deficiency. –When you get warnings of that kind, increase the real memory allocation for your job by using the –l rmem= parameter. –rmem must always be less than mem Determining the virtual memory requirements for a job; –qstat –f –j jobid | grep mem –The reported figures will indicate - the currently used memory ( vmem ) - Maximum memory needed since startup ( maxvmem) - cumulative memory_usage*seconds ( mem ) –When you run the job next you need to use the reported value of vmem to specify the memory requirement

Managing Jobs: Running arrays of jobs http://www.shef.ac.uk/cics/research/hpc/using/runbatch/examples http://www.shef.ac.uk/cics/research/hpc/using/runbatch/examples Many processors running a copy of a task independently Add the –t parameter to the qsub command or script file (with #$ at beginning of the line) –Example: –t 1-10 This will create 10 tasks from one job Each task will have its environment variable $SGE_TASK_ID set to a single unique value ranging from 1 to 10. There is no guarantee that task number m will start before task number n, where m<n.

Managing Jobs : Running cpu-parallel jobs More many processor tasks –Sharing memory –Distributed Memory Parallel environment needed for a job can be specified by the: -pe nn parameter of qsub command, where is.. –openmp : These are shared memory OpenMP jobs and therefore must run on a single node using its multiple processors. –openmpi-ib : OpenMPI library-Infiniband. These are MPI jobs running on multiple hosts using the Infiniband Connection ( 32GBits/sec ) –mvapich2-ib : Mvapich-library-Infiniband. As above but using the MVAPICH MPI library. Compilers that support MPI. –PGI, Intel, GNU

Running GPU parallel jobs GPU parallel processing is supported on 8 Nvidia Tesla Fermi M2070s GPU units attached to iceberg. In order to use the GPU hardware you will need to join the GPU project by emailing research-it@sheffield.ac.uk You can then submit jobs that use the GPU facilities by using the following three parameters to the qsub command; -P gpu -l arch=intel* -l gpu=nn where 1<= nn <= 8 is the number of gpu-modules to be used by the job. P stands for project that you belong to. See next slide.

Demonstration 3 Test 6 provides an opportunity to practice submitting parallel jobs to the scheduler. To run testmpi6, compile the mpi example –Load the openmpi compiler module –module load mpi/intel/openmpi/1.8.3 compile the diffuse program –mpicc diffuse.c -o diffuse –qsub testmpi6 –Use qstat to monitor the job examine the output Running a parallel job

Practice Session: Submitting A Task Array To Iceberg (Problem 4) Case Study –Fish population simulation Submitting jobs to Sun Grid Engine Instructions are in the readme file in the sge folder of the course examples –From an interactive session Run the SGE task array example –Run test4, test5

9. Remote Visualisation See –Specialist High Speed Visualization Access to iceberg –http://www.sheffield.ac.uk/cics/research/hpc/using/access/introhttp://www.sheffield.ac.uk/cics/research/hpc/using/access/intro Undertake visualisation using thin clients accessing remote high quality visualisation hardware Remote visualisation removes the need to transfer data and allows researchers to visualise data sets on remote visualisation servers attached to the high performance computer and its storage facility

VirtualGL VirtualGL is an open source package which gives any UNIX or Linux remote display software the ability to run 3D applications with full hardware accelerations. VirtualGL can also be used in conjunction with remote display software such as VNC to provide 3D hardware accelerated rendering for OpenGL applications. VirtualGL is very useful in providing remote display to thin clients which lack the 3D hardware acceleration.

Client Access to Visualisation Cluster VirtualGL Client Iceberg – Campus Compute Cloud VirtualGL Server (NVIDIA GPU)

Remote Visualisation Using SGD Star a browser and goto –https://myapps.shef.ac.ukhttps://myapps.shef.ac.uk –login to Sun Global Desktop Under Iceberg Applications start the Remote visualisation session This opens a shell with instructions to either –Open a browser and enter the address http://iceberg.shef.ac.uk:XXXX –Start Tiger VNCViewer on your desktop Use the address iceberg.shef.ac.uk:XXXX XXXX is a port address provided on the iceberg terminal When requested use your usual iceberg user credentials

Remote Desktop Through VNC

Remote Visualisation Using Tiger VNC and the Putty SHH Client Login in to iceberg using putty At the prompt type qsh-vis This opens a shell with instructions to either –Open a browser and enter the address http://iceberg.shef.ac.uk:XXXX –Start Tiger VNCViewer on your desktop Use the address iceberg.shef.ac.uk:XXXX XXXX is a port address provided on the iceberg terminal When requested use your usual iceberg user credentials

Beyond Iceberg http://www.sheffield.ac.uk/cics/research/hpc/iceberg/costs http://www.sheffield.ac.uk/cics/research/hpc/iceberg/costs Iceberg OK for many compute problems Purchasing dedicated resource N8 tier 2 facility for more demanding compute problems Hector/Archer Larger facility for grand challenge problems (pier review process to access)

High Performance Computing Tiers Tier 1 computing –Hector, Archer Tier 2 Computing –Polaris Tier 3 Computing –Iceberg

Purchasing Resource http://www.sheffield.ac.uk/cics/research/hpc/iceberg/costs http://www.sheffield.ac.uk/cics/research/hpc/iceberg/costs Buying nodes using framework –Research Groups purchase HPC equipment against their research grant this hardware is integrated with Iceberg cluster Buying slice of time –Research groups can purchase servers for a length of time specified by the research group (cost is 1.7p/core per hour) Servers are reserved for dedicated usage by the research group using a provided project name When reserved nodes are idle they become available to the general short queues. They are quickly released for use by the research group when required. For information e-mail research-it@Sheffield.ac.ukresearch-it@Sheffield.ac.uk

The N8 Tier 2 Facility: Polaris http://www.shef.ac.uk/cics/research/hpc/polaris Note N8 is for users whose research problems require greater resource than that available through Iceberg Registration is through Projects –Authourisation by a supervisor or project leader to register project with the N8 –Users obtain a project code from supervisor or project leader –Complete online form provide an outline of work explaining why N8 resources are required

5312 Intel Sandy Bridge cores Co-located with 4500-core Leeds HPC Purchased through Esteem framework agreement: SGI hardware #291 in June 2012 Top500 Polaris: Specifications

National HPC Services Archer UK National Supercomputing Service Hardware – CRAY XC30 2632 Standard nodes Each node contains two Intel E5-2697 v2 12-core processors Therefore 2632*2*12 63168 cores. 64 GB of memory per node 376 high memory nodes with128GB memory Nodes connected to each other via ARIES low latency interconnect Research Data File System – 7.8PB disk http://www.archer.ac.uk/ EPCC HPCC Facilities http://www.epcc.ed.ac.uk/facilities/national-facilities Training and expertise in parallel computing

Links for Software Downloads Putty http://www.chiark.greenend.org.uk/~sgtatham/putty/ WinSCP http://winscp.net/eng/download.php TigerVNC http://sourceforge.net/projects/tigervnc/ http://sourceforge.net/apps/mediawiki/tigervnc/index.php?title= Main_Page

Requesting Resources on an HPC Facility Michael Griffiths and Deniz Savas Corporate Information and Computing Services The University of Sheffield www.sheffield.ac.uk/wrgrid.

Similar presentations

Presentation on theme: "Requesting Resources on an HPC Facility Michael Griffiths and Deniz Savas Corporate Information and Computing Services The University of Sheffield www.sheffield.ac.uk/wrgrid."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Requesting Resources on an HPC Facility Michael Griffiths and Deniz Savas Corporate Information and Computing Services The University of Sheffield www.sheffield.ac.uk/wrgrid.

Similar presentations

Presentation on theme: "Requesting Resources on an HPC Facility Michael Griffiths and Deniz Savas Corporate Information and Computing Services The University of Sheffield www.sheffield.ac.uk/wrgrid."— Presentation transcript:

Similar presentations

About project

Feedback