Helix - HPC/SLURM Tutorial

Slides:

Advertisements

Similar presentations

Buffers & Spoolers J L Martin Think about it… All I/O is relatively slow. For most of us, input by typing is painfully slow. From the CPUs point.

Advertisements

P3- Represent how data flows around a computer system

Building a Distributed Full-Text Index for the Web S. Melnik, S. Raghavan, B.Yang, H. Garcia-Molina.

Information Technology Center Introduction to High Performance Computing at KFUPM.

Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.

Chapter 8 Operating System Support

5.1 © 2004 Pearson Education, Inc. Exam Managing and Maintaining a Microsoft® Windows® Server 2003 Environment Lesson 5: Working with File Systems.

The hybird approach to programming clusters of multi-core architetures.

Statistics of CAF usage, Interaction with the GRID Marco MEONI CERN - Offline Week –

Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.

Tutorial 6 Memory Management

Ways to Connect to OSG Tuesday afternoon, 3:00 pm Lauren Michael Research Computing Facilitator University of Wisconsin-Madison.

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

MSc. Miriel Martín Mesa, DIC, UCLV. The idea Installing a High Performance Cluster in the UCLV, using professional servers with open source operating.

◦ What is an Operating System? What is an Operating System? ◦ Operating System Objectives Operating System Objectives ◦ Services Provided by the Operating.

3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.

Hotfoot HPC Cluster March 31, Topics Overview Execute Nodes Manager/Submit Nodes NFS Server Storage Networking Performance.

Common Practices for Managing Small HPC Clusters Supercomputing 12

Swarm on the Biowulf2 Cluster Dr. David Hoover, SCB, CIT, NIH September 24, 2015.

Large Scale Parallel File System and Cluster Management ICT, CAS.

© 2014 IBM Corporation SLURM for Yorktown Bluegene/Q.

Proposal for a IS schema Massimo Sgaravatto INFN Padova.

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

Distributed Logging Facility Castor External Operation Workshop, CERN, November 14th 2006 Dennis Waldron CERN / IT.

CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.

Computer Performance. Hard Drive - HDD Stores your files, programs, and information. If it gets full, you can’t save any more. Measured in bytes (KB,

Advanced topics Cluster Training Center for Simulation and Modeling September 4, 2015.

Next Generation of Apache Hadoop MapReduce Owen

Course 03 Basic Concepts assist. eng. Jánó Rajmond, PhD

Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb

Automated File Server Disk Quota Management May 13 th, 2008 Bill Claycomb Computer Systems Analyst Infrastructure Computing Systems Department Sandia is.

A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.

Canadian Bioinformatics Workshops

References A. Silberschatz, P. B. Galvin, and G. Gagne, “Operating Systems Concepts (with Java)”, 8th Edition, John Wiley & Sons, 2009.

Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,

High Performance Computing (HPC)

Outline Introduction/Questions

HPC In The Cloud Case Study: Proteomics Workflow

Hardware & Software Unit 1 August 18, 2017.

Welcome to Indiana University Clusters

First proposal for a modification of the GIS schema

Assumptions What are the prerequisites? … The hands on portion of the workshop will be on the command-line. If you are not familiar with the command.

Xiaomei Zhang CMS IHEP Group Meeting December

About Hadoop Hadoop was one of the first popular open source big data technologies. It is a scalable fault-tolerant system for processing large datasets.

Welcome to Indiana University Clusters

Installing Galaxy on a cluster :

Distributed Network Traffic Feature Extraction for a Real-time IDS

Operating Systems (CS 340 D)

Course Introduction Dr. Eggen COP 6611 Advanced Operating Systems

ASU Saguaro 09/16/2016 Jung Hyun Kim.

Joker: Getting the most out of the slurm scheduler

Hodor HPC Cluster LON MNG HPN Head Node Comp Node Comp Node Comp Node

Introduction to XSEDE Resources HPC Workshop 08/21/2017

Operating Systems (CS 340 D)

Chapter 1: Introduction

The Basics of Apache Hadoop

College of Engineering

CCR Advanced Seminar: Running CPLEX Computations on the ISE Cluster

Advanced Computing Facility Introduction

KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures

High Performance Computing in Bioinformatics

The Design of a Grid Computing System for Drug Discovery and Design

Material for today’s workshop is at:

Introduction to High Performance Computing Using Sapelo2 at GACRC

Big Data, Bigger Data & Big R Data

The Neuronix HPC Cluster:

LO2 – Understand Computer Software

Software - Operating Systems

GPU and Large Memory Jobs on Ibex

Introduction to research computing using Condor

Presentation transcript:

Helix - HPC/SLURM Tutorial 14/09/2017

Tutorials 10-15 minutes presentation 20-25 minutes examples 20-30 minutes questions Open for suggested format, time, topics

High Performance Computing Cluster: A collection of nodes that can run computing jobs in parallel Nodes: Individual Servers with RAM, CPU and networking interfaces. Also known as Compute nodes. RAM: Random Access Memory CPU: Central Processing Unit. Executes the jobs Core: A CPU can have multiple cores. Can run multiple processes Threads: Processes that run at the same time Storage: Hard drive space to store data Partition: Similar to a queue Helix: Our Cluster

Helix: Our Cluster Comprises of: 8 x 32 core, 128Gb Nodes. 175 Tb of disk storage Has Groups: [Project ID, Name, Project leaders] SG0001 Systems Biology Lab at Centre for Systems Genomics Edmund Crampin SG0004 Centre for Systems Genomics Cluster - Leslie Group Allan Motyer, Ashley Farlow, Damjan Vukcevic, Stephen Leslie SG0005 Statistical genomics University of Melbourne David Balding SG0007 Centre for System Genomics Cluster – Kim-Anh Le Cao SG0009 COGENT – Bobbie Shaban SGA0001 Systems Genomics Associate Member Dr Sarah Dunstan Andrew Siebel, SGN0001 Project space for Oxford Nanopore data generated by Systems Genomics

List of partitions [bshaban@snowy-sg1 ~]$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST main* up 30-00:00:0 4 mix snowy[002-003,005,012] main* up 30-00:00:0 26 alloc snowy[001,004,006,008-011,013-031] main* up 30-00:00:0 1 idle snowy007 sysgen up 1-00:00:00 1 mix snowy035 sysgen up 1-00:00:00 5 idle snowy[032-034,040-041] sysgen-long up 30-00:00:0 5 mix snowy[036-039,043] sysgen-long up 30-00:00:0 1 idle snowy042

List of partitions. Cont.. The system genomics nodes are split into two sub-partitions (4 x 128GB + 2 x 512GB nodes each). Half the sysgen nodes are always available for < 24 hour duration jobs. Any job longer than 24 hours will only be eligible for a sub-partition, and the main partition. Any user (or project) can run a maximum of 256 cores worth of jobs at a time (global default), but the maximum number of cores a single job can use on a systems genomics partition is 192 cores (4 x 32 + 2 x 32). Any larger job will only run on the main partition.

Head node – Snowy-sg1 For Job submission only! Can use srun or sbatch. Only has 34Gb of memory. Can easily be consumed by a job If head node over subscribes then no jobs can be submitted – not being monitored Submit via sbatch – job is scheduled on a compute node irrespective of head node Use srun, sbatch or sinteractive – jobs will die if head node crashes

Resources limitations per node [bshaban@snowy-sg1 ~]$ sinfo --format "%n %20E %12U %19H %6t %a %b %C %m %e" HOSTNAMES REASON USER TIMESTAMP STATE AVAIL ACTIVE_FEATURES CPUS(A/I/O/T) MEMORY FREE_MEM snowy002 none Unknown Unknown mix up (null) 31/1/0/32 125000 117569 snowy003 none Unknown Unknown mix up (null) 8/24/0/32 125000 115214 snowy005 none Unknown Unknown mix up (null) 5/27/0/32 125000 118985

Resource limitations per user [bshaban@snowy-sg1 ~]$ mylimits ----------------------------------------------------------------- The following limits apply to your account on snowy: * Max number of idle jobs in queue (see note) 5 * Default memory per core in MB 2048 * Max memory per core in MB (use --mem-per-cpu=N) 50000 * Default wall time 10 mins * Max wall time (use --time=D-hh:mm:ss) 720 hours * Default number of CPUs 1 * Max number of CPUs per job (use --ntasks==N) 560 * Max number of jobs running at one time 192

Resource limitations per user. Cont.. [bshaban@snowy-sg1 ~]$ mydisk The following is how much disk space your current projects have: Fileset Size Used Avail Use% SG0009 20480G 125G 20355G 1% SG0004 25900G 10285G 15615G 40% SG0005 10240G 4278G 5962G 42% SGN0001 9140G 132G 9008G 1%

End Start tutorial https://slurm.schedmd.com/tutorials.html https://srcc.stanford.edu/sge-slurm-conversion Workshop September 26, 2017 - 1.30-4.30pm Melbourne Bioinformatics Boardroom, 187 Grattan Street, Carlton, VIC 3053 Australia Introduction to High Performance Computing Using High Performance Computing (HPC) resources such as Melbourne Bioinformatics in an effective and efficient manner is key to modern research. This workshop will introduce you to HPC environments and assist you to get on with your research. October 3rd as well