Hyperion HPC@WSL Named after one of the sons of Uranus and Gaia from the greek mythology. Hyperion himself: father of Helios (Sun), Selene (Moon) and Eos.

Slides:



Advertisements
Similar presentations
Operating System.
Advertisements

Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems.
1/16/2008CSCI 315 Operating Systems Design1 Introduction Notice: The slides for this lecture have been largely based on those accompanying the textbook.
High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems.
Introduction to UNIX/Linux Exercises Dan Stanzione.
Guide to Linux Installation and Administration, 2e1 Chapter 3 Installing Linux.
Operating System. Architecture of Computer System Hardware Operating System (OS) Programming Language (e.g. PASCAL) Application Programs (e.g. WORD, EXCEL)
WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems.
Introduction to HPC resources for BCB 660 Nirav Merchant
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
VIPBG LINUX CLUSTER By Helen Wang March 29th, 2013.
 H.M.BILAL Operating System Concepts.  What is an Operating System?  Mainframe Systems  Desktop Systems  Multiprocessor Systems  Distributed Systems.
Parallel Computing with Matlab CBI Lab Parallel Computing Toolbox TM An Introduction Oct. 27, 2011 By: CBI Development Team.
Using the BYU Supercomputers. Resources Basic Usage After your account is activated: – ssh You will be logged in to an interactive.
1.1 Operating System Concepts Introduction What is an Operating System? Mainframe Systems Desktop Systems Multiprocessor Systems Distributed Systems Clustered.
HPC for Statistics Grad Students. A Cluster Not just a bunch of computers Linked CPUs managed by queuing software – Cluster – Node – CPU.
Criteria for comparing OS Character-based or GUI-based Single or multi tasking Single or multi threading Weak or strong security 8,16, 32 or 64 bit processed.
Introduction to UNIX CS 2204 Class meeting 1 *Notes by Doug Bowman and other members of the CS faculty at Virginia Tech. Copyright
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Advanced topics Cluster Training Center for Simulation and Modeling September 4, 2015.
Getting Started: XSEDE Comet Shahzeb Siddiqui - Software Systems Engineer Office: 222A Computer Building Institute of CyberScience May.
Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems Multiprocessor Systems Distributed Systems Clustered System Real.
Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems Multiprocessor Systems Distributed Systems Clustered System Real.
CFI 2004 UW A quick overview with lots of time for Q&A and exploration.
Advanced Computing Facility Introduction
Compute and Storage For the Farm at Jlab
Workstations & Thin Clients
Hands on training session for core skills
GRID COMPUTING.
Auburn University
Applied Operating System Concepts
Welcome to Indiana University Clusters
Chapter 1: Introduction
Chapter 1: Introduction
Open OnDemand: Open Source General Purpose HPC Portal
Assumptions What are the prerequisites? … The hands on portion of the workshop will be on the command-line. If you are not familiar with the command.
2. OPERATING SYSTEM 2.1 Operating System Function
Operating System.
Chapter 1: Introduction
Chapter 1: Introduction
ASU Saguaro 09/16/2016 Jung Hyun Kim.
Joker: Getting the most out of the slurm scheduler
Hodor HPC Cluster LON MNG HPN Head Node Comp Node Comp Node Comp Node
The Linux Operating System
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 1: Introduction
Welcome to our Nuclear Physics Computing System
College of Engineering
CCR Advanced Seminar: Running CPLEX Computations on the ISE Cluster
Welcome to our Nuclear Physics Computing System
Advanced Computing Facility Introduction
Operating System Concepts
EDUHIPC-18: WORKSHOP ON EDUCATION FOR HIGH-PERFORMANCE COMPUTING, 17 December 2018, Bengaluru, India Parallel & Distributed Computing (PDC) Using Low Cost,
Chapter 1: Introduction
High Performance Computing in Bioinformatics
Language Processors Application Domain – ideas concerning the behavior of a software. Execution Domain – Ideas implemented in Computer System. Semantic.
Parallel computation with R & Python on TACC HPC server
Chapter 1: Introduction
Chapter 1: Introduction
Introduction to High Performance Computing Using Sapelo2 at GACRC
Chapter 1: Introduction
Chapter 1: Introduction
Operating System Concepts
Working in The IITJ HPC System
Chapter 1: Introduction
Maxwell Compute Cluster
Presentation transcript:

hyperion HPC@WSL Named after one of the sons of Uranus and Gaia from the greek mythology. Hyperion himself: father of Helios (Sun), Selene (Moon) and Eos (Dawn); together with his sister Theia

Why using hyperion HPC: high performance computing large computations gain in computing time by parallelization computational task that is broken down in… several, often many very similar independently processed …subtasks - keep in mind that subtasks with loads of I/O might not me best - keep in mind that it is often better to have fewer subtasks with loads of work than loads of subtasks with almost no work to be done (loose efficiency because of communication)

General Information Dalco Cluster Linux Operating System (CentOS 7) Plant Protection Lab Dalco Switzerland sits in Volketswil; provide HPC-environments for industries (Novartis, UBS, Sauber F1, …) and universities (ETH, Zurich, Basel, …)

General Information Master Node (1) Compute Nodes (56) Storage Server (2) Omni-Path 100 GBit/s Ethernet 2 x 10 GBit/s

Tech Specs Master Node (1): 12 Cores 64 GB Ram 1 TB local storage 3.4 (3.7) GHz 64 GB Ram 1 TB local storage

Tech Specs Compute Nodes (56): 20 Cores 64 GB Ram 240 GB local storage 2.2 (3.1) GHz 64 GB Ram 240 GB local storage 1120 cores - 64 / 20 gives 3.2GB Ram per core - total memory of 1280GB Ram

Tech Specs Storage Server (2): 16 Cores 32 GB Ram 2.2 GHz 32 GB Ram 24 SSD Disks (1.6 TB) 34 TB available hard quota 400 GB No Backup! Lustre Filesystem (distributed system) Throughput: up to 9GB/s (expected ca. 5GB/s when multiple useres) Hard quota can be raised termporarily (contact IT)

General Workflow connect to the Master Node copy your data to Storage Server tell hyperion what to do on Computing Nodes copy your data from Storage Server No Backup!

How To most information found on contact us in case of questions http://hyperion.wsl.ch contact us in case of questions happy to help… …up to a certain limit. Limit: both knowledge wise and time wise

How To: CONNECT get an account from IT via ssh see details on http://hyperion.wsl.ch via ssh Windows: download, install, and use Putty: http://hyperion.wsl.ch/download maybe better: linux subsystem UNIX-alikes (Linux, MacOS, …) built-in ssh Usual WSL-username and password ssh wueest@hyperion Contact me in case you want connection without password!

How To: RUN A JOB SLURM: Simple Linux Utility for Resource Management job submission system => suit of commands “[…] is very easy to learn and use.” use sbatch command call a batch script: testjob.slurm Cite IT from hyperion.wsl.ch/cluster

How To: BASIC BATCH SCRIPT Basic structure #!/bin/bash sbatch arguments organize data run job Alternative to bash => csh

How To: BASIC BATCH SCRIPT - shell definition - sbatch arguments => mail; ntasks; time - copying => mention mounts: eco and lud11 and lud12 - srun to run serial job (ntasks=1)

How To: SET UP A BATCH SCRIPT varieties of jobs serial job(s) array of jobs parallel jobs advanced configuration several jobs each of which with several CPUs

How To: SET UP A BATCH SCRIPT varieties of jobs serial job(s) array of jobs parallel jobs advanced configuration several jobs each of which with several CPUs examples

How To: SET UP A BATCH SCRIPT all examples produce a series of 16 files each file named with an index i1..16 can be of any format (*.txt, *.rdata, *.tif, *.ncdf, *.*)

How To: SET UP A BATCH SCRIPT varieties of jobs serial job(s) array of jobs parallel jobs advanced configuration several jobs each of which with several CPUs

A simple R script (your «task») (that creates one indexed *.txt file) # array_test.r # Collect command arguments args <- commandArgs(trailingOnly=TRUE) args_contents <- strsplit(args, " ") # set an output directory OUTPUT <- ‘/wsl/lud11/username/’ # Get first argument Index <- args_contents[[1]] # write out a file containing the index value from the batch script. write.table(Index, filename = paste(OUTPUT, Index, ’.txt’, sep = ‘’))

How To: ARRAY BATCH SCRIPT Starts 16 R instances, each of which gets one CPUs assigned and a unique SLURM_ARRAY_TASK_ID This calls 16 instances of R, each of which with a unique SLURM_ARRAY_TASK_ID assigned (which can be supplied as argument)

R script 1 CPU 1 CPU 1 CPU 1 CPU 1 CPU 1 CPU 1 CPU 1 CPU 1 CPU 1 CPU 1 CPU 1 CPU 1 CPU 1 CPU 1 CPU 1 CPU Node 1 Result: 16 files: 1.txt, 2.txt, …, 16.txt

How To: SET UP A BATCH SCRIPT varieties of jobs serial job(s) array of jobs parallel jobs advanced configuration several jobs each of which with several CPUs

A simple R script that uses foreach (and creates 16 *.txt files) # test_mpi.r # load packages require(Rmpi) require(foreach) require(doMPI) # setup parallel backend to use all available cl <- startMPIcluster() registerDoMPI(cl) #foreach loop foreach(n = 1:16) %dopar% { write.table(n, filename = paste0(‘/home/wueest/mpitest/’, n, ’.txt’)) } # close parallel backend closeCluster(cl) mpi.quit() startMPIcluster() detects all available CPUs foreach(…) in combination with %dopar% executes a for loop in parallel needed to properly clean MPI cluster connection

How To: PARALLEL BATCH SCRIPT Starts one R instance, which gets 16 CPUs assigned (mpirun needs 1 CPU) R needs to be called with mpirun so that it gets to use all available CPUs can be used in conjuction with, e.g., the foreach functionalities

R script mpirun Node 1 16 CPU 1 3 Result: 16 files: 1.txt, 2.txt, …, 16.txt

How To: SET UP A BATCH SCRIPT varieties of jobs serial job(s) array of jobs (serial or parallel) parallel jobs advanced configuration several jobs each of which with several CPUs

A simple parallel mulitcore R script (that creates one geotiff raster) # testscript.R # Collect command arguments args <- commandArgs(trailingOnly = TRUE) args_contents <- strsplit(args, " ") Index <- contents[[1]] # load libraries library(raster) library(parallel) # setup cluster that can be used by raster beginCluster(10) # read in a raster from ncdf file st <- stack(“/wsl/lud/username/netcdffile.nc”, bands = Index:(Index + 1)) # get the mean of both rasters! IMPORTANT: THIS FUNCTION SHOULD BE ABLE TO RUN IN PARRALLEL. OTHERWISE YOU WONT USE THE 10 assigned to each R instance. rx <- clusterR(st, calc, args = list(fun = mean)) # export raster to the random temporary directory writeRaster(rx, filename = paste0(‘/wsl/lud11/username/’, Index,’.tif’), format = ‘GEOTiff’) # stop the cluster (only stops the 20 R instances you created, not hyperion) endCluster() Reads rasters based on the array-index

How To: ADVANCED CONFIG Starts 16 R instances, each of which gets 10 CPUs assigned Bsp from : /mnt/lud11/karger/hyperion/run_testscript.slurm

R script Node 1 Node 2 Node 3 Node 4 10 CPU 10 CPU 10 CPU 10 CPU 10 CPU 10 CPU 10 CPU 10 CPU Node 5 Node 6 Node 7 Node 8 10 CPU 10 CPU 10 CPU 10 CPU 10 CPU 10 CPU 10 CPU 10 CPU Result: 16 geotiff raster: 1.tif, 2.tif, …, 16.tif

How To most information found on contact us in case of questions http://hyperion.wsl.ch contact us in case of questions happy to help… …up to a certain limit. Limit: both knowledge wise and time wise

Thank you! Dirk N. Karger dirk.karger@wsl.ch Rafael Wüest Karpati rafael.wueest@wsl.ch Thank you!

How To: INTERACTIVE SESSION using srun

How To: ARRAY BATCH SCRIPT Starts 16 R instances, each of which gets one CPUs assigned and a unique SLURM_ARRAY_TASK_ID This calls 16 instances of R, each of which with a unique SLURM_ARRAY_TASK_ID assigned (which can be supplied as argument)

Result: 256 files: 1_1.txt, 1_2.txt, …, 16_16.txt R script Node 1 ntask=16 ntask=16 ntask=16 ntask=16 ntask=16 ntask=16 ntask=16 ntask=16 ntask=16 ntask=16 ntask=16 ntask=16 1 CPU 1 CPU 1 CPU 1 CPU Result: 256 files: 1_1.txt, 1_2.txt, …, 16_16.txt