High-Performance Computing System

High-Performance Computing System
CSU-CU Summit High-Performance Computing System

General Info Websites hpc.colostate.edu
This presentation will be in “News” link rc.colorado.edu

Summit: Schematic Rack Layout
Storage rack Compute rack 1 Compute rack 2 Compute rack 3 Compute rack 4 Compute rack 5 Compute rack 6 Compute rack 7 1 PB scratch GPFS DDN SFA14K HiMem nodes (5) 2 TB RAM / node Ethernet mgt. nodes OmniPath leaf nodes Nvidia K80 GPU nodes (10) core nodes Intel Knights Landing Phi nodes (20) OPA fabric Gateway nodes Intel Haswell nodes (376) Note: actual rack layout may differ from this schematic

CPU Nodes 380 CPU nodes Dell Poweredge C6320
9,120 Intel Haswell CPU cores 2X Intel Xeon E5-2680v3; 2.5 GHz 24 CPU cores / node 128 GB RAM / node 5.3 GB RAM / CPU core 200 GB SATA SSD / node

GPU Nodes 10 GPU nodes Dell Poweredge C4130 99,840 GPU cores
2X Nvidia K80 GPU cards / node 2X Intel Xeon E5-2680v3; 2.5 GHz 24 CPU cores / node 128 GB RAM / node 5.3 GB RAM / CPU core 200 GB SATA SSD

Knights-Landing Nodes
20 KnL nodes 1,440 KnL-F cores 72 Silvermont/Atom cores / node; 1.3 GHz 16 GB HBM (high bandwidth memory) 3D stacked MCDRAM (multi-channel DRAM) 384 GB DDR4 platform RAM 200 GB SATA SSD / node Delivery: 3Q 2017

GPU / KnL-F GPU and KnL-F computing will be addressed in separate workshops

HiMem Nodes 5 HiMem nodes Dell Poweredge R930
4X Intel Xeon E7-4830v3; 2.1 GHz 2 TB RAM / node (DDR4) 48 CPU cores / node 42 GB RAM / CPU core 200 GB SAS SSD / node 12 TB SAS HDD / node

Scratch Storage 1 Petabyte (PB) scratch storage
DDN SFA14K block storage appliance 21 Gbytes / sec. sequential R/W; 6M random 4K IOP’s RAID6 array GPFS (General Parallel File System): Spectrum Scale High-speed parallel I/O Quota Initial: 10 TB / account Request increase by sending to

Interconnect OmniPath 100 Gb/s bandwidth; 1.5us latency
Fat-Tree topology 2:1 oversubscription; cost-performance tradeoff

Accounts See hpc.colostate.edu, “Get Started”
Get CSU eID (eid.colostate.edu) OR get CSU Associate’s eID Fill in Account Application Form on the “Get Started” page Set up DUO Two-Factor authentication (PDF online) Request CU-Boulder account (rcamp.rc.colorado.edu/accounts/account-request/create/general) Organization: Colorado State University Username: CSU eID Password: CSU password,DUO_key Role: choose anything Preferred login shell: leave it as “bash” Check “Summit supercomputing cluster” Receive account confirmation NOTE: DUO_key cycles every 15 seconds

Allocations 1 Service Unit (SU) = 1 core-hour
(i.e. full utilization of a single core on 1 compute node for 1 hour) 380 CPU nodes = ca. 80M SU/yr 10 GPU nodes = ca. 5M SU/yr 5 HiMem nodes = ca. 12M SU/yr Total = ca. 97M SU/yr RMACC gets 10% = 9.7M SU/yr UCB gets 75% of non-RMACC share = 65M SU/yr CSU gets 25% of non-RMACC share = 22M SU/yr Two types of allocation 1) Initial allocation All new accounts = 50K SU’s; expires after 1 year 2) Project allocation If >50K SU’s / yr, then get Project allocation Submit request form (PDF online) Reviewed by Management & Allocations committee

Remote Login SSH ssh csu_ePassword,push OR ssh csu_ePassword,DUO_key SSH client software Apple OSX Terminal (en.wikipedia.org/wiki/Terminal_(macOS)) Windows PuTTY (putty.org) Linux Terminal ( introduction-to-the-linux-terminal)

Remote Login After login, always enter: >ssh scompile
scompile nodes = Haswell CPU compute nodes Users load balanced among multiple nodes compile & test code Not required after CU Janus system is decommissioned (later this year)

File Transfer: Slow SFTP
sftp csu_ePassword,push OR csu_ePassword,DUO_key Also: FileZilla (filezilla-project.org) PuTTY (putty.org)

File Transfer: Fast GLOBUS globus.org fast parallel file transfer
CU Summit is ready with Globus CSU NOT quite ready with Globus at CSU, can use Globus personal connect create Globus “personal endpoint” on workstation, server, etc. bandwidth limited by slowest link, usually 1 Gb/s. Ethernet Can use endpoints outside CSU

File Transfer: Fast Usage go to globus.org

File Transfer: Fast

File Transfer: Fast Endpoints Local Endpoint: csu_cray
Server: cray2.colostate.edu Username: cray_userid Password: cray_password Remote Endpoint: CU-Boulder Research Computing Server: dtn02.rc.colorado.edu:7512 Username: Password: csu_ePassword,push

Directories - Files /home/eName@colostate.edu
2 GB, permanent, daily incremental backup 250 GB, permanent, daily incremental backup 1 PB, purged after 90 days, no backup 20 TB default quota to request quota increase, send message to /scratch/local 200GB, SSD, local to individual nodes, no backup

Modules Linux modules Simplify shell environment & software management
Lmod: hierarchical modules ml = shorthand for “module” module list (show currently loaded modules) ml list (show currently loaded modules) ml avail (show available modules; dependencies) ml spider (show all available modules) ml module_name (load module “name1”) ml unload module_name (unload module “name”) ml module_name (swap out “name1”; swap in “name2”; if they conflict) ml help module_name (description about module “name”) ml show module_name (shows paths etc. for module “name”) ml help (general help)

Modules >ml avail Compilers gcc/ intel/ (m,D) intel/ (m) pgi/16.5 Independent Applications R/ cuda/ (g) gnu_parallel/ matlab/R2016b allinea/ (m) cuda/ (g,D) idl/ ncl/6.3.0 autotools/ cudnn/4.0 (g) jdk/ papi/5.4.3 cmake/ cudnn/5.1 (g,D) jdk/ (D) paraview/5.0.1 cube/ expat/ loadbalance/ pdtoolkit/3.22 cube/ (D) git/ mathematica/ perl/5.24.0 Where: g: built for GPU m: built for host and native MIC D: Default Module

Modules >ml gcc >ml avail
----- MPI Implementations impi/ openmpi/ (D) openmpi/2.0.1 ----- Compiler Dependent Applications antlr/ fftw/ geos/ gsl/ jasper/ mkl atlas/ gdal/ grib_api/ hdf5/ jpeg/9b nco Compilers gcc/6.1.0 (L) intel/ (m,D) intel/ (m) pgi/16.5 Independent Applications R/ cuda/ (g) gnu_parallel/ matlab/R2016b allinea/ (m) cuda/ (g,D) idl/ ncl/6.3.0 autotools/ cudnn/4.0 (g) jdk/ papi/5.4.3 cmake/ cudnn/5.1 (g,D) jdk/ (D) paraview/5.0.1 cube/ expat/ loadbalance/ pdtoolkit/3.22 cube/ (D) git/ mathematica/ perl/5.24.0 Where: g: built for GPU L: Module is loaded m: built for host and native MIC D: Default Module

Compilers icc Intel C compiler icpc Intel C++ compiler ifort Intel Fortran compiler gcc GNU C compiler g++ GNU C++ compiler gfortran GNU Fortran compiler pgcc PGI C compiler pgCC PGI C++ compiler pgfortran PGI Fortran compiler

Compilers - Interpreters
ml intel; ml impi; mpicc Intel MPI C compiler ml intel; ml impi; mpicxx Intel MPI C++ compiler ml intel; ml impi; mpif90 Intel MPI Fortran compiler ml openmpi; mpicc OpenMPI C compiler ml openmpi; mpicxx OpenMPI C++ compiler ml openmpi; mpif90 OpenMPI Fortran compiler nvcc Nvidia CUDA compiler python Python interpreter perl Perl interpreter

Debug GNU gdb debugger gnu.org
search web for tutorials, cheat sheets, etc. >ml gcc >gcc -o hello -g hello.c >gdb hello (gdb) run (gdb) break hello.c:1 (gdb) step (gdb) continue (gdb) quit Intel, PGI developing material for these compilers - check website Valgrind (valgrind.org) >ml valgrind

Performance Analysis Allinea (allinea.com)
PAPI (icl.cs.utk.edu/papi/overview/) Totalview (roguewave.com/products-services/totalview) Perfsuite (perfsuite.ncsa.illinois.edu) Tau (cs.uoregon.edu/research/tau/home.php >ml allinea >ml papi >ml totalview >ml perfsuite >ml tau Developing material for these tools - check website

Libraries Intel MKL (Math Kernel Library)
library of optimized math routines for Intel architecture BLAS, LAPACK, ScaLAPACK, FFT, vectors, etc. >ssh scompile >ml intel >ml mkl >icc -o hello hello.c -mkl

Interactive Jobs >ssh scompile
>cd >srun executable Hello world! >srun -n2 executable (-n = # tasks; default 1 task/node) >srun -n2 --cpus-per-task=1 executable (-n = # tasks; 2 tasks/node) >srun -N2 executable (-N = # nodes) >srun -t 00:01:00 executable (-t = runtime limit HH:MM:SS) >man srun (man pages)

Batch Queues Slurm batch scheduler slurm.schedmd.com
Cheat sheet: slurm.schedmd.com/pdfs/summary.pdf All users have access to all partitions/nodes: Haswell CPU Nvidia GPU Intel KnL-F HiMem Condo

Batch Queues (short name) (long name) Compute node type Default time
Max time QoS shas summit-haswell (380 nodes) 4 hr 24 hr N,D,C sgpu summit-gpu (10 nodes) sknl summit-knl (20 nodes) smem summit-himem (5 nodes) (7 D) N,D,L,C

Batch Queues QoS Description Limits normal (N) Default Normal priority
debug (D) Quick turnaround for testing Priority boost long (L) For jobs with long runtimes Normal priority condo (C) For users who purchased compute nodes (“condo model”) (= 1 D wait in queue)

Batch Queues Batch job files
Suppose we have text batch file - “filename” ——————————————————————————————————————————————————————————————————————————————————————————- #!/bin/bash #SBATCH -J job_name #job name #SBATCH -p shas #partition name #SBATCH -q debug #QoS #SBATCH -t 01:00: #wall clock time #SBATCH --nodes #number of nodes #SBATCH --mail-user #send mail at job finish module load intel #load intel module module load impi #load impi libraries mpicc -o mpic mpic.c #compile mpirun -n 1 ./mpic #run ————————————————————————————————————————————————————————————————————

Batch Queues >sbatch filename (submit job)
>squeue (show job status - all jobs) >squeue -u (show job status for user only) >scancel jobid (cancel job; get jobid from squeue) >sinfo (show partitions)

Allocation is not specific number of hours
Fairshare Scheduler Allocation is not specific number of hours Allocation is a share of computer Shares averaged over 4-week period Motivation Several universities involved; instead of allocating time, everyone has equal share of machine Helps prevent system from being idle Cannot run out of allocation time Accounts never “frozen” or shut-down

FS scheduling uses complex formula to determine priority in queue
Fairshare Scheduler FS scheduling uses complex formula to determine priority in queue Examines load for each user & balances utilization to fairly share resources Involves historical use by user Considers QoS Considers how long job has been in queue Shares averaged over 4-week period Compares current use to FS target and adjusts job queue priority

Fairshare Scheduler If you are under target FS -> queue priority increased If you are over target FS -> queue priority decreased Only impacts pending jobs in queue If no other pending jobs and enough resources are available, then your job will run regardless of previous usage Encourages consistent, steady usage Discourages sporadic, “burst-like” usage

HiMem Nodes 5 HiMem compute nodes 2 TB RAM / node
In Slurm batch script file add: #SBATCH -p smem

MPI Intel MPI optimized for Intel microprocessor architectures & OmniPath interconnect based on standard MPICH2; supports MPI-3.1 distributed memory message passing libraries Usage >ssh scompile >ml intel >ml impi #include <mpi.h> #include <stdio.h> int main(int argc, char *argv[]) { int rank, numprocs; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&rank); printf("Hello from pe %d of %d\n",rank,numprocs); MPI_Finalize(); } mpicc -o hello hello.c mpirun -n 2 ./hello

MPI OpenMPI opensource version of MPI
based on standard MPICH2; supports MPI-3.1 distributed memory message passing libraries Usage >ssh scompile >ml gcc >ml openmpi #include <mpi.h> #include <stdio.h> int main(int argc, char *argv[]) { int rank, numprocs; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&rank); printf("Hello from pe %d of %d\n",rank,numprocs); MPI_Finalize(); } mpicc -o hello hello.c mpirun -np2 ./hello

OpenMP OpenMP opensource multithreading API shared memory libraries
Usage >ssh scompile >ml gcc Source code: #include <omp.h> #include "stdio.h" int main() { #pragma omp parallel int ID = omp_get_thread_num(); printf("hello(%d)\n", ID); }

OpenMP Batch script file (“fname"): #!/bin/bash #SBATCH -J openmp
#SBATCH -p shas #SBATCH --qos debug #SBATCH -t 00:01:00 export OMP_NUM_THREADS=8 gcc -fopenmp -o hello hello.c ./hello Submit job: >sbatch fname hello(1) hello(0) hello(2) hello(3) hello(4) hello(5) hello(6) hello(7)

Condo Model Researchers purchase: • CPU compute nodes
• GPU accelerators (if applicable) • KnL-F accelerators (if applicable; available Q3 2017) • Memory • Disk storage Central IT provides: • Data center facility • Shared service nodes (i.e. login nodes) • Shared OmniPath interconnect switches and cables • Ethernet management switches and cables • Shared scratch storage • Server racks • Power • Cooling • Security • Purchase, order & install equipment • Install OS • System administration • Assist with software application installation

Condo Model Condo jobs have the following privileges
request longer run times (up to 168 hrs. (7 D)) get queue priority boost (= 1 D wait in queue) access all nodes To properly activate Condo shares, Condo users should send the following info to full name csu_eName condo group ID Your csu_eName will be added to appropriate condo ID

Condo Model PI Dept. Condo group ID Michael Antolin Biology bio
Wolfgang Bangerth Mathematics mat Asa Ben-Hur Computer Science hal Stephen Guzik Mechanical Engineering cfd Tony Rappe Chemistry akr Chris Weinberger crw Ander Wilson Statistics fhw

Condo Model Ex.: suppose following text is in file “filename”
#SBATCH -p shas #SBATCH --qos condo #SBATCH -A csu-summit-xxx #SBATCH -t 40:00:00 “shas” = Haswell CPU compute nodes “condo” = charges usage to condo account (required) Note double-dash for “qos” “xxx” = 3-digit condo ID (required) “40:00:00” = HH:MM:SS >sbatch filename

Software Installation
Opensource github (github.com) sourceforge (sourceforge.net) Other sites Commercial license fees (no $$ in CSU IT) license server (FlexLM) No root account access; no sudo redirect install path to Package managers etc. yum, git, pip, rpm, Makefile, cmake, curl Dependency hell libraries compilers applications versions; major.minor.patch (semver.org) Support

Useful Commands Check current SU usage
>sreport -n -t hours -P cluster AccountUtilizationByUser start= tree Last|20000 Check fairshare usage >sshare -U Account User RawShares NormShares RawUsage EffectvUsage FairShare csu-general

Support Trouble tickets
• Submit support requests System status • To receive system updates and other announcements, send a message to “Summit System User’s Guide” on hpc.colostate.edu Contacts Richard Casey, PhD • (970) • Tobin Magle, PhD • Data management specialist • • (970) • See for more information.

High-Performance Computing System

Similar presentations

Presentation on theme: "High-Performance Computing System"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

High-Performance Computing System

Similar presentations

Presentation on theme: "High-Performance Computing System"— Presentation transcript:

Similar presentations

About project

Feedback