Getting Started on Topsail Charles Davis ITS Research Computing April 8, 2009
2 History of Topsail Structure of Topsail File Systems on Topsail Compiling on Topsail Topsail and LSF Outline
3 Initial Topsail Cluster Initially: 1040 CPU Dell Linux Cluster 520 dual socket, single core nodes Infiniband interconnect Intended for capability research Housed in ITS Franklin machine room Fast and efficient for large computational jobs
4 Topsail Upgrade 1 Topsail upgraded to 4,160 CPU replaced blades with dual socket, quad core Intel Xeon 5345 (Clovertown) Processors Quad-Core with 8 CPU/node Increased number of processors, but decreased individual processor speed (was 3.6 GHz, now 2.33) Decreased energy usage and necessary resources for cooling system Summary: slower clock speed, better memory bandwidth, less heat Benchmarks tend to run at the same speed per core Topsail shows a net ~4X improvement Of course, this number is VERY application dependent
5 Topsail – Upgraded blades 52 Chassis: Basis of node names Each holds 10 blades -> 520 blades total Nodes = cmp-chassis#-blade# Old Compute Blades: Dell PowerEdge Single core Intel Xeon EMT64T 3.6 GHZ procs 800 Mhz FSB 2MB L2 Cache per socket Intel NetBurst MicroArchitecture New Compute Blades: Dell PowerEdge Quad core Intel 2.33 GHz procs 1333 Mhz FSB 4MB L2 Cache per socket Intel Core 2 MicroArchitecture
6 Topsail Upgrade 2 Most recent Topsail upgrade Refreshed much of the infrastructure Improved IBRIX filesystem Replaced and improved Infiniband cabling Moved cluster to ITS-Manning building Better cooling and UPS
7 Current Topsail Architecture Login node: GHz Intel EM64T, 12 GB memory Compute nodes: 4, GHz Intel EM64T, 12 GB memory Shared disk: 39TB IBRIX Parallel File System Interconnect: Infiniband 4x SDR 64bit Linux Operating System
8 Multi-Core Computing Processor Structure on Topsail 500+ nodes 2 sockets/node 1 processor/socket 4 cores/processor (Quad-core) 8 cores/node
9 Multi-Core Computing The trend in High Performance Computing is towards multi-core or many core computing. More cores at slower clock speeds for less heat Now, dual and quad core processors are becoming common. Soon 64+ core processors will be common And these may be heterogeneous!
10 The Heat Problem Taken From: Jack Dongarra, UT
11 More Parallelism Taken From: Jack Dongarra, UT
12 Infiniband Connections Connection comes in single (SDR), double (DDR), and quad data rates (QDR). Topsail is SDR. Single data rate is 2.5 Gbit/s in each direction per link. Links can be aggregated - 1x, 4x, 12x. Topsail is 4x. Links use 8B/10B encoding —10 bits carry 8 bits of data — useful data transmission rate is four-fifths the raw rate. Thus single, double, and quad data rates carry 2, 4, or 8 Gbit/s respectively. Data rate for Topsail is 8 GB/s (4x SDR).
13 Topsail Network Topology
14 Infiniband Benchmarks Point-to-point (PTP) intranode communication on Topsail for various MPI send types Peak bandwidth: 1288 MB/s Minimum Latency (1-way): 3.6 s
15 Infiniband Benchmarks Scaled aggregate bandwidth for MPI Broadcast on Topsail Note good scaling throughout the tested range (from cores)
16 Login to Topsail Use ssh to connect: ssh topsail.unc.edu SSH Secure Shell with Windows For using interactive programs with X- Windows Display: ssh –X topsail.unc.edu ssh –Y topsail.unc.edu Off-campus users (i.e. domains outside of unc.edu) must use VPN connection
17 Topsail File Systems 39TB IBRIX Parallel File System Split into Home and Scratch Space Home: /ifs1/home/my_onyen Scratch: /ifs1/scr/my_onyen Mass Storage Only Home is backed up
18 File System Limits 500GB Total Limit per User Home – 5GB limit for Backups Scratch: No limit except 500GB total Not backed up Periodically cleaned No installed packages/programs
19 Compiling on Topsail Modules Serial Programming Intel Compiler Suite for Fortran77, Fortran90, C and C++ - Recommended by Research Computing GNU Parallel Programming MPI OpenMP Must use Intel Compiler Suite Compiler tag: -openmp Must set OMP_NUM_THREADS in submission script
20 Compiling Modules Module commands module – list commands module avail – list modules module add – add module temporarily module list – list modules being used module clear – remove module temporarily Add module using startup files
21 Available Compilers Intel – ifort, icc, icpc GNU – gcc, g++, gfortran Libraries - BLAS/LAPACK MPI: mpicc/mpiCC mpif77/mpif90 mpixx is just a wrapper around the Intel or GNU compiler Adds location of MPI libraries and include files Provided as a convenience
22 Test MPI Compile Copy cpi.c to scratch directory: cp /ifs1/scr/cdavis/Topsail/cpi.c /ifs1/scr/my_onyen/. Add Intel module: module load hpc/mvapich-intel Confirm Intel module: which mpicc Compile code: mpicc –o cpi cpi.c
23 MPI/OpenMP Training Courses are taught throughout year by Research Computing Next course: MPI – Spring OpenMP – Spring
24 Running Programs on Topsail Upon ssh to Topsail, you are on the Login node. Programs SHOULD NOT be run on Login node. Submit programs to one of 4,160 Compute nodes. Submit jobs using Load Sharing Facility (LSF).
25 Job Scheduling Systems Allocates compute nodes to job submissions based on user priority, requested resources, execution time, etc. Many types of schedulers Load Sharing Facility (LSF) – Used by Topsail IBM LoadLeveler Portable Batch System (PBS) Sun Grid Engine (SGE)
26 Load Sharing Facility (LSF) Submission host LIM Batch API Master host MLIM MBD Execution host SBD Child SBD LIM RES User job LIM – Load Information Manager MLIM – Master LIM MBD – Master Batch Daemon SBD – Slave Batch Daemon RES – Remote Execution Server queue Load information other hosts other hosts bsub app
27 Submitting a Job to LSF For a compiled MPI job: bsub -n " " -o out.%J -e err.%J -a mvapich mpirun./mycode bsub – LSF command that submits job to compute node bsub –o and bsub -e Job output saved to file in submission directory
28 Queue System on Topsail Topsail uses queues to distribute jobs. Specify queue with –q in bsub: bsub –q week … No –q specified = default queue (week) Queues vary depending on size and required time of jobs See listing of queues: bqueues
29 Topsail Queues QueueTime LimitJobs/UserCPU Range int2 hrs debug2 hrs day24 hrs10244 – 1024 week1 week5124 – 256 month1 month1284 – cpu4 days – cpu4 days51232 – cpu2 days10244 – 32 chunk4 days512Batch Jobs
30 Submission Scripts Easier to write submission script that can be edited for each job submission. Example script file – run.hpl: #BSUB -n " " #BSUB -e err.%J #BSUB -o out.%J #BSUB -a mvapich mpirun./mycode Submit with: bsub < run.hpl
31 More bsub options bsub –x – NO LONGER USE!! Exclusive use of a node bsub –n 4 –R span[ptile=4] Forces all 4 processors to be on same node Similar to –x bsub –J job_name see man pages for a complete description man bsub
32 Performance Test Gromacs MD simulation of bulk water Simulation setups: Case 1: -n 8 -R span[ptile=1] Case 2: -n 8 -R span[ptile=8] Simulation times (1ns MD): Case 1: 1445 sec Case 2: 1255 sec Using 1 node only improved speed by 13%
33 Following Job After Submission bjobs bjobs –l JobID Shows current status of job bhist bhist –l JobID More details information regarding job history bkill bkill –r JobID Ends job prematurely
34 Submit Test MPI Job Submit the test MPI program on Topsail bsub –q week –n 4 –o out.%J –e err.%J –a mvapich mpirun./cpi Follow submission: bjobs Output stored in out.%J file
35 Pre-Compiled Programs on Topsail Some applications are precompiled for all users: /ifs1/apps Amber, Gaussian, Gromacs, NetCDF, NWChem Add module to path using module commands: module list – shows available applications module add – add specific application Once module command is used, executable is added to the full path
36 Test Gaussian Job on Topsail Add Gaussian Application to path: module add apps/gaussian-03e01 module list Copy input com file: cp /ifs1/scr/cdavis/water.com. Check that executable has been added to path: echo $PATH Submit job: bsub –q week –n 4 –e err.%J –o out.%J g03 water.com
37 Common Error 1 If job immediately dies, check err.%J file err.%J file has error: Can't read MPIRUN_HOST Problem: MPI enivronment settings were not correctly applied on compute node Solution: Include mpirun in bsub command
38 Common Error 2 Job immediately dies after submission err.%J file is blank Problem: ssh passwords and keys were not correctly setup at initial login to Topsail Solution: cd ~/.ssh/ mv id_rsa id_rsa-orig mv id_rsa.pub id_rsa.pub-orig Logout of Topsail Login to Topsail and accept all defaults
39 Interactive Jobs To run long shell scripts on Topsail, use int queue bsub –q int –Ip /bin/bash This bsub command provides a prompt on compute node Can run program or shell script interactively from compute node Totalview debugger can also be run interactively from Topsail
40 Further Help with Topsail More details about using Topsail can be found on the Getting Started on Topsail help document For assistance with Topsail, please contact the ITS Research Computing group For immediate assistance, see manual pages on Topsail: man