Download presentation
Presentation is loading. Please wait.
1
Center for High Performance Computing
Overview of CHPC Anita M. Orendt Center for High Performance Computing Spring 2014
2
Overview CHPC Services HPC Clusters Access and Security
11/9/2018 Overview CHPC Services HPC Clusters Access and Security Batch (PBS and Moab) Allocations Getting Help 11/9/2018
3
CHPC Services HPC Clusters (focus of this talk)
Advanced network services Specialized research services (statistics, database support,..) Research consulting services Campus infrastructure support 11/9/2018
4
Downtown Data Center Came online spring 2012
Shared with enterprise (academic/hospital) groups (wall between rooms) low PUE/power costs 92 racks for research computing 1.2MW of power with upgrade path to add capacity for research computing Metro optical ring connecting campus, data center, & internet2 24/7/365 facility
5
383 nodes/4596 cores Infiniband and GigE Kingspeak
11/9/2018 Ember 383 nodes/4596 cores Infiniband and GigE Kingspeak 159 nodes/2688 cores Infiniband and GigE 12 GPU nodes Switch Turret Arch /scratch/ibrix PFS Apex Arch Great Wall Telluride cluster NFS meteoXX/atmosXX nodes /scratch/kingspeak/serial NFS Home Directories & Group Directories Administrative Nodes 11/9/2018
6
Ember 383 dual-six core nodes (4596 cores) - Intel Xeon 2.8 GHz (Westmere) 253 Prof Phil Smith will become ash.chpc.utah.edu end of month 67 general CHPC 63 other owner nodes 12 GPU nodes (6 general, 6 owner) dual-six cores with each 2 M2090 GPU cards – access upon request mostly 2 GB memory per core Interactive node: ssh –X ember.chpc.utah.edu 11/9/2018
7
Kingspeak 123 dual-eight core nodes – Intel Xeon 2.6 GHz (Sandy bridge) and 36 dual-ten core nodes (Ivy bridge) 32 general CHPC 16 core nodes 4 general CHPC 20 core nodes Rest are owner nodes 64 GB memory/node (on CHPC) Interactive node: ssh –X kingspeak.utah.edu 11/9/2018
8
Special Purpose Clusters/Servers
apexarch: HIPAA compliant; 15 nodes kachina: Windows node with 48 cores, 512 GB Memory, 1TB drive swasey: HIPAA version of kachina greatwall: 2 nodes (2.8 GHz/48 GB) & 2 nodes (2.2GHz/32 GB) UCS cluster: 16 nodes (8 with 12 core/96GB RAM and 8 with 20 core/256 GB RAM 11/9/2018
9
File Systems Home directory space NFS mounted on all HPC platforms
11/9/2018 File Systems Home directory space NFS mounted on all HPC platforms /uufs/chpc.utah.edu/common/home/<uNID> Either in chpc provided – 50GB/users & not backed up OR in PI purchased home (backed up) Group file systems (NFS mounted) Most are /uufs/chpc.utah.edu/common/home/<PI-groupX> Periodic Archive Option available Scratch space for HPC systems Files older than 30 days eligible for removal /scratch/parallel going away, may come back as something else. 11/9/2018
10
Scratch File Systems Shared NFS mounted Shared PFS
/scratch/kingspeak/serial (175 TB) Shared PFS /scratch/ibrix/chpc_gen (56 TB) Both available from ember and kingspeak Also have /scratch/local – local hard drive on each node Ember: 414+ GB Kingspeak: 337+ GB 11/9/2018
11
Utilization & Performance Graphs
To check the current and historical status of the clusters, file system, network 11/9/2018
12
Getting an Account Application Procedure
11/9/2018 Getting an Account Application Procedure CHPC uses the campus uNID and password Passwords maintained by the campus information system Will be changing to web maintenance. 11/9/2018
13
Logins Access to interactive nodes No (direct) access to compute nodes
11/9/2018 Logins Access to interactive nodes ssh (–X) ember.chpc.utah.edu ssh (–X) kingspeak.chpc.utah.edu The –X is needed for any X forwarding No (direct) access to compute nodes 11/9/2018
14
Security Policies No clear text passwords, use ssh and scp
You may not share your account under any circumstances Don’t leave your terminal unattended while logged into your account Do not introduce classified or sensitive work onto CHPC systems Use a good password and protect it 11/9/2018
15
Security Policies (2) Do not try to break passwords, tamper with files etc. Do not distribute or copy privileged data or software Report suspicions to CHPC Please see for more details 11/9/2018
16
Getting Started Getting started guide Environment scripts
11/9/2018 Getting Started Getting started guide / Environment scripts / / New accounts now get these copied into home directory automatically. 11/9/2018
17
Environment All new accounts provided with login or “dot” files.
You will be asked to update these files on occasion On the web at e.g. “wget 11/9/2018
18
11/9/2018 Environment (2) Running the CHPC environment script will set your path correctly for that cluster using the $UUFSCELL environment variable. If your are logged into one of the ember nodes the path /uufs/$UUFSCELL/sys/bin is added to your PATH env variable $UUFSCELL will be set to ember.arches showq will show you the queues for ember If on kingspeak, $UUFSCELL is kingspeak.peaks Demo 11/9/2018
19
Batch System Two components Required for all significant work
Torque (OpenPBS ) Resource Manager Moab (Maui) Scheduler Required for all significant work Interactive nodes only used for short compiles, editing, and very short test runs (no more that 15 minutes!) 11/9/2018
20
Torque Build of OpenPBS Resource Manager Main commands:
11/9/2018 Torque Build of OpenPBS Resource Manager Main commands: qsub <script> qstat qdel $jobid Used with a scheduler Demo 11/9/2018
21
11/9/2018 Simple Batch Script #PBS -S /bin/bash #PBS -l nodes=2:ppn=12,walltime=1:00:00 #PBS –A mygroup #PBS -M #PBS -N myjob # Create scratch directory mkdir -p /scratch/ibrix/chpc_gen/$USER/$PBS_JOBID # Change to working directory cd /scratch/ibrix/chpc_gen/$USER/$PBS_JOBID # Copy data files to scratch directory cp $HOME/work_dir/files /scratch/ibrix/chpc_gen/$USER/$PBS_JOBID #Execute Job source /uufs/ember.arches/sys/pkg/mvapich2/1.9i/etc/mvapich2.sh /uufs/ember.arches/sys/pkg/mvapich2/1.9i/bin/mpirun -np 24 -hostfile $PBS_NODEFILE ./hello_24 # Copy files back home and cleanup cp * $HOME/work_dir && rm –rf /scratch/ibrix/chpc_gen/$USER/$PBS_JOBID One time you must create the /scratch/serial/$USER directory. 11/9/2018
22
Moab scheduler Used on all HPC systems Enforces policies
Sets priorities based on: qos out of allocation (“free cycle”) Optimizes throughput on the system 11/9/2018
23
Moab Commands Main commands: showq showstart <jobnumber>
11/9/2018 Moab Commands Main commands: showq -i (idle jobs) -r (running) -b (blocked) showstart <jobnumber> checkjob <jobnumber> Demo 11/9/2018
24
Batch Policies - Ember Qos and Time limits (CHPC nodes) qos=general - 72 hours wallclock limit (preemptor) qos=long - 14 days wallclock limit (preemptor) - need permission; maxproc=24 qos=freecycle - 72 hours wallclock limit (preemptee) qos=general-gpu – 72 hours wallclock limit – need permission Dr. Smith’s 253 nodes: available for non-group members qos=smithp-guest – 24 hours wallclock time (preemptee) running on these nodes does not affect ember allocation #PBS –A smithp-guest 11/9/2018
25
Bath Policies - Kingspeak
QOS and time limits (CHPC nodes) - qos=general - 72 hours wallclock limit (preemptor) - qos=freecycle – 72 hours wallclock limit (preemptee) - qos=long - 14 days wallclock limit (preemptor) - need permission; maxproc=2 11/9/2018
26
Trouble Shooting Won’t start: showq –i mdiag –p
11/9/2018 Trouble Shooting Won’t start: showq –i mdiag –p mdiag [–v] -j <jobnumber> qstat –f <jobnumber> qstat –n (on running to get nodes) showstart mshow –a –w [qos=general] 11/9/2018
27
Software on the clusters
For use on ALL clusters: /uufs/chpc.utah.edu/sys/pkg/$package/$version e.g: /uufs/chpc.utah.edu/sys/pkg/matlab/R2012a/bin/matlab Cluster specific builds: /uufs/$cluster.utah.edu/sys/pkg/$package/$version e.g: /uufs/ember.arches/sys/pkg/openmpi/1.6.5i/bin/mpicc /uufs/kingspeak.peaks/sys/pkg/openmpi/1.6.5i/bin/mpicc 11/9/2018
28
Software Support CHPC Staff will both do or help you do software installations Any questions – make request via Licensing costs – CHPC pays for some packages that are widely used; other packages paid by Pis; sometimes cost shared Can lock down usage to group if needed 11/9/2018
29
Allocations General allocation pool
11/9/2018 Allocations General allocation pool Ember: 1 SU/core-hour => 1 h on 1 Ember node = 12 SUs Kingspeak: 1.5 SU/core-hour => 1h on 1 16 core Kingspeak node = 24 SUs Check current balance/usage by PI or user: Allocation policies: 11/9/2018
30
Regular Allocations One general allocation pool for ember & kingspeak
11/9/2018 Regular Allocations One general allocation pool for ember & kingspeak Unused allocations not carried forward to next quarter Requests for up to 4 quarters at a time due: September 1st (for Oct-Dec period) December 1st (for Jan-Mar period) March 1st (for Apr-Jun period) June 1st (for Jul-Sep period) Next Allocations Due March 1th Allocation should be completed online: 11/9/2018
31
Quick Allocations Quick allocations may be provided for up to 5000 SU’s for the current quarter only for PIs that do not an allocation Can be done any time during a quarter Review process has 1-2 business day turn around 11/9/2018
32
Getting Help http://www.chpc.utah.edu
11/9/2018 Getting Help (Please do not staff members directly – you may use cc) Help Desk: 405 INSCC, (9-5 M-F) Show web page and how to get around. Show jira system 11/9/2018
33
Upcoming Presentations
11/9/2018 Upcoming Presentations Feb 11 Introductory Linux for HPC Part 1 Feb 13 Introductory Linux for HPC Part 2 Feb 18 Intro to Parallel Computing Feb 20 Protected Environment, AI and NLP Services **HSEB2908 at 2:00PM Feb 25 Using Gaussian09 and Gaussview at CHPC Feb 27 Intro to GPU Programming Mar 4 Python for Scientific Computing All presentations are at 1:00PM in the INSCC Auditorium with the exception of the notation above. 11/9/2018
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.