Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Computing Facility Introduction

Similar presentations


Presentation on theme: "Advanced Computing Facility Introduction"— Presentation transcript:

1 Advanced Computing Facility Introduction

2 Overview The Advanced Computing Facility (ACF)
houses High Performance Computing (HPC) resources dedicated to scientific research 458 nodes, 8568 processing cores and 49.78TB memory 20 nodes have over 500GB memory per node 13 nodes have 64 AMD cores per node and 109 node have 24 Intel cores per node Coprocessor: Nvidia K80: 52, Nvidia K40C: 2, Nvidia K40m: 4, Nvidia K20m: 2, Nvidia M2070:1 Virtual machine operation system: Linux

3 Cluster Usage Website

4 Useful Links ACF Cluster computing resources
Advanced Computing Facility (ACF) documentation main page Cluster Jobs Submission Guide Advanced guide ACF Portal Website Cluster Usage Website

5 ACF Portal Website

6 ACF Portal Website Monitor jobs View cluster loads Download files
Upload files ...

7 Access Cluster System via Linux Terminal
Access cluster in Nichols hall 1. Login to login server → 2. Submit cluster jobs or start an interactive session from the login server . Cluster will create a virtual machine to run your job or for your interactive session. Access cluster from off campus Use the KU Anywhere VPN first : login1 server or login2 server

8 Access Cluster System via Linux Terminal
Login to login server Use “ssh” to directly connect to the cluster login servers: login1 or login2 Examples: ssh login1 # login with your default linux account ssh -X login1 # “-X” access login server with X11 forwarding ssh # login with a different linux account ssh -X Login server is an entry point to the cluster and cannot support computationally intensive tasks

9 Access Cluster System via Linux Terminal
Submit a cluster job Run “qsub” on login server to submit your job script Job script includes PBS parameters in the top portion and the commands to run in the bottom portion PBS parameters (beginning with #PBS) describe the parameters of the job Basic example: PBS parameters can be used as “qsub” arguments qsub -l nodes=2:ppn=4,mem=8000m,walltime=24:00:00 <yourscript> (Virtual machine with 2 nodes, 4 cores per node, and 8G memory) “script.sh” file: qsub script.sh #!/bin/bash # #PBS -N JobName #PBS -l nodes=2:ppn=4,mem=8000m,walltime=24:00:00 #PBS -M #PBS -m abe echo Hello World!

10 Access Cluster System via Linux Terminal
Start an interactive session on the cluster Basic command qlogin (= “qsub -I -q interactive -l nodes=1:ppn=1”) (Interactive session virtual machine with 1 node, 1 cores per node, and 2G memory) Advanced command Run 'qsub' to submit an interactive job. Example: qsub -I -q interactive -l nodes=3:ppn=4, mem=8000m (Interactive session virtual machine with 3 nodes, 4 cores per node, and 8G memory) Further reading

11 Monitoring Job Run the following commands from a login server
"qstat -n1u <username>" or "qstat -nu <username>"

12 Application Support All installed applications can be found in
/tools/cluster/6.2/ Manage software-specific environment variables: Run script "env-selector-menu" to select user's combined environment variables This creates a file in the user's home directory called “.env-selector” containing the selections. You may remove this file to clear the selections chosen. or Run “module load {module_name}” to load environment variables to support the specific software in the current shell Example: module load cuda/7.5 caffe/1.0rc3 Load environment variables for cuda 7.5 and caffe 1.0rc3 Find available modules: Run “module avail” or check what are in the folder “/tools/cluster/6.2/modules”

13 Rules for Job-Related Cluster Folders
Folders writable without asking administrator for permission ~/ : the most heavily used on the cluster and throughout ITTC. When running cluster jobs, you may use ~/ for your compiled programs and cluster job organization, but it is important to store and access data on other filesystems. /tmp : Each node has a local storage space that is freely accessible in /tmp. It is often useful to write output from cluster jobs to the local disk, archive the results, and copy the archive to another cluster filesystem. Folders writable only with administrator's permission /data : best suited for storing large data sets. The intended usage case for /data is for files that are written once, and read multiple times. /work : best suited for recording output from cluster jobs. If a researcher has a batch of cluster jobs that will generate large amounts of output, space will be assigned in /work. /projects : used for organizing group collaborations. /scratch : the only cluster filesystem that is not backed up. This space is used for storing data temporarily during processing on the cluster. Exceptionally large data sets or large amounts of cluster jobs' output may pose difficulty for the storage backup system and are stored in /scratch during processing. /library : contains read-only space for researchers who need copies of data on each node of the cluster. to ask for data sets to be copied to /library.

14 Useful GUI Software in the Cluster System
matlab Technical computing nautilus File explorer gedit Text editor nsight IDE environment for debugging c++ and CUDA code Must apply for a GPU virtual machine and before running nsight, CUDA module must be loaded: module load cuda/7.5

15 Installed Deep Learning Software in Cluster
Caffe: only GPU version module load cuda/7.5 caffe/1.0rc3 Input layer: Only support 'hdf5' file format Tensorflow: both GPU and CPU versions Example: module load tensorflow/0.8_cpu

16 Interactive GUI Example
Matlab ssh -X login1 qsub -X -I -q interactive -l nodes=2:ppn=4,mem=8000m (Starting an interactive virtual machine with 2 nodes, 4 cores per node, and 8G memory) matlab& Nsight qsub -X –I –q gpu -l nodes=1:k40:ppn=4:gpus=2,mem=8000m (Starting an interactive virtual machine with 1 nodes, 4 cores per node, 2 k40 GPU, and 8G memory) module load cuda/7.5 nsight&

17 Example: Running Matlab

18 Example: Running Matlab

19 Example: Running Matlab

20 Example: Running Matlab

21 Example: Running Matlab

22 Caffe 'qsub' script Example
#!/bin/bash # #This is an example script #These commands set up the Cluster Environment for your job: #PBS -S /bin/bash #PBS -N mnist_train_test1 #PBS -q gpu #PBS -l nodes=1:ppn=1:k40,gpus=1 #PBS -M #PBS -m abe #PBS -d ~/mnist/scripts #PBS -e ~/mnist/logs/${PBS_JOBNAME}-${PBS_JOBID}.err #PBS -o ~/mnist/logs/${PBS_JOBNAME}-${PBS_JOBID}.out #Loading modules module load cuda/7.5 caffe/1.0rc3 # Save job specific information for troubleshooting echo "Job ID is ${PBS_JOBID}" echo "Running on host $(hostname)" echo "Working directory is ${PBS_O_WORKDIR}" echo "The following processors are allocated to this job:" echo $(cat $PBS_NODEFILE) # Run the program echo "Start: $(date +%F_%T)" source ${PBS_O_WORKDIR}/train_lenet_hdf5.sh echo "Stop: $(date +%F_%T)" Full example: mnist.tar.gz

23 ACF Virtual Machine vs. Desktop
Many softwares installed in /tools/cluster/6.2 Should manually add the corresponding paths to the shell environment variables or use “env-selector-menu” or module loader to set these variables. Desktop Softwares installed in /usr/bin /usr/lib These folders are included in the searching path by default

24 Thank you !


Download ppt "Advanced Computing Facility Introduction"

Similar presentations


Ads by Google