Advanced Computing Facility Introduction

Slides:



Advertisements
Similar presentations
Southgreen HPC system Concepts Cluster : compute farm i.e. a collection of compute servers that can be shared and accessed through a single “portal”
Advertisements

Introduction to HPC Workshop October Introduction Rob Lane HPC Support Research Computing Services CUIT.
ISG We build general capability Job Submission on the Olympus Cluster J. DePasse; S. Brown, PhD; T. Maiden Pittsburgh Supercomputing Center Public Health.
Introducing the Command Line CMSC 121 Introduction to UNIX Much of the material in these slides was taken from Dan Hood’s CMSC 121 Lecture Notes.
Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes.
High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.
HPCC Mid-Morning Break Interactive High Performance Computing Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 22, 2011assignprelim.1 Assignment Preliminaries ITCS 6010/8010 Spring 2011.
Introduction to UNIX/Linux Exercises Dan Stanzione.
WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.
Introduction to HPC resources for BCB 660 Nirav Merchant
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Introduction to the HPCC Jim Leikert System Administrator High Performance Computing Center.
VIPBG LINUX CLUSTER By Helen Wang March 29th, 2013.
HPC at HCC Jun Wang Outline of Workshop1 Overview of HPC Computing Resources at HCC How to obtain an account at HCC How to login a Linux cluster at HCC.
CISC105 General Computer Science Class 1 – 6/5/2006.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
Introduction to the HPCC Dirk Colbry Research Specialist Institute for Cyber Enabled Research.
Guide to Linux Installation and Administration, 2e1 Chapter 10 Managing System Resources.
Linux Operations and Administration
HPC for Statistics Grad Students. A Cluster Not just a bunch of computers Linked CPUs managed by queuing software – Cluster – Node – CPU.
Chapter Six Introduction to Shell Script Programming.
Page 1 Printing & Terminal Services Lecture 8 Hassan Shuja 11/16/2004.
Running Parallel Jobs Cray XE6 Workshop February 7, 2011 David Turner NERSC User Services Group.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Remote & Collaborative Visualization. TACC Remote Visualization Systems Longhorn – Dell XD Visualization Cluster –256 nodes, each with 48 GB (or 144 GB)
Portable Batch System – Definition and 3 Primary Roles Definition: PBS is a distributed workload management system. It handles the management and monitoring.
Advanced topics Cluster Training Center for Simulation and Modeling September 4, 2015.
Debugging Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.
Introduction to Parallel Computing Presented by The Division of Information Technology Computer Support Services Department Research Support Group.
Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128.
NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable.
Using ROSSMANN to Run GOSET Studies Omar Laldin ( using materials from Jonathan Crider, Harish Suryanarayana ) Feb. 3, 2014.
An Brief Introduction Charlie Taylor Associate Director, Research Computing UF Research Computing.
Compute and Storage For the Farm at Jlab
Workstations & Thin Clients
Hackinars in Bioinformatics
Hands on training session for core skills
GRID COMPUTING.
Specialized Computing Cluster An Introduction
Auburn University
Welcome to Indiana University Clusters
PARADOX Cluster job management
Development Environment
Open OnDemand: Open Source General Purpose HPC Portal
Assumptions What are the prerequisites? … The hands on portion of the workshop will be on the command-line. If you are not familiar with the command.
HPC usage and software packages
Welcome to Indiana University Clusters
How to use the HPCC to do stuff
Getting Started with R.
Chapter 2: System Structures
Hodor HPC Cluster LON MNG HPN Head Node Comp Node Comp Node Comp Node
Architecture & System Overview
CommLab PC Cluster (Ubuntu OS version)
Assignment Preliminaries
File Transfer Olivia Irving and Cameron Foss
Welcome to our Nuclear Physics Computing System
Introduction to HPC Workshop
College of Engineering
CCR Advanced Seminar: Running CPLEX Computations on the ISE Cluster
Welcome to our Nuclear Physics Computing System
Advanced Computing Facility Introduction
Requesting Resources on an HPC Facility
High Performance Computing in Bioinformatics
Introduction to High Performance Computing Using Sapelo2 at GACRC
bitcurator-access-webtools Quick Start Guide
Quick Tutorial on MPICH for NIC-Cluster
Working in The IITJ HPC System
Presentation transcript:

Advanced Computing Facility Introduction

Overview The Advanced Computing Facility (ACF) houses High Performance Computing (HPC) resources dedicated to scientific research 458 nodes, 8568 processing cores and 49.78TB memory 20 nodes have over 500GB memory per node 13 nodes have 64 AMD cores per node and 109 node have 24 Intel cores per node Coprocessor: Nvidia K80: 52, Nvidia K40C: 2, Nvidia K40m: 4, Nvidia K20m: 2, Nvidia M2070:1 Virtual machine operation system: Linux http://ittc.ku.edu/cluster/acf_cluster_hardware.html

Cluster Usage Website http://ganglia.acf.ku.edu/

Useful Links ACF Cluster computing resources http://ittc.ku.edu/cluster/acf_cluster_hardware.html Advanced Computing Facility (ACF) documentation main page https://acf.ku.edu/wiki/index.php/Main_Page Cluster Jobs Submission Guide https://acf.ku.edu/wiki/index.php/Cluster_Jobs_Submission_Guide Advanced guide http://www.adaptivecomputing.com/support/documentation-index/torque-resource-manager-documentation/ ACF Portal Website http://portal.acf.ku.edu/ Cluster Usage Website http://ganglia.acf.ku.edu/

ACF Portal Website http://portal.acf.ku.edu/

ACF Portal Website Monitor jobs View cluster loads Download files Upload files ...

Access Cluster System via Linux Terminal Access cluster in Nichols hall 1. Login to login server → 2. Submit cluster jobs or start an interactive session from the login server . Cluster will create a virtual machine to run your job or for your interactive session. Access cluster from off campus Use the KU Anywhere VPN first : http://technology.ku.edu/software/ku-anywhere-0 login1 server or login2 server

Access Cluster System via Linux Terminal Login to login server Use “ssh” to directly connect to the cluster login servers: login1 or login2 Examples: ssh login1 # login with your default linux account ssh -X login1 # “-X” access login server with X11 forwarding ssh <username>@login1 # login with a different linux account ssh -X <username>@login1 Login server is an entry point to the cluster and cannot support computationally intensive tasks

Access Cluster System via Linux Terminal Submit a cluster job Run “qsub” on login server to submit your job script Job script includes PBS parameters in the top portion and the commands to run in the bottom portion PBS parameters (beginning with #PBS) describe the parameters of the job Basic example: PBS parameters can be used as “qsub” arguments qsub -l nodes=2:ppn=4,mem=8000m,walltime=24:00:00 <yourscript> (Virtual machine with 2 nodes, 4 cores per node, and 8G memory) “script.sh” file: qsub script.sh #!/bin/bash # #PBS -N JobName #PBS -l nodes=2:ppn=4,mem=8000m,walltime=24:00:00 #PBS -M user1@ittc.ku.edu,user2@ittc.ku.edu #PBS -m abe echo Hello World!

Access Cluster System via Linux Terminal Start an interactive session on the cluster Basic command qlogin (= “qsub -I -q interactive -l nodes=1:ppn=1”) (Interactive session virtual machine with 1 node, 1 cores per node, and 2G memory) Advanced command Run 'qsub' to submit an interactive job. Example: qsub -I -q interactive -l nodes=3:ppn=4, mem=8000m (Interactive session virtual machine with 3 nodes, 4 cores per node, and 8G memory) Further reading https://acf.ku.edu/wiki/index.php/Cluster_Jobs_Submission_Guide http://www.adaptivecomputing.com/support/documentation-index/torque-resource-manager-documentation/

Monitoring Job Run the following commands from a login server "qstat -n1u <username>" or "qstat -nu <username>"

Application Support All installed applications can be found in /tools/cluster/6.2/ Manage software-specific environment variables: Run script "env-selector-menu" to select user's combined environment variables This creates a file in the user's home directory called “.env-selector” containing the selections. You may remove this file to clear the selections chosen. or Run “module load {module_name}” to load environment variables to support the specific software in the current shell Example: module load cuda/7.5 caffe/1.0rc3 Load environment variables for cuda 7.5 and caffe 1.0rc3 Find available modules: Run “module avail” or check what are in the folder “/tools/cluster/6.2/modules”

Rules for Job-Related Cluster Folders Folders writable without asking administrator for permission ~/ : the most heavily used on the cluster and throughout ITTC. When running cluster jobs, you may use ~/ for your compiled programs and cluster job organization, but it is important to store and access data on other filesystems. /tmp : Each node has a local storage space that is freely accessible in /tmp. It is often useful to write output from cluster jobs to the local disk, archive the results, and copy the archive to another cluster filesystem. Folders writable only with administrator's permission /data : best suited for storing large data sets. The intended usage case for /data is for files that are written once, and read multiple times. /work : best suited for recording output from cluster jobs. If a researcher has a batch of cluster jobs that will generate large amounts of output, space will be assigned in /work. /projects : used for organizing group collaborations. /scratch : the only cluster filesystem that is not backed up. This space is used for storing data temporarily during processing on the cluster. Exceptionally large data sets or large amounts of cluster jobs' output may pose difficulty for the storage backup system and are stored in /scratch during processing. /library : contains read-only space for researchers who need copies of data on each node of the cluster. Email clusterhelp@acf.ku.edu to ask for data sets to be copied to /library.

Useful GUI Software in the Cluster System matlab Technical computing nautilus File explorer gedit Text editor nsight IDE environment for debugging c++ and CUDA code Must apply for a GPU virtual machine and before running nsight, CUDA module must be loaded: module load cuda/7.5

Installed Deep Learning Software in Cluster Caffe: only GPU version module load cuda/7.5 caffe/1.0rc3 Input layer: Only support 'hdf5' file format Tensorflow: both GPU and CPU versions Example: module load tensorflow/0.8_cpu

Interactive GUI Example Matlab ssh -X login1 qsub -X -I -q interactive -l nodes=2:ppn=4,mem=8000m (Starting an interactive virtual machine with 2 nodes, 4 cores per node, and 8G memory) matlab& Nsight qsub -X –I –q gpu -l nodes=1:k40:ppn=4:gpus=2,mem=8000m (Starting an interactive virtual machine with 1 nodes, 4 cores per node, 2 k40 GPU, and 8G memory) module load cuda/7.5 nsight&

Example: Running Matlab

Example: Running Matlab

Example: Running Matlab

Example: Running Matlab

Example: Running Matlab

Caffe 'qsub' script Example #!/bin/bash # #This is an example script #These commands set up the Cluster Environment for your job: #PBS -S /bin/bash #PBS -N mnist_train_test1 #PBS -q gpu #PBS -l nodes=1:ppn=1:k40,gpus=1 #PBS -M username@ittc.ku.edu #PBS -m abe #PBS -d ~/mnist/scripts #PBS -e ~/mnist/logs/${PBS_JOBNAME}-${PBS_JOBID}.err #PBS -o ~/mnist/logs/${PBS_JOBNAME}-${PBS_JOBID}.out #Loading modules module load cuda/7.5 caffe/1.0rc3 # Save job specific information for troubleshooting echo "Job ID is ${PBS_JOBID}" echo "Running on host $(hostname)" echo "Working directory is ${PBS_O_WORKDIR}" echo "The following processors are allocated to this job:" echo $(cat $PBS_NODEFILE) # Run the program echo "Start: $(date +%F_%T)" source ${PBS_O_WORKDIR}/train_lenet_hdf5.sh echo "Stop: $(date +%F_%T)" Full example: mnist.tar.gz

ACF Virtual Machine vs. Desktop Many softwares installed in /tools/cluster/6.2 Should manually add the corresponding paths to the shell environment variables or use “env-selector-menu” or module loader to set these variables. Desktop Softwares installed in /usr/bin /usr/lib These folders are included in the searching path by default

Thank you !