How to use the HPCC to do stuff

Slides:



Advertisements
Similar presentations
Learning Unix/Linux Bioinformatics Orientation 2008 Eric Bishop.
Advertisements

CCPR Workshop Lexis Cluster Introduction October 19, 2007 David Ash.
Dayu Zhang 9/8/2014 Lab02. Example of Commands pwd --- show your current directory This is home of venus, not your home directory Tilde: means you are.
Git/Unix Lab March Version Control ●Keep track of changes to a project ●Serves as a backup ●Revert to previous version ●Work on the same files concurrently.
Introducing the Command Line CMSC 121 Introduction to UNIX Much of the material in these slides was taken from Dan Hood’s CMSC 121 Lecture Notes.
Understanding the Basics of Computational Informatics Summer School, Hungary, Szeged Methos L. Müller.
1 SEEM3460 Tutorial Unix Introduction. 2 Introduction What is Unix? An operation system (OS), similar to Windows, MacOS X Why learn Unix? Greatest Software.
Using Macs and Unix Nancy Griffeth January 6, 2014 Funding for this workshop was provided by the program “Computational Modeling and Analysis of Complex.
Introduction to UNIX/Linux Exercises Dan Stanzione.
ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson, 2012, Jan 18, 2012assignprelim.1 Assignment Preliminaries ITCS 4145/5145 Spring 2012.
Lesson 7-Creating and Changing Directories. Overview Using directories to create order. Managing files in directories. Using pathnames to manage files.
 Accessing the NCCS Systems  Setting your Initial System Environment  Moving Data onto the NCCS Systems  Storing Data on the NCCS Systems  Running.
1 Intro to Linux - getting around HPC systems Himanshu Chhetri.
Linux environment ● Graphical interface – X-window + window manager ● Text interface – terminal + shell.
Essential Unix at ACEnet Joey Bernard, Computational Research Consultant.
VIPBG LINUX CLUSTER By Helen Wang March 29th, 2013.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
Introduction to HPCC at MSU 09/08/2015 Matthew Scholz Research Consultant, Institute for Cyber-Enabled Research Download this presentation:
The UNIX development environment CS 400/600 – Data Structures.
CCPR Workshop Introduction to the Cluster July 13, 2006.
Unix and Samba By: IC Labs (Raj Kidambi). What is Unix?  Unix stands for UNiplexed Information and Computing System. (It was originally spelled "Unics.")
Introduction to Programming Using C An Introduction to Operating Systems.
Running Parallel Jobs Cray XE6 Workshop February 7, 2011 David Turner NERSC User Services Group.
Portable Batch System – Definition and 3 Primary Roles Definition: PBS is a distributed workload management system. It handles the management and monitoring.
Advanced topics Cluster Training Center for Simulation and Modeling September 4, 2015.
Linux Tutorial Lesson Two *Getting Help in Linux *Data movement and manipulation *Relative and Absolute path *Processes Note: see chapter 1,2,3 from Linux.
CS 120 Extra: The CS1 Server Tarik Booker CS 120.
Learning Unix/Linux Based on slides from: Eric Bishop.
Assignprelim.1 Assignment Preliminaries © 2012 B. Wilkinson/Clayton Ferner. Modification date: Jan 16a, 2014.
An Brief Introduction Charlie Taylor Associate Director, Research Computing UF Research Computing.
Review Why do we use protection levels? Why do we use constructors?
UNIX Basics Matt Hayward October 18, 2016 LS560 – Information Technology for information professionals.
Advanced Computing Facility Introduction
Linux & Joker – An Introduction
Introduction to Unix for FreeSurfer Users
Hackinars in Bioinformatics
GRID COMPUTING.
ENEE150 Discussion 01 Section 0101 Adam Wang.
Specialized Computing Cluster An Introduction
UNIX To do work for the class, you will be using the Unix operating system. Once connected to the system, you will be presented with a login screen. Once.
Tutorial of Unix Command & shell scriptS 5027
Welcome to Indiana University Clusters
PARADOX Cluster job management
CS1010: Intro Workshop.
Unix Scripts and PBS on BioU
HPC usage and software packages
Web Programming Essentials:
Welcome to Indiana University Clusters
Linux Commands Help HANDS ON TRAINING Author: Muhammad Laique
Andy Wang Object Oriented Programming in C++ COP 3330
Introduction to HPCC at MSU
Some Linux Commands.
BIOSTAT LINUX CLUSTER By Helen Wang October 29, 2015.
Part 3 – Remote Connection, File Transfer, Remote Environments
CommLab PC Cluster (Ubuntu OS version)
Assignment Preliminaries
Practice #0: Introduction
Tutorial of Unix Command & shell scriptS 5027
Postdoctoral researcher Department of Environmental Sciences, LSU
Tutorial of Unix Command & shell scriptS 5027
Intro to UNIX System and Homework 1
CCR Advanced Seminar: Running CPLEX Computations on the ISE Cluster
Web Programming Essentials:
Tutorial of Unix Command & shell scriptS 5027
Andy Wang Object Oriented Programming in C++ COP 3330
UNIX/LINUX Commands Using BASH Copyright © 2017 – Curt Hill.
Introduction to High Performance Computing Using Sapelo2 at GACRC
Module 6 Working with Files and Directories
Quick Tutorial on MPICH for NIC-Cluster
Working in The IITJ HPC System
Presentation transcript:

How to use the HPCC to do stuff Presentation to QuERG March 28th, 2016 *Matt’s soothing voice is not a source of necessary nutrients

What is HPCC? And iCER? High Performance Computing Center Collection of computers that has 600 nodes with 7000 computing cores and has large memory nodes (6 TB) Has lots of available software https://wiki.hpcc.msu.edu/display/hpccdocs/Installed+Software iCER is a research unit that maintains MSU’s supercomputer system Provide 1-on-1 consulting

When would I use the HPCC? Takes too long for computation Runs out of memory Needs licensed software Read/write lots of data

How do I connect to the HPCC You need to set up an account with HPCC ssh to HPCC Login to gateway node ssh to developer node to run test code and submit jobs Transfer files using SFTP connection

How to request a new account Have your PI fill out the form using the link below  http://www.hpcc.msu.edu/request 

Once you have an account …. Login using an ssh (Secure Shell) program Many program options to choose from MobaXterm PuTTY Terminal on Macs First two can be obtained on portable flash drives from iCER or downloaded Host Name (or IP address) will be hpcc.msu.edu Might be better to use rsync.hpcc.msu.edu

Login PuTTY MobaXterm

Logged into gateway Gateway does not have most of the programs that you want to use. You need to switch to a developer node to access the programs.

Gateway nodes Shared drive by anyone with an HPCC account Only means of accessing the HPCC computing resources ****DO NOT RUN ANYTHING ON THESE NODES!!******

Switching to developer mode Look at the developer Nodes usage Choose one that has low or med usage Use ssh to login to switch to developer node eg: [user@gateway-00 ~]$ ssh dev-intel14 Tells you which node you are logged into and the name of the folder (automatically appears) ~ means Home folder Code you type to switch to dev-intel14

Begin working The command line takes unix commands to do work Can use many different text editors to edit files emacs file.ext nano file.ext vi file.ext joe file.ext cat file.ext (This only prints what is in the file cannot edit)

Basic Linux Commands Command Meaning cd directory Change Directory cd .. Down one directory cd ../.. Down two directories cd - Return to previous Directory cd ~ Go to home/username mkdir directory Make named directory rmdir directory Remove an empty directory

Basic Linux Commands Command Meaning ls Show contents of current folder Some options for ls command -a list all files and directories -F append indicator (one of */=@|) to entries -h print sized -l list with a long listing format -t sort by modification time

More Linux commands Command Meaning cp source destination Copy files cp –r source destination Copy files recursively: files and directories mv source destination move a file (can be used as a rename command!) pwd Show current path rm filename.ext remove file rm –r folder/ remove directory recursively (i.e. including all subdirs and files)

How to find commands? How to find how to do something? Google is usually the first place to look An exhaustive list http://ss64.com/bash/ • A useful cheatsheet http://fosswire.com/post/2007/08/unixlinux-command-cheatsheet/ • Explain a command given to you http://explainshell.com/

MSU HPCC specific commands Command Meaning sj show jobs gmod show the home screen with development node use levels qsub submit a job to the scheduler getexample shows a list of available examples that can be loaded to current directory

Module This is how you access different softwares module list : prints list of the loaded programs module load OpenBUGS : load OpenBUGS module unload modulename : unload a module module spider keyword search the modules for a key word and lists what can be loaded includes the different versions module purge : Unload all modules R is automatically loaded. But can use this to load a certain version of R if needed module load JAGS load JAGS package can load jags but was unable to load rjags so won’t use it in presentation. Someone could try loading the rjags library to their personal directory if needed should be somewhere in this link https://wiki.hpcc.msu.edu/display/hpccdocs/R

Working on the HPCC Can run small tasks on the Developer nodes If runs for longer than 2 hours or uses too much memory it can be canceled without warning Use Developer Nodes to test your code and determine the resources you need to run jobs When it takes a long time to run you will then submit jobs to the scheduler qsub myjob.sub

What do I mean by a Job? A job is just a list of commands that I tell the computers to do in a job script It also contains the requirements to run the job This will then be submitted to the scheduler and eventually run on the computers when resources become available based on what you request When the job is done running you can check your results by looking at the files (etc.) created

How the scheduler works Ranks submitted jobs based on a priority system Priority is influenced by how long they have been in the queue and resources they request Large jobs may be in the queue for a long time before they begin running Jobs less than 4 hours typically start running the fastest (sometimes immediately)

Creating job scripts What is needed in a job script? List of required resources Run time Memory Number of nodes number of cores per node All command line instructions needed to run the computations

Typical submission script Special system command Login to Shell Resource requests Shell command Special Environment variables Shell command Shell command Special Environment variables Instructions to scheduler

Job Script details # is normally a comment except #! Special system commands #!/bin/bash –login this logs you so you can run the job #PBS instructions to the scheduler #PBS –l nodes=n,ppn=p #PBS –l walltime=hh:mm:ss #PBS –l mem=2GB (!!not per core but total memory)

Instructions to scheduler All lines starting with #PBS need to be above the first non-commented line in the script. If they are below the first non-commented line of code, the scheduler will not read them, leading to unexpected behavior. More options at http://wiki.hpcc.msu.edu/x/Np-T All jobs must have Walltime requesting Memory requested # of nodes requested and processers per node (ppn)

Advanced Environment Variables PBS_JOBID the job number of the current job PBS_O_WORKDIR The working directory from which the job was submitted ${} tells the computer that this is a variable e.g. mkdir ${PBS_O_WORKDIR}/${PBS_JOBID}

Ways to run R Rscript myRprogram.r R < myRprogram.r --no-save Does not save workspace Or R < myRprogram.r --save Save workspace Or R < myRprogram.r --vanilla do not read any user or site profiles or restored data at start up and to not save data files at exit R CMD BATCH myRprogram.r

Time Submitting jobs Queue Run qsub submission.script Submit a job to the queue will return a job ID# Typically looks like 5945571.cmgr0 Time to completion Queue Run Time

Checking on your jobs qdel jobID# showq –u userid sj checkjob jobID# delete a job from the queue showq –u userid show the current job queue of the users sj show the status of jobs (running, eligible, or blocked) checkjob jobID# Check status of the job showstart –e all jobID# Show the estimated start time of the job

Scheduling tips Requesting more resources does not make a job run faster unless running a parallel program more resources requested makes it “harder” for the schedule to reserve those resources First time: over-estimate how many resources you need, and then modify appropriately qstat –f ${PBS_JOBID} put this code at the bottom of the script to show you resources used information when the job is done

Advanced Scheduling Tips Large portion of clusters are buy-in that can only run jobs that are less than 4 hours Most nodes have at least 24GB memory Half have at least 64 GB of memory Few have more than 64Gb of memory (i.e. harder to schedule jobs that requests lots of memory)

System Limitations 10 eligible jobs in the queue (other will be temporarily blocked until jobs start running) 520 running cores (nodes*ppn) 1000 submitted jobs 1 week of walltime ppn=64 2TB memory on a single core ~200 GB Hard Drive

Job Completion By default the job will automatically gnerate two files when it completes: Standard Output: E.g. jobname.o5945571 Standard Error E.g. jobname.e5945571 You can combine these files if you add the join option in our submission script: #PBS –j oe You can change the output file name #PBS –o /mnt/scratch/home/netid/myoutputfile.txt

Transferring Files SFTP program (Secure File Transfer) Two options are MobaXterm WinSCP Gateway is : rsync.hpcc.msu.edu Drag and drop files between personal computer and HPCC

Where to go for help iCER office hours Monday and Thursdays 1 to 2 Biomedical & Physical Sciences Building 567 Wilson Road, Room 1440 Also by appointment http://contact.icer.msu.edu : contact HPCC by submitting a ticket/contact form (msu login) wiki.hpcc.msu.edu :HPCC User Wiki icer.msu.edu : iCER Home hpcc.msu.edu :HPCC Home

How to convert windows files to unix dos2unix Converts special end-of-line characters from windows format to unix dos2unix myRprogram.r Only necessary if you use an editor that does not use unix end of line character