Postdoctoral researcher Department of Environmental Sciences, LSU High-Performance Computing with Python for Geosimulation Linux and HPC User Environment Dr. Yi Qiang Postdoctoral researcher Department of Environmental Sciences, LSU
Outline Basic Linux HPC User Environment Lab exercise Introduction to Linux Basic Commands in Linux File Editing HPC User Environment LSU HPC resource HPC user environment Job Management Lab exercise
Linux Linux is an operating system that interfacing the user and the computer hardware. Linux is originated from Unix, introduced by Linux Torvalds since 1991 Why create Linux Personal computers (PC) were becoming popular Required compatibility with Unix Commercial Unix was too expensive Windows dos is too limited Linux is the most popular OS used in a supercomputer.
Linux Market share of OS in supercomputers
Components of Linux Linux has a kernel, one or more shells and many applications (software) running on it. Kernel: the core component of Linux, which allocates the machine's resources to the programs that you run. Shell: The command line interface is the primary user interface to Linux operating systems.
Linux Distributions Different versions of the Linux OS are called “distributions” A Linux distribution is a collection of software applications built on top of the Linux kernel. Linux distributions are tailored to different requirements such as desktop, server, workstation…
Why Linux is popular Most Linux distributions are free Open-source (completely customizable) Portable to nearly any hardware platform Highly scalable to lots of cores, or lots of memory Highly efficient, therefore useful for computation Robust and proven security model
User Interface of Linux Linux can be accessed from a user-friendly, graphic user interface, for example, To save resource, the command-line interface is the most common way to access HPC clusters.
Remote Access to a Linux Server Most Linux systems allows secure shell (SSH) connections from other systems On Linux and Unix (e.g. Mac) use ssh on a terminal to connect ssh username@mike.hpc.lsu.edu Windows (SSH client): Putty (ftp://130.39.13.164/putty.exe) MobaXterm (http://mobaxterm.mobatek.net/ ) Cygwin
Remote Access to a Linux Server Access to SuperMike2 cluster by Putty
Directory structure in Linux Directories and files in Linux are arranged in a hierarchical structure. The top of hierarchy is called root (written as a slash /) You need to use commands to browse in this hierarchy Directory hierarchy in SuperMike2
Commands to move over directories pwd – prints your current working directory whoami – prints the name of the current user ls [-al] – list the contents of the directory you’re in cd - change directory cd .. – change to the directory above cd - – change to the previous directory cd ~ – change to the home directory cd / – change to the root command --help – more information of the command e.g. cd --help
Create, move and delete files > or touch – create a file. mkdir – create a directory (folder) example: mkdir /home/yourusername/myfolder rm [-r] – remove files and directories mv <options> <source> <destination> – move files and directories (like clip in Windows) example: mv myfolder myfolder1 cp <options> <source> <destination> – copy a file to a new directory (like copy in Windows) Options: ‐r: copy recursively, required when copying ‐i: prompt if file exists on destination and can be copied over.
Read files cat – display the whole content of a file onto the screen. more – display the content of a file in one page at a time. press SPACE to move to the next page press to ENTER to move to the next line Press q to quit from the display less – display the content of a file in one page at a time, but allow forward and backward scrolling press ↑ ↓ or use mouse scroll to move in the displayed content press q to quit from the display
Edit file Command Line Interface (CLI) Editors: Less user-friendly. But more efficient in resource usage Vi/vim (VI IMproved) emacs pico nano GUI Editors: User-friendly, intuitive. But require high network speed and occupy more resource. gedit gvim Eclipse xemacs
File transfer scp is a command to copy files between two Unix/Linux hosts over the SSH protocol scp <options> <user>@<host>:/path/to/source <user>@<host>:/path/to/destination for example: scp examples.desktop yqiang1@mike.hpc.lsu.edu:/home/yqiang1/examples.deskto p Use option –r to copy directories (folders)
File transfer In Windows, you need to use a client that supports the scp protocol, for example, WinSCP and Filezilla download WinSCP: ftp://130.39.13.164/winscp.exe Log in to a Linux server with Win SCP
File transfer WinSCP interface: you can drag files and directories between your local computer and remote server. Your local directories Remote directories
HPC resources at different levels Campus-wide: LSU HPC State-wide: Louisiana Optical Network Initiative (LONI) Nation-wide: Extreme Science and Engineering Discovery Environment (XSEDE) The National Center for Supercomputing Applications (NCSA)
Applying for a HPC account You need an HPC account to access and use LSU and LONI supercomputer resources. All faculty and research staff at Louisiana State University, as well as students pursuing sponsored research activities at LSU, are eligible for a LONI or LSU HPC account. Apply for an HPC account from www.hpc.lsu.edu Students and research staff need to provide the name of a LSU faculty as your HPC Contact/Collaborator.
Applying for an allocation An allocation is a package of service unit (SUs) that a HPC user can consume One SU is one cpu-hour For example, 40 SUs will be charged for a job that runs 10 hours on 4 cores Start-up Allocation (up to 50K SUs) Can be requested anytime Needs a faculty as your sponsor Easy to apply Research Allocation (up to 4M SUs/project) Can be applied quarterly in a year Need to be applied by a faculty member Need to go through a selection process
How to use HPC Multiple users can access HPC clusters through internet Each user can run multiple jobs simultaneously in the cluster User needs to specify resource requirement when submitting jobs (e.g. cores, memory, GPU…) Submitted jobs are managed in queues that wait for available computing resources (nodes)
Cluster Architecture Login (head) nodes Compute nodes Access to the cluster (via ssh) Edit, create, copy and delete files/directories Launch and monitor jobs Do not run your job in head nodes!! Compute nodes The place to run your job Need to specify required resource when submitting your job Different types of nodes for different computing tasks Nodes specifications: http://www.hpc.lsu.edu/resources/hpc/index.php
Put your job in the right queue Jobs in different queues are waiting for different kinds of compute nodes Put your job in the right queue can save resource queue characteristics in LSU HPC
Check Queue Status qstat –q – print the information of queues in the cluster showq – print the eligible, blocked, and/or recently completed jobs qfree – print the number of free, busy and queued nodes in the current cluster
Two ways of submitting job Interactive Job Request a specific portion of computing resource for a specific time period only for your use Similar to a head node, you can use commands to control your program interactively Have to present when running interactive job For testing and debugging with a small number of cores Command to request an interactive session: qsub -I -V –l walltime=<hh:mm:ss>,nodes=<num_nodes>:ppn=<num_cores> -A <Allocation> -q <queue name> For example, request a node with 1 processor: qsub -I -V -l walltime=0:30:00,nodes=1:ppn=1 -A hpc_startup -q single
Two ways of submitting job Batch Job Executed without user intervention after submitted The resource requirement is enclosed in a job script (e.g. PBS job script) No need to present. You will be noticed when job is done or requested time has run out. For production run with more cores More efficient for the system management and your SU consumption Command that launches a batch job specified by a PBS job file: qsub job_script
PBS Job Script #!/bin/bash #PBS -l nodes=4:ppn=4 #Number of nodes and processors/node #PBS -l walltime=24:00:00 #Maximum wall time #PBS -N myjob #Job name #PBS -o <file name> #File name for standard output #PBS -e <file name> #File name for standard error #PBS -q checkpt #Queue name #PBS -A <allocation> #Allocation name #PBS -m e #Send mail when job ends #PBS -M <email address> #Send mail to this address <shell commands> mpirun -machinefile $PBS_NODEFILE -np 16 <path_to_executable> <options>
Job Monitoring You can monitor your submitted job to see if it is running well qstat -f <jobid> – print details of your submitted batch job qstat –n -u <user> – print information of nodes assigned to the < user > qdel <jobid> – delete a job showstart <jobid> – show approximate start time of a job in queue checkjob <jobid> – print the running status of your job qshow -j <jobid> – check the health of your running job, e.g. memory and CPU usage
Python programming in HPC Python is an interpreted that supports all parallel programming models (multi-thread and MPI). Python has a large amount of users and active development community Python is installed in all clusters of LONI and LSU HPC Your programs need to be parallelized to take advantage of HPC clusters My class on parallel programming with Python: ftp://130.39.13.164/CyberGIS_1.pdf
Thanks! My email: yqiang1@lsu.edu