BIOSTAT LINUX CLUSTER By Helen Wang October 11, 2012
Basic Beowulf Cluster Structure
A brief look of our cluster
Biostat Beowulf Cluster Server Name: Merlot.bis.vcu.edu IP: nd server as failover: blanc.bis.vcu.edu IP: (invisible on mission) Software recommended to access servers: PC USERS: 1. MobaXtermhttp://mobaxterm.mobatek.net/ 2. ssh /open ssh / putty / winscp 3. x-windowshttp:// with license codehttp:// MAC USERS: Mac Terminal
Access Merlot from your computer How to use MobaXterm or ssh to access server Outside of VCU, use webvpn.vcu.edu to connect to server Open new session then SSH to add the server name Open “SSH settings” to fill the information remote hostname: username: YOUR_ACCOUNT_NAME port number: 22 Open session settings to put merlot in Session Name ssh –X for graphical access Select the server to test the connection and exchange keys by giving password Create profile or bookmark for the easy access every time
Access Cluster What you need to do to access to server? get username and password change your password to be qualified password: $passwd Get webvpn.vcu.edu to install VCU webvpn on your PC so you can access it from anywhere. send your home IP to administrator in order to access it from home ( not preferred) set up necessary variables to customize your personal console templates: /home/huan/.cshrc /home/huan/.login change the identity to be your name Make temp and bin directory under your home dir $mkdir tmp $mkdir bin
Access Cluster Server and nodes Master node (master1 / master2): merlot.bis.vcu.edu Running CentOS ( redhat kenrnel)Version 5.5, x86-64 Open source or Software download – choose 64 bits CentOS or RHEL 5 if possible Purposes: front-end user interface; slow; - not for running jobs – testing jobs running for 1 hour and will be terminated by system. accessible from outside by permission; Slave nodes (nodes): node1.biocl.vcu.edu – node22.biocl.vcu.edu (dual quart core Xeon processors with 64 GB RAM) Node15-22 has large memory capacity for running memory hunger jobs (96GB) Purposes: computation; not prefer to access user interface, accessible via master and managed by portable batch management ( PBS ); fast; internal network; X, not accessible directly from outside
Accessing Cluster Software available on master and nodes R 2.15 with CRAN libraries and bio_conductor libraries C++/G++ compiler, Fortran compiler ( f77/f90) Perl Python/Biopython compilers Common Open sources needed by users ( PLINK, MERLIN, IMPUTE etc.) Upon users requests SAS 9.3 is on all nodes /usr/local/bin/sas
Commands to be used on cluster Submitting R jobs on normal queue $qR MYSCRIPT ( if the script name is MYSCRIPT.R, submit it with no.R extension) each users is allowed to run jobs simultaneously Submitting jobs on large memory queue large memory queue is on node1 for memory intensive jobs ( limited 8 totally) $qRL MYSCRIPT
Template used on cluster Modify template to create your own pbs script for running programs #!/bin/bash #PBS -q serial #PBS -N MYSCRIPT # echo "******STARTING****************************" # # cd to the directory from which I submitted the job. Otherwise it will execute in my home directory. # set WORKDIR = ~/YOURWORDIR #PBS -V #echo “PBS batch job id is $PBS_JOBID“ echo "Working directory of this job is: " $WORKDIR # echo "Beginning to run job“ Command line you need to execute the job ( /home/huan/bin/calculate - PARAMETEERS) SAVE IT IN AN FILE MYSCRIPT $qsub MYSCRIPT
Commands used on cluster Submitting interactive job when there is no script command for submitting jobs using new application $qsub -I to get on a node NODE7$plink –script PLKSCRIPT Checking job status “R” Running; “E” Exiting “H” Holding “Q” Queued $qstat $qstat –n ( show which node your job is on)
Commands to be used on group Change into nodes to check the status $ssh NODE# NODE#$ top NODE#$ exit Quit or cancel job submission $qstat ( to get the jobID) #qdel YOURJOBID Limitation for the name of the SCRIPT No more than 10 characters no space in between no special characters. use a temporary name if necessary and change it back when the job is done.
General commands used in Linux List files $ls lists the files in the current directory. $ls -F shows the difference between directories and ordinary files. $ls -a lists all files, even those that are normally invisible in UNIX (files whose names start with a period, i.e..xstartup). $ls -lt |more lists files sorted by time, pipe more give you page by page display Make directory and change directory $mkdir DIR1 $cd DIR1 $cd.. Go back to upper level directory $cd Go back to your home directory Remotely copy between servers scp
General commands used in Linux Copy file or directory $cp PATH1/FILE1 PATH2/FILE2 copies the contents of FILE1 into the file FILE2 $cp –r PATH1/DIR1 PATH2/DIR2 copies the contents of DIR1 into the file DIR2 use. Instead of FILE2/DIR2 if you keep the same file/dir name. $cp PATH1/FILE1./ copies FILE1 from PATH1 to current directory with the same file name Move file or directory $mv file_name dir_name moves the file file_name from the current directory into the directory dir_name, where dir_name is a subdirectory of the current directory $mv old_file new_file renames old_file and calls it new_file.
General commands used in Linux Delete files and directory $rm my_file deletes my_file $rm –r my_dir deletes my_dir use wild card * to delete multiple files/directories $rm PLK* deletes all files start with PLK System monitoring $pslist all processes you own on system $ps guxlists only your processes. $ps aux lists all processes running on your machine $kill my_process sends a terminate signal to the process specified by the process id (PID) my_process
General commands used in Linux Display a file $more my_file displays the text of my_file one page at a time. To see the next page, hit the space bar; to see the previous page, type b; to quit paging the file, type q $less my_file similar as more but faster and has more functions Retrieve string from file $grep string filename searches filename for string. It outputs every line which contains string. $grep -v string filename outputs every line which does not contain string Display manual of a command $man COMMAND
General commands used in Linux **Printing $lpr print_file sends print_file to the default printer $lpr -Pother_printer print_file sends print_file to other_printer. Aliases for printing jobs a2psp7, a2psl7, a2psp7d, a2psl7d $aliaes print_filesends the print_file to default printer in font size 7 in portait or landscape with single or double sided Changing permissions for file and directory default permission rwx which forbids other users to view or copy $chmod 750 my_file $chmod 750 my_dir to change the permission so your group member can read copy or execute my_file /my_dir
At Last Edit file using nano or vi use samba connection to map a network drive on PC, recommending to use “EditPad Lite” Transferring files between windows and server use sftp windows to transfer use samba connection to transfer Useful links