Hackinars in Bioinformatics Unix basics and usage of Computerome Erland Hochheim DTU Systems Biology Center for Biological Sequence Analysis
Log on the Danish National Life Sciences Supercomputer – aka Log on the Danish National Life Sciences Supercomputer – aka. Computerome Prerequisites Some sort of personal computer SSH client Linux and Mac have built-in terminal programs which support SSH Terminal, Xterm, etc. Windows does not… PuTTY is a good choice for most people http://www.chiark.greenend.org.uk/~sgtatham/putty/ MobaXterm is widely used, but 2-factor authentication can be a bit tricky How to do Use User-ID and Password provided from DTU Login to <user>@computerome.cbs.dtu.dk homesystem$ ssh lifesci@computerome.cbs.dtu.dk [lifesci@computerome02 ~]$
Login to Computerome
Be familiar with the strengths and weaknesses of each computer including their own laptop. Easy to carry Provides access to resources elsewhere Computerome 500+ nodes, 16000+ cores 3PB available storage Somewhat difficult to carry Provides massive resources from (almost) everywhere
Exercise: Open terminal and login to Computerome Find the terminal program on your computer Linux and Mac should be built-in Windows might want to download PuTTY Login to Computerome
Basic Linux/UNIX commands Manual pages: man Help flag: --help, -h List files and directories: ls List, long form: ls -l List all: ls -a Change directory: cd Move files and directories: mv Copy file and directories: cp Copy recursively: cp -r Copy, and protect permissions: cp -p Remove files: rm Remove recursively: rm -r Force remove: rm -f Create directory: mkdir Create structure directory with parents: mkdir -p
man command – example ‘man ls’
--help flag
ls
cd, cp, mv, rm
Exercise: Experiment with commands ‘man ls’ - see what ls command can do Try some different flags to ls: ls -l ls -a others… Change directory: cd Get help on mv command mv --help mv -h Copy file and directories: cp Copy recursively: cp -r Copy, and protect permissions: cp -p Remove files: rm Remove recursively: rm -r Force remove: rm -f Create directory: mkdir Create structure directory with parents: mkdir -p
Commands for viewing files Concatenate files: cat Output first part of file: head Output last part of file: tail View contents of file: more “Opposite of more”: less
Viewing - cat
Viewing - head, tail
Viewing - more, less, example ‘less outfile’
Pipes and redirects | Pipe, example: date | awk '{print $3,$2,$6}' > [FILE] Overwrite file, example: echo Hello World > hello.txt >> [FILE] Append to file, example: echo Hello CBS >> hello.txt
Pipes and redirects - |, >, >>
Editors on Linux/UNIX machines vi and vim Programmers text editors nedit NEdit is a standard GUI (Graphical User Interface) style text editor gedit Graphical text editor for Gnome
Editors
Editing - vim hello.txt
<use awk/nawk/gawk, sed, sort, count for formatting of text>
Basic file analysis Report lines, words and bytes: wc Sort contents: sort
Basic analysis - wc
Basic analysis - sort
File and directory permissions ls –l to see permissions on files drwxrwxr-x 3 erhh erhh 291 Mar 12 09:00 course -rw------- 1 erhh erhh 0 Mar 10 12:48 course/STDIN.o3929 -rw-rw-r-- 1 erhh erhh 23172310 Mar 10 12:55 course/tmpdir/Graph ugo User Group Owner rwx Read Write eXecute Change owner: chown Change group: chgrp Change permissions: chmod
Permissions and their consequences
File transfer between computers Create a portable archive: tar Secure copy: scp Get stuff from the Web wget curl
Transfer - tar, scp
Transfer - wget, curl
<Knowing about good coding practices (Leon’s 3papers)>
Computerome Wiki Main information source for all things Computerome http://wiki.bio.dtu.dk/computerome/index.php/Main_Page A living document Corrections to be mailed to mailto:erhh@cbs.dtu.dk Special page for Tips and Tricks http://wiki.bio.dtu.dk/computerome/index.php/Tips_and_Tricks
Module environment module avail module load module initadd Reference: http://wiki.bio.dtu.dk/computerome/index.php/Installed_Software
module avail, module load, module initadd
Running jobs on Computerome Jobs are run in a Batch environment Moab scheduler Torque resource manager Reference: http://wiki.bio.dtu.dk/computerome/index.php/Batch_System
Job submission qsub msub xqsub xmsub Reference: http://wiki.bio.dtu.dk/computerome/index.php/Batch_System#Submitting_batch_jobs
Monitoring jobs qstat showq checkjob pestat Reference: http://wiki.bio.dtu.dk/computerome/index.php/Batch_System#Monitoring_batch_jobs
Simple job submission with qsub
Monitoring - watch showq
Monitoring - showq -c
Monitoring - qstat
Monitoring - checkjob
Monitoring - pestat
Job control canceljob showstart tracejob Reference: http://wiki.bio.dtu.dk/computerome/index.php/Batch_System#Job_control
Job control – canceljob, showstart
Job control - tracejob
Job control – checkjob –v –v
Exercise with Velvet Login to Computerome Go to your subdirectory in hackinars project directory /home/projects/pr_hackinars/people/<userdir> Copy data/Strain_H112240283.fastq to you own directory $ cp /home/projects/pr_hackinars/data/Strain_H112240283.fastq .
Exercise with Velvet – cont. Create a basic jobfile (script) $ vi velvet.sh #!/bin/bash velveth tmpdir 21 -fastq Strain_H112240283.fastq velvetg tmpdir Make it executable $ chmod +x velvet.sh
Exercise with Velvet – cont. Load modules $ module load tools $ module load moab torque $ module load velvet/1.2.10 Run velvet job using xmsub $ xmsub -W group_list=pr_hackinars -A pr_hackinars \ > -l nodes=1:ppn=2,mem=100m,walltime=3600 \ > -V -d $PWD -ro outfile -re errorfile -de ./velvet.sh Watch job $ showq -u <userid> or watch showq -u <userid> $ qstat -u <userid> or watch qstat -u <userid> $ checkjob <jobid> $ tracejob <jobid> $ cat outfile $ cat errorfile $ cat tmpdir/Log
Exercise with Velvet – cont. Extend existing jobfile (script) $ vi velvet.sh #!/bin/sh ### Account information #PBS -W group_list=pr_hackinars -A pr_hackinars ### Job name #PBS -N velvet_test ### Output files #PBS -e errorfile #PBS -o outfile ### Number of nodes #PBS -l nodes=1:ppn=2 ### Memory #PBS -l mem=100m ### Requesting time (format dd:hh:mm:ss or just number of seconds) #PBS -l walltime=1:00:00 ### Script goes below here # Go to the directory from where the job was submitted (initial directory is $HOME) echo Working directory is $PBS_O_WORKDIR cd $PBS_O_WORKDIR module load tools module load velvet/1.2.10 velveth tmpdir 21 -fastq Strain_H112240283.fastq velvetg tmpdir Run job $ qsub ./velvet.sh
BONUS: Estimating number of cores gzip gunzip gzip PERL Script gzip