Cluster Computing Applications for Bioinformatics Thurs., Sept. 20, 2007 process management shell scripting Sun Grid Engine running parallel programs
Accessing the Cluster ssh –-X to enable X forwarding ssh compute-#-# to access specific node qrsh to access the least busy node cluster-fork command to run on every node
Managing Processes ps – list your running processes –-f : show file information –-e : list everyone's processes top – current top processes by CPU and memory use kill – terminate a process by number –killall to kill by program name command & - run in background –bg - show background tasks nice / renice – set priority
The Shell Unix command interpreter bash – Bourne Again Shell.bashrc and.bash_profile –settings for your shell environment cd ~ ls -a vi.bash_profile echo $PATH
Shell Scripting Automate common tasks –create directory structure required for sequence assembly mkdir ~/bin cd /share/bio/examples/ cp makeseqdir ~/bin cd TFL makeseqdir
Distributed Shell Scripts Preface CPU intensive commands with qrsh -cwd qtcsh –shell that does this automatically based on ~/.qtask file –Does not work cd /share/bio/examples/ cp assemble ~/bin assemble
Sun Grid Engine - SGE Job queue and load balancing commands: –qrsh / qtcsh –qstat -f : show status of jobs / queues –qdel : delete a job from the queue –qmon : graphical interface –qsub : submit job
Running Parallel Programs MPI – Message Passing Interface must be launched with mpirun or as a script with qsub mpiblast - parallel version of BLAST –modify ~/.ncbirc –first run mpiformatdb –nfrags=n cd /share/bio/examples cp.ncbirc ~ cp mpiblast.sh ~ cd ~ qsub -pe mpich 8 mpiblast.sh