Download presentation
Presentation is loading. Please wait.
Published byPolly Fowler Modified over 9 years ago
1
Cluster Computing Applications for Bioinformatics Thurs., Aug. 9, 2007 Introduction to cluster computing Working with Linux operating systems Overview of bioinformatics applications
2
Introduction Damian Christey Professional Technologist Departments of Mathematics and Biology damian@math.wvu.edu
3
Cluster Computing High Availability (HA) High Performance (HPC) Specialized software Highly parallel Beowulf Commodity hardware Open Source software
4
Biology Cluster Hardware 12 nodes 2 processors per node Dual core 1GHz Opteron 8 GB RAM each Gigabit ethernet 2TB RAID storage
5
GNU/Linux Free, Open Source, Unix- based operating system Rocks cluster management system: http://www.rocksclusters.org/ http://www.rocksclusters.org/ CentOS: http://centos.org/http://centos.org/ derived from Redhat: http://www.redhat.com/ http://www.redhat.com/
6
Why Linux? Cheap Reliable and Scalable Customizable Unix philosophy Text processing
7
Accessing the Cluster Monitoring - http://alba.as.wvu.edu/gangliahttp://alba.as.wvu.edu/ganglia Secure Shell ssh -X username@alba.as.wvu.edu on Mac OS or Linuxusername@alba.as.wvu.edu Windows users can download SSH and X server from: http://cygwin.com/http://cygwin.com/ File transfer – SFTP http://www.winscp.com/ for Windows http://www.winscp.com/ http://cyberduck.ch/ for Mac http://cyberduck.ch/ qrsh – command to get a shell on a node
8
Unix Filesystem Tree with a single root: / folders may be physically stored on separate devices, different machines /home/bob : Bob’s files /opt/Bio : Bioinformatics programs /share/bio : shared data, genome libraries
9
Unix Permissions 3x3 Matrix: owner, group, other read, write, execute chgrp biouser file change the group to which the file belongs chmod g+w file give the group write permission to your file
10
Text Processing cat file : dump the contents of file to standard output head, tail : output the first / last n lines of file grep : return lines matching pattern in input or file grep -v : invert match | : pipe output of one program to another > : pipe output to a file >> : concatenate output to end of file
11
Sequencing and Assembly Software Phred - reads DNA sequencing trace files, calls bases, and assigns quality values Phrap - assembling shotgun DNA sequence data Consed - viewing, editing, and finishing sequence assemblies created with phrap Artemis - genome viewer and annotation tool
12
Sequence Analysis and Screening Software (WU, NCBI, MPI) BLAST - find regions of local similarity between sequences ClustalW, T_Coffee, MUSCLE - multiple sequence alignment RepeatMasker - screens for interspersed repeats and low complexity sequences RepeatScout, PILER - de novo repeat finder EMBOSS – assorted analysis tools
13
Phylogenetics Software Phylip, Paup - packages for inferring phylogenies or evolutionary trees. MrBayes - bayesian inference of phylogeny Structure - model-based clustering method for inferring population structure
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.