Multiple Sequence Alignment
An alignment of heads
Sequence Alignment A way of arranging the primary sequences of DNA, RNA and amino acid to identify the regions of similarity that may be a consequence of functional, structural or evolutionary relationship between the sequences.
Goals To establish an hypothesis of positional homology between bases/amino acids. To generate a concise, information-rich summary of sequence data. Sometimes used to illustrate the dissimilarity between a group of sequences. Alignments can be treated as models that can be used to test hypotheses.
Sequence Alignment Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps (symbol “-”) are inserted between the residues so that residues with identical or similar characters are aligned. GGGAATCTAGGACTATACCGGATCTA GGGAATCTA--ACTATA--GGATCTA GGG--TCTAGGACTATACCGGAT--A Taxon A Taxon B Taxon C
Alignment can be easy or difficult Easy Difficult due to insertions or deletions (indels)
Protein Alignment may be guided by Tertiary Structure Interactions Homo sapiens DjlA protein Escherichia coli DjlA protein
Multiple Sequence Alignment- Approaches 3 main approaches of alignment: -Manual -Automatic -Combined
Manual Alignment Might be carried out because: -Alignment is easy. -There is some extraneous information (structural). -Automated alignment methods have encountered the local minimum problem. -An automated alignment method can be “improved”.
Automatic Alignment: Progressive Approach Devised by Feng and Doolittle in Essentially a heuristic method and as such is not guaranteed to find the ‘optimal’ alignment. Requires n-1+n-2+n-3...n-n+1 pairwise alignments as a starting point. Most successful implementation is CLUSTAL.
Overview of ClustalW Procedure 1 PEEKSAVTALWGKVN--VDEVGG 2 GEEKAAVLALWDKVN--EEEVGG 3 PADKTNVKAAWGKVGAHAGEYGA 4 AADKTNVKAAWSKVGGHAGEYGA 5 EHEWQLVLHVWAKVEADVAGHGQ Hbb_Human 1 - Hbb_Horse Hba_Human Hba_Horse Myg_Whale Hbb_Human Hbb_Horse Hba_Horse Hba_Human Myg_Whale alpha-helices Quick pairwise alignment: calculate distance matrix Neighbor-joining tree (guide tree) Progressive alignment following guide tree ClustalW