Download presentation
Presentation is loading. Please wait.
Published byCollin Owens Modified over 9 years ago
1
Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja
2
What is MSA? MSA is an alignment generated from three or more sequences. MSA is usually a more global alignment, i.e., the aim is to align homologous residues (nucleotides or amino acids) in columns across the length of the whole sequences. GA--GTACA CAC-GTATA CACGGTAT- G-CGGTCTA
3
What is MSA? Picture shows protein multiple sequence alignment http://en.wikipedia.org/wiki/Multiple_sequence_alignment
4
Why MSA ”MSA emphasises signal observed in the pairwise alignment” (Liisa Holm) Improved alignments!! Alignment of more distant sequences with the help from intermediate sequences Highlight the conserved regions in sequences http://ekhidna.biocenter.helsinki.fi/users/petri/public/opetus_jutut/Bioinf_Per_Lects/urease_output.txt
5
Why MSA MSA is input to many analysis tasks: Detection of active site Generation sequence profiles Detection of protein domains and motifs Phylogenetics …
6
Remember First step of MSA: Good selection of sequences to the analysis Sequences need to be functionally/evolutionarily related Sometimes it is good to have some variation in the sequences (depends on the analysis task) Alternative: Rubbish in → Rubbish out
7
MSA methods Finding optimal multiple sequence alignment is computationally hard task “Correct” answer would always come by extending dynamic algorithm to multiple sequences In practice dynamic algorithm cannot be applied to MSA problems We need approximate solutions (heuristics) http://en.wikipedia.org/wiki/Multiple_sequence_alignment#Dynamic_programming_and_ computational_complexity
8
MSA methods: heuristics Progressive Alignment (not much used) Iterative Alignment (most popular) Hidden Markov Models Pattern Based methods
9
Progressive alignment Divide unsolvable task into subtasks that can be solved Align first most similar pairs of sets of sequences –Sequence sets can have 1 or many sequences –First the sets include only single sequences Move progressively to more bigger sets and to more difficult pairs of sets Always align only two pairs of sets at the time
10
Progressive alignment Produce pairwise alignments between all the sequences you want to align with MSA. –Dynamic programming, ktup-methods.. Produce a “guide tree” on the basis of the pairwise distances calculated from pairwise alignments –UPGMA, neighbor joining Produce an MSA using the “guide tree”. –Sequences are aligned in the same order as the guide tree instructs.
11
Set of sequences All against all pairwise alignment Here demonstrated for 1. sequence Get pairwise similarities from alignments Create a cluster tree from similarities Join sequences in the order obtained From the cluster tree
12
Guide tree construction: UPGMA Unweighted Pair Group Method with Arithmetic mean One of the fastest tree construction methods
13
An example: Pairwise alignments
14
Pairwise distances, based on pairwise alignments Number of nucleotide differences Absolute distances, used in Pileup/ Clustal JC-distance
15
UPGMA based on JC-distances* 0,107 / 2 JC-distances = Jukes-Cantor distances. The observed distances, D, are corrected for multiple substitutions via correction function –(3/4)*ln(1-(4/3)D)
16
UPGMA, distance updates d(human,chimp),gorilla = [d(human, gorilla) + d(chimp, gorilla)] / 2 = [0,383 + 0,232] / 2 = 0,3075
17
UPGMA
19
U d(human & chimp),U = 0,3923/2 = 0,1962 d(gorilla & orangutan),U = 0,3923/2 = 0,1962 0,1962 - 0,0537 = 0,1426 0,1962 - 0,116 = 0,080
20
UPGMA 0.7083 / 2 0,3541 - 0,1426 - 0,0537 0,3541 - 0,080 - 0,116 or
21
Constructing MSA human ACGTACGTCC chimp ACCTACGTCC gorilla ACCACCGTCC orangutan ACCCCCCTCC maqaque CCCCCCCCCC human ACGTACGTCC chimp ACCTACGTCC gorilla ACCACCGTCC orangutan ACCCCCCTCC human ACGTACGTCC chimp ACCTACGTCC gorilla ACCACCGTCC orangutan ACCCCCCTCC
22
Alignment score 1234 ACGT match=1 ACGA mismatch=0 AGGA 1: A-A + A-A + A-A = 1+1+1 = 3 2: C-C + C-G + C-G =1+0+0 = 1 3: G-G + G-G + G-G = 1+1+1 = 3 4: T-A + T-A + A-A = 0+0+1 =1 S(alignment) = S(1) + S(2) + S(3) + S(4) = 3+1+3+1 = 8 The higher the score, the better the alignment
23
Progressive alignment - pros and cons Pros: –Fast Cons: –Once gaps are opened they can never be closed –Errors in the alignment of the first few sequences can have catastrophic effects on the whole alignment –Not much used (to my knowledge)
24
Iterative alignment Create a progressive alignment After obtaining the alignment calculate a quality score REPEAT THE FOLLOWING STEPS: –Redo the cluster tree –Realign the sequences using the new cluster tree –Calculate a quality score Loop above can be stopped when a maximum number is reached or when quality score is not improved
25
Iterative alignment Allows correction of errors that was not possible in progressive alignment Very popular among the MSA methods Increases the running time of the method
26
Diagram of typical iterative MSA program workflow. Figure from Do & Katoh 2008 http://ai.stanford.edu/~chuongdo/papers/alignment_review.pdf Iterative alignment Iteration loop
27
What MSA program(s) to use? Depends on the application –Phylogenetic studies –Structure based studies Depends on the size of the data –Some programs cannot handle large dataset Remember to evaluate the alignment by eye
28
What MSA program(s) to use? Collection of MSA programs at EBI http://www.ebi.ac.uk/Tools/msa/
29
Summary of MSA MSA is relevant for many analysis tasks –Improved signal from the alignment Solving MSA requires heuristics Selection of MSA methods depends on the application Results should be evaluated by eye –And the errors should be corrected with MSA editors
30
Manual editing of MSAs? Let’s say that your performed an MSA witn computer. However, biologically, it has some faults - needs manual editing -> Editors: Jalview and Seaview http://www.csc.fi/english/research/sciences/bioscience/programs/index_html Input data can be in any of the most common MSA formats (Mase, Phylip, Clustal, MSF, Fasta, NEXUS, PIR and BCL)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.