Download presentation
Presentation is loading. Please wait.
Published byTyler McMillan Modified over 10 years ago
1
Methods course Multiple sequence alignment and Reconstruction of phylogenetic trees Burkhard Morgenstern, Fabian Schreiber Göttingen, October/November 2007
2
Tools for multiple sequence alignment Multiple alignment basis of (almost) all methods for sequence analysis in bioinformatics
3
Tools for multiple sequence alignment T Y I M R E A Q Y E T C I V M R E A Y E
4
Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V M R E A - Y E
5
Tools for multiple sequence alignment T Y I M R E A Q Y E T C I V M R E A Y E Y I M Q E V Q Q E Y I A M R E Q Y E
6
Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V M R E A - Y E Y - I - M Q E V Q Q E Y – I A M R E - Q Y E
7
Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V M R E A - Y E - Y I - M Q E V Q Q E Y – I A M R E - Q Y E Astronomical Number of possible alignments!
8
Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V - M R E A Y E - Y I - M Q E V Q Q E Y – I A M R E - Q Y E Astronomical Number of possible alignments!
9
Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V M R E A - Y E - Y I - M Q E V Q Q E Y – I A M R E - Q Y E Which one is the best ???
10
Tools for multiple sequence alignment Questions in development of alignment programs: (1) What is a good alignment? objective function (`score) (2) How to find a good alignment? optimization algorithm
11
Tools for multiple sequence alignment What is a biologically good alignment ??
12
Tools for multiple sequence alignment Criteria for alignment quality: 1. 3D-Structure: align residues at corresponding positions in 3D structure of protein! 2. Evolution: align residues with common ancestors!
13
Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V M - R E A Y E - Y I - M Q E V Q Q E - Y I A M R E - Q Y E Alignment hypothesis about sequence evolution Search for most plausible hypothesis!
14
Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V - M R E A Y E - Y I - M Q E V Q Q E - Y I A M R E - Q Y E Alignment hypothesis about sequence evolution Search for most plausible hypothesis!
15
Tools for multiple sequence alignment Compute for amino acids a and b Probability p a,b of substitution a b (or b a), Frequency q a of a Define similarity score s(a,b) based on p a,b, q a Result: similarity matrix (substitution matrix), e.g. PAM (Dayhoff matrix), BLOSUM, …
17
Tools for multiple sequence alignment
18
Traditional objective functions: Define Score of alignments as Sum of individual similarity scores s(a,b) of aligned amino acid residues Gap penalty g for each gap in alignment Optimal alignment can be calculated for two sequences but in practice not for > 8 sequences
19
T Y W I V T - - L V Example: Score = s(T,T) + s(I,L) + s (V,V) – 2 g
20
Tools for multiple sequence alignment Most commonly used heuristic for multiple alignment: Progressive alignment (mid 1980s): Idea: calculate multiple alignment as series of pairwise alignments of sequences and profiles Use guide tree to determine order of pairwise alignments
21
`Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WWRLNDKEGYVPRNLLGLYP AVVIQDNSDIKVVPKAKIIRD YAVESEAHPGSFQPVAALERIN WLNYNETTGERGDFPGTYVEYIGRKKISP
22
`Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WWRLNDKEGYVPRNLLGLYP AVVIQDNSDIKVVPKAKIIRD YAVESEAHPGSFQPVAALERIN WLNYNETTGERGDFPGTYVEYIGRKKISP Guide tree
23
`Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASFQPVAALERIN WLNYNEERGDFPGTYVEYIGRKKISP Profile alignment, once a gap - always a gap
24
`Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASVQ--PVAALERIN------ WLN-YNEERGDFPGTYVEYIGRKKISP Profile alignment, once a gap - always a gap
25
`Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN- WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASVQ--PVAALERIN------ WLN-YNEERGDFPGTYVEYIGRKKISP Profile alignment, once a gap - always a gap
26
`Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN-------- WW--RLNDKEGYVPRNLLGLYP-------- AVVIQDNSDIKVVP--KAKIIRD------- YAVESEA---SVQ--PVAALERIN------ WLN-YNE---ERGDFPGTYVEYIGRKKISP Profile alignment, once a gap - always a gap
27
CLUSTAL W Most important software program: CLUSTAL W: J. Thompson, T. Gibson, D. Higgins (1994, Nuc. Acids Res.) (22,327 citations in the literaterature!, Oct 2007)
28
Tools for multiple sequence alignment Problems with traditional approach: Results depend on gap penalty Heuristic guide tree determines alignment; alignment used for phylogeny reconstruction Algorithm produces global alignments.
29
Tools for multiple sequence alignment Problems with traditional approach: But: Many sequence families share only local similarity E.g. sequences share one conserved motif
30
Local sequence alignment Find common motif in sequences; ignore the rest EYENS ERYENS ERYAS
31
Local sequence alignment Find common motif in sequences; ignore the rest E-YENS ERYENS ERYA-S
32
Local sequence alignment Find common motif in sequences; ignore the rest – Local alignment E-YENS ERYENS ERYA-S
33
Gibbs Motive Sampler Local multiple alignment without gaps: E.g. Gibbs sampling C.E. Lawrence et al. (1993, Science)
34
Traditional alignment approaches: Either global or local methods!
35
New question: sequence families with multiple local similarities Neither local nor global methods appliccable
36
New question: sequence families with multiple local similarities Alignment possible if order conserved
37
The DIALIGN approach Morgenstern, Dress, Werner (1996, Proc Natl. Acad. Sci.) Combination of global and local methods Assemble multiple alignment from gap-free local pairwise alignments (,,fragments)
38
The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
39
The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
40
The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
41
The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
42
The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
43
The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
44
The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
45
The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacccctgaattgaataa
46
The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa
47
The DIALIGN approach atc------taatagttaaactcccccgtgc-ttag cagtgcgtgtattactaac----------gg-ttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa
48
The DIALIGN approach atc------taatagttaaactcccccgtgc-ttag cagtgcgtgtattactaac----------gg-ttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa Consistency!
49
The DIALIGN approach atc------TAATAGTTAaactccccCGTGC-TTag cagtgcGTGTATTACTAAc----------GG-TTCAATcgcg caaa--GAGTATCAcc----------CCTGaaTTGAATaa
50
The DIALIGN approach Advantages of segment-based approach: Program can produce global and local alignments! Sequence families alignable that cannot be aligned with standard methods
51
T-COFFEE C. Notredame, D. Higgins, J. Heringa (2000, J. Mol. Biol.) Combination of global and local methods
52
T-COFFEE SeqA GARFIELD THE LAST FAT CAT SeqB GARFIELD THE FAST CAT SeqC GARFIELD THE VERY FAST CAT SeqD THE FAT CAT
53
T-COFFEE SeqA GARFIELD THE LAST FAT CAT SeqB GARFIELD THE FAST CAT SeqC GARFIELD THE VERY FAST CAT SeqD THE FAT CAT
54
T-COFFEE
55
Mixing Heterogenous Data With T-Coffee Local AlignmentGlobal Alignment Multiple Sequence Alignment Multiple Alignment StructuralSpecialist
57
T-COFFEE T-COFFEE Idea: 1. Build library of pairwise alignments 2. Alignment from seq i, j and seq j, k supports alignment from seq i, k.
58
T-COFFEE T-COFFEE Less sensitive to spurious pairwise similarities Can handle local homologies better than CLUSTAL
59
Evaluation of multi-alignment methods Alignment evaluation by comparison to trusted benchmark alignments. `True alignment known by information about structure or evolution.
60
1aboA 1.NLFVALYDfvasgdntlsitkGEKLRVLgynhn..............gE 1ycsB 1 kGVIYALWDyepqnddelpmkeGDCMTIIhrede............deiE 1pht 1 gYQYRALYDykkereedidlhlGDILTVNkgslvalgfsdgqearpeeiG 1ihvA 1.NFRVYYRDsrd......pvwkGPAKLLWkg.................eG 1vie 1.drvrkksga.........awqGQIVGWYctnlt.............peG 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN...... Key alpha helix RED beta strand GREEN core blocks UNDERSCORE BAliBASE Reference alignments Evaluation of multi-alignment methods
61
Result: DIALIGN best method for distantly related sequences, T-Coffee best for globally related proteins
62
Evaluation of multi-alignment methods Conclusion: no single best multi alignment program! Advice: try different methods!
63
Tools for phylogeny reconstruction Two approaches covered in this course: Distance methods, e.g. Neighbour-Joining Maximum Likelihood Other important methods (not covered in this course): Maximum parsimony Bayesian approaches
64
Tools for phylogeny reconstruction Phylogenetic trees: rooted trees unrooted trees Many methods produce unrooted trees: find root using outgroup!
65
Biological Question: Are Sponges mono-/paraphyletic? Phylogenetic Reconstuction: An Example Organims of interest: Sponge
66
Build Dataset Dataset Query Sequence DNA/Protein Sequence from Sponge Gene Search for Homologs using e.g BLAST Hits from Search: putative homologs
67
Sequence alignment Dataset Sequence Alignment Hits from Search: putative homologs Alignment tools: -Clustalw -T-Coffee -Dialign...many more Use to bring sequences in relation
68
Alignment Phylogenetic Tree Phylogeny Methods: Distance-based: ---Nj ---UPGMA Parsimony: ---Max.Parsimony(Phylip/Paup) Statistical: ---Max.Likelihood (Phyml) ---Bayesian Inf. (MrBayes) Estimate Phylogeny
69
Interpretate results Hypothesis: Sponges are monophyletic
70
Tools for phylogeny reconstruction Distance methods: For N sequences S 1, … S N : Calculate distance d(i,j) for any two sequences S i and S j Goal find tree that represents all distances d(i,j) as closely as possible To calculate distances d(i,j) : construct multiple alignment of input sequences, consider substitutions implied by alignment
71
Matrix of pairwise distances d(i,j)
72
Find tree that corresponds to distances d(i,j)
73
Tools for phylogeny reconstruction Maximum likelihood: Consider evolution of sequences as random process. Stochastical model assigns probabilities to substitutions. Consider tree T as hypothesis about observed sequence data D Search tree with highest likelihood P(D|T)
74
Tools for phylogeny reconstruction Assumptions: Positions in sequences (colums in alignment) independent of each other Events on different branches of tree independent of each other Result: probabilities can be multiplied
75
Probability P(D|T) for given residues at internal nodes
78
Consider all possible residues for internal nodes
79
Testing the reliability of a tree (or parts of it): the bootstrap approach Bootstrap in general: repeat statistical test after random re-sampling, i.e. by drawing additional sample data. In phylogeny: 1. Select randomly columns from Alignment and repeat tree reconstruction with the same method (e.g. 1000 times) 2. Calculate for every branch: how often is it observed in newly constructed trees?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.