Download presentation
Presentation is loading. Please wait.
Published byTyler Malone Modified over 10 years ago
1
Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department of Bioinformatics Göttingen, October/November 2006
2
Tools for multiple sequence alignment T Y I M R E A Q Y E T C I V M R E A Y E
3
Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V M R E A - Y E
4
Tools for multiple sequence alignment T Y I M R E A Q Y E T C I V M R E A Y E Y I M Q E V Q Q E Y I A M R E Q Y E
5
Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V M R E A - Y E Y - I - M Q E V Q Q E Y – I A M R E - Q Y E
6
Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V M R E A - Y E - Y I - M Q E V Q Q E Y – I A M R E - Q Y E Astronomical Number of possible alignments!
7
Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V - M R E A Y E - Y I - M Q E V Q Q E Y – I A M R E - Q Y E Astronomical Number of possible alignments!
8
Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V M R E A - Y E - Y I - M Q E V Q Q E Y – I A M R E - Q Y E Which one is the best ???
9
Tools for multiple sequence alignment Questions in development of alignment programs: (1) What is a good alignment? objective function (`score) (2) How to find a good alignment? optimization algorithm First question far more important !
10
Tools for multiple sequence alignment Before defining an objective function (scoring scheme) What is a biologically good alignment ??
11
Tools for multiple sequence alignment Criteria for alignment quality: 1. 3D-Structure: align residues at corresponding positions in 3D structure of protein! 2. Evolution: align residues with common ancestors!
12
Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V - M R E A Y E - Y I - M Q E V Q Q E - Y I A M R E - Q Y E Alignment hypothesis about sequence evolution Search for most plausible hypothesis!
13
Tools for multiple sequence alignment Compute for amino acids a and b Probability p a,b of substitution a b (or b a), Frequency q a of a Define s(a,b) = log (p a,b / q a q b )
14
Tools for multiple sequence alignment
16
Traditional objective functions: Define Score of alignments as Sum of individual similarity scores s(a,b) Gap penalty g for each gap in alignment Needleman-Wunsch scoring system (1970) for pairwise alignment (= alignment of two sequences)
17
T Y W I V T - - L V Example: Score = s(T,T) + s(I,L) + s (V,V) – 2 g
18
T Y W I V T - - L V Idea: alignment with optimal (maximal) score probably biologically meaningful. Dynamic programming algorithm finds optimal alignment for two sequences efficiently (Needleman and Wunsch, 1970).
19
Tools for multiple sequence alignment Traditional Objective functions can be generalized to multiple alignment (e.g. sum-of-pair score, tree alignment) Needleman-Wunsch algorithm can also be generalized to find optimal multiple alignment, but: Very time and memory consuming! -> Heuristic algorithm needed, i.e. fast but sub-optimal solution
20
Tools for multiple sequence alignment Most commonly used heuristic for multiple alignment: Progressive alignment (mid 1980s)
21
`Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WWRLNDKEGYVPRNLLGLYP AVVIQDNSDIKVVPKAKIIRD YAVESEAHPGSFQPVAALERIN WLNYNETTGERGDFPGTYVEYIGRKKISP
22
`Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WWRLNDKEGYVPRNLLGLYP AVVIQDNSDIKVVPKAKIIRD YAVESEAHPGSFQPVAALERIN WLNYNETTGERGDFPGTYVEYIGRKKISP Guide tree
23
`Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASFQPVAALERIN WLNYNEERGDFPGTYVEYIGRKKISP Profile alignment, once a gap - always a gap
24
`Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASVQ--PVAALERIN------ WLN-YNEERGDFPGTYVEYIGRKKISP Profile alignment, once a gap - always a gap
25
`Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN- WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASVQ--PVAALERIN------ WLN-YNEERGDFPGTYVEYIGRKKISP Profile alignment, once a gap - always a gap
26
`Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN-------- WW--RLNDKEGYVPRNLLGLYP-------- AVVIQDNSDIKVVP--KAKIIRD------- YAVESEA---SVQ--PVAALERIN------ WLN-YNE---ERGDFPGTYVEYIGRKKISP Profile alignment, once a gap - always a gap
27
CLUSTAL W Most important software program: CLUSTAL W: J. Thompson, T. Gibson, D. Higgins (1994), CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment … Nuc. Acids. Res. 22, 4673 - 4680 (~ 20.000 citations in the literature)
28
Tools for multiple sequence alignment Problems with traditional approach: Results depend on gap penalty Heuristic guide tree determines alignment; alignment used for phylogeny reconstruction Algorithm produces global alignments.
29
Tools for multiple sequence alignment Problems with traditional approach: But: Many sequence families share only local similarity E.g. sequences share one conserved motif
30
Local sequence alignment Find common motif in sequences; ignore the rest EYENS ERYENS ERYAS
31
Local sequence alignment Find common motif in sequences; ignore the rest E-YENS ERYENS ERYA-S
32
Local sequence alignment Find common motif in sequences; ignore the rest – Local alignment E-YENS ERYENS ERYA-S
33
Gibbs Motive Sampler Local multiple alignment without gaps: C.E. Lawrence et al. (1993) Detecting subtle sequence signals: a Gibbs Sampling Strategy for Multiple Alignment Science, 262, 208 - 214
34
Traditional alignment approaches: Either global or local methods!
35
New question: sequence families with multiple local similarities Neither local nor global methods appliccable
36
New question: sequence families with multiple local similarities Alignment possible if order conserved
37
The DIALIGN approach Morgenstern, Dress, Werner (1996), PNAS 93, 12098-12103 Combination of global and local methods Assemble multiple alignment from gap-free local pair-wise alignments (,,fragments)
38
The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
39
The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
40
The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
41
The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
42
The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
43
The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
44
The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
45
The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacccctgaattgaataa
46
The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa
47
The DIALIGN approach atc------taatagttaaactcccccgtgc-ttag cagtgcgtgtattactaac----------gg-ttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa
48
The DIALIGN approach atc------taatagttaaactcccccgtgc-ttag cagtgcgtgtattactaac----------gg-ttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa Consistency!
49
The DIALIGN approach atc------TAATAGTTAaactccccCGTGC-TTag cagtgcGTGTATTACTAAc----------GG-TTCAATcgcg caaa--GAGTATCAcc----------CCTGaaTTGAATaa
50
The DIALIGN approach Multiple alignment: atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
51
The DIALIGN approach Multiple alignment: atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaccctgaattgaagagtatcacataa (1) Calculate all optimal pair-wise alignments
52
The DIALIGN approach Multiple alignment: atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa (1) Calculate all optimal pair-wise alignments
53
The DIALIGN approach Multiple alignment: atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa (1) Calculate all optimal pair-wise alignments
54
The DIALIGN approach Fragments from optimal pair-wise alignments might be inconsistent
55
The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
56
The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
57
The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
58
The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacccctgaattgaataa
59
The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacccctgaattgaataa
60
The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
61
The DIALIGN approach Score of alignment: Define weight score for fragments based on probability of random occurrence Score of alignment = sum of weight scores of fragments Goal: find consistent set of fragments with maximum total weight
62
The DIALIGN approach Advantages of segment-based approach: Program can produce global and local alignments! Sequence families alignable that cannot be aligned with standard methods
63
T-COFFEE C. Notredame, D. Higgins, J. Heringa (2000), T-Coffee: A novel algorithm for multiple sequence alignment, J. Mol. Biol.
67
T-COFFEE T-COFFEE Less sensitive to spurious pairwise similarities Can handle local homologies better than CLUSTAL
68
T-COFFEE T-COFFEE Idea: 1. Build library of pairwise alignments 2. Alignment from seq i, j and seq j, k supports alignmetn from seq i, k.
69
Evaluation of multi-alignment methods Alignment evaluation by comparison to trusted benchmark alignments. `True alignment known by information about structure or evolution.
70
Evaluation of multi-alignment methods For protein alignment: M. McClure et al. (1994): 4 protein families, known functional sites J. Thompson et al. (1999): Benchmark data base, 130 known 3D structures (BAliBASE) T. Lassmann & E. Sonnhammer (2002): BAliBASE + simulated evolution (ROSE)
71
Evaluation of multi-alignment methods
72
1aboA 1.NLFVALYDfvasgdntlsitkGEKLRVLgynhn..............gE 1ycsB 1 kGVIYALWDyepqnddelpmkeGDCMTIIhrede............deiE 1pht 1 gYQYRALYDykkereedidlhlGDILTVNkgslvalgfsdgqearpeeiG 1ihvA 1.NFRVYYRDsrd......pvwkGPAKLLWkg.................eG 1vie 1.drvrkksga.........awqGQIVGWYctnlt.............peG 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN...... Key alpha helix RED beta strand GREEN core blocks UNDERSCORE BAliBASE Reference alignments Evaluation of multi-alignment methods
74
Result: DIALIGN best method for distantly related sequences, T-Coffee best for globally related proteins
75
Evaluation of multi-alignment methods BAliBASE: 5 categories of benchmark sequences (globally related, internal gaps, end gaps) CLUSTAL W, T-COFFEE, MAFFT, PROBCONS perform well on globally related sequences, DIALIGN superior for local similarities
76
Evaluation of multi-alignment methods Conclusion: no single best multi alignment program! Advice: try different methods!
77
Anchored sequence alignment Idea: semi-automatic alignment use expert knowledge to define constraints instead of fully automated alignment Define parts of the sequences where biologically correct alignment is known as anchor points, align rest of the sequences automatically.
78
Anchored sequence alignment NLFVALYDFVASGDNTLSITKGEKLRVLGYNHN IIHREDKGVIYALWDYEPQNDDELPMKEGDCMT GYQYRALYDYKKEREEDIDLHLGDILTVNKGSLVALGFS
79
Anchored sequence alignment NLFVALYDFVASGDNTLSITKGEKLRVLGYNHN IIHREDKGVIYALWDYEPQNDDELPMKEGDCMT GYQYRALYDYKKEREEDIDLHLGDILTVNKGSLVALGFS Anchor points in multiple alignment
80
Anchored sequence alignment NLFV ALYDFVASGDNTLSITKGEKLRVLGYNHN IIHREDKGVIYALWDYEPQND DELPMKEGDCMT GYQYRALYDYKKEREEDIDLHLGDILTVNKGSLVALGFS Anchor points in multiple alignment
81
Anchored sequence alignment -------NLF V-ALYDFVAS GD-------- NTLSITKGEk lrvLGYNhn iihredkGVI Y-ALWDYEPQ ND-------- DELPMKEGDC MT------- -------GYQ YrALYDYKKE REedidlhlg DILTVNKGSL VA-LGFS-- Anchored multiple alignment
82
Algorithmic questions Goal: Find optimal alignment (=consistent set of fragments) under costraints given by user- specified anchor points!
83
Additional input file with anchor points: 1 3 215 231 5 4.5 2 3 34 78 23 1.23 1 4 317 402 8 8.5 Algorithmic questions
84
NLFVALYDFVASGDNTLSITKGEKLRVLGYNHN IIHREDKGVIYALWDYEPQNDDELPMKEGDCMT GYQYRALYDYKKEREEDIDLHLGDILTVNKGSLVALGFS
85
Additional input file with anchor points: 1 3 215 231 5 4.5 2 3 34 78 23 1.23 1 4 317 402 8 8.5 Algorithmic questions
86
Additional input file with anchor points: 1 3 215 231 5 4.5 2 3 34 78 23 1.23 1 4 317 402 8 8.5 Sequences Algorithmic questions
87
Additional input file with anchor points: 1 3 215 231 5 4.5 2 3 34 78 23 1.23 1 4 317 402 8 8.5 Sequences start positions Algorithmic questions
88
Additional input file with anchor points: 1 3 215 231 5 4.5 2 3 34 78 23 1.23 1 4 317 402 8 8.5 Sequences start positions length Algorithmic questions
89
Additional input file with anchor points: 1 3 215 231 5 4.5 2 3 34 78 23 1.23 1 4 317 402 8 8.5 Sequences start positions length score Algorithmic questions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.