Burkhard Morgenstern Institut für Mikrobiologie und Genetik Grundlagen der Bioinformatik Multiples Sequenzalignment Juni 2007
`Progressive´ Alignment Most popular approach to (global) multiple sequence alignment: Progressive Alignment Since mid-Eighties: Feng/Doolittle, Higgins/Sharp, Taylor, …
`Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WWRLNDKEGYVPRNLLGLYP AVVIQDNSDIKVVPKAKIIRD YAVESEAHPGSFQPVAALERIN WLNYNETTGERGDFPGTYVEYIGRKKISP
`Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WWRLNDKEGYVPRNLLGLYP AVVIQDNSDIKVVPKAKIIRD YAVESEAHPGSFQPVAALERIN WLNYNETTGERGDFPGTYVEYIGRKKISP Guide tree
`Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASFQPVAALERIN WLNYNEERGDFPGTYVEYIGRKKISP Profile alignment, “once a gap - always a gap”
`Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASVQ--PVAALERIN WLN-YNEERGDFPGTYVEYIGRKKISP Profile alignment, “once a gap - always a gap”
`Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN- WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASVQ--PVAALERIN WLN-YNEERGDFPGTYVEYIGRKKISP Profile alignment, “once a gap - always a gap”
`Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WW--RLNDKEGYVPRNLLGLYP AVVIQDNSDIKVVP--KAKIIRD YAVESEA---SVQ--PVAALERIN WLN-YNE---ERGDFPGTYVEYIGRKKISP Profile alignment, “once a gap - always a gap”
`Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WW--RLNDKEGYVPRNLLGLYP AVVIQDNSDIKVVP--KAKIIRD YAVESEA---SVQ--PVAALERIN WLN-YNE---ERGDFPGTYVEYIGRKKISP Most important implementation: CLUSTAL W
`Progressive´ Alignment CLUSTAL W; Thompson et al., 1994 (~ citations) Pairwise distances as 1 - percentage of identity Calculate un-rooted tree with Neighbor Joining Define root as central position in tree Define sequence weights based on tree Gap penalties calculated based on various parameters
Tools for multiple sequence alignment Problems with traditional approach: Results depend on gap penalty Heuristic guide tree determines alignment; alignment used for phylogeny reconstruction Algorithm produces global alignments.
Tools for multiple sequence alignment Problems with traditional approach: But: Many sequence families share only local similarity E.g. sequences share one conserved motif
Local sequence alignment Find common motif in sequences; ignore the rest EYENS ERYENS ERYAS
Local sequence alignment Find common motif in sequences; ignore the rest E-YENS ERYENS ERYA-S
Local sequence alignment Find common motif in sequences; ignore the rest – Local alignment E-YENS ERYENS ERYA-S
Local sequence alignment Traditional alignment approaches: Either global or local methods!
New question: sequence families with multiple local similarities Neither local nor global methods appliccable
New question: sequence families with multiple local similarities Alignment possible if order conserved
The DIALIGN approach
Consistency!
The DIALIGN approach
T-COFFEE C. Notredame, D. Higgins, J. Heringa (2000), T-Coffee: A novel algorithm for multiple sequence alignment, J. Mol. Biol. Problem: progressive alignment can go wrong if mistakes are made at an early stage. Example …
T-COFFEE SeqA GARFIELD THE LAST FAT CAT SeqB GARFIELD THE FAST CAT SeqC GARFIELD THE VERY FAST CAT SeqD THE FAT CAT
T-COFFEE SeqA GARFIELD THE LAST FAT CAT SeqB GARFIELD THE FAST CAT SeqC GARFIELD THE VERY FAST CAT SeqD THE FAT CAT
T-COFFEE
Idea: consider different pairwise alignments (local and global) check how these alignments support each other
T-COFFEE
T-COFFEE Less sensitive to spurious pairwise similarities Can handle local homologies better than CLUSTAL
Evaluation of multi-alignment methods Alignment evaluation by comparison to trusted benchmark alignments. `True’ alignment known by information about structure or evolution.
Evaluation of multi-alignment methods For protein alignment: M. McClure et al. (1994): 4 protein families, known functional sites J. Thompson et al. (1999): Benchmark data base, 130 known 3D structures (BAliBASE) T. Lassmann & E. Sonnhammer (2002): BAliBASE + simulated evolution (ROSE)
Evaluation of multi-alignment methods
Alignment evaluation by comparison to trusted benchmark alignments. `True’ alignment known by information about structure or evolution.
Evaluation of multi-alignment methods
1aboA 1.NLFVALYDfvasgdntlsitkGEKLRVLgynhn gE 1ycsB 1 kGVIYALWDyepqnddelpmkeGDCMTIIhrede deiE 1pht 1 gYQYRALYDykkereedidlhlGDILTVNkgslvalgfsdgqearpeeiG 1ihvA 1.NFRVYYRDsrd......pvwkGPAKLLWkg eG 1vie 1.drvrkksga awqGQIVGWYctnlt peG 1aboA 36 WCEAQt..kngqGWVPSNYITPVN ycsB 39 WWWARl..ndkeGYVPRNLLGLYP pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd vie 28 YAVESeahpgsvQIYPVAALERIN Key alpha helix RED beta strand GREEN core blocks UNDERSCORE BAliBASE Reference alignments Evaluation of multi-alignment methods
5 categories of benchmark sequences (globally related, internal gaps, end gaps) CLUSTAL W, RPPR perform well on globally related sequences, DIALIGN superior for local similarities Conclusion: no single best multi alignment program!
Evaluation of multi-alignment methods T. Lassmann & E. Sonnhammer (2002): BAliBASE + simulated evolution (ROSE)
Result: DIALIGN best for distantly related sequences, TCOFFEE best for closely related sequences