Download presentation
Presentation is loading. Please wait.
Published byAmberlynn Stewart Modified over 9 years ago
1
Using the T-Coffee Multiple Sequence Alignment Package I - Overview Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program
2
What is T-Coffee ? Tree Based Consistency based Objective Function for Alignment Evaluation – Progressive Alignment – Consistency
3
Progressive Alignment Feng and Dolittle, 1988; Taylor 1989 Clustering
4
Dynamic Programming Using A Substitution Matrix Progressive Alignment
5
-Depends on the ORDER of the sequences (Tree). -Depends on the CHOICE of the sequences. -Depends on the PARAMETERS: Substitution Matrix. Penalties (Gop, Gep). Sequence Weight. Tree making Algorithm.
6
Consistency? Consistency is an attempt to use alignment information at very early stages
7
T-Coffee and Concistency… SeqA GARFIELD THE LAST FAT CAT Prim. Weight =88 SeqB GARFIELD THE FAST CAT --- SeqA GARFIELD THE LAST FA-T CAT Prim. Weight =77 SeqC GARFIELD THE VERY FAST CAT SeqA GARFIELD THE LAST FAT CAT Prim. Weight =100 SeqD -------- THE ---- FAT CAT SeqB GARFIELD THE ---- FAST CAT Prim. Weight =100 SeqC GARFIELD THE VERY FAST CAT SeqC GARFIELD THE VERY FAST CAT Prim. Weight =100 SeqD -------- THE ---- FA-T CAT
8
T-Coffee and Concistency… SeqA GARFIELD THE LAST FAT CAT Prim. Weight =88 SeqB GARFIELD THE FAST CAT --- SeqA GARFIELD THE LAST FA-T CAT Prim. Weight =77 SeqC GARFIELD THE VERY FAST CAT SeqA GARFIELD THE LAST FAT CAT Prim. Weight =100 SeqD -------- THE ---- FAT CAT SeqB GARFIELD THE ---- FAST CAT Prim. Weight =100 SeqC GARFIELD THE VERY FAST CAT SeqC GARFIELD THE VERY FAST CAT Prim. Weight =100 SeqD -------- THE ---- FA-T CAT SeqA GARFIELD THE LAST FAT CAT Weight =88 SeqB GARFIELD THE FAST CAT --- SeqA GARFIELD THE LAST FA-T CAT Weight =77 SeqC GARFIELD THE VERY FAST CAT SeqB GARFIELD THE ---- FAST CAT SeqA GARFIELD THE LAST FA-T CAT Weight =100 SeqD -------- THE ---- FA-T CAT SeqB GARFIELD THE ---- FAST CAT
9
T-Coffee and Concistency… SeqA GARFIELD THE LAST FAT CAT Weight =88 SeqB GARFIELD THE FAST CAT --- SeqA GARFIELD THE LAST FA-T CAT Weight =77 SeqC GARFIELD THE VERY FAST CAT SeqB GARFIELD THE ---- FAST CAT SeqA GARFIELD THE LAST FA-T CAT Weight =100 SeqD -------- THE ---- FA-T CAT SeqB GARFIELD THE ---- FAST CAT
10
T-Coffee and Concistency…
11
Where Do The Primary Alignments Come From? Primary Alignments – Primary Library Source – Any valid Third Party Method
12
T-Coffee and Concistency…
14
Using the T-Coffee Multiple Sequence Alignment Package II – M-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program
15
What is the Best MSA method ? More than 50 MSA methods Some methods are fast and inacurate – Mafft, muscle, kalign Some methods are slow and accurate – T-Coffee, ProbCons Some Methods are slow and inacurate… – ClustalW
16
Why Not Combining Them ? All Methods give different alignments Their Agreement is an indication of accuracy t_coffee –method mafft_msa, muscle_msa
17
Combining Many MSAs into ONE MUSCLE MAFFT ClustalW ??????? T-Coffee
19
Where to Trust Your Alignments Most Methods Agree Most Methods Disagree
20
What To Do Without Structures
21
Using the T-Coffee Multiple Sequence Alignment Package III – Template Based Alignments Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program
22
Sometimes Sequences are Not Enough Sequence based alignments are limited in accuracy – 30% for proteins – 70% for DNA It is hard to align correctly sequences whose similarity is below these values – Twilight zone
23
One Solution: Template Based Alignment Replace the sequence with something more informative – PDB Structure Expresso – Profile PSI-Coffee – RNA-StructureR-Coffee
24
Template Based Multiple Sequence Alignments -Structure -Profile -… Sources Templates Library Template Aligner Template Alignment Source Template Alignment Remove Templates Templates -Structure -Profile -…
25
Expresso: Finding the Right Structure Sources Templates Library BLAST SAP Template Alignment Source Template Alignment Remove Templates Templates
26
PSI-Coffee: Homology Extension Sources Templates Library BLAST Template Alignment Source Template Alignment Remove Templates Templates Profile Aligner
27
What is Homology Extension ? LL L ? -Simple scoring schemes result in alignment ambiguities
28
What is Homology Extension ? LL L L L L L L L L L I V I L L L L L L L Profile 1 Profile 2
29
What is Homology Extension ? LL L L L L L L L L L I V I L L L L L L L Profile 1 Profile 2
30
Method TemplateScoreComment ClustalW-2ProgressiveNO22.74 PRANKGapNO26.18Science2008 MAFFTIterativeNO26.18 MuscleIterativeNO31.37 ProbConsConsistencyNO40.80 ProbConsMonoPhasicNO37.53 T-CoffeeConsistencyNO42.30 M-Coffe4ConsistencyNO43.60 PSI-CoffeeConsistencyProfile53.71 PROMALConsistencyProfile55.08 PROMAL-3DConsistencyPDB57.60 3D-CoffeeConsistencyPDB61.00Expresso Score: fraction of correct columns when compared with a structure based reference (BB11 of BaliBase).
31
Experimental Data … TARGET Experimental Data … TARGET Template Aligner Template-Sequence Alignment Primary Library Template Alignment Template based Alignment of the Sequences Templates TARGET
32
Using the T-Coffee Multiple Sequence Alignment Package IV – RNA Alignments Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program
33
ncRNAs Comparison And ENCODE said… “nearly the entire genome may be represented in primary transcripts that extensively overlap and include many non-protein-coding regions” Who Are They? – tRNA, rRNA, snoRNAs, – microRNAs, siRNAs – piRNAs – long ncRNAs (Xist, Evf, Air, CTN, PINK…) How Many of them – Open question – 30.000 is a common guess – Harder to detect than proteins.
34
ncRNAs Can Evolve Rapidly CCAGGCAAGACGGGACGAGAGTTGCCTGG CCTCCGTTCAGAGGTGCATAGAACGGAGG **-------*--**---*-**------** GAACGGACCGAACGGACC CTTGCCTGGCTTGCCTGG G G A A CC A C G G A G A C G CTTGCCTCCCTTGCCTCC GAACGGAGGGAACGGAGG G G A A CC A C G G A G A C G
35
The Holy Grail of RNA Comparison: Sankoff’ Algorithm
36
The Holy Grail of RNA Comparison Sankoff’ Algorithm Simultaneous Folding and Alignment – Time Complexity: O(L 2n ) – Space Complexity: O(L 3n ) In Practice, for Two Sequences: – 50 nucleotides: 1 min.6 M. – 100 nucleotides 16 min.256 M. – 200 nucleotides 4 hours 4 G. – 400 nucleotides3 days3 T. Forget about – Multiple sequence alignments – Database searches
37
RNA Sequences Secondary Structures Primary Library R-Coffee Extended Primary Library Progressive Alignment Using The R-Score RNAplfold Consan or Mafft / Muscle / ProbCons R-Coffee Extension R-Score
38
C C R-Coffee Extension G G TC Library G G Score X C C Score Y C C G G Goal: Embedding RNA Structures Within The T-Coffee Libraries The R-extension can be added on the top of any existing method.
39
R-Coffee + Regular Aligners MethodAvg BraliscoreNet Improv. direct +T+R +T+R ----------------------------------------------------------- Poa0.620.650.70 48154 Pcma0.620.640.67 34120 Prrn0.640.610.66-63 45 ClustalW0.650.650.69 -7 83 Mafft_fftnts0.680.680.72 17 68 ProbConsRNA0.690.670.71-49 39 Muscle0.690.690.73-17 42 Mafft_ginsi0.700.680.72-49 39 ----------------------------------------------------------- Improvement= # R-Coffee wins - # R-Coffee looses
40
RM-Coffee + Regular Aligners MethodAvg BraliscoreNet Improv. direct +T+R +T+R ----------------------------------------------------------- Poa0.620.650.70 48154 Pcma0.620.640.67 34120 Prrn0.640.610.66-63 45 ClustalW0.650.650.69 -7 83 Mafft_fftnts0.680.680.72 17 68 ProbConsRNA0.690.670.71-49 39 Muscle0.690.690.73-17 42 Mafft_ginsi0.700.680.72-49 39 ----------------------------------------------------------- RM-Coffee40.71 /0.74 / 84
41
R-Coffee + Structural Aligners MethodAvg BraliscoreNet Improv. direct +T+R +T+R ----------------------------------------------------------- Stemloc0.620.750.76 104113 Mlocarna0.660.690.71 101133 Murlet0.730.700.72 -132-73 Pmcomp0.730.730.73 142145 T-Lara0.740.740.69 -36 -8 Foldalign0.750.770.77 72 73 ----------------------------------------------------------- Dyalign---0.630.62 ------ Consan---0.790.79 ------ ----------------------------------------------------------- RM-Coffee40.71 /0.74 / 84
42
Using the T-Coffee Multiple Sequence Alignment Package V – DNA Alignments Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program
43
Aligning Genomic DNA Main problem – Tell a good alignment from a bad one Strategy: – Tuning on Orthologous Promoter Detection – Evaluation on ChIp-Seq Data
44
Aligning Genomic DNA Main problem – Tell a good alignment from a bad one Strategy: – Tuning on Orthologous Promoter Detection – Evaluation on ChIp-Seq Data
45
Aligning Genomic DNA Tuning of Gap Penalties Design of a di- nucleotide substitution matrix
46
Aligning Genomic DNA
47
gDNA is very heterogenous Each genomic feature requires its own aligner Aligning non-orthologous regions with a global aligner is impossible Pro-Coffee is designed to align orthologous promoter regions
48
Using the T-Coffee Multiple Sequence Alignment Package VI – Wrap Up Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program
49
Which Flavor? Fast Alignments – M-Coffee with Fast Aligners: mafft, muscle, kalign Difficult Protein Alignments – Expresso – PSI-Coffee RNA Alignments – R-Coffee Promoter Alignments – Pro-Coffee
50
www.tcoffee.org
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.