3D-COFFEE Mixing Sequences and Structures Cédric Notredame
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. :::.:... :.. *. *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM mouse AKDDRIRYDNEMKSWEEQMAE * :.*. : Potential Uses of A Multiple Sequence Alignment? Extrapolation Motifs/Patterns Phylogeny Profiles Struc. Prediction Multiple Alignments Are CENTRAL to MOST Bioinformatics Techniques.
Why Is It Difficult To Compute A multiple Sequence Alignment? A CROSSROAD PROBLEM BIOLOGY: What is A Good Alignment COMPUTATION What is THE Good Alignment chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. :::.:... :.. *. *: *
Why Is It Difficult To Compute A multiple Sequence Alignment ? BIOLOGY CIRCULAR PROBLEM.... Good Sequences Good Alignment COMPUTATION
The T-Coffee Algorithm
Local Alignment Global Alignment Extension Multiple Sequence Alignment Mixing Local and Global Alignments
What is a library? Extension+T-Coffee Library Based Multiple Sequence Alignment 2 Seq1 MySeq Seq2 MyotherSeq # …. 3 Seq1 anotherseq Seq2 atsecondone Seq3 athirdone # # ….
The Triplet Assumption X Y Z X Y SEQ A SEQ B Consistency Consensus
ClustalWT-Coffee
Dynamic Programming Using An Extended Library Progressive Alignment
What Is BaliBase How Good is T-Coffee ??? Best Performing Method on MSA benchmark Datasets BaliBase -Notredame -Sonhammer Ribosomal RNA -Katoh (Mafft) Homstrad -Notredame OxBench -Barton
Mixing Heterogenous Data With T-Coffee Local AlignmentGlobal Alignment Multiple Sequence Alignment Multiple Alignment StructuralSpecialist
Mixing Sequences and Structures
Why Do We Want To Mix Sequences and Structures? 1-Predicting Sequence Structures STUCTURE FUNCTION
Why Do We Want To Mix Sequences and Structures? Sequences are Cheap and Common. Structures are Expensive and Rare.
Why Do We Want To Mix Sequences and Structures? Cheapest Structure determination: Sequence-Structure Alignment THREAD Or ALIGN ADKPRRP---LS-YMLWLN ADKPKRPKPRLSAYMLWLN
Why Do We Want To Mix Sequences and Structures? ADKPRRP---LS-YMLWLN ADKPKRPKPRLSAYMLWLN THREAD Or ALIGN Convincing Alignment Same Fold
Why Do We Want To Mix Sequences and Structures? Convincing Alignment Same Fold Distant sequences are hard to align
Why Do We Want To Mix Sequences and Structures? chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. :::.:... :.. *. *: * Multiple Sequence Alignments Help Exploring the Twilight Zone
Why Do We Want To Mix Sequences and Structures? 1-Predicting Sequence Structures 2-Produce Better Alignments
Why Do We Want To Mix Sequences and Structures? ADKPRRP---LS-YMLWLN ADKPKRPKPRLSAYMLWLN ALIGN Unreliable alignment if %ID <30%
Why Do We Want To Mix Sequences and Structures? Alignment Unsentitive to %ID ADKPRRP---LS-YMLWLN ADKPKRPKPRLSAYMLWLN Struc. Superposition Folds evolve Slower than Sequences
Why Do We Want To Mix Sequences and Structures?
Structure Superposition
Why Do We Want To Mix Sequences and Structures? 1-Predicting Sequence Structures 2-Produce Better Alignments
How To Mix Sequences and Structures
Mixing Heterogenous Data With T-Coffee Local AlignmentGlobal Alignment Multiple Sequence Alignment Multiple Alignment StructuralSpecialist
Struct Vs Struct Seq Vs Struct Thread Evaluation on Homestrad Superpose Seq Vs Seq Local Global Mixing Sequences and Structures with T-Coffee
The 3D-Coffee Libraries Methods Global: Needlman and Wunsch Local:Sim (lalign) Threading: Fugue Superposition:SAP
Threading: Fugue
Fugue Threading: Fugue
Fugue Threading: Fugue 1-Turn Sequence into a profile: -lower penalties in loops -Structure specific matrix 2- Align Profile with Sequence
Evaluating Fugue Threading: Fugue 1-Select 967 pairs of sequences in HOMSTRAD FUGUE T-Coffee 2-Align each pair with T-Coffee and Fugue. Compare 3-Compare the Two Alignments
Fugue Threading: Fugue 1-Select 967 pairs of sequences in HOMSTRAD 2-Align each pair with T-Coffee and Fugue. 3-Compare the Two Alignments TCdef wins Fugue wins TCdef:58.81% Fugue:61.81%
Superposition: SAP
Superposition:SAP
1-High Level Dynamic Programming Substitution Matrix when doing regular Alignments 2-Low Level DP. Forcing the aln of two residues
1-High Level Dynamic Programming Superposition:SAP Rigid Body Superposition RMSD 2-Low Level DP. Forcing the aln of two residues
1-High Level Dynamic Programming Superposition:SAP Rigid Body Superposition RMSD 2-Low Level DP. Forcing the aln of two residues
1-High Level Dynamic Programming Superposition:SAP 3-Rigid Body Superposition 2-Low Level DP. Evaluate Every Pair
1-High Level Dynamic Programming Superposition:SAP Structure Based Sequence Alignment Make a DP on the accumulated traces Use Traces like a Substitution Matrix
1-Select 967 pairs of sequences in HOMSTRAD 2-Align each pair with T-Coffee and SAP. 3-Compare the Two Alignments Superposition:SAP
1-Select 967 pairs of sequences in HOMSTRAD 2-Align each pair with T-Coffee and SAP. 3-Compare the Two Alignments Superposition:SAP TCdef:58.81% SAP:86.31%
SAPFugue TCdef:58.81% Fugue:61.81% TCdef:58.81% Fugue:86.31%
Sequences and Structures: How Good is The Mixture ???
Our Benchmark: HOM39 -HOMSTRAD: Structure based MSAs that can be used as References. -COMPACT and DEMANDING -HOM39: The 39 Most difficult datasets (percent ID lower than 25).
Our BenchMark: Using HOM39 BENCHMARKING Strategy: -re-align HOM39 without using ALL the structures -Compare the result with the reference
Evaluating 3D-Coffee 1- Can a SINGLE structure Help ?
Seq Vs Struct Thread Evaluation on HOM39 Seq Vs Seq Local Global Using ONE structure with 3D-Coffee HOM39 with ONE Structure per MSA
Evaluating 3D-Coffee 1- Can a SINGLE structure Help ? 2- Does it benefit to ALL the Sequences Is EVERYONE Happier if there is a STAR in the team…
BaliBase HOM39 TC-Fugue + Remove Provided Structure(s) Comparison
Evaluating 3D-Coffee 1- Can a SINGLE structure Help ? 3- Can We Use Two or More Structures 2-Does it benefit to all the sequences
Seq Vs Struct Fugue Evaluation on Homestrad Seq Vs Seq Local Global Mixing Sequences and Structures with 3D-Coffee HOM39 with TWO Structures/MSA Struct Vs Struct SAP, LSQ
Indirect Improvement Direct Improvement
Evaluating 3D-Coffee 1- Can a SINGLE structure Help ? 4-Relation Accuracy/ N-structures ??? 2-Does it benefit to all the sequences 3-Can we use Two Structures
Seq Vs Struct Fugue Evaluation on Homestrad Seq Vs Seq Local Global Mixing Sequences and Structures with T-Coffee HOM39 with 1-N Structures per MSA Struct Vs Struct SAP
Induced Improvement
Conclusion
-Structures Help BUT NOT SO MUCH
The More Structures The Merrier
Credits Orla O’Sullivan: University College, Cork, Ireland Des Higgins: University College, Cork, Ireland Karsten Suhre: IGS-CNRS, Marseille, France
Conclusion The program is available on request from: