3D-COFFEE Mixing Sequences and Structures Cédric Notredame.

Slides:



Advertisements
Similar presentations
Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
Advertisements

Multiple Sequence Alignment (MSA) I519 Introduction to Bioinformatics, Fall 2012.
Multiple Sequence Alignment
Clustal Ω for Protein Multiple Sequence Alignment Des Higgins (Conway Institute, University College Dublin, Ireland), “Clustal Omega for Protein Multiple.
COFFEE: an objective function for multiple sequence alignments
Structural bioinformatics
BNFO 602 Multiple sequence alignment Usman Roshan.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Expected accuracy sequence alignment
1 CAP5510 – Bioinformatics Multiple Alignment Tamer Kahveci CISE Department University of Florida.
BNFO 602 Lecture 2 Usman Roshan. Sequence Alignment Widely used in bioinformatics Proteins and genes are of different lengths due to error in sequencing.
BNFO 602, Lecture 3 Usman Roshan Some of the slides are based upon material by David Wishart of University.
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Lecture 6 – 07/01/08 Multiple sequence alignment 2 Sequence analysis 2007 Optimizing.
BNFO 602 Multiple sequence alignment Usman Roshan.
Multiple Sequence Alignments
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Needleman-Wunsch with affine gaps
Chapter 5 Multiple Sequence Alignment.
Multiple sequence alignment
Biology 4900 Biocomputing.
Multiple Sequence Alignment
An Introduction to Multiple Sequence Alignments Cédric Notredame.
Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.
Coffee Shop F 黃仁暐 F 戴志華 F 施逸優 R 吳於芳 R 林與絜.
© Wiley Publishing All Rights Reserved. Building Multiple- Sequence Alignments.
Cédric Notredame (19/10/2015) Using Dynamic Programming To Align Sequences Cédric Notredame.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Using the T-Coffee Multiple Sequence Alignment Package I - Overview Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.
Getting the best out of multiple sequence alignment methods in the genomic era Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
Integrating Biological Information In Multiple Sequence Alignments Confronting Bits and Pieces of Information Cédric Notredame CNRS-Marseille, France
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
Cédric Notredame (07/11/2015) Recent Progress in Multiple Sequence Alignments: A Survey Cédric Notredame.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Classifying MSA Packages Multiple Sequence Alignments in the Genome Era Cédric Notredame Information Génétique et Structurale CNRS-Marseille, France.
T-Coffee tutorial ACGT Retreat 2012 Jean-François Taly, Ionas Erb and Cedrik Magis.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Multiple Sequence Alignment Scott Walmsley, PhD Research Instructor, Department Pharmaceutical Sciences Skaggs School of Pharmacy.
Techniques for Protein Sequence Alignment and Database Searching (part2) G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Medical Natural Sciences Year 2: Introduction to Bioinformatics Lecture 9: Multiple sequence alignment (III) Centre for Integrative Bioinformatics VU.
Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.
MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance Andrew I. Jewett, Conrad C. Huang and Thomas.
Sequence Alignment.
Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique CN+LF An introduction to multiple alignments © Cédric Notredame.
Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Multiple alignments, PATTERNS, PSI-BLAST.
Finding, Aligning and Analyzing Non Coding RNAs Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.
T-COFFEE, a novel method for Multiple Sequence Alignments Cédric Notredame.
Expected accuracy sequence alignment Usman Roshan.
Multiple Sequence Alignment
1 Multiple Sequence Alignment(MSA). 2 Multiple Alignment Number of sequences >2 Global alignment Seek an alignment that maximizes score.
Cédric Notredame (22/02/2016) Comparing Two Protein Sequences Cédric Notredame.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
T-COFFEE, a novel method for combining biological information Cédric Notredame.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Aligning Kinases Applying MSA Analysis to the CDK family.
Pairwise alignment Now we know how to do it: How do we get a multiple alignment (three or more sequences)? Multiple alignment: much greater combinatorial.
ncRNA Multiple Alignments with R-Coffee
The ideal approach is simultaneous alignment and tree estimation.
Comparing Two Protein Sequences
Recent Progress in Multiple Sequence Alignments: A Survey
An Introduction to Multiple Sequence Alignments
An Introduction to Multiple Sequence Alignments
Multiply Aligning RNA Sequences
Using Dynamic Programming To Align Sequences
BIOINFORMATICS Summary
Olivier Poirot, Eamonn O'Toole and Cedric Notredame
T-Coffee: What’s New in The Grinder
Presentation transcript:

3D-COFFEE Mixing Sequences and Structures Cédric Notredame

chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. :::.:... :.. *. *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM mouse AKDDRIRYDNEMKSWEEQMAE * :.*. : Potential Uses of A Multiple Sequence Alignment? Extrapolation Motifs/Patterns Phylogeny Profiles Struc. Prediction Multiple Alignments Are CENTRAL to MOST Bioinformatics Techniques.

Why Is It Difficult To Compute A multiple Sequence Alignment? A CROSSROAD PROBLEM BIOLOGY: What is A Good Alignment COMPUTATION What is THE Good Alignment chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. :::.:... :.. *. *: *

Why Is It Difficult To Compute A multiple Sequence Alignment ? BIOLOGY CIRCULAR PROBLEM.... Good Sequences Good Alignment COMPUTATION

The T-Coffee Algorithm

Local Alignment Global Alignment Extension Multiple Sequence Alignment Mixing Local and Global Alignments

What is a library? Extension+T-Coffee Library Based Multiple Sequence Alignment 2 Seq1 MySeq Seq2 MyotherSeq # …. 3 Seq1 anotherseq Seq2 atsecondone Seq3 athirdone # # ….

The Triplet Assumption X Y Z X Y SEQ A SEQ B Consistency Consensus

ClustalWT-Coffee

Dynamic Programming Using An Extended Library Progressive Alignment

What Is BaliBase How Good is T-Coffee ??? Best Performing Method on MSA benchmark Datasets BaliBase -Notredame -Sonhammer Ribosomal RNA -Katoh (Mafft) Homstrad -Notredame OxBench -Barton

Mixing Heterogenous Data With T-Coffee Local AlignmentGlobal Alignment Multiple Sequence Alignment Multiple Alignment StructuralSpecialist

Mixing Sequences and Structures

Why Do We Want To Mix Sequences and Structures? 1-Predicting Sequence Structures STUCTURE  FUNCTION

Why Do We Want To Mix Sequences and Structures? Sequences are Cheap and Common. Structures are Expensive and Rare.

Why Do We Want To Mix Sequences and Structures? Cheapest Structure determination: Sequence-Structure Alignment THREAD Or ALIGN ADKPRRP---LS-YMLWLN ADKPKRPKPRLSAYMLWLN

Why Do We Want To Mix Sequences and Structures? ADKPRRP---LS-YMLWLN ADKPKRPKPRLSAYMLWLN THREAD Or ALIGN Convincing Alignment  Same Fold

Why Do We Want To Mix Sequences and Structures? Convincing Alignment  Same Fold Distant sequences are hard to align

Why Do We Want To Mix Sequences and Structures? chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. :::.:... :.. *. *: * Multiple Sequence Alignments Help Exploring the Twilight Zone

Why Do We Want To Mix Sequences and Structures? 1-Predicting Sequence Structures 2-Produce Better Alignments

Why Do We Want To Mix Sequences and Structures? ADKPRRP---LS-YMLWLN ADKPKRPKPRLSAYMLWLN ALIGN Unreliable alignment if %ID <30%

Why Do We Want To Mix Sequences and Structures? Alignment Unsentitive to %ID ADKPRRP---LS-YMLWLN ADKPKRPKPRLSAYMLWLN Struc. Superposition Folds evolve Slower than Sequences

Why Do We Want To Mix Sequences and Structures?

Structure Superposition

Why Do We Want To Mix Sequences and Structures? 1-Predicting Sequence Structures 2-Produce Better Alignments

How To Mix Sequences and Structures

Mixing Heterogenous Data With T-Coffee Local AlignmentGlobal Alignment Multiple Sequence Alignment Multiple Alignment StructuralSpecialist

Struct Vs Struct Seq Vs Struct Thread Evaluation on Homestrad Superpose Seq Vs Seq Local Global Mixing Sequences and Structures with T-Coffee

The 3D-Coffee Libraries Methods Global: Needlman and Wunsch Local:Sim (lalign) Threading: Fugue Superposition:SAP

Threading: Fugue

Fugue Threading: Fugue

Fugue Threading: Fugue 1-Turn Sequence into a profile: -lower penalties in loops -Structure specific matrix 2- Align Profile with Sequence

Evaluating Fugue Threading: Fugue  1-Select 967 pairs of sequences in HOMSTRAD FUGUE T-Coffee 2-Align each pair with T-Coffee and Fugue. Compare 3-Compare the Two Alignments

Fugue Threading: Fugue 1-Select 967 pairs of sequences in HOMSTRAD 2-Align each pair with T-Coffee and Fugue. 3-Compare the Two Alignments TCdef wins Fugue wins TCdef:58.81% Fugue:61.81%

Superposition: SAP

Superposition:SAP

1-High Level Dynamic Programming Substitution Matrix when doing regular Alignments 2-Low Level DP. Forcing the aln of two residues

1-High Level Dynamic Programming Superposition:SAP Rigid Body Superposition RMSD 2-Low Level DP. Forcing the aln of two residues

1-High Level Dynamic Programming Superposition:SAP Rigid Body Superposition RMSD 2-Low Level DP. Forcing the aln of two residues

1-High Level Dynamic Programming Superposition:SAP 3-Rigid Body Superposition 2-Low Level DP. Evaluate Every Pair

1-High Level Dynamic Programming Superposition:SAP Structure Based Sequence Alignment Make a DP on the accumulated traces  Use Traces like a Substitution Matrix

1-Select 967 pairs of sequences in HOMSTRAD 2-Align each pair with T-Coffee and SAP. 3-Compare the Two Alignments Superposition:SAP

1-Select 967 pairs of sequences in HOMSTRAD 2-Align each pair with T-Coffee and SAP. 3-Compare the Two Alignments Superposition:SAP TCdef:58.81% SAP:86.31%

SAPFugue TCdef:58.81% Fugue:61.81% TCdef:58.81% Fugue:86.31%

Sequences and Structures: How Good is The Mixture ???

Our Benchmark: HOM39 -HOMSTRAD: Structure based MSAs that can be used as References. -COMPACT and DEMANDING -HOM39: The 39 Most difficult datasets (percent ID lower than 25).

Our BenchMark: Using HOM39 BENCHMARKING Strategy: -re-align HOM39 without using ALL the structures -Compare the result with the reference

Evaluating 3D-Coffee 1- Can a SINGLE structure Help ?

Seq Vs Struct Thread Evaluation on HOM39 Seq Vs Seq Local Global Using ONE structure with 3D-Coffee HOM39 with ONE Structure per MSA

Evaluating 3D-Coffee 1- Can a SINGLE structure Help ? 2- Does it benefit to ALL the Sequences Is EVERYONE Happier if there is a STAR in the team…

BaliBase HOM39 TC-Fugue  + Remove Provided Structure(s) Comparison

Evaluating 3D-Coffee 1- Can a SINGLE structure Help ? 3- Can We Use Two or More Structures 2-Does it benefit to all the sequences

Seq Vs Struct Fugue Evaluation on Homestrad Seq Vs Seq Local Global Mixing Sequences and Structures with 3D-Coffee HOM39 with TWO Structures/MSA Struct Vs Struct SAP, LSQ

Indirect Improvement Direct Improvement

Evaluating 3D-Coffee 1- Can a SINGLE structure Help ? 4-Relation Accuracy/ N-structures ??? 2-Does it benefit to all the sequences 3-Can we use Two Structures

Seq Vs Struct Fugue Evaluation on Homestrad Seq Vs Seq Local Global Mixing Sequences and Structures with T-Coffee HOM39 with 1-N Structures per MSA Struct Vs Struct SAP

Induced Improvement

Conclusion

-Structures Help BUT NOT SO MUCH

The More Structures The Merrier

Credits Orla O’Sullivan: University College, Cork, Ireland Des Higgins: University College, Cork, Ireland Karsten Suhre: IGS-CNRS, Marseille, France

Conclusion The program is available on request from: