Integrating Biological Information In Multiple Sequence Alignments Confronting Bits and Pieces of Information Cédric Notredame CNRS-Marseille, France www.tcoffee.org.

Slides:



Advertisements
Similar presentations
Multiple Sequence Alignment (MSA) I519 Introduction to Bioinformatics, Fall 2012.
Advertisements

Clustal Ω for Protein Multiple Sequence Alignment Des Higgins (Conway Institute, University College Dublin, Ireland), “Clustal Omega for Protein Multiple.
COFFEE: an objective function for multiple sequence alignments
Structural bioinformatics
BNFO 602 Multiple sequence alignment Usman Roshan.
Tertiary protein structure viewing and prediction July 1, 2009 Learning objectives- Learn how to manipulate protein structures with Deep View software.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Multiple Alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis
. Class 5: Multiple Sequence Alignment. Multiple sequence alignment VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG.
11 Ch6 multiple sequence alignment methods 1 Biologists produce high quality multiple sequence alignment by hand using knowledge of protein sequence evolution.
Bioinformatics and Phylogenetic Analysis
Expected accuracy sequence alignment
The 7 steps of Homology modeling. 1: Template recognition and initial alignment.
BNFO 602, Lecture 3 Usman Roshan Some of the slides are based upon material by David Wishart of University.
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Lecture 6 – 07/01/08 Multiple sequence alignment 2 Sequence analysis 2007 Optimizing.
Multiple Sequence Alignment. Sequence Families Most sequences are members of large families, some with the same function and others with different functions.
Practical multiple sequence algorithms Sushmita Roy BMI/CS 576 Sushmita Roy Sep 23rd, 2014.
Multiple Sequence Alignments
CISC667, F05, Lec8, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Multiple Sequence Alignment Scoring Dynamic Programming algorithms Heuristic algorithms.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Multiple Sequence Alignment. Terminology n Motif: the biological object one attempts to model - a functional or structural domain, active site, phosphorylation.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
3D-COFFEE Mixing Sequences and Structures Cédric Notredame.
Multiple sequence alignment
Multiple sequence alignment Monday, December 6, 2010 Bioinformatics J. Pevsner
Biology 4900 Biocomputing.
Multiple Sequence Alignment
An Introduction to Multiple Sequence Alignments Cédric Notredame.
Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.
ZORRO : A masking program for incorporating Alignment Accuracy in Phylogenetic Inference Sourav Chatterji Martin Wu.
Multiple sequence alignment Tuesday, Feb Suggested installation for the following tools on your own computer: ClustalX, Mega4, GeneDoc; treeview.
© Wiley Publishing All Rights Reserved. Building Multiple- Sequence Alignments.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Using the T-Coffee Multiple Sequence Alignment Package I - Overview Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.
Getting the best out of multiple sequence alignment methods in the genomic era Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Input Sensitive Algorithms for Multiple Sequence Alignment Pankaj Yonatan University Rachel
CrossWA: A new approach of combining pairwise and three-sequence alignments to improve the accuracy for highly divergent sequence alignment Che-Lun Hung,
Multiple sequence alignment
Cédric Notredame (07/11/2015) Recent Progress in Multiple Sequence Alignments: A Survey Cédric Notredame.
Classifying MSA Packages Multiple Sequence Alignments in the Genome Era Cédric Notredame Information Génétique et Structurale CNRS-Marseille, France.
T-Coffee tutorial ACGT Retreat 2012 Jean-François Taly, Ionas Erb and Cedrik Magis.
Techniques for Protein Sequence Alignment and Database Searching (part2) G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.
Burkhard Morgenstern Institut für Mikrobiologie und Genetik Molekulare Evolution und Rekonstruktion von phylogenetischen Bäumen WS 2006/2007.
Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique CN+LF An introduction to multiple alignments © Cédric Notredame.
Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Multiple alignments, PATTERNS, PSI-BLAST.
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
T-COFFEE, a novel method for Multiple Sequence Alignments Cédric Notredame.
Expected accuracy sequence alignment Usman Roshan.
MGM workshop. 19 Oct 2010 Some frequently-used Bioinformatics Tools Konstantinos Mavrommatis Prokaryotic Superprogram.
1 Multiple Sequence Alignment(MSA). 2 Multiple Alignment Number of sequences >2 Global alignment Seek an alignment that maximizes score.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Protein multiple sequence alignment by hybrid bio-inspired algorithms Vincenzo Cutello, Giuseppe Nicosia*, Mario Pavone and Igor Prizzi Nucleic Acids Research,
最佳的多重序列比對方法針對基因組 領域 Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.
T-COFFEE, a novel method for combining biological information Cédric Notredame.
Biology 224 Instructor: Tom Peavy October 18 & 20, Multiple Sequence.
Pairwise alignment Now we know how to do it: How do we get a multiple alignment (three or more sequences)? Multiple alignment: much greater combinatorial.
Multiple Sequence Alignment
ncRNA Multiple Alignments with R-Coffee
Recent Progress in Multiple Sequence Alignments: A Survey
An Introduction to Multiple Sequence Alignments
An Introduction to Multiple Sequence Alignments
Multiply Aligning RNA Sequences
Genes to Trees Daniel Ayres and Adam Bazinet
Olivier Poirot, Eamonn O'Toole and Cedric Notredame
SEQUENCE ALIGNMENT Presented By:- Fahim Mahmud Khan : Md. Shihab Sharar : Bioinformatics Presentation Presents.
T-Coffee: What’s New in The Grinder
Presentation transcript:

Integrating Biological Information In Multiple Sequence Alignments Confronting Bits and Pieces of Information Cédric Notredame CNRS-Marseille, France

Building and Using Models Angstrom

Computing the Correct Alignement is a Complicated Problem

T-Coffee and Concistency…

Too Many Methods for ONE Alignment M-Coffee

Combining Many MSAs into ONE MUSCLE MAFFT ClustalW ??????? T-Coffee

Comparing Methods MAFFT

Going Further

Place your Bets…

Where to Trust Your Alignments Most Methods Agree Most Methods Disagree

When Sequences Are not Enough 3D-Coffee and Expresso

3D-Coffee: Combining Sequences and Structures Within Multiple Sequence Alignments

Expresso: Finding the Right Structure Template based Alignment of the Source Sequences Template-Source Alignment Why Not Using Structure Based Alignments

Expresso: Finding the Right Structure Sources Templates Template based Alignment of the Source Sequences Template-Source Alignment Library BLAST SAP Template Alignment Source Template Alignment Remove Templates Templates

3D-Coffee: Combining Sequences and Structures Within Multiple Sequence Alignments

Improving The Evaluation

More Than Structure based Alignments Structural Correctness Is Only the Easy Side of the Coin. In practice MSA are intermediate models used to generate other models: DataModel TypeBenchmark HomologyProfileYes EvolutionTreesNo Structure3D-StructureCASP FunctionAnnotationNo

Conclusion Template based Multiple Sequence Alignments Projecting any relevant information onto the sequences Using this Information Need for new evaluation procedures Functional Analysis Phylogenetic Analysis Homology Search (Profiles) Homology Modelling Integrating data  Making sure your bits of data can fight with one another

Fabrice Armougom (CNRS, FR) Sebastien Moretti (CNRS, FR) Olivier Poirot (CNRS, FR) Frederic Reinier (CRS4, IT) Karsten Suhre (CNRS, FR) Vladimir Saudek (Sanofi-Aventis, FR) Des Higgins (UCD, IE)h Orla O’Sullivan (UCD, IE) Iain Wallace (UCD, IE) Victor Jongeneel (SIB/VitalIT, CH) Bruno Nyfler (VitalIT, CH) Roger Hersch (EPFL, CH) Pierre Dumas (EPFL, CH) Basile Schaeli (EPFL, CH)

Turning Data into Models Data Columbus, considered that the landmass occupied 225°, leaving only 135° of water (Marinus of Tyre, 70 AD).Marinus of Tyre Columbus believed that 1° represented only 56 miles (Alfraganus, XIth century)Alfraganus He knew there was an island named Japan off the cost of China… Model Circumference of the Earth as 25,255 km at most, Canary Island to Japan : 3,700 km (Reality: 12,000 km.)

T-Coffee Results Validation Using BaliBase

Structures Vs Sequences

ClustalW: The Progressive Algorithm

The More Structures The Merrier Average Improvement over T-Coffee Struc/Seq Ratio

T-Coffee and Concistency… Each Library Line is a Soft Constraint (a wish) You can’t satisfy them all You must satisfy as many as possible (The easy ones)

Concistency Based Algorithms: T-Coffee Gotoh (1990) – Iterative strategy using concistency Martin Vingron (1991) – Dot Matrices Multiplications – Accurate but too stringeant Dialign (1996, Morgenstern) – Concistency – Agglomerative Assembly T-Coffee (2000, Notredame) – Concistency – Progressive algorithm ProbCons (2004, Do) – T-Coffee with a Bayesian Treatment

The Right Mixt of Methods

What’s in a Multiple Alignment? Structural Criteria – Residues are arranged so that those playing a similar role end up in the same column. Evolutive Criteria – Residues are arranged so that those having the same ancestor end up in the same column. Similarity Criteria – As many similar residues as possible in the same column

How Do We Perform In The Twilight Zone? Concistency Based Methods Have an Edge Hard to tell Methods Apart Sequence Alignment is NOT solved

T-Coffee and Concistency…

Three Types of Algorithms Progressive: ClustalW Iterative: Muscle Concistency Based: T-Coffee and Probcons

3D-Coffee: Combining Sequences and Structures Within Multiple Sequence Alignments

What’s in a Multiple Alignment? The MSA contains what you put inside: – Structural Similarity – Evolutive Similarity – Sequence Similarity You can view your MSA as: – A record of evolution – A summary of a protein family – A collection of experiments made for you by Nature…