Nadia Léonard Unité de Recherche en Biologie Moléculaire F.U.N.D.P. Developing a reliable methodology to align a sequence of known structure and a sequence.

Slides:



Advertisements
Similar presentations
Protein Structure Prediction using ROSETTA
Advertisements

Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生資所 g 詹濠先.
Protein Tertiary Structure Prediction
Structural bioinformatics
Protein Threading Optimization Using Consensus Homology Modeling Maliha Sarwat ( ), Tasmin Tamanna Haque ( ) Department of Computer Science.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Fold Recognition Ole Lund, Assistant professor, CBS.
Contact Lens: Evaluating Protein Structure by Contacts Contact Lens: Evaluating Protein Structure by Contacts RMSD vs. Contact Lens Root Mean Square Distance.
Protein Fold recognition Morten Nielsen, Thomas Nordahl CBS, BioCentrum, DTU.
Homology modelling ? X-ray ? NMR ?. Homology Modelling !
Fold Recognition Ole Lund, Associate professor, CBS.
Protein Fold recognition
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
The Protein Data Bank (PDB)
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Results Functional shape of original HSSP-curve adequate –But: A threshold of 25% not reasonable for an alignment length below residues Above an.
1 Protein Structure Prediction Charles Yan. 2 Different Levels of Protein Structures The primary structure is the sequence of residues in the polypeptide.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Modelling Workshop - Some Relevant Questions Prof. David Jones University College London Where are we now? Where are we going? Where should.
Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST.
Introduction to Bioinformatics - Tutorial no. 8 Protein Prediction: - PROSITE - Pfam - SCOP - TOPITS - genThreader.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Current Status of Homology Modeling Using MCSG Structures 319 MCSG structures in PDB have over 400,000 sequence homologues. These structures represent.
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Gapped BLAST and PSI-BLAST : a new generation of protein database search programs Team2 邱冠儒 黃尹柔 田耕豪 蕭逸嫻 謝朝茂 莊閔傑 2014/05/12 1.
COMPARATIVE or HOMOLOGY MODELING
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Protein Structure Prediction. Historical Perspective Protein Folding: From the Levinthal Paradox to Structure Prediction, Barry Honig, 1999 A personal.
Modelling binding site with 3DLigandSite Mark Wass
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches1 By Jayakumar Rudhrasenan S Primary Supervisor: Prof. Heiko Schroder.
Representations of Molecular Structure: Bonds Only.
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
Structure prediction: Homology modeling
Bioinformatics how to … use publicly available free tools to predict protein structure by comparative modeling.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Predicting Protein Structure: Comparative Modeling (homology modeling)
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Modelling protein tertiary structure Ram Samudrala University of Washington.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Protein Homologue Clustering and Molecular Modeling L. Wang.
Homology Modeling 原理、流程,還有如何用該工具去預測三級結構 Lu Chih-Hao 1 1.
Blast 2.0 Details The Filter Option: –process of hiding regions of (nucleic acid or amino acid) sequence having characteristics.
Guidelines for sequence reports. Outline Summary Results & Discussion –Sequence identification –Function assignment –Fold assignment –Identification of.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
Modelling genome structure and function Ram Samudrala University of Washington.
Protein Tertiary Structure Prediction Structural Bioinformatics.
The Biologist’s Wishlist A complete and accurate set of all genes and their genomic positions A set of all the transcripts produced by each gene The location.
3.3b1 Protein Structure Threading (Fold recognition) Boris Steipe University of Toronto (Slides evolved from original material.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
BIOINFORMATION A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation - - 王红刚 14S
Automation System For Checking Protein Prediction
Bioinformatics how to …
Prediction of Protein Structure and Function on a Proteomic Scale
Protein dynamics Folding/unfolding dynamics
Protein Folding and Protein Threading
Prediction of protein structure
Marrying structure and genomics
Protein structure prediction.
Bioinformatics Lecture 2 By: Dr. Mehdi Mansouri
Presentation transcript:

Nadia Léonard Unité de Recherche en Biologie Moléculaire F.U.N.D.P. Developing a reliable methodology to align a sequence of known structure and a sequence with low homology, to model it

Introduction 3D Structure : information to understand function to plan directed mutagenesis Number of known structures (8000) smaller than known sequences (500000). Experimental techniques : long and expensive Alternative: modeling Homology modeling : two homologues adopt the same structure

Pairwise alignment: most features well predicted multiple alignment Twilight zone Midnight zone fold recognition (not very reliable) Homology modeling (reliable) Not homologous BUT proteins of different sequences can adopt the same structure %id. Consensus of alignments, some features well predicted

Sequence alignment is the critical step for homology modeling Below 30% of identities, there is no automatic method which allows reliable protein modeling

Aim of our work to propose a reliable alignment method for proteins sharing a small percentage of identities with their template (<30%)

General strategy for homology modeling General strategy for homology modeling Search databanks (PSI-BLAST) Multiple alignment of sequences target-template alignment Modeling Theoretical model evaluation Comparison model to real structure PDB template Critical step

Our methodology 1. Target selection : PDB proteins of which template shares between10 and 30 % of identities (ALIGN) 2. Improvement of sequence-structure alignment Building of 3 alignments 2 from our method (consensus 1 and 2) pairwise alignment PSI-BLAST (best alignment method for Twilight Zone proteins) 3. Homology modeling from each target-template alignment 4. evaluation :geometrical features of the models 5. Comparison of each model to the real structure

Our approach consists in building consensus of several alignment programs Multiple alignment Target template Several programs Several programs Multiple alignment

Our approach consists in building consensus of several alignment programs Multiple alignment Targettemplate Pairwise alignment Several programs Several pairwise alignment Multiple alignment Pairwise alignment

Our approach consists in building consensus of several alignment programs Multiple alignment Target template Pairwise alignment Several programs Several pairwise alignments consensus consensus Multiple alignment Pairwise alignment Consensus building

Multiple alignments (8 alignements) multiple alignments (12 alignments) 13 pairwise alignments Consensus 2 8 pairwise alignments Consensus 1 pairwise alignment PSI-BLAST Databank searching PSI-BLAST Model PSI-BLAST Model 1 Model 2 1) Alignment and modeling

2) Comparison of models to real structure global RMSD between model and structure after superposition local RMSD :percentage of well predicted residues Lower the distance, closer the model from the real structure.Lower the distance, closer the model from the real structure. A wrong modeled region can dramatically increase the global RMSD.

3pte: D-alanyl- D- alanine carboxypeptidase de Streptomyces sp R161 Mod 2 PSI-BLAST Real structure

Results 9 proteins have been modelled. We can distinguish: 3 proteins of the midnight zone (<20% id.) 6 proteins of the twilight zone (20-30%)

Comparison of models to the real structure Midnight Zone proteins (<20% id) For all methods (models 1, 2, PSI), very bad results: most of the residues have been badly modeled. Actually, no reliable alignment method exists below 20%. Our method (models 1 et 2) can not lower this threshold. Modeling of these 3 proteins confirms the limits Modeling of these 3 proteins confirms the limits of alignment methods below 20%.

Twilight Zone proteins (20-30% id) global and local RMS : most accurate models (4/6 et 5/6) come from our method (consensus 1 and 2). In general, model 2 gives better results than model 1 and model PSI-BLAST. It is better to use many alignment programs. models build from our methodology seem to be better than PSI-BLAST models.

Comparison to CASP (Critical Assessment of techniques for protein Structure Prediction) modeling of proteins for which structure is unknown by the entrants (revealed after competition) comparison to the real structure (global RMS) The best CASP ’s models are taken as reference

Conclusions Limits of our method are defined below 20% of identities. Our alignment method appears to be better than PSI-BLAST (above 20% id.) Our results are comparable to the best CASP ’s performances (cfr. graph) consensus for sequence alignment has a future for homology modeling of Twilight Zone proteins.

Perspectives (1) Test our approach on a large set of proteins improve our method: giving more weight to better alignment programs increasing the number of alignment programs using several templates using SSP and fold recognition

Perspectives (2) Evaluate the confidence of regions predicted by a lot of programs take part in CASP competition Automate : expert system (PHD thesis)

61 1d2f MHGVFGYSRW KNDE-FLAAI AHWFSTQHYT AIDSQTVVYG PSVIYMVSEL IRQWSETGEG consensus1 AQGKTKYAPP AGIPELREAL AEKFRRENGL SVTEEETIVT VGGKQALFNL FQAILDPGDE score consensus2 AQGKTKYAPP AGIPELREAL AEKFRRENGL SVTPEETIVT VGGKQALFNL FQAILDPGDE score d2f VVIHTPAYDA FYKAIEGNQR TVMPVALEKQ ADGWFCDMGK LEAVLAKPEC KIMLLCSPQN consensus1 VIVLSPYWVS YPEMVRFAGG VVVEVETL R----R-T KALVVNSPNN score consensus2 VIVLSPYWVS YPEMVRFAGG VVVEVETL-P EEGFVPD-PE RVRRAITPRT KALVVNSPNN score d2f PTGKVWTCDE LEIMADLCER HGVRVISDEI HMDMVWGEQP HIPWSNVARG DWALLTSGSK consensus1 PTGAVYPKEV LEALARLAVE HDFYLVSDEI YEHLLYEG-E HFSPGRVAPE HTLTVNGAAK score consensus2 PTGAVYPKEV LEALARLAVE HDFYLVSDEI YEHLLYEGEH FSPGRVA-PE HTLTVNGAAK score

1nec

1d2f modèle 2