JM - 1 Knowledge-based protocols for protein structure prediction: from protein threading to solvent accessibility prediction.

Slides:



Advertisements
Similar presentations
Protein Structure Prediction
Advertisements

Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Protein Tertiary Structure Prediction
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
JM - 1 Systems biology of cell-signaling systems: It's all about protein-protein interactions Jarek Meller Departments of Environmental.
Protein Fold recognition Morten Nielsen, Thomas Nordahl CBS, BioCentrum, DTU.
Thomas Blicher Center for Biological Sequence Analysis
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
1 Protein Structure Prediction Charles Yan. 2 Different Levels of Protein Structures The primary structure is the sequence of residues in the polypeptide.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Protein Structural Prediction. Protein Structure is Hierarchical.
Protein modelling ● Protein structure is the key to understanding protein function ● Protein structure ● Topics in modelling and computational methods.
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
JM - 1 Introduction to Bioinformatics: Lecture VIII Classification and Supervised Learning Jarek Meller Jarek Meller Division.
Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.
JM - 1 Introduction to Bioinformatics: Lecture XVI Global Optimization and Monte Carlo Jarek Meller Jarek Meller Division of Biomedical.
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
COMPARATIVE or HOMOLOGY MODELING
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Protein Structure Prediction. Historical Perspective Protein Folding: From the Levinthal Paradox to Structure Prediction, Barry Honig, 1999 A personal.
JM - 1 Introduction to Bioinformatics: Lecture I An Overview of the Course Jarek Meller Jarek Meller Division of Biomedical Informatics,
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
Representations of Molecular Structure: Bonds Only.
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
JM - 1 Introduction to Bioinformatics: Lecture III Genome Assembly and String Matching Jarek Meller Jarek Meller Division of Biomedical.
Secondary structure prediction
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Applied Bioinformatics Week 12. Bioinformatics & Functional Proteomics How to classify proteins into functional classes? How to compare one proteome with.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Structure prediction: Homology modeling
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
JM - 1 Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction Jarek Meller Jarek Meller Division.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Predicting Protein Structure: Comparative Modeling (homology modeling)
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Modelling protein tertiary structure Ram Samudrala University of Washington.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Programme Last week’s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues Summary.
Russell Group, Protein Evolution _________ ____ Rob Russell Cell Networks University of Heidelberg Interactions and Modules: the how and why of molecular.
Motif Search and RNA Structure Prediction Lesson 9.
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.
Protein dynamics Folding/unfolding dynamics
Protein dynamics Folding/unfolding dynamics
Protein Structure Prediction
Protein Structures.
Rosetta: De Novo determination of protein structure
Protein structure prediction.
Presentation transcript:

JM Knowledge-based protocols for protein structure prediction: from protein threading to solvent accessibility prediction and back to protein structure prediction by threading Jarek Meller Jarek Meller Division of Biomedical Informatics, Children’s Hospital Research Foundation & Department of Biomedical Engineering, UC

JM - Outline of the talk Protein structure and complexity of conformational search: from de novo structure prediction to similarity based methods Protein structure prediction by sequence-to-structure matching (threading and fold recognition) Secondary structure and solvent accessibility prediction Improving fold recognition and de novo simulations with accurate solvent accessibility prediction A story from our backyard: predicting interaction between pVHL and RNA Pol II

JM - Polypeptide chains: backbone and side-chains C-ter N-ter

JM - Distinct chemical nature of amino acid side-chains ARG PHE GLU VAL CYS C-ter N-ter

JM - Hydrogen bonds and secondary structures  helix  strand

JM - Tertiary structure and long range contacts: annexin

JM - Domains, interactions, complexes: VHL

JM - Multiple alignment and PSSM

JM - Protein folding problem The protein folding problem consists of predicting three-dimensional structure of a protein from its amino acid sequence Hierarchical organization of protein structures helps to break the problem into secondary structure, tertiary structure and protein-protein interaction predictions Computational approaches for protein structure prediction: similarity based and de novo methods

JM - Ab initio (or de novo) folding simulations Ab initio folding simulations consist of conformational search with an empirical scoring function (“force field”) to be maximized (minimized) Computational bottleneck: exponential search space and sampling problem (global optimization!) Fundamental problem: inaccuracy of empirical force fields and scoring functions (folding potentials) Importance of mixed protocols, such as Rosetta by D. Baker and colleagues (Monte Carlo fragment assembly)

JM - Similarity based approaches to structure prediction: from sequence alignment to fold recognition High level of redundancy in biology: sequence similarity is often sufficient to use the “guilt by association” rule: if similar sequence then similar structure and function Multiple alignments and family profiles can detect evolutionary relatedness with much lower sequence similarity, hard to detect with pairwise sequence alignments: Psi-BLAST by S. Altschul et. al. Many structures are already known (see PDB) and one can match sequences directly with structures to enhance structure recognition: fold recognition (not for new folds!) For both, fold recognition and de novo simulation, prediction of intermediate attributes such secondary structure or solvent accessibility helps to achieve better sensitivity and specificity

JM - Why “fold recognition”? Divergent (common ancestor) vs. convergent (no ancestor) evolution PDB: virtually all proteins with 30% seq. identity have similar structures, however most of the similar structures share only up to 10% of seq. identity !

JM - Going beyond sequence similarity: threading and fold recognition When sequence similarity is not detectable use a library of known structures to match your query with target structures. One needs a scoring (“energy”) function that measures compatibility between sequences and structures.

JM - Scoring alternative conformations with empirical (knowledge-based) folding potentials misfolded native E Ideally, each misfolded structure should have an energy higher than the native energy, i.e. : E misfolded - E native > 0

JM - Simple contact model for protein structure prediction Each amino acid is represented by a point in 3D space and two amino acids are said to be in contact if their distance is smaller than a cutoff distance, e.g. 7 [Ang].

JM - Sequence-to-structure matching with contact models Generalized string matching problem: aligning a string of amino acids against a string of “structural sites” characterized by other residues in contact Finding an optimal alignment with gaps using inter- residue pairwise models: E =  k< l  k l, is NP-hard because of the non-local character of scores at a given structural site (identity of the interaction partners may change depending on location of gaps in the alignment) R.H. Lathrop, Protein Eng. 7 (1994)

JM - Hydrophobic contact model and sequence-to-structure alignment HPHPP - Solutions to this yet another instance of the global optimization problem: a)Heuristic (e.g. frozen environment approximation) b)“Profile” or local scoring functions (folding potentials)

JM - Implementing threading protocols: LOOPP LOOPP in CAFASP4 About average for all fold recognition targets (missing some easy targets, recognized by PsiBlast) Third best server in the category of difficult targets Best predictions among the servers for 3 difficult targets Further improvements necessary to make the predictions more robust Joint work with Ron Elber

JM - Using sequence similarity, predicted secondary structures and contact potentials: fold recognition protocols In practice fold recognition methods are often mixtures of sequence matching and threading, with compatibility between a sequence and a structure measured by: i)sequence alignment ii)contact potentials iii)predicted secondary structures (compared to the secondary structure of a template)

JM - Predicting 1D protein profiles from sequences: secondary structures and solvent accessibility SABLE server POLYVIEW server a) Multiple alignment and family profiles improve prediction of local structural propensities b) Use of advanced machine learning techniques, such as Neural Networks or Support Vector Machines improves results as well B. Rost and C. Sander were first to achieve more than 70% accuracy in three state (H, E, C) classification, applying a) and b).

JM - Predicting 1D protein profiles from sequences: secondary structures and solvent accessibility PDB Sable PsiPred Prof Relative solvent accessibility prediction is typically cast as a classification problem

JM - Variability in surface exposure for structurally equivalent residues does not support classification

JM - Neural Network-based regression for relative solvent accessibility (RSA) prediction

JM - Accuracy of predictions depends on the level of surface exposure: error measures and fine tuning

JM - Overall accuracy of different regression models S163 cc / MAE / RMSE S156 cc / MAE / RMSE S135 cc / MAE / RMSE S149 cc / MAE / RMSE SABLE-a0.65 / 15.6 / / 15.9 / / 15.3 / / 16.0 / 21.0 SABLE-wa0.66 / 15.5 / / 15.7 / / 15.3 / / 15.8 / 21.4 LS0.63 / 16.3 / / 16.5 / / 15.9 / / 16.5 / 21.2 SVR10.62 / 15.9 / / 16.1 / / 15.6 / / 16.2 / 21.5 SVR20.62 / 16.6 / / 16.7 / / 16.4 / / 16.9 / 23.0 Non-linear models: Rafal Adamczak; Linear models: Michael Wagner; Datasets and servers: Aleksey Porollo and Rafal Adamczak

JM - Regression vs. two-class classification MethodS163S156S135S149 ACCpro server 25%70.4% / % / % / % / 0.43 SABLE-wa BS6271.7% / % / % / 0.44 SABLE-wa binary71.4% / % / % / % / 0.44 SABLE-2c 25%76.7% / % / % / % / 0.53 SABLE-wa77.3% / % / % / % / 0.53

JM - Predicting transmembrane domains

JM - Predicting transmembrane domains

JM - Now back to threading and folding simulations Applications in filtering out incorrect models in both de novo simulations and fold recognition Domain structure prediction, protein-protein interactions Better sensitivity in finding correct matches in threading: one story as an example

JM - Modeling the RNA Polymerase II Interaction with the von Hippel-Lindau Protein Modeling the RNA Polymerase II Interaction with the von Hippel-Lindau Protein: from experimental clues to structure prediction and back to experiment. Jarek Meller Children’s Hospital Research Foundation Joint work with M. Czyzyk-Krzeska and her group, College of Medicine, University of Cincinnati

JM - A play of life (script and beyond): Stage: protein society or proteosome Rules of life: proteins are assembled and degraded: nursery (ribosome) vs. police and gillotine (ubiquitination and proteasome) Social order: one look at the equilibrium in the system: Holy scriptures (DNA) Army of scribers (middle class proteins) Temple priests (selected proteins) Transcription Translation “I think we need to adjust the interpretation of the script … “ (regulation of replication and transcription) Law and oppression

JM - Hypoxia-induced stabilization of Hif-1a Graphics from R.K. Bruick and S.L.McKnight, Science 295

JM - Experimental clues:  Observation: correlation between pVHL levels and transcript elongation of the tyrosine hydroxylase gene (M. Czyzyk-Krzeska)  Could pVHL influence the transcription by interaction with elongation complex co-factors ?  Where to start? Experiment without a model is usually not a very good idea. Could in silico study and bioinformatics help?

36 Searching for pVHL interaction targets:  Hif-1a ODD interacts with pVHL – other pVHL targets should have domains structurally resembling that of Hif1-a ODD  Use the Hif-1a ODD sequence as a query in order to find other structures that are compatible with it Rpb1 Rpb6 Hif-1a ODD Pro-OH pVHL

JM - RNA Polymerase II in the act of transcription, RNA Polymerase II in the act of transcription, Gnatt, Kornberg et. al., Science 292 (2001)

JM - C-ter Rpb1 Rpb6 The C-terminal of Rpb1 and Rpb6 form a pocket on the surface of RNA Polymerase II complex. The C-terminal of Rpb1 and Rpb6 form a pocket on the surface of RNA Polymerase II complex. C-ter of Rpb1 and Rpb6 represented by cartoons.

JM - Could the Hif ODD fragment resemble C-terminal fragment of RNA Polymerase II ?  A motif similar to that of ODD found, but that could occur by chance. We used sequence alignments and threading to measure similarity between these fragments.  Sequences about 25% identical for a short fragment of about 50 aa – not significant.  Predicted secondary structures similar.  Suggestive but still not significant similarity.  However, a weak match between the adjacent Rpb6 and the consecutive part of the Hif-1a sequence was observed in threading (3D-PSSM, Loopp).  Prediction: the ODD shares 3D structure with C-ter fragment of Rpb1 and Rpb6.  Implication: VHL is likely to interact with Rpb1/Rpb6!

JM - Experimental results (MCK): RNA Pol II peptides suggested by computational analysis do bind to pVHL and this binding is controlled by hydroxylation of the critical PRO residue. Co-immunoprecipitations of hyper- phosphorylated RNA Pol II and pVHL observed: interaction confirmed. Ubiquitination of Rpb1 confirmed. Biological meaning?