1 Three-Body Delaunay Statistical Potentials of Protein Folding Andrew Leaver-Fay University of North Carolina at Chapel Hill Bala Krishnamoorthy, Alex.

Slides:



Advertisements
Similar presentations
Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)
Advertisements

Determination of Protein Structure. Methods for Determining Structures X-ray crystallography – uses an X-ray diffraction pattern and electron density.
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Rosetta Energy Function Glenn Butterfoss. Rosetta Energy Function Major Classes: 1. Low resolution: Reduced atom representation Simple energy function.
Todd J.Taylor, Iosif I.Vaisman Abstract: A method of protein structural domain assignment using an Ising/Potts-like.
Hydrogen bonds in Rosetta: a phenomonological study Jack Snoeyink Dept. of Computer Science UNC Chapel Hill.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity Nicholas M. Luscombe and Janet M. Thornton JMB (2002)
Mining frequent patterns in protein structures: A study of protease families Dr. Charles Yan CS6890 (Section 001) ST: Bioinformatics The Machine Learning.
Distinguishing Photographic Images and Photorealistic Computer Graphics Using Visual Vocabulary on Local Image Edges Rong Zhang,Rand-Ding Wang, and Tian-Tsong.
Identifying functional residues of proteins from sequence info Using MSA (multiple sequence alignment) - search for remote homologs using HMMs or profiles.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Chapter 7 Probability and Samples: The Distribution of Sample Means
Protein Structures.
A Statistical Geometry Approach to the Study of Protein Structure Majid Masso Bioinformatics and Computational Biology George Mason University.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Protein Tertiary Structure Prediction
Module 2: Structure Based Ph4 Design
Structural alignments of Proteins using by TOPOFIT method Vitkup D., Melamud E., Moult J., Sander C. Completeness in structural genomics. Nature Struct.
Overcoming the Curse of Dimensionality in a Statistical Geometry Based Computational Protein Mutagenesis Majid Masso Bioinformatics and Computational Biology.
PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
Classifier Evaluation Vasileios Hatzivassiloglou University of Texas at Dallas.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Analyzing the Simplicial Decomposition of Spatial Protein Structures Rafael Ördög, Zoltán Szabadka, Vince Grolmusz.
Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Ozgur Ozturk, Ahmet Sacan, Hakan Ferhatosmanoglu, Yusu Wang The Ohio State University LFM-Pro: a tool for mining family-specific sites in protein structure.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
UNC Chapel Hill David A. O’Brien Chain Growing Using Statistical Energy Functions David A. O'Brien Balasubramanian Krishnamoorthy: Jack Snoeyink Alex Tropsha.
Two Main Uses of Statistics: 1)Descriptive : To describe or summarize a collection of data points The data set in hand = the population of interest 2)Inferential.
Cluster Analysis.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Protein Design with Backbone Optimization Brian Kuhlman University of North Carolina at Chapel Hill.
A MULTIBODY ATOMIC STATISTICAL POTENTIAL FOR PREDICTING ENZYME-INHIBITOR BINDING ENERGY Majid Masso Laboratory for Structural Bioinformatics,
PROTEIN FOLDING: H-P Lattice Model 1. Outline: Introduction: What is Protein? Protein Folding Native State Mechanism of Folding Energy Landscape Kinetic.
FlexWeb Nassim Sohaee. FlexWeb 2 Proteins The ability of proteins to change their conformation is important to their function as biological machines.
Polish Infrastructure for Supporting Computational Science in the European Research Space EUROPEAN UNION Examining Protein Folding Process Simulation and.
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Mismatch String Kernals for SVM Protein Classification Christina Leslie, Eleazar Eskin, Jason Weston, William Stafford Noble Presented by Pradeep Anand.
Automated Refinement (distinct from manual building) Two TERMS: E total = E data ( w data ) + E stereochemistry E data describes the difference between.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Protein Structures from A Statistical Perspective Jinfeng Zhang Department of Statistics Florida State University.
We propose an accurate potential which combines useful features HP, HH and PP interactions among the amino acids Sequence based accessibility obtained.
Enzymes SADIA SAYED. Enzymes are proteins  All enzymes are proteins  Strings of amino acids folding up into distinct structures  The properties of.
Modeling Cell Proliferation Activity of Human Interleukin-3 (IL-3) Upon Single Residue Replacements Majid Masso Bioinformatics and Computational Biology.
Structural Bioinformatics Elodie Laine Master BIM-BMC Semester 3, Genomics of Microorganisms, UMR 7238, CNRS-UPMC e-documents:
A new protein-protein docking scoring function based on interface residue properties Reporter: Yu Lun Kuo (D )
Research Overview III Jack Snoeyink UNC Chapel Hill.
BIOINFORMATION A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation - - 王红刚 14S
Results for all features Results for the reduced set of features
Majid Masso School of Systems Biology, George Mason University
Evaluating classifiers for disease gene discovery
Prediction of RNA Binding Protein Using Machine Learning Technique
Categorizing sex and identity from the biological motion of faces
Introduction to Bioinformatics II
Protein Structures.
Generalizations of Markov model to characterize biological sequences
謝孫源 (Sun-Yuan Hsieh) 成功大學 電機資訊學院 資訊工程系
Protein structure prediction
Presentation transcript:

1 Three-Body Delaunay Statistical Potentials of Protein Folding Andrew Leaver-Fay University of North Carolina at Chapel Hill Bala Krishnamoorthy, Alex Tropsha

2 Protein Folding Problem Find the 3-D structure of a protein in nature from its 1-D sequence. –Holy grail of computational biology Generic Solution –Search Algorithm Takes Sequence Produces Decoys –Scoring Function Ranks Decoys

3 Empirical Scoring Functions Philosophy: compare structural properties of decoys to those of known proteins “Two-Body” Potentials –Distribution of distances between amino acids –Frequency of amino-acid contacts Arbitrary cutoff distance defines contact Delaunay-based statistical potentials –“How do four amino acids pack together?” –Alex Tropsha’s Lab: SNAPP Four-Body Potential

4 Delaunay Tessellation Of Proteins Describe each residue’s position by a single point –C-  –Side Chain Centroid Delaunay tessellation gives a simplicial complex –Geometric “nearest neighbor” criterion –Captures a sense of “shielding” in residue interaction Gather statistics on tetrahedra (4-simplicies) –Classify tetrahedra –Convert observed frequencies to scores

5 Classification of Tetrahedra 8,855 ways to classify a tetrahedron by the four amino acids that define it 5 ways to classify a tetrahedron by gaps in primary sequence –e.g., residues 1, 5, 6, & 10 in a tetrahedron share the same gap structure with residues 20, 22, 23, & 43 L V A F I

6 From Statistics To Scores Log-likelihood score for a particular tetrahedron type is log 10 (f ijklp / p ijklp ) P ijklp = C ijkl *f(aa i )*f(aa j )*f(aa k )*f(aa l )*f(psg p ) The score for a decoy is the sum of the log- likelihood scores for each of its tetrahedron

7 Desired Classification Features Amino Acid Types –Backbone and Side-chain distinction, 2 points/residue Primary Sequence Gaps –Gaps of varying lengths, 0, 1, 2-4, 5+  Buriedness –Are these residues exposed to solvent?  Edge Lengths, Tetrahedron Volume  2 o Stucture Self Imposed Sampling Requirement Have 10 times as many tetrahedra in training set as the number of tetrahedra types. Adding classification features to the existing two requires we use a larger training set

8 Facet based Delaunay Potential Sacrifice some higher-order information to gain insight into other structural features –Simultaneously show that higher order information is valuable 1,540 ways to classify a facet by the 3 defining amino acids 3 ways to classify a facet by gaps in the primary sequence 5 ways to classify a facet by its buriedness

9 Buried by Geometry A facet in the Delaunay tessellation may be involved in two tetrahedra (AVL) or in only one (DSG). Def: a facet that appears only once is a “surface facet” Vertices on any surface facet are “surface vertices.” 5 classes of facets by buriedness –Surface facets –Non-surface facets: number of surface vertices (3, 2, 1, or 0) L I V A F P D GS Figure courtesy Alex Tropsha

10 Training Set 1,600 Structures –High Resolution –Low Sequence Identity, < 25% 226K facets observed

11 Decoy Discrimination Well formed, non-native structures –Standard sets available from Decoys’R’Us, –Many potentials have failed the discrimination task on these sets Two Measures of Fitness for a Potential –Rank of Native Structure –Z-Score of Native Structure (NativeScore -  ) /  Compare 4 potentials: –Latest 4-Body Potential –3-Body, no buriedness distinction –3-Body –Combination of 3- and 4-Body Potentials Scores from 3-body come from only the fully buried facets

12 Four-State Reduced Decoy Sets PDB-ID#D’sRankZ-ScrRankZ-ScrRankZ-ScrRankZ-Scr 1ctf r sn cro icb pti rxn Body3bNBD3-body4b + 3b* * fully buried facets only

13 Fisa Decoy Sets PDB-ID#D’sRankZ-ScrRankZ-ScrRankZ-ScrRankZ-Scr 1fc hdd-C cro icb Body3bNBD3-body4b + 3b* * fully buried facets only

14 Lattice SS Fit Decoy Sets PDB-ID#D’sRankZ-ScrRankZ-ScrRankZ-ScrRankZ-Scr 1beo ctf dkt-A* fca nkl pgb trl-A* icb Body3bNBD3-body4b + 3b* * fully buried facets only

15 LMDS Decoy Sets PDB-ID#D’sRankZ-ScrRankZ-ScrRankZ-ScrRankZ-Scr 1b0n-B* bba ctf dtk fc igd shf-A cro ovo pti Body3bNBD3-body4b + 3b* * fully buried facets only

16 Average Performance Across Sets RankZ-scrRankZ-scrRankZ-scrRankZ-scr Body3bNBD3-body4b + 3b* Mean Median Mean Median Mean Median Mean Median Mean( Mean) Mean( Median) 4state Fisa Lat LMDS All * fully buried facets only

17 Dimer “Discrimination” We could not effectively discriminate the native from decoys with either the 3- or 4- body potentials for 3 proteins. On closer examination, we discovered the native structures were incomplete, leaving exposed residues that would be buried in their native multimeric shapes. 1b0n-B1dkt-A1trl-A

18 Average Performance Across Sets RankZ-scrRankZ-scrRankZ-scrRankZ-scr Body3bNBD3-body4b + 3b* Mean( Mean) Mean( Median) All * fully buried facets only

19 Conclusion Buriedness distinctions capture valuable information about protein structure Body potential is the strongest Delaunay potential to date.