1 Three-Body Delaunay Statistical Potentials of Protein Folding Andrew Leaver-Fay University of North Carolina at Chapel Hill Bala Krishnamoorthy, Alex Tropsha
2 Protein Folding Problem Find the 3-D structure of a protein in nature from its 1-D sequence. –Holy grail of computational biology Generic Solution –Search Algorithm Takes Sequence Produces Decoys –Scoring Function Ranks Decoys
3 Empirical Scoring Functions Philosophy: compare structural properties of decoys to those of known proteins “Two-Body” Potentials –Distribution of distances between amino acids –Frequency of amino-acid contacts Arbitrary cutoff distance defines contact Delaunay-based statistical potentials –“How do four amino acids pack together?” –Alex Tropsha’s Lab: SNAPP Four-Body Potential
4 Delaunay Tessellation Of Proteins Describe each residue’s position by a single point –C- –Side Chain Centroid Delaunay tessellation gives a simplicial complex –Geometric “nearest neighbor” criterion –Captures a sense of “shielding” in residue interaction Gather statistics on tetrahedra (4-simplicies) –Classify tetrahedra –Convert observed frequencies to scores
5 Classification of Tetrahedra 8,855 ways to classify a tetrahedron by the four amino acids that define it 5 ways to classify a tetrahedron by gaps in primary sequence –e.g., residues 1, 5, 6, & 10 in a tetrahedron share the same gap structure with residues 20, 22, 23, & 43 L V A F I
6 From Statistics To Scores Log-likelihood score for a particular tetrahedron type is log 10 (f ijklp / p ijklp ) P ijklp = C ijkl *f(aa i )*f(aa j )*f(aa k )*f(aa l )*f(psg p ) The score for a decoy is the sum of the log- likelihood scores for each of its tetrahedron
7 Desired Classification Features Amino Acid Types –Backbone and Side-chain distinction, 2 points/residue Primary Sequence Gaps –Gaps of varying lengths, 0, 1, 2-4, 5+ Buriedness –Are these residues exposed to solvent? Edge Lengths, Tetrahedron Volume 2 o Stucture Self Imposed Sampling Requirement Have 10 times as many tetrahedra in training set as the number of tetrahedra types. Adding classification features to the existing two requires we use a larger training set
8 Facet based Delaunay Potential Sacrifice some higher-order information to gain insight into other structural features –Simultaneously show that higher order information is valuable 1,540 ways to classify a facet by the 3 defining amino acids 3 ways to classify a facet by gaps in the primary sequence 5 ways to classify a facet by its buriedness
9 Buried by Geometry A facet in the Delaunay tessellation may be involved in two tetrahedra (AVL) or in only one (DSG). Def: a facet that appears only once is a “surface facet” Vertices on any surface facet are “surface vertices.” 5 classes of facets by buriedness –Surface facets –Non-surface facets: number of surface vertices (3, 2, 1, or 0) L I V A F P D GS Figure courtesy Alex Tropsha
10 Training Set 1,600 Structures –High Resolution –Low Sequence Identity, < 25% 226K facets observed
11 Decoy Discrimination Well formed, non-native structures –Standard sets available from Decoys’R’Us, –Many potentials have failed the discrimination task on these sets Two Measures of Fitness for a Potential –Rank of Native Structure –Z-Score of Native Structure (NativeScore - ) / Compare 4 potentials: –Latest 4-Body Potential –3-Body, no buriedness distinction –3-Body –Combination of 3- and 4-Body Potentials Scores from 3-body come from only the fully buried facets
12 Four-State Reduced Decoy Sets PDB-ID#D’sRankZ-ScrRankZ-ScrRankZ-ScrRankZ-Scr 1ctf r sn cro icb pti rxn Body3bNBD3-body4b + 3b* * fully buried facets only
13 Fisa Decoy Sets PDB-ID#D’sRankZ-ScrRankZ-ScrRankZ-ScrRankZ-Scr 1fc hdd-C cro icb Body3bNBD3-body4b + 3b* * fully buried facets only
14 Lattice SS Fit Decoy Sets PDB-ID#D’sRankZ-ScrRankZ-ScrRankZ-ScrRankZ-Scr 1beo ctf dkt-A* fca nkl pgb trl-A* icb Body3bNBD3-body4b + 3b* * fully buried facets only
15 LMDS Decoy Sets PDB-ID#D’sRankZ-ScrRankZ-ScrRankZ-ScrRankZ-Scr 1b0n-B* bba ctf dtk fc igd shf-A cro ovo pti Body3bNBD3-body4b + 3b* * fully buried facets only
16 Average Performance Across Sets RankZ-scrRankZ-scrRankZ-scrRankZ-scr Body3bNBD3-body4b + 3b* Mean Median Mean Median Mean Median Mean Median Mean( Mean) Mean( Median) 4state Fisa Lat LMDS All * fully buried facets only
17 Dimer “Discrimination” We could not effectively discriminate the native from decoys with either the 3- or 4- body potentials for 3 proteins. On closer examination, we discovered the native structures were incomplete, leaving exposed residues that would be buried in their native multimeric shapes. 1b0n-B1dkt-A1trl-A
18 Average Performance Across Sets RankZ-scrRankZ-scrRankZ-scrRankZ-scr Body3bNBD3-body4b + 3b* Mean( Mean) Mean( Median) All * fully buried facets only
19 Conclusion Buriedness distinctions capture valuable information about protein structure Body potential is the strongest Delaunay potential to date.