Download presentation
Presentation is loading. Please wait.
Published byShanna Dixon Modified over 8 years ago
1
1 Three-Body Delaunay Statistical Potentials of Protein Folding Andrew Leaver-Fay University of North Carolina at Chapel Hill Bala Krishnamoorthy, Alex Tropsha
2
2 Protein Folding Problem Find the 3-D structure of a protein in nature from its 1-D sequence. –Holy grail of computational biology Generic Solution –Search Algorithm Takes Sequence Produces Decoys –Scoring Function Ranks Decoys
3
3 Empirical Scoring Functions Philosophy: compare structural properties of decoys to those of known proteins “Two-Body” Potentials –Distribution of distances between amino acids –Frequency of amino-acid contacts Arbitrary cutoff distance defines contact Delaunay-based statistical potentials –“How do four amino acids pack together?” –Alex Tropsha’s Lab: SNAPP Four-Body Potential
4
4 Delaunay Tessellation Of Proteins Describe each residue’s position by a single point –C- –Side Chain Centroid Delaunay tessellation gives a simplicial complex –Geometric “nearest neighbor” criterion –Captures a sense of “shielding” in residue interaction Gather statistics on tetrahedra (4-simplicies) –Classify tetrahedra –Convert observed frequencies to scores
5
5 Classification of Tetrahedra 8,855 ways to classify a tetrahedron by the four amino acids that define it 5 ways to classify a tetrahedron by gaps in primary sequence –e.g., residues 1, 5, 6, & 10 in a tetrahedron share the same gap structure with residues 20, 22, 23, & 43 L V A F I
6
6 From Statistics To Scores Log-likelihood score for a particular tetrahedron type is log 10 (f ijklp / p ijklp ) P ijklp = C ijkl *f(aa i )*f(aa j )*f(aa k )*f(aa l )*f(psg p ) The score for a decoy is the sum of the log- likelihood scores for each of its tetrahedron
7
7 Desired Classification Features Amino Acid Types –Backbone and Side-chain distinction, 2 points/residue Primary Sequence Gaps –Gaps of varying lengths, 0, 1, 2-4, 5+ Buriedness –Are these residues exposed to solvent? Edge Lengths, Tetrahedron Volume 2 o Stucture Self Imposed Sampling Requirement Have 10 times as many tetrahedra in training set as the number of tetrahedra types. Adding classification features to the existing two requires we use a larger training set
8
8 Facet based Delaunay Potential Sacrifice some higher-order information to gain insight into other structural features –Simultaneously show that higher order information is valuable 1,540 ways to classify a facet by the 3 defining amino acids 3 ways to classify a facet by gaps in the primary sequence 5 ways to classify a facet by its buriedness
9
9 Buried by Geometry A facet in the Delaunay tessellation may be involved in two tetrahedra (AVL) or in only one (DSG). Def: a facet that appears only once is a “surface facet” Vertices on any surface facet are “surface vertices.” 5 classes of facets by buriedness –Surface facets –Non-surface facets: number of surface vertices (3, 2, 1, or 0) L I V A F P D GS Figure courtesy Alex Tropsha
10
10 Training Set 1,600 Structures –High Resolution –Low Sequence Identity, < 25% 226K facets observed
11
11 Decoy Discrimination Well formed, non-native structures –Standard sets available from Decoys’R’Us, http://dd.stanford.edu –Many potentials have failed the discrimination task on these sets Two Measures of Fitness for a Potential –Rank of Native Structure –Z-Score of Native Structure (NativeScore - ) / Compare 4 potentials: –Latest 4-Body Potential –3-Body, no buriedness distinction –3-Body –Combination of 3- and 4-Body Potentials Scores from 3-body come from only the fully buried facets
12
12 Four-State Reduced Decoy Sets PDB-ID#D’sRankZ-ScrRankZ-ScrRankZ-ScrRankZ-Scr 1ctf 63013.08932.28072.53092.942 1r69 67522.74132.61722.66823.572 1sn3 660241.8971940.525261.760202.041 2cro 674191.925671.2301031.113232.138 3icb 653291.905301.73062.319122.325 4pti 68713.1203030.1611001.01013.330 4rxn 67732.9302840.2271860.62052.702 4-Body3bNBD3-body4b + 3b* * fully buried facets only
13
13 Fisa Decoy Sets PDB-ID#D’sRankZ-ScrRankZ-ScrRankZ-ScrRankZ-Scr 1fc2 50013.0171130.800102.50713.357 1hdd-C 5001530.6191130.71213.085711.021 2cro 500172.008321.602162.20723.511 4icb 50014.55613.19816.97216.367 4-Body3bNBD3-body4b + 3b* * fully buried facets only
14
14 Lattice SS Fit Decoy Sets PDB-ID#D’sRankZ-ScrRankZ-ScrRankZ-ScrRankZ-Scr 1beo 200017.828162.31773.14215.564 1ctf 200014.65462.815112.92443.947 1dkt-A* 20001021.5961047-0.0464680.717851.790 1fca 200016.2555240.5921221.65915.986 1nkl 200017.66714.18114.75517.769 1pgb 200015.4341001.596142.66816.003 1trl-A* 20009620.0718320.2221476-0.6831141-0.254 4icb 200014.73213.58913.85215.685 4-Body3bNBD3-body4b + 3b* * fully buried facets only
15
15 LMDS Decoy Sets PDB-ID#D’sRankZ-ScrRankZ-ScrRankZ-ScrRankZ-Scr 1b0n-B* 497405-0.916466-1.394253-0.059405-0.916 1bba 50014.142477-1.8221790.30414.142 1ctf 49712.79722.27192.36652.475 1dtk 21581.903500.76452.18051.952 1fc2 5002170.1371270.61513.7701230.669 1igd 50042.569132.006102.47942.458 1shf-A 437351.472121.86942.473151.850 2cro 50022.78742.27114.12515.222 2ovo 348610.853141.721131.853550.917 4pti 34391.9001520.147301.48482.064 4-Body3bNBD3-body4b + 3b* * fully buried facets only
16
16 Average Performance Across Sets RankZ-scrRankZ-scrRankZ-scrRankZ-scr 11.32.516126.31.25361.4281.71810.2862.722 32.742671.231261.76192.702 432.55064.81.57873.69318.753.564 92.51372.51.2015.52.7961.53.434 133.84.800315.91.908262.52.380154.44.561 15.084581.95712.52.79715.625 74.31.764131.70.84550.52.09862.22.083 8.51.901321.2429.52.2736.52.008 65.62.902159.71.39695.42.47261.43.232 5.43.06057.41.40813.42.4064.53.442 4-Body3bNBD3-body4b + 3b* Mean Median Mean Median Mean Median Mean Median Mean( Mean) Mean( Median) 4state Fisa Lat LMDS All * fully buried facets only
17
17 Dimer “Discrimination” We could not effectively discriminate the native from decoys with either the 3- or 4- body potentials for 3 proteins. On closer examination, we discovered the native structures were incomplete, leaving exposed residues that would be buried in their native multimeric shapes. 1b0n-B1dkt-A1trl-A
18
18 Average Performance Across Sets RankZ-scrRankZ-scrRankZ-scrRankZ-scr 23.53.30698.41.61030.92.72913.9113.632 5.53.25049.31.61412.62.4894.43.509 4-Body3bNBD3-body4b + 3b* Mean( Mean) Mean( Median) All * fully buried facets only
19
19 Conclusion Buriedness distinctions capture valuable information about protein structure 3- + 4-Body potential is the strongest Delaunay potential to date.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.