Download presentation
Presentation is loading. Please wait.
Published byMarcus Norris Modified over 9 years ago
1
A MULTIBODY ATOMIC STATISTICAL POTENTIAL FOR PREDICTING ENZYME-INHIBITOR BINDING ENERGY Majid Masso (mmasso@gmu.edu) Laboratory for Structural Bioinformatics, School of Systems Biology, George Mason University, 10900 University Blvd. MS 5B3, Manassas, Virginia 20110, USA I. Abstract Accurate prediction of enzyme-inhibitor binding energy has the capacity to speed drug design and chemical genomics efforts by helping to narrow the focus of experiments. Here a non-redundant set of three hundred high-resolution crystallographic enzyme-inhibitor structures was compiled for analysis, complexes with known binding energies (ΔG) based on the availability of experimentally determined inhibition constants (k i ). Additionally, a separate set of over 1400 diverse high-resolution macromolecular crystal structures was collected for the purpose of creating an all-atom knowledge-based statistical potential, via application of the Delaunay tessellation computational geometry technique. Next, two hundred of the enzyme-inhibitor complexes were randomly selected to develop a model for predicting binding energy, first by tessellating structures of the complexes as well as the enzymes without their bound inhibitors, then by using the statistical potential to calculate a topological score for each structure tessellation. We derived as a predictor of binding energy an empirical linear function of the difference between topological scores for a complex and its isolated enzyme. A correlation coefficient (r) of 0.79 was obtained for the experimental and calculated ΔG values, with a standard error of 2.34 kcal/mol. Lastly, the model was evaluated with the held-out set of one hundred complexes, for which structure tessellations were performed in order to calculate topological score differences, and binding energy predictions were generated from the derived linear function. Calculated binding energies for the test data also compared well with their experimental counterparts, displaying a correlation coefficient of r = 0.77 with a standard error of 2.50 kcal/mol. II. Protein Data Bank (http://www.rcsb.org/pdb)http://www.rcsb.org/pdb PDB – repository of solved (x-ray, nmr,...) structures Each structure file contains atomic 3D coordinate data AtomXYZ :::: :::: III. Macromolecular Modeling Native structure is conformation having lowest energy Physics-based energy calculations using quantum mechanics are computationally impractical Same for molecular mechanics-based potential energy functions (i.e., force fields): E(total) = E(bond) + E(angle) + E(dihedral) + E(electrostatic) + E(van der Waals) Alternative (our approach): knowledge-based potentials of mean force (i.e., generated from known protein structures) IV. Knowledge-Based Potentials of Mean Force Assumptions: –At equilibrium, native state has global free energy min –Microscopic states (i.e., features) follow Boltzmann dist Examples: –Well-documented in the literature: distance-dependent pairwise interactions at the atomic or amino acid level –This study: inclusion of higher-order contributions by developing all-atom four-body statistical potentials Motivation (our prior work): –Four-body protein potential at the amino acid level V. Motivational Example: Pairwise Amino Acid Potential A 20-letter protein alphabet yields 210 residue pairs Obtain large, diverse PDB dataset of single protein chains For each residue pair (i, j), calculate the relative frequency f ij with which they appear within a given distance (e.g., 12 angstroms) of each other in all the protein structures Calculate a rate p ij expected by chance alone from a background or reference distribution (more later…) Apply inverted Bolzmann principle: s ij = log(f ij / p ij ) quantifies interaction propensity and is proportional to the energy of interaction (by a factor of ‘–RT’) VI. All-Atom Four-Body Statistical Potential Obtain diverse PDB dataset of 1417 single chain and multimeric proteins, many complexed to ligands (see XV. References) Six-letter atomic alphabet: C, N, O, S, M (metals), X (other) Apply Delaunay tessellation to the atomic point coordinates of each PDB file – objectively identifies all nearest-neighbor quadruplets of atoms in the structure (8 angstrom cutoff) VII. All-Atom Four-Body Statistical Potential A six-letter atomic alphabet yields 126 distinct quadruplets For each quad (i, j, k, l), calculate observed rate of occurrence f ijkl among all tetrahedra from the 1417 structure tessellations Compute rate p ijkl expected by chance from a multinomial reference distribution: a n = proportion of atoms from all structures that are of type n t n = number of occurrences of atom type n in the quad Apply inverted Bolzmann principle: s ijkl = log(f ijkl / p ijkl ) quantifies the interaction propensity and is proportional to the energy of atomic quadruplet interaction VIII. Summary Data for the 1417 Structure Files and their Delaunay Tessellations IX. All-Atom Four-Body Statistical Potential X. Topological Score (TS) Delaunay tessellation of any macromolecular structure yields an aggregate of tetrahedral simplices Each simplex can be scored using the all-atom four-body potential based on the quad present at the four vertices Topological score (or ‘total potential’) of the structure: the sum of all constituent simplices in the tessellation s ijkl TS = Σs ijkl XI. Topological Score Difference (ΔTS) XII. Application of ΔTS: Predicting Enzyme–Inhibitor Binding Energy MOAD – repository of exp. inhibition constants (k i ) for protein–ligand complexes whose structures are in PDB Collected k i values for 300 complexes reflecting diverse protein structures Obtained exp. binding energy from k i via ΔG exp = –RTln(k i ) Calculated ΔTS for complexes XIII. Predicting Enzyme–Inhibitor Binding Energy Randomly selected 200 complexes to train a model Correlation coefficient r = 0.79 between ΔTS and ΔG exp Empirical linear transform of ΔTS to reflect energy values: ΔG calc = (1 / 0.0003) × ΔTS – 6.24 Linear => same r = 0.79 value between ΔG calc and ΔG exp Also, standard error of SE = 2.34 kcal/mol and fitted regression line of y = 0.98x – 0.41 (y = ΔG calc and x = ΔG exp ) XIV. Predicting Enzyme–Inhibitor Binding Energy For the test set of 100 remaining complexes: r = 0.77 between ΔG calc and ΔG exp SE = 2.50 kcal/mol Fitted regression line is y = 1.07x + 0.46 All training/test data available online as a text file (see XV. References) XV. References and Acknowledgments PDB dataset: http://proteins.gmu.edu/automute/tessellatable1417.txt http://proteins.gmu.edu/automute/tessellatable1417.txt Train/test dataset: http://proteins.gmu.edu/automute/MOAD300ki.txt http://proteins.gmu.edu/automute/MOAD300ki.txt PDB (structure DB): http://www.rcsb.org/pdbhttp://www.rcsb.org/pdb MOAD (ligand binding DB): http://bindingmoad.org/http://bindingmoad.org/ Qhull (Delaunay tessellation): http://www.qhull.org/http://www.qhull.org/ UCSF Chimera (ribbon/ball-stick structure visualization): http://www.cgl.ucsf.edu/chimera/ http://www.cgl.ucsf.edu/chimera/ Matlab (tessellation visualization): http://www.mathworks.com/products/matlab/ http://www.mathworks.com/products/matlab/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.