Presentation is loading. Please wait.

Presentation is loading. Please wait.

Majid Masso School of Systems Biology, George Mason University

Similar presentations


Presentation on theme: "Majid Masso School of Systems Biology, George Mason University"— Presentation transcript:

1 An Atomic Four-Body Potential for the Prediction of Protein-Ligand Binding Affinity
Majid Masso School of Systems Biology, George Mason University Manassas, Virginia 20110, USA CSBW – BIBM 2012, Philadelphia, Pennsylvania

2 Knowledge-Based Potentials of Mean Force
Generated via statistical analysis of observed features in a diverse training set of structures selected from the PDB Alternative to physics or molecular mechanics energy functions Assumption: observed features follow a Boltzmann distribution Examples: Well-documented in the literature: distance-dependent pairwise interactions at the atomic or amino acid level This study: inclusion of higher-order contributions by developing an all-atom four-body statistical potential Motivation (our prior work): Four-body protein potential at the amino acid level

3 Motivational Example: Pairwise Amino Acid Potential
The 20-letter protein alphabet yields 210 residue pairs Obtain a diverse PDB training set of single protein chains; represent each protein as a set of amino acid points in 3D For each residue pair (i, j), calculate the relative frequency fij with which they appear within a given distance (e.g., 12 angstroms) of each other in all the protein structures Calculate a rate pij expected by chance alone by using a background or reference distribution (more later…) Apply inverted Bolzmann principle: sij = log(fij / pij) quantifies interaction propensity and is proportional to the energy of interaction (by a factor of ‘–RT’) for the pair

4 All-Atom Four-Body Statistical Potential
Diverse PDB training set of 1417 single chain and multimeric proteins, many complexed to ligands (see paper for text file) Six-letter atomic alphabet: C, N, O, S, M (metals), X (other) Apply Delaunay tessellation to the atomic point coordinates of each PDB file – objectively identifies all nearest-neighbor quadruplets of atoms in the structure (8 angstrom cutoff)

5 All-Atom Four-Body Statistical Potential
The six-letter atomic alphabet yields 126 distinct quadruplets Calculate observed rate fijkl of quad (i, j, k, l) occurrence among all tetrahedra from the 1417 structure tessellations Compute rate pijkl expected by chance from a multinomial reference distribution: an = proportion of atoms from all structures that are of type n tn = number of occurrences of atom type n in the quad

6 Summary Data for the 1417 Structure Files and their Delaunay Tessellations

7 All-Atom Four-Body Statistical Potential

8 Topological Score (TS)
Delaunay tessellation of any macromolecular structure yields an aggregate of tetrahedral simplices Each simplex can be scored using the all-atom four-body potential based on the quad present at the four vertices Topological score (or ‘total potential’) of the structure: sum the scores of all constituent simplices in tessellation sijkl TS = Σsijkl

9 Topological Score Difference (ΔTS)

10 Application of ΔTS: Predicting Protein – Ligand Binding Energy
MOAD – repository of exp. dissociation constants (kd) for protein–ligand complexes whose structures are in PDB Collected kd values for 300 complexes reflecting diverse protein structures Obtained exp. binding energy from kd via ΔGexp = –RTln(kd) Calculated ΔTS for complexes

11 Predicting Protein – Ligand Binding Energy
Randomly selected 200 complexes to train a model Correlation coefficient r = 0.79 between ΔTS and ΔGexp Empirical linear transformation of ΔTS to reflect energy values: ΔGcalc = L (ΔTS) Linear => same r = 0.79 value between ΔGcalc and ΔGexp Also, standard error of SE = 1.98 kcal/mol and fitted regression line of y = 1.18x (y = ΔGcalc and x = ΔGexp)

12 Predicting Protein – Ligand Binding Energy
For the test set of 100 remaining complexes: r = 0.79 between ΔGcalc and ΔGexp SE = 1.93 kcal/mol Fitted regression line is y = 1.11x – 0.63 All training/test data is available online as a text file (see paper)

13 References and Acknowledgments
PDB (structure DB): MOAD (ligand binding DB): Qhull (Delaunay tessellation): UCSF Chimera (ribbon/ball-stick structure visualization): Matlab (tessellation visualization):


Download ppt "Majid Masso School of Systems Biology, George Mason University"

Similar presentations


Ads by Google