Majid Masso School of Systems Biology, George Mason University

Slides:



Advertisements
Similar presentations
Forecasting Using the Simple Linear Regression Model and Correlation
Advertisements

Ligand Binding Site Prediction for HIV-1 Protease using Shape Comparison Techniques Manasi Jahagirdar 1, Vivek K Jalahalli 2, Sunil Kumar 1, A. Srinivas.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Regression, Correlation. Research Theoretical empirical Usually combination of the two.
A COMPLEX NETWORK APPROACH TO FOLLOWING THE PATH OF ENERGY IN PROTEIN CONFORMATIONAL CHANGES Del Jackson CS 790G Complex Networks
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Linear Regression and Correlation
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Review of “Stability of Macromolecular Complexes” Dan Kulp Brooijmans, Sharp, Kuntz.
Relationships Among Variables
A Statistical Geometry Approach to the Study of Protein Structure Majid Masso Bioinformatics and Computational Biology George Mason University.
Protein Tertiary Structure Prediction
CS324e - Elements of Graphics and Visualization Java Intro / Review.
Simple Linear Regression Models
Is there a relationship between the lengths of body parts ?
Overcoming the Curse of Dimensionality in a Statistical Geometry Based Computational Protein Mutagenesis Majid Masso Bioinformatics and Computational Biology.
Increasing the Value of Crystallographic Databases Derived knowledge bases Knowledge-based applications programs Data mining tools for protein-ligand complexes.
Prediction of HIV-1 Drug Resistance: Representation of Target Sequence Mutational Patterns via an n-Grams Approach Majid Masso School of Systems Biology,
Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander.
Correlation & Regression
1 John Mitchell; James McDonagh; Neetika Nath Rob Lowe; Richard Marchese Robinson.
Conformational Entropy Entropy is an essential component in ΔG and must be considered in order to model many chemical processes, including protein folding,
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
UNC Chapel Hill David A. O’Brien Chain Growing Using Statistical Energy Functions David A. O'Brien Balasubramanian Krishnamoorthy: Jack Snoeyink Alex Tropsha.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
A MULTIBODY ATOMIC STATISTICAL POTENTIAL FOR PREDICTING ENZYME-INHIBITOR BINDING ENERGY Majid Masso Laboratory for Structural Bioinformatics,
Linear Prediction Correlation can be used to make predictions – Values on X can be used to predict values on Y – Stronger relationships between X and Y.
Molecular dynamics simulations of toxin binding to ion channels Quantitative description protein –ligand interactions is a fundamental problem in molecular.
Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine Ajay N. Jain UCSF Cancer Research Institute and Comprehensive.
1 Three-Body Delaunay Statistical Potentials of Protein Folding Andrew Leaver-Fay University of North Carolina at Chapel Hill Bala Krishnamoorthy, Alex.
We propose an accurate potential which combines useful features HP, HH and PP interactions among the amino acids Sequence based accessibility obtained.
CORRELATION-REGULATION ANALYSIS Томский политехнический университет.
Modeling Cell Proliferation Activity of Human Interleukin-3 (IL-3) Upon Single Residue Replacements Majid Masso Bioinformatics and Computational Biology.
2014 Using machine learning to predict binding sites in proteins Jenelle Bray Stanford University October 10, 2014 #GHC
A new protein-protein docking scoring function based on interface residue properties Reporter: Yu Lun Kuo (D )
Highlights in Physics –14 October 2005, Dipartimento di Fisica, Università di Milano Quantum methods in protein science C. Camilloni *, P. Cerri.
BIOINFORMATION A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation - - 王红刚 14S
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Regression and Correlation
Regression Analysis AGEC 784.
Correlation, Bivariate Regression, and Multiple Regression
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
DEVELOPMENT OF SEMI-EMPIRICAL ATOMISTIC POTENTIALS MS-MEAM
NATURE NEUROSCIENCE 2007 Coordinated memory replay in the visual cortex and hippocampus during sleep Daoyun Ji & Matthew A Wilson Department of Brain.
Chapter 12: Regression Diagnostics
(Residuals and
Protein Structure Prediction and Protein Homology modeling
Database extraction of residue-specific empirical potentials
Extra Tree Classifier-WS3 Bagging Classifier-WS3
Correlation and Regression
Gene Architectures that Minimize Cost of Gene Expression
CHAPTER 26: Inference for Regression
Linear Regression.
DRQ 8 Dr. Capps AGEC points
Machine Learning to Predict Experimental Protein-Ligand Complexes
Functions and Their Graphs
Adequacy of Linear Regression Models
Volume 23, Issue 10, Pages (October 2016)
Adequacy of Linear Regression Models
Ligand Binding to the Voltage-Gated Kv1
Adequacy of Linear Regression Models
Volume 22, Issue 4, Pages (April 2014)
Model selection and fitting
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Bootstrapping and Bootstrapping Regression Models
Demian Riccardi, Qiang Cui, George N. Phillips  Biophysical Journal 
Presentation transcript:

An Atomic Four-Body Potential for the Prediction of Protein-Ligand Binding Affinity Majid Masso School of Systems Biology, George Mason University Manassas, Virginia 20110, USA CSBW – BIBM 2012, Philadelphia, Pennsylvania

Knowledge-Based Potentials of Mean Force Generated via statistical analysis of observed features in a diverse training set of structures selected from the PDB Alternative to physics or molecular mechanics energy functions Assumption: observed features follow a Boltzmann distribution Examples: Well-documented in the literature: distance-dependent pairwise interactions at the atomic or amino acid level This study: inclusion of higher-order contributions by developing an all-atom four-body statistical potential Motivation (our prior work): Four-body protein potential at the amino acid level

Motivational Example: Pairwise Amino Acid Potential The 20-letter protein alphabet yields 210 residue pairs Obtain a diverse PDB training set of single protein chains; represent each protein as a set of amino acid points in 3D For each residue pair (i, j), calculate the relative frequency fij with which they appear within a given distance (e.g., 12 angstroms) of each other in all the protein structures Calculate a rate pij expected by chance alone by using a background or reference distribution (more later…) Apply inverted Bolzmann principle: sij = log(fij / pij) quantifies interaction propensity and is proportional to the energy of interaction (by a factor of ‘–RT’) for the pair

All-Atom Four-Body Statistical Potential Diverse PDB training set of 1417 single chain and multimeric proteins, many complexed to ligands (see paper for text file) Six-letter atomic alphabet: C, N, O, S, M (metals), X (other) Apply Delaunay tessellation to the atomic point coordinates of each PDB file – objectively identifies all nearest-neighbor quadruplets of atoms in the structure (8 angstrom cutoff)

All-Atom Four-Body Statistical Potential The six-letter atomic alphabet yields 126 distinct quadruplets Calculate observed rate fijkl of quad (i, j, k, l) occurrence among all tetrahedra from the 1417 structure tessellations Compute rate pijkl expected by chance from a multinomial reference distribution: an = proportion of atoms from all structures that are of type n tn = number of occurrences of atom type n in the quad

Summary Data for the 1417 Structure Files and their Delaunay Tessellations

All-Atom Four-Body Statistical Potential

Topological Score (TS) Delaunay tessellation of any macromolecular structure yields an aggregate of tetrahedral simplices Each simplex can be scored using the all-atom four-body potential based on the quad present at the four vertices Topological score (or ‘total potential’) of the structure: sum the scores of all constituent simplices in tessellation sijkl TS = Σsijkl

Topological Score Difference (ΔTS)

Application of ΔTS: Predicting Protein – Ligand Binding Energy MOAD – repository of exp. dissociation constants (kd) for protein–ligand complexes whose structures are in PDB Collected kd values for 300 complexes reflecting diverse protein structures Obtained exp. binding energy from kd via ΔGexp = –RTln(kd) Calculated ΔTS for complexes

Predicting Protein – Ligand Binding Energy Randomly selected 200 complexes to train a model Correlation coefficient r = 0.79 between ΔTS and ΔGexp Empirical linear transformation of ΔTS to reflect energy values: ΔGcalc = L (ΔTS) Linear => same r = 0.79 value between ΔGcalc and ΔGexp Also, standard error of SE = 1.98 kcal/mol and fitted regression line of y = 1.18x (y = ΔGcalc and x = ΔGexp)

Predicting Protein – Ligand Binding Energy For the test set of 100 remaining complexes: r = 0.79 between ΔGcalc and ΔGexp SE = 1.93 kcal/mol Fitted regression line is y = 1.11x – 0.63 All training/test data is available online as a text file (see paper)

References and Acknowledgments PDB (structure DB): http://www.rcsb.org/pdb MOAD (ligand binding DB): http://bindingmoad.org/ Qhull (Delaunay tessellation): http://www.qhull.org/ UCSF Chimera (ribbon/ball-stick structure visualization): http://www.cgl.ucsf.edu/chimera/ Matlab (tessellation visualization): http://www.mathworks.com/products/matlab/