Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander.

Slides:



Advertisements
Similar presentations
Combinatorial computational method gives new picomolar ligands for a known enzyme Bartosz A. Grzybowski, Alexey V. Ishchenko, Chu- Young Kim, George Topalov,
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
Case Study: Dopamine D 3 Receptor Anthagonists Chapter 3 – Molecular Modeling 1.
Comparison of Data Mining Algorithms on Bioinformatics Dataset Melissa K. Carroll Advisor: Sung-Hyuk Cha March 4, 2003.
Improving enrichment rates A practical solution to an impractical problem Noel O’Boyle Cambridge Crystallographic Data Centre
Bioinformatics Vol. 21 no (Pages ) Reporter: Yu Lun Kuo (D )
Computational Drug Design Apr 2010 Postgrad course on Comp Chem Noel M. O’Boyle.
PROBABILISTIC ASSESSMENT OF THE QSAR APPLICATION DOMAIN Nina Jeliazkova 1, Joanna Jaworska 2 (1) IPP, Bulgarian Academy of Sciences, Sofia, Bulgaria (2)
Molecular dynamics refinement and rescoring in WISDOM virtual screenings Gianluca Degliesposti University of Modena and Reggio Emilia Molecular Modelling.
Development of methods for the analysis of ligand-protein interactions by Maris Lapinsh; Advisor Jarl Wikberg Division of Pharmacology, Uppsala University.
Molecular Docking G. Schaftenaar Docking Challenge Identification of the ligand’s correct binding geometry in the binding site ( Binding Mode ) Observation:
Improved prediction of protein-protein binding sites using a support vector machine ( James Bradford, et al (2004)) Tapan Patel CISC841 Trypsin (and inhibitor.
Chapter 7 – K-Nearest-Neighbor
Summary Molecular surfaces QM properties presented on surface Compound screening Pattern matching on surfaces Martin Swain Critical features Dave Whitley.
Quantative Structure- Activity Relationships. Why QSAR? The number of compounds required for synthesis in order to place 10 different groups in 4 positions.
Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.
Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA) Martin Ott.
An Integrated Approach to Protein-Protein Docking
BL5203: Molecular Recognition & Interaction Lecture 5: Drug Design Methods Ligand-Protein Docking (Part I) Prof. Chen Yu Zong Tel:
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
A Statistical Geometry Approach to the Study of Protein Structure Majid Masso Bioinformatics and Computational Biology George Mason University.
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Inverse Kinematics for Molecular World Sadia Malik April 18, 2002 CS 395T U.T. Austin.
Predicting Highly Connected Proteins in PIN using QSAR Art Cherkasov Apr 14, 2011 UBC / VGH THE UNIVERSITY OF BRITISH COLUMBIA.
eHiTS Score Darryl Reid, Zsolt Zsoldos, Bashir S. Sadjad, Aniko Simon, The next stage in scoring function evolution: a new statistically.
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTAC CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGG
Monte Carlo Simulation of Interacting Electron Models by a New Determinant Approach Mucheng Zhang (Under the direction of Robert W. Robinson and Heinz-Bernd.
Topological Summaries: Using Graphs for Chemical Searching and Mining Graphs are a flexible & unifying model Scalable similarity searches through novel.
Overcoming the Curse of Dimensionality in a Statistical Geometry Based Computational Protein Mutagenesis Majid Masso Bioinformatics and Computational Biology.
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes PDBe Protein.
In molecular switching, the recognition of an external signal such as ligand binding by one protein is coupled to the catalytic activity of a second protein.
“Emergency discovery” of novel antimicrobials among known drugs in response to new and re-emerging infectious threats A. Cherkasov UBC / VGH Infectious.
1 John Mitchell; James McDonagh; Neetika Nath Rob Lowe; Richard Marchese Robinson.
In silico discovery of inhibitors using structure-based approaches Jasmita Gill Structural and Computational Biology Group, ICGEB, New Delhi Nov 2005.
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes PDBe Protein.
Empirical Validation of the Effectiveness of Chemical Descriptors in Data Mining Kirk Simmons DuPont Crop Protection Stine-Haskell Research Center 1090.
3.10 Determining a Chemical Formula from Experimental Data
A two-state homology model of the hERG K + channel: application to ligand binding Ramkumar Rajamani, Brett Tongue, Jian Li, Charles H. Reynolds J & J PRD.
Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
UNC Chapel Hill David A. O’Brien Chain Growing Using Statistical Energy Functions David A. O'Brien Balasubramanian Krishnamoorthy: Jack Snoeyink Alex Tropsha.
ABSTRACT The behavior and fate of chemicals in the environment is strongly influenced by the inherent properties of the compounds themselves, particularly.
Chapter 6 – Three Simple Classification Methods © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Unsupervised Forward Selection A data reduction algorithm for use with very large data sets David Whitley †, Martyn Ford † and David Livingstone †‡ † Centre.
Protein Design with Backbone Optimization Brian Kuhlman University of North Carolina at Chapel Hill.
A MULTIBODY ATOMIC STATISTICAL POTENTIAL FOR PREDICTING ENZYME-INHIBITOR BINDING ENERGY Majid Masso Laboratory for Structural Bioinformatics,
Selecting Diverse Sets of Compounds C371 Fall 2004.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe-PISA a web based service for understanding Protein Interfaces, Surfaces and Assemblies.
MUTAGENICITY OF AROMATIC AMINES: MODELLING, PREDICTION AND CLASSIFICATION BY MOLECULAR DESCRIPTORS M.Pavan and P.Gramatica QSAR Research Unit, Dept. of.
Molecular dynamics simulations of toxin binding to ion channels Quantitative description protein –ligand interactions is a fundamental problem in molecular.
Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine Ajay N. Jain UCSF Cancer Research Institute and Comprehensive.
Chapter 8 – Naïve Bayes DM for Business Intelligence.
CoMFA Study of Piperidine Analogues of Cocaine at the Dopamine Transporter: Exploring the Binding Mode of the 3  -Substituent of the Piperidine Ring Using.
1 Three-Body Delaunay Statistical Potentials of Protein Folding Andrew Leaver-Fay University of North Carolina at Chapel Hill Bala Krishnamoorthy, Alex.
Computational Approach for Combinatorial Library Design Journal club-1 Sushil Kumar Singh IBAB, Bangalore.
Molecular mechanics Classical physics, treats atoms as spheres Calculations are rapid, even for large molecules Useful for studying conformations Cannot.
Modeling Cell Proliferation Activity of Human Interleukin-3 (IL-3) Upon Single Residue Replacements Majid Masso Bioinformatics and Computational Biology.
2014 Using machine learning to predict binding sites in proteins Jenelle Bray Stanford University October 10, 2014 #GHC
PDBe Protein Interfaces, Surfaces and Assemblies
SCAI Activities in the - GRID-field - Molecular Modelling field
Majid Masso School of Systems Biology, George Mason University
Molecular Docking Profacgen. The interactions between proteins and other molecules play important roles in various biological processes, including gene.
Introduction Feature Extraction Discussions Conclusions Results
K Nearest Neighbor Classification
Virtual Screening.
Nearest-Neighbor Classifiers
Polarity of Molecules October 2016.
An Integrated Approach to Protein-Protein Docking
M.Pavan, P.Gramatica, F.Consolaro, V.Consonni, R.Todeschini
Volume 84, Issue 4, Pages (April 2003)
Presentation transcript:

Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling School of Pharmacy University of North Carolina at Chapel Hill October 14, 2015

Problem Given a protein-ligand complex, predict ligand binding affinity.

Knowledge-based (Statistical) Potentials Two Body potentials PMF Muegge, I.; Martin, Y.C.; J.Med.Chem.1999, 42, BLEEP Mitchell, J.B.; Laskowski R.A.; Alex A.; Thornton, J.M.; J. Comp. Chem. 1999, 20, DrugScore Gohlke, H.; Hendlich, M.; Klebe,G.; J Mol Biol 2000, 295, SMoG DeWitte, R. S.; Shakhnovich, E.I. J Am. Chem. Soc. 1996, 118, SMoG2001 Ishchenko. A. V.; Shakhnovich, E. I.; J. Med. Chem. 2002, 45, Four-Body contact potential (By Jun Feng)

Full Atom-based Delaunay tessellation of Protein-ligand Interface (5HVP)

RRRLRRLLRLLL RRRL: Formed by 3 receptor atoms and 1 ligand atoms RRLL: Formed by 2 receptor atoms and 2 ligand atoms RLLL: Formed by 1 receptor atoms and 3 ligand atoms Three Types of Tetrahedra at Protein-ligand Interface

Earlier work: Four-Body Statistical Contact Scoring Function Based on Delaunay Tessellation

Correlation between experimental and calculated binding free energy for PMF dataset using four-body scoring function

Training Set size Test Set size Test Set R 2 BLEEP PMF SMoG SMoG DT DT Comparison of Current Scoring Functions

Multiple CG descriptors of protein-ligand interface and correlation with ligand affinity Define the ligand-receptor interface by the means of DT Calculate chemical descriptors for nearest neighbor atom quadruplets. Use statistical data modeling approach to correlate descriptors and affinity

µ: Electronegativity (chemical potentials) of atoms Q: Partial charges on atoms Η: Hardness kernel Descriptors derived from atomic electronegativity

Ligand Atom Types OEN = 3.4 NEN = 3.0 CEN = 2.5 SEN = 2.4 XP and Halgens, EN = 2.0 ~ 2.4, 4.0 MMetal and all other unexpected atom types, EN = 0.6 ~ 1.6 Receptor Atom Types OEN = 3.4 NEN = 3.0 CEN = 2.5 SEN = 2.4 There are 554 possible interfacial quadruplet composition types. After processing 517 complexes, 100 are found to occur with high frequency (at least 50 times). Atom Type Definition based on En values

m: m-th tetrahedral composition type j: Vertex of a tetradedron n: Number of m-th composition type Thus, there are 100 descriptors for each protein-ligand complex Descriptor Calculation S_L C_R O_L N_R

Flowchart of Novel Descriptor Generation Process files and assign atom type based on EN value Define interaction interface with DT and record all interfacial tetrahedra 264 complexes Classify interfacial tetrahedra into different composition types and calculate their EN values (Descriptors) Correlate with Binding

Data Modeling StructureBindingCG Descriptors Comp.1Value1D1D2D3D4 Comp.2Value2"""" Comp.3Value3"""" Comp.N-264Value264"""" Goal: Establish correlations between descriptors and the binding affinity capable of predicting binding of novel complexes {Binding affinity} = K{descriptor diversity} ^

Diversity of the dataset: 264 Complexes, 33 families

Only accept models that have a q 2 > 0.6 R 2 > 0.6, etc. Multiple Training Sets Validate Predictive Models with Randomly Selected External Sets (24) Data Modeling Workflow 264 Complexes Multiple Test Sets Variable Selection kNN to build models Split 240 into Training and Test Sets Binding Prediction Y-Randomization Randomly Exclude 24 Complexes as External Set

Leave out one complex from the training set and calculate distance between the eliminated and all remaining compounds (in the original 100 descriptor space) k Nearest Neighbor (k NN) with Variable Selection Randomly select a subset of descriptors (a hypothetical descriptor pharmacophore) Leave out a complex Find k nearest neighbors in the training set Predict the binding affinity of the eliminated complex by weighted kNN using the identified k nearest neighbors. Select acceptable models (with q 2 > 0.6) Calculate the predictive ability (q 2 ) of the model N times SA

Correlation of Actual ~ Predicted Binding Affinity for 49 Test Set Complexes

Correlation of Actual ~ Predicted Binding Affinity for 24 Complexes with Best Model

Training Set size Test Set size Test Set R 2 BLEEP PMF SMoG SMoG DT DT CG Comparison of Current Scoring Functions

Novel geometrical chemical descriptors have been developed These simple yet fundamental descriptors can be used to predict binding affinity using correlation approaches; have high prediction power for diverse ligand-protein structures The statistical models can be used for fast and accurate scoring of complexes resulting from docking studies Conclusions