Download presentation
Presentation is loading. Please wait.
Published byMaria Watson Modified over 9 years ago
1
Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling School of Pharmacy University of North Carolina at Chapel Hill October 14, 2015
2
Problem Given a protein-ligand complex, predict ligand binding affinity.
3
Knowledge-based (Statistical) Potentials Two Body potentials PMF Muegge, I.; Martin, Y.C.; J.Med.Chem.1999, 42, 791-804 BLEEP Mitchell, J.B.; Laskowski R.A.; Alex A.; Thornton, J.M.; J. Comp. Chem. 1999, 20,1165-1176 DrugScore Gohlke, H.; Hendlich, M.; Klebe,G.; J Mol Biol 2000, 295, 337-356 SMoG DeWitte, R. S.; Shakhnovich, E.I. J Am. Chem. Soc. 1996, 118,11733-11744 SMoG2001 Ishchenko. A. V.; Shakhnovich, E. I.; J. Med. Chem. 2002, 45, 2770-2780 Four-Body contact potential (By Jun Feng)
4
Full Atom-based Delaunay tessellation of Protein-ligand Interface (5HVP)
5
RRRLRRLLRLLL RRRL: Formed by 3 receptor atoms and 1 ligand atoms RRLL: Formed by 2 receptor atoms and 2 ligand atoms RLLL: Formed by 1 receptor atoms and 3 ligand atoms Three Types of Tetrahedra at Protein-ligand Interface
6
Earlier work: Four-Body Statistical Contact Scoring Function Based on Delaunay Tessellation
7
Correlation between experimental and calculated binding free energy for PMF dataset using four-body scoring function
8
Training Set size Test Set size Test Set R 2 BLEEP351900.53 PMF697770.61 SMoG96120460.42 SMoG20017251110.436 DT2001319670.71 DT20023191070.54 Comparison of Current Scoring Functions
9
Multiple CG descriptors of protein-ligand interface and correlation with ligand affinity Define the ligand-receptor interface by the means of DT Calculate chemical descriptors for nearest neighbor atom quadruplets. Use statistical data modeling approach to correlate descriptors and affinity
10
µ: Electronegativity (chemical potentials) of atoms Q: Partial charges on atoms Η: Hardness kernel Descriptors derived from atomic electronegativity
11
Ligand Atom Types OEN = 3.4 NEN = 3.0 CEN = 2.5 SEN = 2.4 XP and Halgens, EN = 2.0 ~ 2.4, 4.0 MMetal and all other unexpected atom types, EN = 0.6 ~ 1.6 Receptor Atom Types OEN = 3.4 NEN = 3.0 CEN = 2.5 SEN = 2.4 There are 554 possible interfacial quadruplet composition types. After processing 517 complexes, 100 are found to occur with high frequency (at least 50 times). Atom Type Definition based on En values
12
m: m-th tetrahedral composition type j: Vertex of a tetradedron n: Number of m-th composition type Thus, there are 100 descriptors for each protein-ligand complex Descriptor Calculation S_L C_R O_L N_R 2.5 2.4 3.0 3.4
13
Flowchart of Novel Descriptor Generation Process files and assign atom type based on EN value Define interaction interface with DT and record all interfacial tetrahedra 264 complexes Classify interfacial tetrahedra into different composition types and calculate their EN values (Descriptors) Correlate with Binding
14
Data Modeling StructureBindingCG Descriptors Comp.1Value1D1D2D3D4 Comp.2Value2"""" Comp.3Value3"""" Comp.N-264Value264"""" - - - - - - - Goal: Establish correlations between descriptors and the binding affinity capable of predicting binding of novel complexes {Binding affinity} = K{descriptor diversity} ^
15
Diversity of the dataset: 264 Complexes, 33 families
16
Only accept models that have a q 2 > 0.6 R 2 > 0.6, etc. Multiple Training Sets Validate Predictive Models with Randomly Selected External Sets (24) Data Modeling Workflow 264 Complexes Multiple Test Sets Variable Selection kNN to build models Split 240 into Training and Test Sets Binding Prediction Y-Randomization Randomly Exclude 24 Complexes as External Set
17
Leave out one complex from the training set and calculate distance between the eliminated and all remaining compounds (in the original 100 descriptor space) k Nearest Neighbor (k NN) with Variable Selection Randomly select a subset of descriptors (a hypothetical descriptor pharmacophore) Leave out a complex Find k nearest neighbors in the training set Predict the binding affinity of the eliminated complex by weighted kNN using the identified k nearest neighbors. Select acceptable models (with q 2 > 0.6) Calculate the predictive ability (q 2 ) of the model N times SA
18
Correlation of Actual ~ Predicted Binding Affinity for 49 Test Set Complexes
19
Correlation of Actual ~ Predicted Binding Affinity for 24 Complexes with Best Model
20
Training Set size Test Set size Test Set R 2 BLEEP351900.53 PMF697770.61 SMoG96120460.42 SMoG20017251110.436 DT2001319670.71 DT20023191070.54 CG191490.78 Comparison of Current Scoring Functions
21
Novel geometrical chemical descriptors have been developed These simple yet fundamental descriptors can be used to predict binding affinity using correlation approaches; have high prediction power for diverse ligand-protein structures The statistical models can be used for fast and accurate scoring of complexes resulting from docking studies Conclusions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.