Developing & Benchmarking Large-scale Docking (LSD) Pipeline Niu Huang, 02/17/2004.

Slides:



Advertisements
Similar presentations
Improving enrichment rates A practical solution to an impractical problem Noel O’Boyle Cambridge Crystallographic Data Centre
Advertisements

Why multiple scoring functions can improve docking performance - Testing hypotheses for rescoring success Noel M. O’Boyle, John W. Liebeschuetz and Jason.
Ubiquinase Johnny has a genetic disorder which leads to overactivity of the enzyme ubiquinase. Previous studies have determined 1) The molecule benzamide.
Iterative Relaxation of Constraints (IRC) Can’t solve originalCan solve relaxed PRMs sample randomly but… start goal C-obst difficult to sample points.
Molecular dynamics refinement and rescoring in WISDOM virtual screenings Gianluca Degliesposti University of Modena and Reggio Emilia Molecular Modelling.
FLEX* - REVIEW.
Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA) Martin Ott.
An Integrated Approach to Protein-Protein Docking
BL5203: Molecular Recognition & Interaction Lecture 5: Drug Design Methods Ligand-Protein Docking (Part I) Prof. Chen Yu Zong Tel:
Comparative Evaluation of 11 Scoring Functions for Molekular Docking Authors: Renxiao Wang, Yipin Lu and Shaomeng Wang Presented by Florian Lenz.
eHiTS Score Darryl Reid, Zsolt Zsoldos, Bashir S. Sadjad, Aniko Simon, The next stage in scoring function evolution: a new statistically.
A genetic algorithm for structure based de-novo design Scott C.-H. Pegg, Jose J. Haresco & Irwin D. Kuntz February 21, 2006.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Modelling binding site with 3DLigandSite Mark Wass
PART II. Prediction of functional regions within disordered proteins Zsuzsanna Dosztányi MTA-ELTE Momentum Bioinformatics Group Department of Biochemistry.
Common parameters At the beginning one need to set up the parameters.
Flexible Multi-scale Fitting of Atomic Structures into Low- resolution Electron Density Maps with Elastic Network Normal Mode Analysis Tama, Miyashita,
Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based Virtual Screening (SBVS) is a proven technique.
In silico discovery of inhibitors using structure-based approaches Jasmita Gill Structural and Computational Biology Group, ICGEB, New Delhi Nov 2005.
SimBioSys Inc.© Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package.
A two-state homology model of the hERG K + channel: application to ligand binding Ramkumar Rajamani, Brett Tongue, Jian Li, Charles H. Reynolds J & J PRD.
Altman et al. JACS 2008, Presented By Swati Jain.
Rational Drug Design : HIV Integrase. A process for drug design which bases the design of the drug upon the structure of its protein target. 1.Structural.
Using Sequence Information Into Protein Docking Procedure.
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Hierarchical Database Screenings for HIV-1 Reverse Transcriptase Using a Pharmacophore Model, Rigid Docking, Solvation Docking, and MM-PB/SA Junmei Wang,
Protein Design with Backbone Optimization Brian Kuhlman University of North Carolina at Chapel Hill.
AN APPROACH TO SEMI FLEXIBLE DOCKING: A case study of the enzymatic reaction catalysed by terpenoid cyclases DIMACS, 13 June 2005
Results I) Regional Survey Rarefaction curves leveled off across sites, suggesting that the sample effort was sufficient to capture differences between.
05/02/2008 Jae Hyun Kim Genome scale enzyme-metabolite and drug-target interaction predictions using the signature molecular descriptor Faulon, J. L.,
Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine Ajay N. Jain UCSF Cancer Research Institute and Comprehensive.
Molecular mechanics Classical physics, treats atoms as spheres Calculations are rapid, even for large molecules Useful for studying conformations Cannot.
A Computational Study of RNA Structure and Dynamics Rhiannon Jacobs and Harish Vashisth Department of Chemical Engineering, University of New Hampshire,
Simplified picture of the principles used for multiple copy simultaneous search (MCSS) and for computational combinatorial ligand design (CCLD). Simplified.
Complete automation in CCP4 What do we need and how to achieve it?
DATA MINING FOR SMALL MOLECULE ALLOSTERIC INHIBITORS
Volume 19, Issue 8, Pages (August 2011)
Rational Drug Design : HIV Integrase
An Integrated Approach to Protein-Protein Docking
Note the effect of cooperativity in the example below.
Volume 23, Issue 12, Pages (December 2015)
AG-221 structure and binding characteristics.
Volume 23, Issue 12, Pages (December 2015)
Giovanni Settanni, Antonino Cattaneo, Paolo Carloni 
AnchorDock: Blind and Flexible Anchor-Driven Peptide Docking
Volume 11, Issue 6, Pages (June 2003)
Volume 89, Issue 2, Pages (August 2005)
Volume 4, Issue 3, Pages (March 1996)
Volume 21, Issue 2, Pages (February 2013)
Volume 19, Issue 5, Pages (May 2011)
Volume 19, Issue 10, Pages (October 2011)
Joe G. Greener, Ioannis Filippis, Michael J.E. Sternberg  Structure 
Volume 16, Issue 5, Pages (May 2008)
Volume 90, Issue 1, Pages (July 1997)
Ligand Binding to the Voltage-Gated Kv1
The Structure of the Tiam1 PDZ Domain/ Phospho-Syndecan1 Complex Reveals a Ligand Conformation that Modulates Protein Dynamics  Xu Liu, Tyson R. Shepherd,
Volume 19, Issue 8, Pages (August 2011)
Volume 21, Issue 2, Pages (February 2013)
Crystal Structure of Saccharopine Reductase from Magnaporthe grisea, an Enzyme of the α-Aminoadipate Pathway of Lysine Biosynthesis  Eva Johansson, James.
Volume 127, Issue 2, Pages (October 2006)
Volume 74, Issue 1, Pages (January 1998)
Volume 23, Issue 12, Pages (December 2015)
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Volume 4, Issue 3, Pages (March 1996)
Monte carlo simulations on mixed resolution protein models
Docking validation of Dud778-dUTPase cocrystal structure.
Julia Salas Case Study, CS379a
Presentation transcript:

Developing & Benchmarking Large-scale Docking (LSD) Pipeline Niu Huang, 02/17/2004

LSD pipeline Binding Site Refinement (PLOP/Modeller) LigBase Model Building (ModBase/PDB) Ligand Docking (DOCK ) Post-docking Refinement (PLOP) Central Database System

Where are we now? Applications (CK – enolase, Jenni – malaria related, Chris – Sporalation …) LSD Modules (Testing & Debugging, Benchmarking) Investigation in details (Enrichment, Binding properties, Performance …)

Target Protein SPHGEN DATABASE GRID SCORING Docking pipeline DATA ANALYSIS

Test case (from J. Med. Chem., McGovern & Shoichet, 2003) DHFR GART TS Thrombin PNPSAHHAChEARPARP

Expert vs automated docking  Enrichment plots comparing the performance of an expert (dark blue), automated procedure (magenta, referred to Test10), and random enrichment (black).

Approach to “expert docking” limit? Enzyme/ target % of db to find 25% of known ligands Susan (expert docking) John (auto) Niu (best) Niu (Test10) AchE Adometc11N/A AR DHFR GART L99AN/A PARP PNP SAHH Thrombin TS Missing atoms

Case analysis (DHFR)

DHFR cont. 1 DHFRDocking parametersEnrichment plotsDocking statistics (CPU 2.4 GHZ) Test1INDOCK.3;.useligsph = on, 70 matching spheres % of db to find 25% of known ligands = 31; Max. enrichment factor < 5 ~ 300 hrs, cmpds scored Test?INDOCK.3;.useligsph = on, 50 matching spheres % of db to find 25% of known ligands = 9.9; Max. enrichment factor = 8.2 ~ 31 hrs, cmpds scored Test?INDOCK.2;.useligsph = on, 50 matching spheres % of db to find 25% of known ligands = 47; Max. enrichment factor < 5 ~ 16 hrs, cmpds scored Test9INDOCK.3;.usefragsph = on, 50 matching spheres % of db to find 25% of known ligands = 4.3; Max. enrichment factor = 9.2 ~ 50 hrs, cmpds scored Test10INDOCK.3;.usefragsph = on, 35 matching spheres *without cofactor **without HOH ***without HIP28 (*) from Susan. ~ 15 hrs, cmpds scored (~ 7.5 hrs, cmpds from Susan); Test11INDOCK.1;.usefragsph = on, 35 matching spheres % of db to find 25% of known ligands = 3.5; Max. enrichment factor = 76 ~ 2 hrs, compds scored * 2.9** 0.3*** (2.0) * 43** 128*** (29)

DHFR cont. 2 Using focused set of spheres appears to be essential for reducing the noise caused by inaccurate scoring function that favors the wrong docking poses, which is alleviated by only using the spheres filled in hot spot region.

DHFR cont. 3 Test1 docked ligands top scored mddr decoys Test10 docked ligands top scored mddr decoys

Case analysis (Aldose Reductase) * Structure, 1997, 5: The conformational flexibility of the binding site appears to contribute to the poor enrichment as implicated by crystal structures, however it may be also due to other factors such as, lack of protein desolvation penalty in scoring function.

AR cont. 1  Correlation coefficients between electrostatic energy and total energy, vdw energy and total energy are 0.74 and 0.66 for docked ligands, individually, 0.62 and for docked top 500 decoys. Clearly, electrostatic interaction is way too favorable and dominate the interaction energy score for docked decoys, which might be remedied by including the protein desolvation penalty.

Case analysis (PARP) PARPDocking parametersEnrichment plotsDocking statistics Test?INDOCK.1; 70 matching spheres % of db to find 25% of known ligands = 3.0; Max. enrichment factor = cmpds Test?INDOCK.2; 70 matching spheres % of db to find 25% of known ligands = 7.3; Max. enrichment factor = cmpds Test?INDOCK.3; 70 matching spheres % of db to find 25% of known ligands = 10.5; Max. enrichment factor = cmpds Test9INDOCK.3; 50 matching spheres % of db to find 25% of known ligands = 9.4; Max. enrichment factor = cmpds Test10INDOCK.3; 35 matching spheres % of db to find 25% of known ligands = 4.5; Max. enrichment factor = cmpds Test11INDOCK.1; 35 matching spheres % of db to find 25% of known ligands = 2.8; Max. enrichment factor = cmpds

Docked ligands Top scored MDDR decoys PARP cont. 1

Case analysis (AChE)  Poor enrichment (5.0 % of db to find 25% of known ligands) appears to be caused by the large number of improbable docking poses. The AChE binding cavity is large with many waters and more than one clear binding region in the pocket; no direct hydrogen bonds between the ligand and the protein have been observed, only water-bridged hydrogen bonds, which presents a particular hard case to dock to. (Jacobsson, JMC, 2004)  Can we do something about it to improve our docking for such cases?

Case analysis (Thrombin) Multiple binding sub-sites? anything to do with the way to generate dockable database and the way to match spheres?

Preliminary Conclusion  A fully automated docking procedure and a consistent parameter set for Grids generation, Docking and Scoring appear to perform well across all the tested systems.  Cofactor, iron and structural waters involving in ligand binding are required to be carefully inspected, as well as protonation states of amino acid residues in binding site.  “larger binding pocket, more extensive sampling – INDOCK.3” is required (validated by DHFR, TS, thrombine and GART test sets).  Docking spheres and delphi spheres can be generated by using different schemes. Focused set of matching spheres were shown to be critical for systems like DHFR, TS and GART, and indicates that the information of hot spot in binding pocket will be important for directing docking.  Careful interpretation of docking results (energy component analysis) should be regularly employed to identify possible errors caused by certain factors.

High quality test sets Enrichment data sets (known ligands and decoys datasets) i. Susan test set ii. Enolase test set iii. NCTR ER data set: 232 diverse compounds, covers a 10 6 – fold range in a validated ER competitive binding assay, and NCTR AR data set: 202 diverse compounds (Tong, et.al. 2001) iv. McMaster DHFR data set ( v. Compumine ERalpha, MMP3, AChE and fXa data sets ( Docking and scoring test sets (experimental structures and binding affinities) i. CCDC/Astex validation test set: 308 crystal complexes ( ii. X-CScore dock set: 100 crystal complexes and binding affinities (wang, et al. 2003)

Suggestion  What is the first and possibly major second putative major principal component that if fixed would make the enrichment better?  For each improvement that could be made, your estimate of what should be done, how much effort, likelihood of improvement.  Closely look at the active site residues (ionization and protonation states), use top decoy compounds to identify the residues that contribute to overestimation of the docking energy.

Acknowledgement  Shoichet  Jacobson  Ursula & Sali