Why multiple scoring functions can improve docking performance - Testing hypotheses for rescoring success Noel M. O’Boyle, John W. Liebeschuetz and Jason.

Slides:



Advertisements
Similar presentations
SP 225 Lecture 11 Introduction to Hypothesis Testing.
Advertisements

Christopher Reynolds Supervisor: Prof. Michael Sternberg Bioinformatics Department Division of Molecular Biosciences Imperial College London.
Functional Site Prediction Selects Correct Protein Models Vijayalakshmi Chelliah Division of Mathematical Biology National Institute.
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
Protein Structure Prediction using ROSETTA
Improving enrichment rates A practical solution to an impractical problem Noel O’Boyle Cambridge Crystallographic Data Centre
Bioinformatics Vol. 21 no (Pages ) Reporter: Yu Lun Kuo (D )
Computational Drug Design Apr 2010 Postgrad course on Comp Chem Noel M. O’Boyle.
Critical Thinking.
Molecular dynamics refinement and rescoring in WISDOM virtual screenings Gianluca Degliesposti University of Modena and Reggio Emilia Molecular Modelling.
Becoming Acquainted With Statistical Concepts CHAPTER CHAPTER 12.
Why multiple scoring functions can improve docking performance Testing hypotheses for rescoring success Noel O’Boyle, John Liebeschuetz,
Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.
Docking Algorithm Scheme Part 1: Molecular shape representation Part 2: Matching of critical features Part 3: Filtering and scoring of candidate transformations.
Ensemble Results of PIM1 PIM PIM Ensemble Results of GSK3 GSK GSK GSK
Jeopardy! One-Way ANOVA Correlation & Regression Plots.
BL5203: Molecular Recognition & Interaction Lecture 5: Drug Design Methods Ligand-Protein Docking (Part I) Prof. Chen Yu Zong Tel:
Continuous Random Variables and Probability Distributions
Analysis of Variance & Multivariate Analysis of Variance
Variance and Standard Deviation. Variance: a measure of how data points differ from the mean Data Set 1: 3, 5, 7, 10, 10 Data Set 2: 7, 7, 7, 7, 7 What.
Comparative Evaluation of 11 Scoring Functions for Molekular Docking Authors: Renxiao Wang, Yipin Lu and Shaomeng Wang Presented by Florian Lenz.
eHiTS Score Darryl Reid, Zsolt Zsoldos, Bashir S. Sadjad, Aniko Simon, The next stage in scoring function evolution: a new statistically.
Copyright © 2012 by Nelson Education Limited. Chapter 8 Hypothesis Testing II: The Two-Sample Case 8-1.
ClusPro: an automated docking and discrimination method for the prediction of protein complexes Stephen R. Comeau, David W.Gatchell, Sandor Vajda, and.
Chapter 8 Introduction to Hypothesis Testing
Modelling binding site with 3DLigandSite Mark Wass
Two-way ANOVA Introduction to Factorial Designs and their Analysis.
In silico discovery of inhibitors using structure-based approaches Jasmita Gill Structural and Computational Biology Group, ICGEB, New Delhi Nov 2005.
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
SimBioSys Inc.© Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package.
A two-state homology model of the hERG K + channel: application to ligand binding Ramkumar Rajamani, Brett Tongue, Jian Li, Charles H. Reynolds J & J PRD.
Analysis of Variance (ANOVA) Brian Healy, PhD BIO203.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
1/20 Study of Highly Accurate and Fast Protein-Ligand Docking Method Based on Molecular Dynamics Reporter: Yu Lun Kuo
Introduction to Hypothesis Testing: the z test. Testing a hypothesis about SAT Scores (p210) Standard error of the mean Normal curve Finding Boundaries.
IMPROVED RECONSTRUCTION OF IN SILICO GENE REGULATORY NETWORKS BY INTEGRATING KNOCKOUT AND PERTURBATION DATA Yip, K. Y., Alexander, R. P., Yan, K. K., &
R L R L L L R R L L R R L L water DOCKING SIMULATIONS.
T tests comparing two means t tests comparing two means.
Internal assessment, Results, Discussion, and Format By Mr Daniel Hansson.
Averages If 10 children take a test out of a possible 20 marks, and the mean average score is 12, investigate what their scores could have been. What if.
Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine Ajay N. Jain UCSF Cancer Research Institute and Comprehensive.
T tests comparing two means t tests comparing two means.
Lesson 3 Scientific Inquiry.
Network Motifs See some examples of motifs and their functionality Discuss a study that showed how a miRNA also can be integrated into motifs Today’s plan.
The National Centre for Sensor Research Density functional theory investigation of ruthenium polypyridyl complexes incorporating 1,2,4-triazole Introduction.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
A new protein-protein docking scoring function based on interface residue properties Reporter: Yu Lun Kuo (D )
Scoring the Technical Evaluation Maximum possible score
Data analysis Research methods.
Spearman’s Rank Correlation Test
Formation of relationships Matching Hypothesis
Volume 19, Issue 8, Pages (August 2011)
Virtual Screening.
Prediction of inhibitory activities of Hsp90 inhibitors
More About ANOVA BPS 7e Chapter 30 © 2015 W. H. Freeman and Company.
Chapter 11: Introduction to Hypothesis Testing Lecture 5b
Machine Learning to Predict Experimental Protein-Ligand Complexes
AnchorDock: Blind and Flexible Anchor-Driven Peptide Docking
Reporter: Yu Lun Kuo (D )
The Rank-Sum Test Section 15.2.
Anastasia Baryshnikova  Cell Systems 
Michel A. Cuendet, Olivier Michielin  Biophysical Journal 
Complementarity of Structure Ensembles in Protein-Protein Binding
What is Biology? The science of living organisms and life processes, including the study of structure, functioning, growth, origin, evolution and distribution.
Volume 20, Issue 6, Pages (June 2012)
Ligand Binding to the Voltage-Gated Kv1
Volume 19, Issue 8, Pages (August 2011)
1 Chapter 8: Introduction to Hypothesis Testing. 2 Hypothesis Testing The general goal of a hypothesis test is to rule out chance (sampling error) as.
Algebra 2/Trig Name: ________________________________
Michel A. Cuendet, Olivier Michielin  Biophysical Journal 
Presentation transcript:

Why multiple scoring functions can improve docking performance - Testing hypotheses for rescoring success Noel M. O’Boyle, John W. Liebeschuetz and Jason C. Cole. Cambridge Crystallographic Data Centre, Cambridge, UK. Web: 1Hartshorn, M. J.; Verdonk, M. L.; Chessari, G.; Brewerton, S. C.; Mooij, W. T. M.; Mortenson, P. N.; Murray, C. W. J. Med. Chem. 2007, 50, Hypothesis 1: Rescoring success is driven by a consensus effect Introduction Hypothesis 2: Rescoring success is due to complementary strengths Conclusions and Future Work Does rescoring work by eliminating false positives? That is, does it work because an active is likely to be ranked highly only if it is ranked highly by both scoring functions? This is the reason for success in consensus scoring (combining multiple rescore values), but does it also hold true for rescoring itself? If true, then swapping the order of scoring and rescoring functions should have little effect. However, this is not the case (compare CS rescored with GS and vice versa in Table 1). The scores from the initial scoring function serve only to filter out all but the top ten poses. For a pose to score highly in the end, it must score highly according to the rescoring function. Pairwise correlations support this: all of the correlations above 0.60 are associated with pairs of experiments that involve the same function used for the final scoring. Overall, Hypothesis 2 appears to be the principal reason for success in rescoring. We are currently investigating the best scoring or rescoring protocols for a wide range of protein targets. These will be made available as template settings in GOLD. Eliminating unfavorable interactions with ASP A knowledge-based potential such as ASP incorporates information on the distance distribution of protein-ligand interactions. As a result, ASP can be used to score each atom in a docked pose (resulting from GS or CS) and mark it as un/favorable. Initial results show that this can be used to improve pose quality, but not virtual screening results (not enough unfavorable interactions observed). When using protein-ligand docking software for virtual screening, a different scoring function may be used to rank the docked poses than is used during the docking process itself. This is referred to as rescoring (Scheme 1). Rescoring can improve enrichment rates compared to docking alone, but the underlying reasons have not been studied to date. Here we propose two hypotheses, and test them using the 85 protein-ligand complexes in Astex Diverse Set [1] and 99 physicochemically-similar decoys per ligand. The scoring functions used were ChemScore (CS), GoldScore (GS) and ASP in GOLD. This hypothesis proposes that rescoring works when the docking function is good at scoring different poses of the same molecule, and when the rescoring function is good at relative scoring of different molecules. Table 1 and Figure 1 show that CS, GS and ASP are all equally capable of pose prediction; however, CS performs much poorer on average in ranking the active. According to this hypothesis, CS should not be used as the rescoring function, but any of the scoring functions could be used for the initial docking. This is consistent with the results in Table 1, where rescoring with CS reduces performance (on average), while the best performance overall is obtained when CS poses are rescored with GS. Table 1 – Scoring and rescoring performance. Standard deviation from 25 repetitions shown in parentheses. Median ranks for GS, CS and ASP are 2, 8 and 4, resp. Docking with scoring function A Poses and associated scores Same poses but with new scores Rescoring with scoring function B Protein structureMolecular library Scheme 1 – A rescoring experiment Docking (10 poses) Rescoring Mean rank of actives (1-100) No. of correct poses (out of 85) GS-8.9 (0.4)67.1 (1.3) GSCS15.8 (0.7)69.1 (2.1) GSASP9.5 (0.6)67.7 (1.8) CS-20.5 (0.7)68.2 (2.0) CSASP11.0 (0.8)67.1 (3.0) CSGS7.2 (0.8)68.2 (2.4) ASP-11.0 (0.5)65.4 (2.0) ASPGS7.9 (0.5)68.6 (2.6) ASPCS19.3 (0.9)65.0 (2.2) Figure 1 – (a) The number of actives placed in the top-ranked position. (b) Poses correctly predicted; that is, where the top-ranked pose is within 2.0 Å RMSD of the crystal structure.