Design of a Compound Screening Collection Gavin Harper Cheminformatics, Stevenage.

Slides:



Advertisements
Similar presentations
ChemAxon in 3D Gábor Imre, Adrián Kalászi and Miklós Vargyas Solutions for Cheminformatics.
Advertisements

Using the Optimizer to Generate an Effective Regression Suite: A First Step Murali M. Krishna Presented by Harumi Kuno HP.
Monte Carlo Methods and Statistical Physics
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
A Multiobjective Approach to Combinatorial Library Design Val Gillet University of Sheffield, UK.
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
Chemical Diversity Qualify and/or quantify the extent of variety within a set of compounds. Try to define the extent of chemical space. In combinatorial.
The Rate of Convergence of AdaBoost Indraneel Mukherjee Cynthia Rudin Rob Schapire.
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
Iterative Optimization of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence.
A Study on Feature Selection for Toxicity Prediction*
No Free Lunch (NFL) Theorem Many slides are based on a presentation of Y.C. Ho Presentation by Kristian Nolde.
Clustering Color/Intensity
Clustering.
Code and Decoder Design of LDPC Codes for Gbps Systems Jeremy Thorpe Presented to: Microsoft Research
Hardness-Aware Restart Policies Yongshao Ruan, Eric Horvitz, & Henry Kautz IJCAI 2003 Workshop on Stochastic Search.
Impact Evaluation Session VII Sampling and Power Jishnu Das November 2006.
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
Radial Basis Function Networks
Correlation & Regression
PARTICIPATING IN EX APPOINTMENT PROCESSES
COLLECTING QUANTITATIVE DATA: Sampling and Data collection
Combinatorial Chemistry and Library Design
Sampling Theory Determining the distribution of Sample statistics.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
EXPLORING CHEMICAL SPACE FOR DRUG DISCOVERY Daniel Svozil Laboratory of Informatics and Chemistry.
Daniel Brown. D9.1 Discuss the use of a compound library in drug design. Traditionally, a large collection of related compounds are synthesized individually.
Discrete Mathematical Structures (Counting Principles)
EFFICIENT CHARACTERIZATION OF UNCERTAINTY IN CONTROL STRATEGY IMPACT PREDICTIONS EFFICIENT CHARACTERIZATION OF UNCERTAINTY IN CONTROL STRATEGY IMPACT PREDICTIONS.
CS CM124/224 & HG CM124/224 DISCUSSION SECTION (JUN 6, 2013) TA: Farhad Hormozdiari.
Variables, sampling, and sample size. Overview  Variables  Types of variables  Sampling  Types of samples  Why specific sampling methods are used.
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
SAR vs QSAR or “is QSAR different from SAR”
Chapter 7 Probability and Samples: The Distribution of Sample Means.
2005MEE Software Engineering Lecture 11 – Optimisation Techniques.
PHARMACOECONOMIC EVALUATIONS & METHODS MARKOV MODELING IN DECISION ANALYSIS FROM THE PHARMACOECONOMICS ON THE INTERNET ®SERIES ©Paul C Langley 2004 Maimon.
Clustering and Testing in High- Dimensional Data M. Radavičius, G. Jakimauskas, J. Sušinskas (Institute of Mathematics and Informatics, Vilnius, Lithuania)
Advanced Decision Architectures Collaborative Technology Alliance An Interactive Decision Support Architecture for Visualizing Robust Solutions in High-Risk.
Chapter 4: Variability. Variability Provides a quantitative measure of the degree to which scores in a distribution are spread out or clustered together.
The Case Against Test Cases
Chapter 7 Sampling Distributions. Sampling Distribution of the Mean Inferential statistics –conclusions about population Distributions –if you examined.
In memory of Rich Green An Outstanding Medicinal Chemist and Colleague.
Selecting Diverse Sets of Compounds C371 Fall 2004.
Flat clustering approaches
CZ5225: Modeling and Simulation in Biology Lecture 7, Microarray Class Classification by Machine learning Methods Prof. Chen Yu Zong Tel:
Jumpin’ Jack Flash It’s a gas gas gas! Solids, Liquids and Gases and Gas Laws Chapter 7.
Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
The Design of Statistical Specifications for a Test Mark D. Reckase Michigan State University.
Correlation Clustering Nikhil Bansal Joint Work with Avrim Blum and Shuchi Chawla.
Quick summary One-dimensional vertical (quality) differentiation model is extended to two dimensions Use to analyze product and price competition Two.
Artificial Intelligence in Game Design Lecture 20: Hill Climbing and N-Grams.
Identification of structurally diverse Growth Hormone Secretagogue (GHS) agonists by virtual screening and structure-activity relationship analysis of.
Sampling Theory Determining the distribution of Sample statistics.
Computational Approach for Combinatorial Library Design Journal club-1 Sushil Kumar Singh IBAB, Bangalore.
Balancing on Three Legs: The Tension Between Aligning to Standards, Predicting High-Stakes Outcomes, and Being Sensitive to Growth Julie Alonzo, Joe Nese,
Introduction to emulators Tony O’Hagan University of Sheffield.
Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.
Page 1 Computer-aided Drug Design —Profacgen. Page 2 The most fundamental goal in the drug design process is to determine whether a given compound will.
Toxicity vs CHEMICAL space
Differential Methylation Analysis
Combinatorial Library Design Using a Multiobjective Genetic Algorithm
Random Testing: Theoretical Results and Practical Implications IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 2012 Andrea Arcuri, Member, IEEE, Muhammad.
Selcia Fragment Library
Student Activity 1: Fair trials with two dice
Software Reliability Models.
Statistics 2 for Chemical Engineering lecture 5
Presentation transcript:

Design of a Compound Screening Collection Gavin Harper Cheminformatics, Stevenage

In the Past...  Scientists chose what molecules to make  They tested the molecules for relevant activity

Now...  We often screen a whole corporate collection – compounds  But we choose what’s in the collection  If the collection doesn’t have the right molecules in it –we fail

“Screen MORE”  Everything’ll be fine  We’ll find lots of hits  Not borne out by our experience

How do I design a collection? - 1  Pick the right kind of molecules –hits similar biological targets –computational (in-silico) model predicts activity at right kind of target for given class of molecules –exclude molecules that fail simple chemical or property filters known to be important for “drugs”  FOCUS!

How do I design a collection? - 2  Cover all the options  Pick as “diverse” a set of molecules as possible  If there’s an active region of chemical space, we should have it covered  DIVERSE SELECTION –opposite extreme to focused selection

Basic Idea of Our Model  Relate biological similarity to chemical similarity  Use a realistic objective –maximize number of lead series found in HTS  Build a mathematical model on minimal assumptions  How does our collection perform now in HTS? –relate this to our model  Learn what we need to make/purchase for HTS to find more leads

A “simple” model  Chemical space is clustered (partitioned) –there are various possible ways to do this  For a given screen, each cluster i has –a probability  i that it contains a lead  If we sample a random compound from a cluster containing a lead, the compound has –a probability  i that it shows up as a hit in the screen  If we find a hit in the cluster, that’s enough to get us to the lead

And in pictures... clusters containing leads  i = Pr(box i is orange)

 i = Pr(dot is green) Hit Non-Hit Lead

Constrained Optimization Problem  Suppose that we want to construct a screening collection of fixed size M  To maximize expected number of lead series found we have to

Solution  If we know very little (  i,  i equal for all i) –select the same number from each cluster - diversity solution  If e.g. we know some clusters are far more likely than others to contain leads for a target –select compounds only from these clusters - focused solution (filters)  But we also have a solution for all the situations in between, where there is a balance between diversity and focus

Immediate Impact  Improved “diversity” score  Use in assessing collections for acquisition  We have integrated this score into our Multi-Objective Library Design Package * Gillett et al., J. Chem. Inf. Comp. Sci. 2002, 42,

The Waldman Criteria (Sheffield, 1998) 1.Adding redundant molecules to a system does not change its diversity. 2.Adding non-redundant molecules to a system always increases its diversity. 3.Space-filling behaviour of diversity space should be preferred. 4. Perfect (i.e. infinite) filling of a finite descriptor space should result in a finite value for the diversity function. 5.If the dissimilarity or distance of one molecule to all others is increased, the diversity of the system should increase. However, as this distance increases to infinity, the diversity should asymptotically approach a constant value. * Waldman et al., J. Mol. Graphics Modell. 2000, 18,

What value should  take?  Determining a value of  is important. We can cluster molecules using a variety of methods.  Fortunately, there is a recent paper from Abbott which answers this question  In 115 HTS assays, with a TIGHT 2-D clustering –consistent: mostly varies between 0.2 and 0.4  This agrees well with our experience  In practice we will use this (Taylor-Butina) clustering with radius 0.85 and using Daylight fingerprints * Martin et al., J. Med. Chem. 2002, 45,

How Difficult Are Typical Screens? Probability of one-or-more leads Screen Difficulty

Improving compound acquisition  For a sample to be eligible to form part of the collection  It has to be of a minimum purity –determined by the QA project  It has to pass a set of agreed in silico filters –good starting points –developability  Multiple lead series per screen –Multiple chemotypes => 2D representation –Collection model provides rationale and design guidelines  Leads for all targets –3D Pharmacophore coverage

Probability of finding a lead for different acquisition strategies

Why don’t far more screens find leads as collections get bigger? performance with a collection of singletons

The model assists with a wide range of strategic questions  What contribution did compounds from different sources make last year?  How big should a focused set of compounds for a biological system be? –what should be in it?  How many compounds that fail the default filters should I include in the corporate collection?  A secure framework for investigating a wide range of strategic questions  Also easy to check sensitivity to changes in parameters

The model generates new challenges  There appear to be fundamental limits in the performance of screening  Can I find a better way of describing molecules? –can I increase  i but keep  i at a similar level?  Are there assumptions in the model that I could breach? –model assumes I can only find leads from “hits” in the same cluster –how good can I get at jumping BETWEEN clusters from original hit compounds?  Can I get better at identifying high  i clusters? –improve modelling and estimation of probability values

Acknowledgements  Stephen Pickett  Darren Green  Jameed Hussain  Andrew Leach  Andy Whittington * Harper et al., Combinatorial Chemistry and High Throughput Screening 2004, 7,