Faculté de Chimie, ULP, Strasbourg, FRANCE

Faculté de Chimie, ULP, Strasbourg, FRANCE
Master Chemoinfo Criblage virtuel Alexandre Varnek Faculté de Chimie, ULP, Strasbourg, FRANCE

Small Library of selected hits
experimental computational Virtual Screening Filtering, QSAR, Docking Small Library of selected hits High Throughout Screening Hit Target Protein Large libraries of molecules

Virtual screening must be fast and reliable
Chemical universe: 10200 molecules 1060 druglike molecules Virtual screening must be fast and reliable Molecules are considered as vectors in multidimentional chemical space defined by the descriptors 3

Candidat au développement
Criblage à haut débit Cible HTS Criblage à haut débit High-throughput screening Hits Lead Génomique Analyse de données Optimisation Candidat au développement

Drug Discovery and ADME/Tox studies should be performed in parallel
idea target combichem/HTS hit lead candidate drug ADME/Tox studies

Methodologies of a virtual screening
from A.R. Leach, V.J. Gillet “An Introduction to Chemoinformatics”, Kluwer Academic Publisher, 2003

Platform for Ligand Based Virtual Screening
~106 – 109 molecules Filters Similarity search ~103 - – 104 molecules QSAR models Candidates for docking or experimental tests 7

Criblage à haut débit (HTS)
Mots clés: - Chimie combinatoire Criblage à haut débit (High Throughput Screening (HTS)) - Screening virtuel - Aspect Drug-like - Training sets jusqu’à composés

Virtual Screening Molecules available for screening (1) Real molecules
1 - 2 millions in in-house archives of large pharma and agrochemical companies 3 - 4 millions of samples available commercially (2) Hypothetical molecules Virtual combinatorial libraries (up to 1060 molecules)

Methods of virtual High-Throughput Screening
Filters Similarity search Classification and regression structure – property models Docking

Filters to estimate “drug-likeness”

Lipinski rules for intestinal absorption (« Rules of 5 »)
H-bond donors < 5 (the sum of OH and NH groups); MWT < 500; LogP < 5 H-bond acceptors < (the sum of N and O atoms without H attached).

Lipinski rules for drug-like molecules (« Rules of 5 »)

Example of different filters:
Rules for Absorbable compounds It is quite interesting to compare our permeability model to the Lipinski’s and Veber’s rules. All three models are described by similar parameters. The following table shows the maximum cut-off values for absorbable compounds in our data set. In bold we show cut-off values of Lipinski’s and Veber’s rules. By comparing these three columns we can see that in most cases the cut-off values of Lipinski’s and Veber’s rules have been exceeded by 100 percent. This observation has a dual explanation. First, all three models dealt with quite different biological phenomena. Lipinski analyzed compounds that reached the second phase of clinical trials. Veber analyzed oral bioavailability in rats (which is affected by metabolism to a much greater extend than HIA). Whereas we analyzed HIA. The second explanation is that all models have been derived using quite different analytical tools. We used C-SAR analysis that automatically considers a large variety of possible causes that determine poor permeability. Lipinski and Veber used conventional data mining techniques.

Remove compounds containing too many rings

Remove compounds with toxic groups

Remove compounds with reactive groups

Remove False-Positive Hits

Remove poorly soluble compounds

Filter on inorganic and heteroatom compounds

Remove compounds with multiple chiral centers

Paclitaxel (Taxol): violation of 2 rules
MW = 837 logP=4.49 HD = 3 HA = 15

logD vs logP 95% of all drugs are ionizable : 75% are bases and 20% acids Utilizing pH dependent log D as a descriptor for lipophilicity in place of log P significantly increases the number of compounds correctly identified as drug-like using the drug-likeness filter: log D5.5 < 5 The Rule of Five Revisited: Applying Log D in Place of Log P in Drug-Likeness Filters S. K. Bhal, K. Kassam, I. G. Peirson, and G. M. Pearl , MOLECULAR PHARMACEUTICS, v.4, , (2007)

Synthetic Accessibility
is proportional to fragment’s occurrence in the PubChem database Ertl and Schuffenhauer Journal of Cheminformatics :8

Frequency distribution of fragments Altogether 605,864 different fragment types have been obtained by fragmenting the PubChem structures. Most of them (51%), however are singletons (present only once in the whole set). Only a relatively small number of fragments, namely 3759 (0.62%), are frequent (i.e. present more than 1000-times in the database). Ertl and Schuffenhauer Journal of Cheminformatics :8

The most common fragments present in the million PubChem molecules. The "A" represents any non-hydrogen atom, "dashed" double bond indicates an aromatic bond and the yellow circle marks the central atom of the fragment. Ertl and Schuffenhauer Journal of Cheminformatics :8

Distribution of (- Sascore) for natural products, bioactive molecules and molecules from catalogues. Correlation of calculated (-SAscore ) and average chemist estimation for 40 molecules (r2 = 0.890) Ertl and Schuffenhauer Journal of Cheminformatics :8

Similarity Search: unsupervised and supervised approaches 29

2d (unsupervised) Similarity Search
Tanimoto coef Recherche par similarité; comparaison des clés structurales; molecular fingerprints 30

Contineous and Discontineous SAR

Structural Spectrum of Thrombin Inhibitors
structural similarity “fading away” … reference compounds 0.56 0.72 0.53 0.84 0.67 0.52 0.82 0.64 0.39

R. Guha et al. J.Chem.Inf.Mod., 2008, 48, 646
discontinuous SARs continuous SARs gradual changes in structure result in moderate changes in activity “rolling hills” (G. Maggiora) small changes in structure have dramatic effects on activity “cliffs” in activity landscapes Structure-Activity Landscape Index: SALIij = DAij / DSij DAij (DSij ) is the difference between activities (similarities) of molecules i and j R. Guha et al. J.Chem.Inf.Mod., 2008, 48, 646

discontinuous SARs VEGFR-2 tyrosine kinase inhibitors
MACCSTc: 1.00 Analog 6 nM 2390 nM small changes in structure have dramatic effects on activity “cliffs” in activity landscapes lead optimization, QSAR bad news for molecular similarity analysis...

Example of a “Classical” Discontinuous SAR
Any similarity method must recognize these compounds as being “similar“ ... (MACCS Tanimoto similarity) Adenosine deaminase inhibitors

Supervised Molecular Similarity Analysis

Dynamic Mapping of Consensus Positions
Prototypic “mapping algorithm” for simplified binary-transformed* descriptor spaces Uses known active compounds to create activity-dependent consensus positions in chemical space Operates in descriptor spaces of step-wise increasing dimensionality (“dimension extension”) Selects preferred descriptors from large pools * median-based, i.e. assign “1” to a descriptor if its value is greater than (or equal to) its screening database median; assign “0” if it is smaller Godden et al. & Bajorath. J Chem Inf Comput Sci 44, 21 (2004)

DMC Algorithm Calculate and binary transform descriptors
Descriptor bit strings for reference molecules DMC Algorithm Calculate and binary transform descriptors Compare descriptor bit strings of reference molecules and determine consensus bits Select DB compounds matching consensus bits Re-generate bit strings permitting bit variability Select DB compounds matching extended bit strings Repeat until a small selection set is obtained … Calculate consensus bit string: = 1.0 or = 0.0 no variability 1. Dimension extension: ³ 0.9 or £ 0.1 10% variability 2. Dimension extension: ³ 0.8 or £ 0.2 20% variability (white “0”, black “1” gray, variably set bits) 1 2 e.g. 0%, 10%, 20% permitted bit variability: longer bit strings – fewer matching DB compounds

QSAR/QSPR models 40

Screening and hits selection
Database Virtual Sreening QSPR model Experimental Tests Hits Useless compounds 41

Libraries profiling: indexing a database by simultaneous assessment of various activities
Example: PASS software (Prediction of Activity Spectra for Substances)

For each fragment i

PASS Naïve Bayes estimator Calculations of « P(act) » and « P(inact) »
Molecule is considered as active if P(act) > P(inact) or/and P(act) > 0.7

Quantitative Structure-Property Relationships
(QSPR) Y = f (Structure) = f (descriptors) QSPR restricts reliable predictions for compounds which are similar to those used for the obtaining the models. Similarity / pharmacophore search approaches are still inevitable as complementary tools

Combinatorial Library Design

... when target structure is unknown
Virtual Screening ... when target structure is unknown Virtual library Screening library Diverse Subset Parallel synthesis or synthesis of single compounds Design of focussed library Screening HTS Hits

Generation of Virtual Combinatorial Libraries
Fragment Marking approach Markush structure if R1, R2, R3 = and then

The types of variation in Markush structures:
Substituent variation (R1) Position variation (R2) Frequency variation Homology variation (R3) (only for patent search) n = 1 – 3 R2 =NH2 R3 = alkyl or heterocycle R1 = Me, Et, Pr

Generation of Virtual Combinatorial Libraries
Reaction transform approach from A.R. Leach, V.J. Gillet “An Introduction to Chemoinformatics”, Kluwer Academic Publisher, 2003

Issues and Concepts in Combinatorial Library Design
Size of the library Coverage of properties („chemical space“) Diversity, Similarity, Redundancy Descriptor validation Subset selection from virtual libraries

Hot topics in chemoinformatics
Predictions vs interpretation New approaches in structure-property modeling descriptors, applicability domain machine-learning methods (inductive learning transfer, semi-supervised learning, ....) New techniques to mine chemical reactions Schematiquement, QSAR of complex systems multi-component synergistic mixtures, new materials, metabolic pathways, ... Public availability of chemoinformatics tools

Predictions vs interpretation
Nathan BROWN “Chemoinformatics—An Introduction for Computer Scientists” ACM Computing Surveys, Vol. 41, No. 2, Article 8, February 2009

Predictions vs interpretation
Problems : Ensemble modeling Non-linear machine-learning methods (SVM, NN, …) Descriptors correlations What do end users expect from QSAR models ? Reliable estimation (prediction) of the given property.

Public accessibility of models:
WEB based platform for virtual screening Schematiquement,

Some Screen Shots: Welcome Page…

ISIDA property prediction WEB server
infochim.u-strasbg.fr/webserv/VSEngine.html

ISIDA ScreenDB tools only INTERNET browser is required
only INTERNET browser is required Different descriptors (ISIDA fragments, FPT, ChemAxon) Similarity search with different metrics (Tanimoto, Dice, …) ensemble modeling approach (simulteneous application of several models) models applicability domain (automatic detection of useless models)

The most fundamental and lasting objective of synthesis is not production of new compounds but production of properties George S. Hammond Norris Award Lecture, 1968 59

Faculté de Chimie, ULP, Strasbourg, FRANCE

Similar presentations

Presentation on theme: "Faculté de Chimie, ULP, Strasbourg, FRANCE"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Faculté de Chimie, ULP, Strasbourg, FRANCE

Similar presentations

Presentation on theme: "Faculté de Chimie, ULP, Strasbourg, FRANCE"— Presentation transcript:

Similar presentations

About project

Feedback