Faculté de Chimie, ULP, Strasbourg, FRANCE

Slides:



Advertisements
Similar presentations
May, 2008 Presenting: Szabolcs Csepregi The ChemAxon Markush project overview and development discussion.
Advertisements

SOMA2 – Drug Design Environment. Drug design environment – SOMA2 The SOMA2 project Tekes (National Technology Agency of Finland) DRUG2000 program.
1 Real World Chemistry Virtual discovery for the real world Joe Mernagh 19 May 2005.
Analysis of High-Throughput Screening Data C371 Fall 2004.
1 Sequential Screening S. Stanley Young NISS HTS Workshop October 25, 2002.
3D Molecular Structures C371 Fall Morgan Algorithm (Leach & Gillet, p. 8)
A Multiobjective Approach to Combinatorial Library Design Val Gillet University of Sheffield, UK.
PharmaMiner: Geometric Mining of Pharmacophores 1.
Jürgen Sühnel Institute of Molecular Biotechnology, Jena Centre for Bioinformatics Jena / Germany Supplementary Material:
ABCD Flexsim-R: A new 3D descriptor for combinatorial library design and in-silico screening 2 nd Joint Sheffield Conference on Chemoinformatics: Computational.
What is similar?. Similarity and Diversity Alexandre Varnek, University of Strasbourg, France.
Lipinski’s rule of five
Establishing a Successful Virtual Screening Process Stephen Pickett Roche Discovery Welwyn.
Cheminformatics II Apr 2010 Postgrad course on Comp Chem Noel M. O’Boyle.
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
Luddite: An Information Theoretic Library Design Tool Jennifer L. Miller, Erin K. Bradley, and Steven L. Teig July 18, 2002.
Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.
Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA) Martin Ott.
Design of Small Molecule Drugs Targeted to RNA RNA Ontology Group May
Super fast identification and optimization of high quality drug candidates.
Chemoinformatics in Drug Design
Bioinformatics Ayesha M. Khan Spring Phylogenetic software PHYLIP l 2.
Structure-based Drug Design
Predicting Highly Connected Proteins in PIN using QSAR Art Cherkasov Apr 14, 2011 UBC / VGH THE UNIVERSITY OF BRITISH COLUMBIA.
Computational Techniques in Support of Drug Discovery October 2, 2002 Jeffrey Wolbach, Ph. D.
Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.
Molecular Descriptors
Functional groups / Pharmacological Activity
Combinatorial Chemistry and Library Design
Topological Summaries: Using Graphs for Chemical Searching and Mining Graphs are a flexible & unifying model Scalable similarity searches through novel.
Similarity Methods C371 Fall 2004.
VAMOS Visualization of Accessible Molecular Space A new compound filtering and selection interface Spotfire User Conference - Europe - May , 2003.
Introduction to Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
EXPLORING CHEMICAL SPACE FOR DRUG DISCOVERY Daniel Svozil Laboratory of Informatics and Chemistry.
Daniel Brown. D9.1 Discuss the use of a compound library in drug design. Traditionally, a large collection of related compounds are synthesized individually.
Genomics Research Institute University of Cincinnati Compound Library Wm. L. Seibel January 10, 2007.
Faculté de Chimie, ULP, Strasbourg, FRANCE
Développement "IN SILICO" de nouveaux extractants et complexants de métaux Alexandre Varnek Laboratoire d’Infochimie, Université Louis Pasteur, Strasbourg,
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
Développement "IN SILICO" de nouveaux extractants et complexants de métaux Alexandre Varnek Laboratoire d’Infochimie, Université Louis Pasteur, Strasbourg,
Quantitative Structure-Activity Relationships Quantitative Structure-Property-Relationships Alexandre Varnek Faculté de Chimie, ULP, Strasbourg, FRANCE.
Open source software and web services for designing therapeutic molecules G. P. S. Raghava, Head Bioinformatics Centre, Institute of Microbial Technology,
Identifying Applicability Domains for Quantitative Structure Property Relationships Mordechai Shacham a, Neima Brauner b Georgi St. Cholakov c and Roumiana.
In silico discovery of inhibitors using structure-based approaches Jasmita Gill Structural and Computational Biology Group, ICGEB, New Delhi Nov 2005.
Ligand-based drug discovery No a priori knowledge of the receptor What information can we get from a few active compounds.
1 Cheminformatics David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
QSAR Study of HIV Protease Inhibitors Using Neural Network and Genetic Algorithm Akmal Aulia, 1 Sunil Kumar, 2 Rajni Garg, * 3 A. Srinivas Reddy, 4 1 Computational.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
Selecting Diverse Sets of Compounds C371 Fall 2004.
Computer-aided drug discovery (CADD)/design methods have played a major role in the development of therapeutically important small molecules for several.
Introduction to Chemoinformatics and Drug Discovery Irene Kouskoumvekaki Associate Professor February 15 th, 2013.
Design of a Compound Screening Collection Gavin Harper Cheminformatics, Stevenage.
Développement "IN SILICO" de nouveaux extractants et complexants de métaux Alexandre Varnek Laboratoire d’Infochimie, Université Louis Pasteur, Strasbourg,
Use of Machine Learning in Chemoinformatics
Bioinformatics in Drug Design and Discovery Unit 2.
Identification of structurally diverse Growth Hormone Secretagogue (GHS) agonists by virtual screening and structure-activity relationship analysis of.
Computational Approach for Combinatorial Library Design Journal club-1 Sushil Kumar Singh IBAB, Bangalore.
Docking and Virtual Screening Using the BMI cluster
Molecular Modeling in Drug Discovery: an Overview
Julia Salas CS379a Aim of the Study To determine distinguishing features of orally administered drugs –Physical and structural features probed.
Indiana University School of Indiana University ECCR Summary Infrastructure: Cheminformatics web service infrastructure made available as a community resource.
Natural products from plants
Page 1 Computer-aided Drug Design —Profacgen. Page 2 The most fundamental goal in the drug design process is to determine whether a given compound will.
Lipinski’s rule of five
An Introduction to Medicinal Chemistry 3/e COMBINATORIAL CHEMISTRY
Selcia Fragment Library
APPLICATIONS OF BIOINFORMATICS IN DRUG DISCOVERY
Daylight and Discovery
Virtual Screening.
Current Status at BioChemtek
Presentation transcript:

Faculté de Chimie, ULP, Strasbourg, FRANCE Master Chemoinfo Criblage virtuel Alexandre Varnek Faculté de Chimie, ULP, Strasbourg, FRANCE

Small Library of selected hits experimental computational Virtual Screening Filtering, QSAR, Docking Small Library of selected hits High Throughout Screening Hit Target Protein Large libraries of molecules

Virtual screening must be fast and reliable Chemical universe: 10200 molecules 1060 druglike molecules Virtual screening must be fast and reliable Molecules are considered as vectors in multidimentional chemical space defined by the descriptors 3

Candidat au développement Criblage à haut débit Cible HTS Criblage à haut débit High-throughput screening Hits Lead Génomique Analyse de données Optimisation Candidat au développement

Drug Discovery and ADME/Tox studies should be performed in parallel idea target combichem/HTS hit lead candidate drug ADME/Tox studies

Methodologies of a virtual screening from A.R. Leach, V.J. Gillet “An Introduction to Chemoinformatics”, Kluwer Academic Publisher, 2003

Platform for Ligand Based Virtual Screening ~106 – 109 molecules Filters Similarity search ~103 - – 104 molecules QSAR models Candidates for docking or experimental tests 7

Criblage à haut débit (HTS) Mots clés: - Chimie combinatoire Criblage à haut débit (High Throughput Screening (HTS)) - Screening virtuel - Aspect Drug-like - Training sets jusqu’à 1000000 composés

Virtual Screening Molecules available for screening (1) Real molecules 1 - 2 millions in in-house archives of large pharma and agrochemical companies 3 - 4 millions of samples available commercially (2) Hypothetical molecules Virtual combinatorial libraries (up to 1060 molecules)

Methods of virtual High-Throughput Screening Filters Similarity search Classification and regression structure – property models Docking

Filters to estimate “drug-likeness”

Lipinski rules for intestinal absorption (« Rules of 5 ») H-bond donors < 5 (the sum of OH and NH groups); MWT < 500; LogP < 5 H-bond acceptors < 10 (the sum of N and O atoms without H attached).

Lipinski rules for drug-like molecules (« Rules of 5 »)

Lipinski rules for drug-like molecules (« Rules of 5 »)

Example of different filters: Rules for Absorbable compounds It is quite interesting to compare our permeability model to the Lipinski’s and Veber’s rules. All three models are described by similar parameters. The following table shows the maximum cut-off values for absorbable compounds in our data set. In bold we show cut-off values of Lipinski’s and Veber’s rules. By comparing these three columns we can see that in most cases the cut-off values of Lipinski’s and Veber’s rules have been exceeded by 100 percent. This observation has a dual explanation. First, all three models dealt with quite different biological phenomena. Lipinski analyzed compounds that reached the second phase of clinical trials. Veber analyzed oral bioavailability in rats (which is affected by metabolism to a much greater extend than HIA). Whereas we analyzed HIA. The second explanation is that all models have been derived using quite different analytical tools. We used C-SAR analysis that automatically considers a large variety of possible causes that determine poor permeability. Lipinski and Veber used conventional data mining techniques.

Remove compounds containing too many rings

Remove compounds with toxic groups

Remove compounds with reactive groups

Remove False-Positive Hits

Remove poorly soluble compounds

Filter on inorganic and heteroatom compounds

Remove compounds with multiple chiral centers

Paclitaxel (Taxol): violation of 2 rules MW = 837 logP=4.49 HD = 3 HA = 15

logD vs logP 95% of all drugs are ionizable : 75% are bases and 20% acids Utilizing pH dependent log D as a descriptor for lipophilicity in place of log P significantly increases the number of compounds correctly identified as drug-like using the drug-likeness filter: log D5.5 < 5 The Rule of Five Revisited: Applying Log D in Place of Log P in Drug-Likeness Filters S. K. Bhal, K. Kassam, I. G. Peirson, and G. M. Pearl , MOLECULAR PHARMACEUTICS, v.4, 556-560, (2007)

Synthetic Accessibility is proportional to fragment’s occurrence in the PubChem database Ertl and Schuffenhauer Journal of Cheminformatics 2009 1:8

Synthetic Accessibility Frequency distribution of fragments Altogether 605,864 different fragment types have been obtained by fragmenting the PubChem structures. Most of them (51%), however are singletons (present only once in the whole set). Only a relatively small number of fragments, namely 3759 (0.62%), are frequent (i.e. present more than 1000-times in the database). Ertl and Schuffenhauer Journal of Cheminformatics 2009 1:8

Synthetic Accessibility The most common fragments present in the million PubChem molecules. The "A" represents any non-hydrogen atom, "dashed" double bond indicates an aromatic bond and the yellow circle marks the central atom of the fragment. Ertl and Schuffenhauer Journal of Cheminformatics 2009 1:8

Synthetic Accessibility Distribution of (- Sascore) for natural products, bioactive molecules and molecules from catalogues. Correlation of calculated (-SAscore ) and average chemist estimation for 40 molecules (r2 = 0.890) Ertl and Schuffenhauer Journal of Cheminformatics 2009 1:8

Similarity Search: unsupervised and supervised approaches 29

2d (unsupervised) Similarity Search Tanimoto coef Recherche par similarité; comparaison des clés structurales; 1 0 1 0 0 0 1 0 0 1 1 1 0 1 1 0 1 0 1 0 0 1 0 0 0 1 0 0 1 1 1 0 1 1 0 1 0 1 molecular fingerprints 30

Contineous and Discontineous SAR

Structural Spectrum of Thrombin Inhibitors structural similarity “fading away” … reference compounds 0.56 0.72 0.53 0.84 0.67 0.52 0.82 0.64 0.39

R. Guha et al. J.Chem.Inf.Mod., 2008, 48, 646 discontinuous SARs continuous SARs gradual changes in structure result in moderate changes in activity “rolling hills” (G. Maggiora) small changes in structure have dramatic effects on activity “cliffs” in activity landscapes Structure-Activity Landscape Index: SALIij = DAij / DSij DAij (DSij ) is the difference between activities (similarities) of molecules i and j R. Guha et al. J.Chem.Inf.Mod., 2008, 48, 646

discontinuous SARs VEGFR-2 tyrosine kinase inhibitors MACCSTc: 1.00 Analog 6 nM 2390 nM small changes in structure have dramatic effects on activity “cliffs” in activity landscapes lead optimization, QSAR bad news for molecular similarity analysis...

Example of a “Classical” Discontinuous SAR Any similarity method must recognize these compounds as being “similar“ ... (MACCS Tanimoto similarity) Adenosine deaminase inhibitors

Supervised Molecular Similarity Analysis

Dynamic Mapping of Consensus Positions Prototypic “mapping algorithm” for simplified binary-transformed* descriptor spaces Uses known active compounds to create activity-dependent consensus positions in chemical space Operates in descriptor spaces of step-wise increasing dimensionality (“dimension extension”) Selects preferred descriptors from large pools * median-based, i.e. assign “1” to a descriptor if its value is greater than (or equal to) its screening database median; assign “0” if it is smaller Godden et al. & Bajorath. J Chem Inf Comput Sci 44, 21 (2004)

DMC Algorithm Calculate and binary transform descriptors Descriptor bit strings for reference molecules DMC Algorithm Calculate and binary transform descriptors Compare descriptor bit strings of reference molecules and determine consensus bits Select DB compounds matching consensus bits Re-generate bit strings permitting bit variability Select DB compounds matching extended bit strings Repeat until a small selection set is obtained … Calculate consensus bit string: = 1.0 or = 0.0 no variability 1. Dimension extension: ³ 0.9 or £ 0.1 10% variability 2. Dimension extension: ³ 0.8 or £ 0.2 20% variability (white “0”, black “1” gray, variably set bits) 1 2 e.g. 0%, 10%, 20% permitted bit variability: longer bit strings – fewer matching DB compounds

QSAR/QSPR models 40

Screening and hits selection Database Virtual Sreening QSPR model Experimental Tests Hits Useless compounds 41

Libraries profiling: indexing a database by simultaneous assessment of various activities Example: PASS software (Prediction of Activity Spectra for Substances)

For each fragment i

PASS Naïve Bayes estimator Calculations of « P(act) » and « P(inact) » Molecule is considered as active if  P(act) > P(inact) or/and   P(act) > 0.7

Quantitative Structure-Property Relationships (QSPR) Y = f (Structure) = f (descriptors) QSPR restricts reliable predictions for compounds which are similar to those used for the obtaining the models. Similarity / pharmacophore search approaches are still inevitable as complementary tools

Combinatorial Library Design

... when target structure is unknown Virtual Screening ... when target structure is unknown Virtual library Screening library Diverse Subset Parallel synthesis or synthesis of single compounds Design of focussed library Screening HTS Hits

Generation of Virtual Combinatorial Libraries Fragment Marking approach Markush structure if R1, R2, R3 = and then

The types of variation in Markush structures: Substituent variation (R1) Position variation (R2) Frequency variation Homology variation (R3) (only for patent search) n = 1 – 3 R2 =NH2 R3 = alkyl or heterocycle R1 = Me, Et, Pr

Generation of Virtual Combinatorial Libraries Reaction transform approach from A.R. Leach, V.J. Gillet “An Introduction to Chemoinformatics”, Kluwer Academic Publisher, 2003

Issues and Concepts in Combinatorial Library Design Size of the library Coverage of properties („chemical space“) Diversity, Similarity, Redundancy Descriptor validation Subset selection from virtual libraries

Hot topics in chemoinformatics Predictions vs interpretation New approaches in structure-property modeling descriptors, applicability domain machine-learning methods (inductive learning transfer, semi-supervised learning, ....) New techniques to mine chemical reactions Schematiquement, QSAR of complex systems multi-component synergistic mixtures, new materials, metabolic pathways, ... Public availability of chemoinformatics tools

Predictions vs interpretation Nathan BROWN “Chemoinformatics—An Introduction for Computer Scientists” ACM Computing Surveys, Vol. 41, No. 2, Article 8, February 2009

Predictions vs interpretation Problems : Ensemble modeling Non-linear machine-learning methods (SVM, NN, …) Descriptors correlations What do end users expect from QSAR models ? Reliable estimation (prediction) of the given property.

Public accessibility of models: WEB based platform for virtual screening Schematiquement,

Some Screen Shots: Welcome Page…

ISIDA property prediction WEB server infochim.u-strasbg.fr/webserv/VSEngine.html

ISIDA ScreenDB tools only INTERNET browser is required http://infochim.u-strasbg.fr/webserv/VSEngine.html only INTERNET browser is required Different descriptors (ISIDA fragments, FPT, ChemAxon) Similarity search with different metrics (Tanimoto, Dice, …) ensemble modeling approach (simulteneous application of several models) models applicability domain (automatic detection of useless models)

The most fundamental and lasting objective of synthesis is not production of new compounds but production of properties George S. Hammond Norris Award Lecture, 1968 59