Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Slides:



Advertisements
Similar presentations
1 Real World Chemistry Virtual discovery for the real world Joe Mernagh 19 May 2005.
Advertisements

Analysis of High-Throughput Screening Data C371 Fall 2004.
3D Molecular Structures C371 Fall Morgan Algorithm (Leach & Gillet, p. 8)
PharmaMiner: Geometric Mining of Pharmacophores 1.
Jürgen Sühnel Institute of Molecular Biotechnology, Jena Centre for Bioinformatics Jena / Germany Supplementary Material:
Lipinski’s rule of five
Faculty of Computer Science © 2006 CMPUT 605February 04, 2008 Novel Approaches for Small Bio-molecule Classification and Structural Similarity Search Karakoc.
Case Studies Class 5. Computational Chemistry Structure of molecules and their reactivities Two major areas –molecular mechanics –electronic structure.
Cheminformatics II Apr 2010 Postgrad course on Comp Chem Noel M. O’Boyle.
The smallest particles of matter are atoms. Atoms have a nucleus, with protons and neutrons as major components and electrons which orbit the nucleus.
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
Basic Steps of QSAR/QSPR Investigations
Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.
Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA) Martin Ott.
Life and Chemistry: Small Molecules
Organic Chemistry 4 th Edition Paula Yurkanis Bruice Chapter 1 Electronic Structure and Bonding Acids and Bases Irene Lee Case Western Reserve University.
Important Points in Drug Design based on Bioinformatics Tools History of Drug/Vaccine development –Plants or Natural Product Plant and Natural products.
Quantitative Structure-Activity Relationships (QSAR)  Attempts to identify and quantitate physicochemical properties of a drug in relation to its biological.
Predicting Highly Connected Proteins in PIN using QSAR Art Cherkasov Apr 14, 2011 UBC / VGH THE UNIVERSITY OF BRITISH COLUMBIA.
Computational Techniques in Support of Drug Discovery October 2, 2002 Jeffrey Wolbach, Ph. D.
Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.
Molecular Descriptors
1 InstantJChem: a flexible chemical database system G. Marcou, D. Horvath + Laboratoire d’infochimie, Université de Strasbourg, 1, rue Blaise Pascal,
Topological Summaries: Using Graphs for Chemical Searching and Mining Graphs are a flexible & unifying model Scalable similarity searches through novel.
Similarity Methods C371 Fall 2004.
Process Flowsheet Generation & Design Through a Group Contribution Approach Lo ï c d ’ Anterroches CAPEC Friday Morning Seminar, Spring 2005.
Chapter 5 The Periodic Law
Introduction to Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
The Nature of Molecules Chapter 2. 2 Atomic Structure All matter is composed of atoms. Understanding the structure of atoms is critical to understanding.
Faculté de Chimie, ULP, Strasbourg, FRANCE
Développement "IN SILICO" de nouveaux extractants et complexants de métaux Alexandre Varnek Laboratoire d’Infochimie, Université Louis Pasteur, Strasbourg,
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
Développement "IN SILICO" de nouveaux extractants et complexants de métaux Alexandre Varnek Laboratoire d’Infochimie, Université Louis Pasteur, Strasbourg,
High Throughput Screening of Materials (CCP9) Friday 20 th April 2012 CXD Workshop.
3D- QSAR. QSAR A QSAR is a mathematical relationship between a biological activity of a molecular system and its physicochemical parameters. QSAR attempts.
“Emergency discovery” of novel antimicrobials among known drugs in response to new and re-emerging infectious threats A. Cherkasov UBC / VGH Infectious.
Ligand-based drug discovery No a priori knowledge of the receptor What information can we get from a few active compounds.
1 Cheminformatics David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Paola Gramatica, Elena Bonfanti, Manuela Pavan and Federica Consolaro QSAR Research Unit, Department of Structural and Functional Biology, University of.
QSAR Study of HIV Protease Inhibitors Using Neural Network and Genetic Algorithm Akmal Aulia, 1 Sunil Kumar, 2 Rajni Garg, * 3 A. Srinivas Reddy, 4 1 Computational.
Introduction to atoms and molecules Chapter 2-1 – 2-5 Chapter 5-7 and 5-9 Chapter 4-5 – 4-6.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
Selecting Diverse Sets of Compounds C371 Fall 2004.
Développement "IN SILICO" de nouveaux complexants de métaux Alexandre Varnek Laboratoire d’Infochimie, Université Louis Pasteur, Strasbourg, FRANCE.
Role of Theory Model and understand catalytic processes at the electronic/atomistic level. This involves proposing atomic structures, suggesting reaction.
Introduction to Chemoinformatics and Drug Discovery Irene Kouskoumvekaki Associate Professor February 15 th, 2013.
CHAPTER 5 Electrons in Atoms. Development of Atomic Models Dalton – Remember atomic theory? – Atom considered indivisible Thomson – “plum pudding atom”
Développement "IN SILICO" de nouveaux extractants et complexants de métaux Alexandre Varnek Laboratoire d’Infochimie, Université Louis Pasteur, Strasbourg,
Use of Machine Learning in Chemoinformatics
Computational Approach for Combinatorial Library Design Journal club-1 Sushil Kumar Singh IBAB, Bangalore.
Molecular mechanics Classical physics, treats atoms as spheres Calculations are rapid, even for large molecules Useful for studying conformations Cannot.
Natural products from plants
Page 1 Computer-aided Drug Design —Profacgen. Page 2 The most fundamental goal in the drug design process is to determine whether a given compound will.
The Periodic Table.
Lipinski’s rule of five
The Nature of Molecules
Periodic Trends of the Elements
APPLICATIONS OF BIOINFORMATICS IN DRUG DISCOVERY
1.1 Atoms, Electrons, and Orbitals
Atoms and Atomic Theory
CHEMICAL BONDING AND COMPOUND FORMATION
Virtual Screening.
“Structure Based Drug Design for Antidiabetics”
An Introduction to Chemistry
Elements and the Periodic Table
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Presentation transcript:

Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Strasbourg Paris

Laboratory of Chemoinformatics Master on Chemoinformatics (since 2002)

Chemoinformatics: new disciline combining several „old“ fields Chemical databases, QSAR, Virtual screening, In silico design, ……………..

Needs for chemoinformatics Fundamentals of chemoinformatics Some applications OUTLOOK

Chemoinformatics: why

amount of information many millions of compounds and reactions many millions of publications Chemical Databases Storage, organization and search experimental data

May 2009September ,984,228 62,105,511 39,804, ,474 43,995, , M +2 M +22 M

Problem: Flood of Information > 54 million compounds > 5 million new compounds / year 800,000 publications / year => can anyone read publications / day ? chemical information should be well organized and searchable

Problem: Not Enough Information > 54,000,000 chemical compounds > 500,000 3D structures in Cambridge Crystallographic File 230,000 infrared spectra in largest database (Bio-Rad) > 1 % of all compounds 0.4 % of all compounds The goal of chemoinfomatics is to develop predictive approaches and tools What about physico-chemical and biological properties ?

Chemoinformatics as a modeling discipline

What structure do I need for a certain property ? How do I make this structure ? What is the product of my reaction ? Chemoinfomatics as a modeling discipline structure-activity relationships synthesis design reaction prediction, structure elucidation

Theoretical chemistry Quantum Chemistry Force Field Molecular Modelling Chemoinformatics - Molecular model - Basic concepts - Major applications - Learning approaches

Molecular Model Quantum Chemistry Force Field Molecular Modelling Chemoinformatics molecular graph descriptor vector electrons and nuclei atoms and bonds

Basic mathematical approaches Quantum Chemistry Force Field Molecular Modelling Chemoinformatics Schrödinger equation, HF, DFT, … Classical mechanics Statistical mechanics -Graph theory, -Statistical Learning Theory

Basic concepts Quantum Chemistry Force Field Molecular Modelling Chemoinformatics chemical space wave/particle dualism classical mechanics

Chemical space = objects + metrics Objects: - molecular graphs; - descriptors vectors {D i } = f ( ) Metrics: - Graphs hierarchy, - Similarity measures

Navigation in Chemical Space: topological space of chemical structures Relationships between the objects: Hierarchical scaffold-tree approach Structural mutation rules Network-like Similarity Graphs Combinatorial Analog Graphs …………. Rational organisation of structural data Exploration of the chemical space Identification of new objects (e.g., active scaffolds, R-groups combinations, etc)

Navigation in Chemical Space: vectorial space defined by molecular descriptors Relationships between the objects: In this space, each molecule is represented as a vector whereas the metric is defined by similarity measures. In properly selected spaces, neighboring molecules possess similar properties. Different databases could be compared. Compounds subsets for screening could be rationally selected

Physicochemical parameters can be broadly classiied into three general types: Electronic (  ) Steric (  E s ) Hydrophobic (logP) Example :Hansch Analysis Biological Activity = f (Physicochemical parameters ) + constant log1/C = a ( log P ) 2 + b log P +  +  E s + C

Constitutional(mol. weight, the number of S, N or O atoms, …) Topological(Randic index, informational content, …) Geometrical(molecular size, distances between functional groups, … ) Electrostatic(electrostatic potential, charges, …) Charged Partial Surface Area Quantum-chemical(energies of molecular orbitals, reactivity indices, …) Thermodynamical(heat of formation, logP, …) Fragments(sequences of atoms and bonds, augmented atoms, …) More than 4000 types of descriptors are known Molecular Descriptors

Learning approach Quantum Chemistry Force Field Molecular Modelling Chemoinformatics deductive >> inductive deductive  inductive deductive << inductive

In chemoinformatics the logic of learning is not based on existing physical theories. Chemoinformatics considers the world too complex to be a priori described by any set of rules. Thus, the rules (models) in chemoinformatics are not explicitly taken from rigorous physical models, but learned inductively from the data. Learning approach

Chemoinformatics: From Data to Knowledge know- ledge information data generalization context measurement or calculation deductive learning inductive learning

In chemoinformatics, a model represents an ensemble of rules or mathematical equation linking a given property (activity) with the molecular structure. Models PROPERTY= f (structure) Two main types of models: - binary classification (SAR) - regression (QSAR)

Organic chemistry: exercise of « intuitive » chemoinformatics

The Markovnikov Rule: When a Brønsted acid, HX, adds to an unsymmetrically substituted double bond, the acidic hydrogen of the acid bonds to that carbon of the double bond that has the greater number of hydrogen atoms already attached to it. Extraction of rules from the data

In silico design Chemical Databases Virtuel screening Major applications Structure-Activity Models Machine-learning approaches: - MLR, -Decision Trees, - Artificial Neural Networks, - Support Vector Machines, -……… Algorithms for organisation and search the data - fingerprints, - graph theory, - similarity measures,

Chemoinformatics: some applications

Dmitry Mendeleév (1834 – 1907) Russian chemist who arranged the 63 known elements into a periodic table based on atomic mass, which he published in Principles of Chemistry in Mendeléev left space for new elements, and predicted three yet-to-be-discovered elements: Ga (1875), Sc (1879) and Ge (1886). Discoverer of the Periodic Table — an early “Chemoinformatician ”

Periodic Table Chemical properties of elements gradually vary along the two axis

Target Protein Large libraries of molecules High Throughout Screening Hit experiment computations Virtual Screening Small Library of selected hits

Chemical universe: > 50 M compounds are currently available druglike molecules could be synthesised Virtual screening is inevitable to analyse a huge amount of protein-ligand combinations Virtual screening must be very fast and efficient ! Human proteome: peptides

~10 6 – 10 9 molecules VIRTUAL SCREENING INACTIVES HITS CHEMICAL DATABASE Virtual screening “funnel” Similarity search Filters (Q)SAR Docking Pharmacophore models ~10 1 – 10 3 molecules

REACh regulation The European Union adopted Regulation on the Registration, Evaluation, Authorisation, and Restriction of Chemicals (the “REACH Regulation”), which entered into force on June 1, REACH imposes requirements of information of physico-chemical, toxicology and eco-toxicology parameters for the chemicals, production of which exceeds 1 ton. More than compounds must be tested. Total cost estimated (EU Commission) over a year period is €2.8 - €5.2 bn No Data, No Market!

predictions of > 20 physico-chemical properties and NMR spectra for each individual compound Chemoinformatics tools in SciFinder:

Drug design

Virtual screening - what does it give us? Herbert Koppen (Boehringer, Germany) Current Opinion Drug Discovery & Dev. (2009) 12: From virtuality to reality Ulrich Rester (Bayer, Germany) Current Opinion Drug Discovery & Dev. (2008) 11: What has virtual screening ever done for drug discovery? David E Clark (Argenta Discovery Ltd, UK) Expert Opinion on Drug Discovery (2008) 8: Virtual screening: success stories & drugs

39 Market: tirofiban (1999) Aggrastat (trade name) from Merck, GP IIb/IIIa antagonist (myocardial infarction, it is an anticoagulant)) (2S)-2-(butylsulfonylamino)-3-[4-[4-(4-piperidyl)butoxy]phenyl propanoic acid (Mol. Mass: g/mol) PK data: Bioavailability: IV only (intravenous only); Half life : 2 hours Combined with heparin and aspirin, but numerous precautions In silico screening: success stories & drugs

Materials design

Ionic Liquids Ionic Liquids are composed of large organic cations: PF 6 -, Cl -, BF 4 -, CF 3 SO 3 -, [CF 3 SO 2 ) 2 N] - and anions:

There exist combinations of ions that could lead to useful ionic liquids Ionic Liquids Large organic cations: PF 6 -, Cl -, BF 4 -, CF 3 SO 3 -, [CF 3 SO 2 ) 2 N] - anions: 10 18

Viscosity predictions on 23 new ILs Solvionics company None of these Ionic Liquids have been used for model preparation

Ionic Liquids viscosity: Experimental validation of the Neural Networks models prediction error (~70 cP) is similar to the “noise” in the experimental data used for the training of the model exp pred G. Marcou, I. Billard, A. Ouadi and A. Varnek, submitted RMSE=73 cP

Metabolites prediction

Prediction of aromatic hydroxylation sites for human CYP1A2 substrates aromatic hydroxylation Potential hydroxylation sites CYP1A2 ? ? ? ? The obtained model correctly predicts the hydroxylation products with the probability of ≈80% (see poster of C. Muller) Method: SVM + descriptors issued from condensed graphs of reaction

Reaction conditions

Search of optimal reaction conditions reaction query Potential products of the reaction. The compound A is a target ABC + H 2

Experimental validation Sub A Conditions suggested by the program Expérimental validation catalystsolventadditifYield (Exp) 1Pt/C (10%)THFNone A : 98 % 2Pt/C (10%)DMFNone A : 90 %, Sub : 2% 3Ir/CaCO3 (5%)EtOHNEt3 (5 %) A : 100 % 4Ir/CaCO3 (5%)HexaneNone INSOLUBLE 5Ir/CaCO3 (5%)DMFNone A : 27%, Sub : 69 % + H 2 A. Varnek, in “Chemoinformatics and Computational Chemical Biology", J. Bajorath, Ed., Springer, 2010

Joseph Louis Gay-Lussac, Mémoires de la Société d ’Arcueil 2:207 (1808) « We are perhaps not far removed from the time when we shall be able to submit the bulk of chemical phenomena to calculation »

Visit our website :