Chemoinformatics in Drug Design Irene Kouskoumvekaki, Associate Professor, Computational Chemical Biology, CBS, DTU-Systems Biology Biological Sequence Analysis, May 6, 2011
Computational Chemical Biology group Tudor Oprea Guest Professor Olivier Taboureau Associate Professor Irene Kouskoumvekaki Associate Professor Sonny Kim Nielsen PhD student Kasper Jensen PhD student Ulrik Plesner master student
Word cloud
Definition: Chemoinformatics Gathering and systematic use of chemical information, and application of this information to predict the behavior of unknown compounds in silico. data prediction
Definition: A drug candidate… ... is a (ligand) compound that binds to a biological target (protein, enzyme, receptor, ...) and in this way either initiates a process (agonist) or inhibits it (antagonist) The structure/conformation of the ligand is complementary to the space defined by the protein’s active site The binding is caused by favorable interactions between the ligand and the side chains of the amino acids in the active site. (electrostatic interactions, hydrogen bonds, hydrophobic contacts...)
In vitro / In silico studies Drug Discovery Animal studies Disease Biological Target Drug candidate In vitro / In silico studies Clinical studies
The Drug Discovery Process Chemoinformatics
The Drug Discovery Process We identify/predict the binding pocket We know the structure of the biological target MKTAALAPLFFLPSALATTVYLA GDSTMAKNGGGSGTNGWGEYL ASYLSATVVNDAVAGRSAR…(etc) Challenge: To design an organic molecule that would bind strong enough to the biological target and modute it’s activity. New drug candidate
Example: – Alzheimer’s disease What is it? Alzheimer's is a disease that causes failure of brain functions and dementia. It starts with bad memory and disability to function in common everyday activities. How do you get it? Alzheimer's disease is the result of malfunctioning neurons at different parts of the brain. This, in turn, is due to an inbalance in the concentration of neurotranmitters.
Example: – Alzheimer’s disease How can we treat it? Acetylkolin neurotransmitter Drug against Alzheimer’s
Old School Drug discovery process HTS Follow-up Hit-to-lead Lead-to-drug Screening collection Lead series Drug candidate Actives Hits 106 cmp. 103 actives 1-10 hits 0-3 lead series 0-1 Clinical trials High rate of false positives !!!
Failures
in vitro in silico + in vitro 4/17/2017 Drug discovery in the 21st Century in vitro in silico + in vitro Diverse set of molecules tested in the lab Computational methods to select subsets (to be tested in the lab) based on prediction of drug-likeness, solubility, binding, pharmacokinetics, toxicity, side effects, ... In silico results can be used to make in vivo methods more efficient. If there are several thousand compounds available for testing, in silico methods are used to identify those that are most likely to be active and these would take priority for screening.
The Lipinski ‘rule of five’ for drug-likeness prediction Octanol-water partition coefficient (logP) ≤ 5 Molecular weight ≤ 500 # hydrogen bond acceptors (HBA) ≤ 10 # hydrogen bond donors (HBD) ≤ 5 If two or more of these rules are violated, the compound might have problems with oral bioavailability. (Lipinski et al., Adv. Drug Delivery Rev., 23, 1997, 3.)
Major Aspects of Chemoinformatics Experimental data Model generation Prediction for unknown compounds
Major Aspects of Chemoinformatics Information Acquisition and Management: Methods for collecting data (mainly experimental). Development of databases for storage and retrieval of information. Information Use: Data analysis, correlation and model building. Information Application: Prediction of molecular properties relevant to chemical and biochemical sciences.
Major Aspects of Chemoinformatics Information Acquisition and Management: Methods for collecting data (mainly experimental). Development of databases for storage and retrieval of information. Information Use: Data analysis, correlation and model building. Information Application: Prediction of molecular properties relevant to chemical and biochemical sciences.
Information Acquisition and Management
Small molecule databases One tricky thing when storing
Growth In PubChem Substances & Compounds Recent count: Substance: 72,156,631 Compound: 28,807,320 Rule of 5: 20,692,980 The PubChem Compound Database contains validated chemical depiction information that is provided to describe substances in PubChem Substance. The PubChem substance database contains chemical structures, synonyms, registration IDs, description, related urls, database cross-reference links to PubMed, protein 3D structures, and biological screening results. If the contents of a chemical sample are known, the description includes links to PubChem Compound.
Searching in PubChem
Structural representation of molecules
Major Aspects of Chemoinformatics Information Acquisition and Management: Methods for collecting data (mainly experimental). Development of databases for storage and retrieval of information. Information Use: Data analysis, correlation and model building. Information Application: Prediction of molecular properties relevant to chemical and biochemical sciences.
Beyond the Lipinski Rule of 5... Chemometrics: The application of mathematical or statistical methods to chemical data (simple, linear methods) e.g. Principal Component Analysis Machine Learning: The design and development of algorithms and techniques that allow computers to learn (complex, non-linear algorithms) e.g. Artificial Neural Networks, K-means clustering
Major Aspects of Chemoinformatics Information Acquisition and Management: Methods for collecting data (mainly experimental). Development of databases for storage and retrieval of information. Information Use: Data analysis, correlation and model building. Information Application: Prediction of molecular properties relevant to chemical and biochemical sciences.
Prediction of Solubility, ADME & Toxicity Membrane transfer Liver extraction Dissolution Solid drug Systemic circulation Drug in solution Absorbed drug Solubility Absorption Metabolism
Prediction of biological activity/selectivity
Prediction models at CBS
Virtual screening Computational techniques for a rapid assessment of large libraries of chemical structures in order to guide the selection of likely drug candidates. Exploit knowledge of the active ligand molecule or the protein target.
Virtual Screening Flavors TARGET-BASED 1D filters e.g. Lipinskis Rule of Five 1D LIGAND-BASED
Molecular similarity on the Chemical Space Similar Property Principle – Molecules having similar structures and properties are expected to exhibit similar biological activity. (Not always true!) Thus, molecules that are located closely together in the chemical space are often considered to be functionally related.
Ligand-based VS: Fingerprints widely used similarity search tool consists of descriptors encoded as bit strings Bit strings of query and database are compared using similarity metric such as Tanimoto coefficient MACCS fingerprints: 166 structural keys that answer questions of the type: Is there a ring of size 4? Is at least one F, Br, Cl, or I present? where the answer is either TRUE (1) or FALSE (0)
Tanimoto Similarity or 90% similarity
Tanimoto Similarity
Ligand-based VS: Pharmacophore
Structure-based Virtual Screening: Docking Binding pocket of target Library of small compounds Given a protein and a database of ligands, docking scores determine which ligands are most likely to bind.
Energy of binding Binding pocket of target Library of small compounds -1 kcal/mol -10 kcal/mol +10 kcal/mol +1 kcal/mol For any spontaneous change in a closed system, the change in [Gibbs] free energy equals the change in enthalpy minus the change in entropy times the temperature. ΔG = ΔH - TΔS Torsional free E vdW Hbond Desolvation E Electrostatic E
“Docking” and “Scoring” Docking involves the prediction of the binding mode of individual molecules Goal: new ligand orientation closest in geometry to the observed X-ray structure (Conformations of ligands in complexes often have very similar geometries to minimum-energy conformations of the isolated ligand) Scoring ranks the ligands using some function related to the free energy of association of the two partners, looking at attractive and repulsive regions and taking into account steric and hydrogen bonding interactions Goal: new ligand score closest in value to the docking score of the X-ray structure
Docking algorithms Most exhaustive algorithms: Accurate prediction of a binding pose Most efficient algorithms Docking of small ligand databases in reasonable time Rapid algorithms Virtual high-throughput screening of millions of compounds
Scoring functions Molecular mechanics force field-based Score is estimated by summing the strength of intermolecular van der Waals and electrostatic interactions between all atoms of the ligand-target complex -CHARMM, AMBER Empirical-based Based on summing various types of interactions between the two binding partners (hydrogen bonds, hydrophobic, …) - ChemScore, GlideScore, AutoDock Knowledge-based Based on statistical observations of intermolecular close contacts from large 3D databases, which are used to derive potentials or mean forces -PMF, DrugScore
Combination of pharmacophore, docking and molecular dynamics (MD) screens Ligand-based VS good enrichment of candidate molecules from the screening of large databases with less computational efforts too coarse to pick up subtle differences induced by small structural variations in the ligands many options for model refinement Structure-based VS better fit for analyzing smaller sets of compounds, especially in retrospective analysis include all possible interactions thus allowing the detection of unexpected binding modes Changing parameters for docking algorithms and scores is demanding Mutants are being developed: pharmacophore methods with information about the target’s binding site docking programs that incorporate pharmacophore constraints
http://www.vcclab.org/lab/edragon/
Public Web Chemoinformatics Tools http://pasilla.health.unm.edu/
ChemSpider www.chemspider.com
Open Babel http://openbabel.org/wiki/Main_page
D. Vidal et al, Ligand-based Approaches to In Silico Pharmacology, Chemoinformatics and Computational Chemical Biology, Ed J. Bajorath, Springer, 2011
Questions?