De Novo design tools for the generation of synthetically accessible ligands Peter Johnson, Krisztina Boda, Shane Weaver, Aniko Valko, Vilmos Valko.

Slides:



Advertisements
Similar presentations
Scientific & technical presentation Fragmenter Nóra Máté Sept 2005.
Advertisements

Scientific & technical presentation Structure Visualization with MarvinSpace Oct 2006.
SOMA2 – Drug Design Environment. Drug design environment – SOMA2 The SOMA2 project Tekes (National Technology Agency of Finland) DRUG2000 program.
1 Miklós Vargyas, Judit Papp May, 2005 MarvinSpace – live demo.
Analysis of High-Throughput Screening Data C371 Fall 2004.
Drug design.  electronic databases  contain molecules which have been isolated or synthesized and tested by pharmaceutical companies for possible pharmaceutical.
Jürgen Sühnel Institute of Molecular Biotechnology, Jena Centre for Bioinformatics Jena / Germany Supplementary Material:
Alcohols: Structure & Synthesis
SimBioSys Inc.© 2003http:// eHiTS: Novel algorithm for fast, exhaustive flexible ligand docking and scoring Zsolt Zsoldos, Aniko Simon,
A 3-D reference frame can be uniquely defined by the ordered vertices of a non- degenerate triangle p1p1 p2p2 p3p3.
Molecular dynamics refinement and rescoring in WISDOM virtual screenings Gianluca Degliesposti University of Modena and Reggio Emilia Molecular Modelling.
1 PharmID: A New Algorithm for Pharmacophore Identification Stan Young Jun Feng and Ashish Sanil NISSMPDM 3 June 2005.
Establishing a Successful Virtual Screening Process Stephen Pickett Roche Discovery Welwyn.
Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.
M. Wagener 3D Database Searching and Scaffold Hopping Markus Wagener NV Organon.
FLEX* - REVIEW.
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
BL5203: Molecular Recognition & Interaction Lecture 5: Drug Design Methods Ligand-Protein Docking (Part I) Prof. Chen Yu Zong Tel:
Protein-protein and Protein- ligand Docking The geometric filtering.
Inverse Kinematics for Molecular World Sadia Malik April 18, 2002 CS 395T U.T. Austin.
eHiTS Score Darryl Reid, Zsolt Zsoldos, Bashir S. Sadjad, Aniko Simon, The next stage in scoring function evolution: a new statistically.
1 Data mining of toxic chemicals & database-based toxicity prediction Jiansuo Wang & Luhua Lai Institute of Physical Chemistry, Peking University P. R.
Computational Techniques in Support of Drug Discovery October 2, 2002 Jeffrey Wolbach, Ph. D.
Combinatorial Chemistry and Library Design
ClusPro: an automated docking and discrimination method for the prediction of protein complexes Stephen R. Comeau, David W.Gatchell, Sandor Vajda, and.
Process Flowsheet Generation & Design Through a Group Contribution Approach Lo ï c d ’ Anterroches CAPEC Friday Morning Seminar, Spring 2005.
Optimizing Target Interactions
Drug design.  electronic databases  contain molecules which have been isolated or synthesized and tested by pharmaceutical companies for possible pharmaceutical.
ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
WWU Chemistry ADDITION-ELIMINATION: NITROGEN AND PHOSPHORUS NUCLEOPHILES Sections
In silico discovery of inhibitors using structure-based approaches Jasmita Gill Structural and Computational Biology Group, ICGEB, New Delhi Nov 2005.
SimBioSys Inc.© Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package.
SimBioSys Inc.© 2004http:// Conformational sampling in protein-ligand complex environment Zsolt Zsoldos SimBioSys Inc., © 2004 Contents:
Altman et al. JACS 2008, Presented By Swati Jain.
Chapter 21  Functional Groups  Functional group families are characterized by the presence of a certain arrangement of atoms called a functional group.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
1 © Patrick An Introduction to Medicinal Chemistry 3/e Chapter 10 DRUG DESIGN: OPTIMIZING TARGET INTERACTIONS Part 1: Section 10.1 (SAR)
Hierarchical Database Screenings for HIV-1 Reverse Transcriptase Using a Pharmacophore Model, Rigid Docking, Solvation Docking, and MM-PB/SA Junmei Wang,
R L R L L L R R L L R R L L water DOCKING SIMULATIONS.
BREED: Generating Novel Inhibitors through Hybridization of Known Ligands (A. C. Pierce, G. Rao, and G. W. Bemis) Richard S. L. Stein CS 379a February.
Generating Synthetically Accessible Ligands by De Novo Design Synthetic Sprout A Peter Johnson Krisztina Boda Attilla Ting Jon Baber.
Chemistry XXI Unit 3 How do we predict properties? M1. Analyzing Molecular Structure Predicting properties based on molecular structure. M4. Exploring.
FlexWeb Nassim Sohaee. FlexWeb 2 Proteins The ability of proteins to change their conformation is important to their function as biological machines.
Use of Machine Learning in Chemoinformatics
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Protein backbone Biochemical view:
Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine Ajay N. Jain UCSF Cancer Research Institute and Comprehensive.
Molecular mechanics Classical physics, treats atoms as spheres Calculations are rapid, even for large molecules Useful for studying conformations Cannot.
Elon Yariv Graduate student in Prof. Nir Ben-Tal’s lab Department of Biochemistry and Molecular Biology, Tel Aviv University.
Docking and Virtual Screening Using the BMI cluster
Molecular Modeling in Drug Discovery: an Overview
Page 1 Computer-aided Drug Design —Profacgen. Page 2 The most fundamental goal in the drug design process is to determine whether a given compound will.
DRUG DESIGN: OPTIMIZING TARGET INTERACTIONS
An Introduction to Medicinal Chemistry 3/e COMBINATORIAL CHEMISTRY
APPLICATIONS OF BIOINFORMATICS IN DRUG DISCOVERY
Organic Chemistry Second Edition Chapter 23 David Klein Amines
Protein Structure Prediction and Protein Homology modeling
Functional Groups In an organic molecule, a functional group is an atom or group of atoms that always reacts in a certain way. Section 22-1.
Building Hypotheses and Searching Databases
Functional Groups Unit 3.
Virtual Screening.
Structure-based drug design: progress, results and challenges
Protein structure prediction.
ORGANIC PHARMACEUTICAL CHEMISTRY IV
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Presentation transcript:

De Novo design tools for the generation of synthetically accessible ligands Peter Johnson, Krisztina Boda, Shane Weaver, Aniko Valko, Vilmos Valko

To suggest potential leads that  bind strongly to a given protein because of shape and electrostatic complementarity  Are easy to synthesise Receptor Structure Based Drug Design  Docking methods (preferably flexible docking) identify new lead structures by rapidly screening a database of 3-D structures of known compounds  De novo design methods (such as SPROUT) construct a diverse set of entirely novel potential leads from scratch Approaches: Objective:

Detects potential binding pockets of the protein structures Identifies favourable hydrogen bonding interaction sites (H-bonding, hydrophobic, covalent, metal, user defined) Docks structures to target interaction sites Generates 3D molecular structures of novel ligands by linking the docked starting fragments together in an incremental construction scheme Scores, sorts and clusters the solutions SPROUT Components

De novo design programs such as SPROUT can suggest large sets of entirely novel potential leads Problem with Large Answer Sets Powerful heuristics are necessary to evaluate (and reduce) often large answer sets Eliminate candidates with poor estimated binding affinity Binding Affinity Score Eliminate candidates with complex molecular structures Synthetic Feasibility

For de novo design prediction of synthetic accessibilty is equally important Hypothetical ligands, including those predicted to bind very strongly, have no practical value unless they can be readily synthesised. Our Attempts to Provide Solutions: CAESA (estimates synthetic accessibility) Complexity Analysis (estimates structural complexity and drug-likeness) SynSPROUT ( avoids the problem by building constraints into the structure generation process)

CAESA Computer Assisted Estimation of Synthetic Accessibility Glenn Myatt Jon Baber

Goals of CAESA Project  Clear need for automated method of ranking hypothetical compounds according to perceived ease of synthesis  Good synthetic chemists can do this job themselves on small number of compounds but are unwilling to do it for hundreds or thousands of compounds  CAESA attempts to do the same job but never gets bored!

Estimation of Synthetic Accessibility: Criteria used by CAESA CAESA scores the synthetic accessibility of structures using two main criteria: a)An estimate of structural complexity:  stereocentres  complex topological features (fusions etc.)  functional group complexity b)Availability of good starting materials:  rapid retrosynthetic analysis  database of commercially available materials  reaction rule base (editable)

CAESA Components

Automatic Selection of Starting Materials Starting Materials and Synthetic Accessibility  Availability of suitable starting materials very important factor - good starting materials can dramatically reduce the difficulty of synthesising a compound.  Good starting materials for part of the target molecule means the analysis of structural synthetic difficulty or complexity can be directed to just those portions of the target molecule that cannot be made from available starting materials  Finding good starting materials through retrosynthetic analysis also provides possible synthetic routes as a byproduct

Traditional Retrosynthetic Analysis

Bidirectional Search for Synthetic Routes

Example of Starting Material Selection

Summary of CAESA Features  CAESA carries out a retrosynthetic analysis which terminates when a starting material from a database (such as ACD) is found  Found starting materials are scored according to length and difficulty of reaction sequence and coverage of target compound  All chemistry rules and transformations are described in editable text knowledge bases easily modified by chemists  Quality of the analysis depends on the chemistry included in the knowledge bases and the comprehensiveness of the starting material libraries  But CAESA is relatively slow and speedier methods needed for pruning of large data sets

Alternative Approach Complexity Analysis Based on statistical distribution of various substitution patterns found in databases of existing drugs and available starting materials. Molecular Complexity Analysis of de Novo Designed Ligands Krisztina Boda and A. Peter Johnson J. Med. Chem.; 2006; ASAP Web Release Date: 26-Jan-2006

If a molecular structure contains ring and chain substitution patterns which are common in Assumption Complexity analysis based on statistical distribution of various substitution patterns existing drugs than the structure is likely to be “drug-like” as well as readily synthesisable available starting materials, then the structure is likely to be readily synthesisable

Building Complexity Database Input structure Enumerate chain patterns 1-centred 2-centred 3-centred 4-centred Enumerate ring/ring substitution patterns Database of chains Database of rings/ring substitutions

Atom Substitution Hierarchy Ring (and chain) substitutions are organised in hierarchies The hierarchy stores: Atom type sequence Number of occurrences Binding properties Total occurrences of the topology: 11,801

Ligand Complexity Analysis 3. Match canonical name against the hierarchy roots of the database 4. Retrieval of frequency of occurrences → Calculate score DATABASE of hierarchies + frequency of occurrences 5. Rank structures by complexity score 1. Enumerate ring and chain patterns 2. Generate canonical names for each atom pattern Canonical name : ACanonical name : BCanonical name : C [More Patterns] Speed of Complexity Analysis ~ structures / minute on Linux PC (3GHz)

CONCEPT Calculation of Complexity Score Penalise atom patterns which are infrequent or not present in the complexity database. In SPROUT the complexity analysis is followed by ranking the putative ligands according to their evaluated complexity score. Penalty values can be altered to tailor the system for different applications. The penalty values used in the examples presented here are 25, 20, 15, 10 for 1-,2-,3- and 4-centred chain patterns, 40 and 30 for rings and ring substitutions.

Validation Experiment Comparison with CAESA Both methods used to estimate synthetic accessibility for the same set of 50 top selling drugs

CAESA vs. Complexity Analysis Elapsed time: CAESA : 703 sec Complexity Analysis : 8 sec Complexity scores are calculated using the complexity database derived from available SMs penalty for each identified stereo centre in the structures.

Complexity Analysis vs CAESA  More suitable for prioritization of thousands of structures within a reasonable time frame.  Provides acceptable compromise between the speed of the analysis and the accuracy of calculated scores.  Because this approach is based on characteristics of existing readily available compounds, simple but novel structural features may be wrongly identified as complex

Yet another alternative approach Build synthetic feasibility into the structure generation process ~

SynSPROUT Approach Readily synthetisable putative ligand structures Reliable high yielding reactions SyntheticKnowledgeBase Pool of readily available starting materials FragmentLibrary fuse spiro new bond Classic SPROUT Built in / user defined reactions: Amide formation Ether formation Ester formation Amine alkylation Reductive amination etc. SynSPROUT Ease of synthesis is a key factor in drug development Build synthetic constraints into structure generation process VIRTUAL SYNTHESIS IN RECEPTOR CAVITY SynSPROUT Scheme

Current Status  Promising structures with estimated high binding affinity  SynSPROUT provides the equivalent to screening a large number of combinatorial libraries  Potential for suggesting starting points for new combinatorial libraries  Combination of a large starting material library with a large reaction knowledgebase causes a combinatorial problem – even with parallel processing  Restricting either size of library or number of synthetic reactions gives acceptable run times

De Novo Structure Generation vs. Lead Optimization De Novo Structure Generation Lead Optimization No structural information from any existing bound ligand is utilised To generate diverse putative ligands from scratch To suggest better ligands structurally similar to the bound one The structure of a good bound ligand provides a starting point (core) AIM AIM

Variations on the SynSPROUT Theme SPROUT LeadOpt Two modes for structure based lead optimisation  Core Extension – Extends core structure (derived from lead) by virtual synthetic chemistry  Monomer Replacement – Replaces monomers which have been identified by retrosynthetic analysis of a lead compound

Core Extension  Import the modified bound ligand (core) + identify substitution points (functional groups)  Generate core + monomer product by performing virtual synthetic reaction(s) at selected functional groups  Estimate binding affinity for products

List of reactions (between functional groups) Synthetic Knowledge Base Core Extension Scheme CORE Simulatesynthetic reaction in the 3D context of receptorsite CORE R 23 R 13 CORE R 12 R 22 R 33 R 32 R 31 R 11 R 21 Multiple low energy conformers + detected functional groups Core Structure Monomer Library General Scheme All possible core + monomer combinations are generated

Automatic Monomer Library Generation SDF file of 3D monomers Perception Knowledge Base o Aromaticity o Normalisation o Hybridisation o H-bonding properties Synthetic rules Functional Groups Synthetic Knowledge Base Atom & Ring Perception Detect Functional Groups (joining points) Multiple low energy conformers + detected functional groups Monomer Library …

CHEMICAL-LABEL C[SPCENTRE=2](=O)-O[HS=1] CHEMICAL-LABEL C-N[HS=2];[CONNECTION=1] Synthetic Knowledge Base  Steps of formation  Hybridization changes  Bond type  Bond length  Dihedral penalty/angle Steps of Joining Rules EXPLANATION Amide Formation IF Carboxylic Acid INTER Primary Amine THEN delete-atom 3 change-hybridization 5 to SP2 form-bond - between 1 and 5 DIHEDRAL-ATOMS DIHEDRAL 0 0 BOND-LENGTH 1.35 END-THEN

Importing the Core Structure (from MOL/PDB file in Elephant module) Importing from a pdb file pdb → mol converter is invoked Functional group(s) are automatically detected when the core structure is imported into the system Hydrogen donor/acceptor or spheric target sites anchor the imported core structure inside the receptor cavity, partially restricting the displacement of the core during lead optimization, but allowing slight movements in order to avoid boundary violations.

Product Generation I. R1R1 Sulphonamide Formation Amide Formation Core R2R2 Generate products by mimicking synthetic reactions between core + monomers Step I.

Product Generation II. Secondary conformers generated by twisting about rotatable bonds of the low energy monomer conformers User defined parameters: Max deviation Sampling of dihedral angles Max penalty Primary monomer conformers generated by (a) CORINA + ROTATE (b) sampling discrete dihedral angles around formed bonds Rigid body docking R1R1 R2R2 Core Ligand flexibility = generate multiple low energy conformers Step II.

Product Generation III. Docking + rejection of conformers with High internal energy Boundary violation Step III.

Multiple Extension Points Combinatorial Problem  Clients-Master-Slaves architecture  Mixed SGI/Linux cluster network (TCP/IP socket network communication) Master Client 1 Client 2 Client 3 … … LinuxSGI Slave 1 CORE R3R3R3R3 R1R1R1R1 R2R2R2R2 Slave 2 CORE R3R3R3R3 R1R1R1R1 R2R2R2R2 Slave 3 CORE R3R3R3R3 R1R1R1R1 R2R2R2R2 Each slave performs optimization on different core + monomer combination

PDB: 1KE8 Case Study (CDK2) CORE R1R1R1R1 R2R2R2R2

ROTATE D structures CORINA D conformers Monomer Library At least one of the following functional groups:  Carboxylic Acid  Primary Amine  Primary Alkyl Halide  Carbonyl Applied filters  Number of heavy atoms ≥ 8  Number of heavy atoms ≤ 16  Number of acceptor atom ≤ 5  Number of donor atoms ≤ 3  Number of rotatable bonds ≤ 2  Max chain length ≤ 3  Allowed atom types: H, B, C, N, O, F, S, Cl, Br  Number of rings ≤ 3  Stereo centres ≤ 1  No 3,4,7,8,9 –membered ring Maybridge & Aldrich (~ ) 2D structures Monomer Reagent Library Generation Case Study (CDK2)

Primary amine in sulphonamide formation Sulphonyl chloride reacts with Carboxylic acid in amide reaction Primary aryl halide in amine alkylation reaction Carbonyl in reductive amination and imine formation Primary amine reacts with CORE R1R1R1R1 R2R2R2R2 Case Study (CDK2)

CORE R1R1R1R1 R2R2R2R2 523 Primary Amine R 1 Monomer Library Elapsed time ~ 5 Hours (with 100 slave processors) R 1 +Core + R 2 combinations: Screened 81.23% Failed 4.87 % Accepted % (54,123) Results 293 Carboxylic Acid 93 Primary Alkyl Halide 393 Carbonyl R 2 Monomer Library x= 432,345 combinations Case Study (CDK2)

Case Study (Generated Products)

Monomer Replacement Many lead compounds are composed of readily available starting materials (monomers) linked by reliable high yielding reactions Retrosynthetic analysis can be used to identify the monomers Structurally related analogues could be generated by exhaustive monomer replacement Considerable efficiency gains if monomer library is arranged in a hierarchy based on substructural relationships

Amide Substructure No overlap Substructure Superstructure No overlap Hierarchy Construction

Amide Hierarchy Usage

Monomer Replacement Do they exist in starting materials HIERARCHY ? Retro-synthetic analysis

CASE STUDY Optimisation of SPROUT designed inhibitors of p falciparum Dihydro-orotate Dehydrogenase using Monomer Replacement Initial lead compound MD-155 Sprout score Retrosynthetic analysis finds amide formation and Ullmann/Suzuki reaction for monomer formation Monomer library: aryl halides and p- halo-anilines 2D structures: 1923 conformations: 26916

High scoring monomer replacement results Monomer replacement gave 840 new structures (including multiple conformers of the same structure) Scores – 7.50 to 9.30.

Experimental Results for Some Ligands Suggested by SPROUT LeadOpt Monomer Replacement Starting Point MD-155 PfDHODH Ki 3.0 mM HsDHODH Ki 11.0 nM MD-204 PfDHODH Ki 733 nM HsDHODH Ki 21.0 nM 4 fold enhancement in Ki for PfDHODH MD-213 PfDHODH Ki 478 nM HsDHODH Ki 21.7 nM 6 fold enhancement in Ki for PfDHODH

Conclusions  Scoring functions for assessment of binding affinity of the hypothetical compounds produced by de novo design are far from perfect  Hence only readily synthesisable putative ligands will undergo experimental evaluation by medicinal chemists  Assessment of synthetic feasibility is a tractable problem

Acknowledgements  Matt Davies, Phil Bone and Timo Heikkala for experimental work  Molecular Networks GmbH for providing CORINA & ROTATE  MDL for providing MDDR, one of the databases used in the complexity analysis project  for sponsoring the lead optimization project