A genetic algorithm for structure based de-novo design Scott C.-H. Pegg, Jose J. Haresco & Irwin D. Kuntz February 21, 2006.

Slides:



Advertisements
Similar presentations
Algorithm Design Techniques
Advertisements

Genetic Algorithms (Evolutionary Computing) Genetic Algorithms are used to try to “evolve” the solution to a problem Generate prototype solutions called.
Case Study: Genetic Algorithms GAs are an area of AI research –used to solve search problems with potentially better performance than traditional search.
How to approximate complex physical and thermodynamic interactions? Employ rigid or flexible structures for ligand and receptor (Side-chains or Back-bone.
Genetic Algorithms Representation of Candidate Solutions GAs on primarily two types of representations: –Binary-Coded –Real-Coded Binary-Coded GAs must.
Iterative Relaxation of Constraints (IRC) Can’t solve originalCan solve relaxed PRMs sample randomly but… start goal C-obst difficult to sample points.
1 PharmID: A New Algorithm for Pharmacophore Identification Stan Young Jun Feng and Ashish Sanil NISSMPDM 3 June 2005.
Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.
Protein Docking and Interactions Modeling CS 374 Maria Teresa Gil Lucientes November 4, 2004.
Novel Dihydrofolate Reductase Inhibitors. Structure- Based versus Diversity-Based Design and High- Throughput Synthesis and Screening Pierre C. Wyss, Paul.
COMP305. Part II. Genetic Algorithms. Genetic Algorithms.
A new crossover technique in Genetic Programming Janet Clegg Intelligent Systems Group Electronics Department.
Docking of Protein Molecules
FLEX* - REVIEW.
COMP305. Part II. Genetic Algorithms. Genetic Algorithms.
An Integrated Approach to Protein-Protein Docking
BL5203: Molecular Recognition & Interaction Lecture 5: Drug Design Methods Ligand-Protein Docking (Part I) Prof. Chen Yu Zong Tel:
Design of Small Molecule Drugs Targeted to RNA RNA Ontology Group May
Molecular Docking Using GOLD Tommi Suvitaival Seppo Virtanen S Basics for Biosystems of the Cell Fall 2006.
D Nagesh Kumar, IIScOptimization Methods: M1L4 1 Introduction and Basic Concepts Classical and Advanced Techniques for Optimization.
Protein Structure Prediction Samantha Chui Oct. 26, 2004.
Protein-protein and Protein- ligand Docking The geometric filtering.
Genetic Algorithms Overview Genetic Algorithms: a gentle introduction –What are GAs –How do they work/ Why? –Critical issues Use in Data Mining –GAs.
Leiden University. The university to discover. Enhancing Search Space Diversity in Multi-Objective Evolutionary Drug Molecule Design using Niching 1. Leiden.
Conformational Sampling
SOFT COMPUTING (Optimization Techniques using GA) Dr. N.Uma Maheswari Professor/CSE PSNA CET.
Genetic algorithms Prof Kang Li
Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory Mixed Integer Problems Most optimization algorithms deal.
Flexible Multi-scale Fitting of Atomic Structures into Low- resolution Electron Density Maps with Elastic Network Normal Mode Analysis Tama, Miyashita,
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
ART – Artificial Reasoning Toolkit Evolving a complex system Marco Lamieri Spss training day
What is Genetic Programming? Genetic programming is a model of programming which uses the ideas (and some of the terminology) of biological evolution to.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
Applying Genetic Algorithm to the Knapsack Problem Qi Su ECE 539 Spring 2001 Course Project.
Computational Complexity Jang, HaYoung BioIntelligence Lab.
Evolving Virtual Creatures & Evolving 3D Morphology and Behavior by Competition Papers by Karl Sims Presented by Sarah Waziruddin.
In silico discovery of inhibitors using structure-based approaches Jasmita Gill Structural and Computational Biology Group, ICGEB, New Delhi Nov 2005.
SimBioSys Inc.© Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package.
2005MEE Software Engineering Lecture 11 – Optimisation Techniques.
Altman et al. JACS 2008, Presented By Swati Jain.
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
Hierarchical Database Screenings for HIV-1 Reverse Transcriptase Using a Pharmacophore Model, Rigid Docking, Solvation Docking, and MM-PB/SA Junmei Wang,
ECE 103 Engineering Programming Chapter 52 Generic Algorithm Herbert G. Mayer, PSU CS Status 6/4/2014 Initial content copied verbatim from ECE 103 material.
Chapter 12 FUSION OF FUZZY SYSTEM AND GENETIC ALGORITHMS Chi-Yuan Yeh.
BREED: Generating Novel Inhibitors through Hybridization of Known Ligands (A. C. Pierce, G. Rao, and G. W. Bemis) Richard S. L. Stein CS 379a February.
Motif Search and RNA Structure Prediction Lesson 9.
Use of Machine Learning in Chemoinformatics
Elon Yariv Graduate student in Prof. Nir Ben-Tal’s lab Department of Biochemistry and Molecular Biology, Tel Aviv University.
Docking and Virtual Screening Using the BMI cluster
Molecular Modeling in Drug Discovery: an Overview
TIDEA Target (and Lead) Independent Drug Enhancement Algorithm.
Genetic Algorithm(GA)
GENETIC ALGORITHM By Siti Rohajawati. Definition Genetic algorithms are sets of computational procedures that conceptually follow steps inspired by the.
CEng 713, Evolutionary Computation, Lecture Notes parallel Evolutionary Computation.
Page 1 Computer-aided Drug Design —Profacgen. Page 2 The most fundamental goal in the drug design process is to determine whether a given compound will.
Creation Of Novel Compounds by Evaluation of Residues at Target Sites
Simplified picture of the principles used for multiple copy simultaneous search (MCSS) and for computational combinatorial ligand design (CCLD). Simplified.
Complete automation in CCP4 What do we need and how to achieve it?
Ligand-Based Structural Hypotheses for Virtual Screening
Virtual Screening.
Ligand Docking to MHC Class I Molecules
An Integrated Approach to Protein-Protein Docking
Structure-based drug design: progress, results and challenges
Methods and Materials (cont.)
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Presentation transcript:

A genetic algorithm for structure based de-novo design Scott C.-H. Pegg, Jose J. Haresco & Irwin D. Kuntz February 21, 2006

Method and applications Goal –Using genetic algorithm in ADAPT to search novel small molecules for combinatorial library generation Method –Initial generation –Fitness function –Breeding next generation Applications –Catheprin D – small chemical space, ligand unknown –Dihydrofolate reductase – larger chemical space, ligand known –HIV 1 RT – reproduce known structures Questions

Goal and algorithm choice “develop new ligands using information from the three dimensional (3D) structure of a protein target without the prior knowledge of other ligands” Basic genetic algorithm Goal is to Challenge is location of bio-active drugs in complete chemical space is sparse, non- contiguous and difficult to predict a priori Strategies already tried -Find fragments that fit in some part of active site and link multiple fragments together -Find a fragment that fits in some part of active site and grow in a particular direction Genetic algorithm is better -Good for searching large part of chemical space quickly -Good for adequate not best solution -Works even when fitness/scoring functions are not known exactly -Works with “whole” molecule properties (ADME) -Generates ensemble solutions as leads

Important steps Basic genetic algorithm Initial generation Fitness pressure Breeding more fit + equal/unequal single/multiple crossover mutation Start with acyclic graph of at most16 fragments with at most 8 connections in SMILES notation -Generate diverse set by picking a random fragment and adding random fragments at random positions -Generate user defined set by swapping at most 2 fragments from user defined graph randomly Evaluate the fitness value for each compound using DOCK 4.0 program with 6-12 Van der Waals and 1/r electrostatic terms, Daylight’s clogp program, molecular weight, number of rotatable bonds and number of hydrogen bond donor/acceptors Select best scoring compounds as parents for the next generation which may or may not include the parents Crossover from parents happens by randomly swapping nodes of equal or unequal sizes generated from random walks Mutations of daughter occurs with user defined mutation probability with respect to identity or connectivity New generation is created, optionally diversity is added and process is cycled until the fitness goal is reached.

Applications Catheprin D Compare results of ADAPT applied to a combinatorial library with experimental binding constant data on the library -Able to select fragments consistently present in best inhibitors tested experimentally -Unable to directly produce known inhibitors due to differences in DOCK score functions and binding constant surface Dihydrofolate reductase (DHFR) Study the effect of seeding with a known ligand, methoxtrexate in this case and adding diversity to longer runs in a larger chemical space search (10 8 compounds) -Able to evolve compounds with motif of known ligands -Able to do so faster when seeded with a known bioactive ligand -Able to do so efficiently in one long run by adding diversity than in multiple short runs HIV 1 reverse transcriptase Rediscover specific structural themes of ligands that bind to this active site -Able to reproduce four known inhibitors in “buttefly-like” shape (out of 26?) -Able to reproduce a PETT variant inhibitor like MSC-127 which was experimentally discovered by testing 750 PETT variants

Catheprin D Setup Experimentally studied ligands 10x10x10 =1000 Size of potential chemical space 25 frag 3 sites = Performed 10 runs of 50 generations each

Catheprin D Results Experimentally studied ligands 10x10x10 =1000 Size of potential chemical space 25 frag 3 sites = Size of library generated by ADAPT 8x7x7 = 329 4/7 inhibitors with 100 nM and 0/23 inhibitors with 330 nM activity found in the ADAPT library Experimental data only exists for 24/392 compounds in ADAPT library DOCK only fitness function does not accurately map the binding constant surface

DHFR Setup 17 fragments from methotrexate + other = 32 total fragments 3-13 fragments allowed per compound Size of possible chemical space = 3.5 x 10 8 unique compounds 1 set of 10 runs to 30 generations – methotrexate seeded 1 set of 10 runs to 30 generations – unseeded 1 set of 10 runs to 100 generations – unseeded 1 run of 1000 generations with diversity every 200 generations 5 runs of 200 generations

DHFR Results 94% of solutions in seeded results better than seed 0% of solutions in 30 generation unseeded results better 28% of solutions in 100 generation unseeded results better 96/98 structures in seeded runs contained pteridine frag. 21/100 structures in 30 gen. and 56/100 structures in 100 gen. unseeded run contained pteridine fragment Fitness score for 1000 generation run was better than generation runs.

HIV 1 RT Setup HIV 1 RT model bound to HEPT in butterfly shape characterized by sphgen & GRID 65 fragments from 26 inhibitors with fragments per compound for 5 x compounds in potential chemical space 5250 compounds generated from 10 runs 5 inhibitors superimposed on top of each other fashioned the butterfly shape Ligands with 50% of atoms in both wings count as butterfly-like

HIV 1 RT Results 4/26 known inhibitors found in butterfly like shape in ADAPT library Effavirenz (SustivaTM),Pyrrolobenzodiazepinone, PETT, Dyarryl Sulfone like scaffolds were found among the butterfly like compounds. Despite the lack of a structural motif in the initial, unseeded populations, the ADAPT program was able to reproduce a geometric constraint, the ‘butterfly’ motif of known NNI’s from the use of a molecular docking fitness function which is not a best choice

Questions What are the time gains/ costs in using this technique instead of just some screening technique? How do you decide what to set the parameters to ? How do you test the method / parameter set without a known set of ligands to form the fragment library from?