ABCD Flexsim-R: A new 3D descriptor for combinatorial library design and in-silico screening 2 nd Joint Sheffield Conference on Chemoinformatics: Computational.

Slides:



Advertisements
Similar presentations
ChemAxon in 3D Gábor Imre, Adrián Kalászi and Miklós Vargyas Solutions for Cheminformatics.
Advertisements

Krishna Rajan Data Dimensionality Reduction: Introduction to Principal Component Analysis Case Study: Multivariate Analysis of Chemistry-Property data.
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data.
Improving enrichment rates A practical solution to an impractical problem Noel O’Boyle Cambridge Crystallographic Data Centre
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Clustering approaches for high- throughput data Sushmita Roy BMI/CS 576 Nov 12 th, 2013.
Molecular dynamics refinement and rescoring in WISDOM virtual screenings Gianluca Degliesposti University of Modena and Reggio Emilia Molecular Modelling.
Establishing a Successful Virtual Screening Process Stephen Pickett Roche Discovery Welwyn.
Lecture 7: Principal component analysis (PCA)
Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.
Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010 Presenter: Yunchao Gong Dept. Computer Science, UNC Chapel Hill.
Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no Anindya Bhattacharya and Rajat K. De.
Mutual Information Mathematical Biology Seminar
A Study on Feature Selection for Toxicity Prediction*
Heuristic alignment algorithms and cost matrices
Summary Molecular surfaces QM properties presented on surface Compound screening Pattern matching on surfaces Martin Swain Critical features Dave Whitley.
Agenda A brief introduction The MASS algorithm The pairwise case Extension to the multiple case Experimental results.
Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.
Identifying functional residues of proteins from sequence info Using MSA (multiple sequence alignment) - search for remote homologs using HMMs or profiles.
Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA) Martin Ott.
Student: Kylie Gorman Mentor: Yang Zhang COLOR-ATTRIBUTES- RELATED IMAGE RETRIEVAL.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Protein Structure and Drug Discovery Workshop To be held at Monash University, Mebourne, Australia October 3 rd to 4 th 2006 Molecular Visualization Learn.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Predicting Highly Connected Proteins in PIN using QSAR Art Cherkasov Apr 14, 2011 UBC / VGH THE UNIVERSITY OF BRITISH COLUMBIA.
Drug Design Process Discovery Phase. Tripos Software n SYBYL & its modules SYBYL, Concord, MOLCAD, SiteId, Advanced Computation, GASP, DISCOtech, HQSAR,
Computational Techniques in Support of Drug Discovery October 2, 2002 Jeffrey Wolbach, Ph. D.
Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.
Combinatorial Chemistry and Library Design
NUS CS5247 A dimensionality reduction approach to modeling protein flexibility By, By Miguel L. Teodoro, George N. Phillips J* and Lydia E. Kavraki Rice.
Chapter 3 Data Exploration and Dimension Reduction 1.
Topological Summaries: Using Graphs for Chemical Searching and Mining Graphs are a flexible & unifying model Scalable similarity searches through novel.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
D9 Drug design Compound libraries Combinatorial and parallel chemistry
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Faculté de Chimie, ULP, Strasbourg, FRANCE
Identification of Cancer-Specific Motifs in
1 John Mitchell; James McDonagh; Neetika Nath Rob Lowe; Richard Marchese Robinson.
In silico discovery of inhibitors using structure-based approaches Jasmita Gill Structural and Computational Biology Group, ICGEB, New Delhi Nov 2005.
Approximation of Protein Structure for Fast Similarity Measures Fabian Schwarzer Itay Lotan Stanford University.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Conformational Space.  Conformation of a molecule: specification of the relative positions of all atoms in 3D-space,  Typical parameterizations:  List.
© 2001, Boehringer, Inc. - All Rights Reserved. SCA: New Cluster Algorithm for Structural Diversity Analysis and Applications Presented at Spotfire Users.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
Pairwise Sequence Analysis-III
Unsupervised Forward Selection A data reduction algorithm for use with very large data sets David Whitley †, Martyn Ford † and David Livingstone †‡ † Centre.
A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space Author: Azzedine Boukerche, Jan M. Correa, Alba.
Doug Raiford Lesson 5.  Dynamic programming methods  Needleman-Wunsch (global alignment)  Smith-Waterman (local alignment)  BLAST Fixed: best Linear:
MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance Andrew I. Jewett, Conrad C. Huang and Thomas.
Design of a Compound Screening Collection Gavin Harper Cheminformatics, Stevenage.
Chance Correlation in QSAR studies Ahmadreza Mehdipour Medicinal & Natural Product Chemistry Research Center.
Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine Ajay N. Jain UCSF Cancer Research Institute and Comprehensive.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Mean Field Theory and Mutually Orthogonal Latin Squares in Peptide Structure Prediction N. Gautham Department of Crystallography and Biophysics University.
Computational Approach for Combinatorial Library Design Journal club-1 Sushil Kumar Singh IBAB, Bangalore.
Principal Components Analysis ( PCA)
We propose an accurate potential which combines useful features HP, HH and PP interactions among the amino acids Sequence based accessibility obtained.
Unsupervised Learning II Feature Extraction
2014 Using machine learning to predict binding sites in proteins Jenelle Bray Stanford University October 10, 2014 #GHC
Molecular Modeling in Drug Discovery: an Overview
Page 1 Computer-aided Drug Design —Profacgen. Page 2 The most fundamental goal in the drug design process is to determine whether a given compound will.
An Introduction to Medicinal Chemistry 3/e COMBINATORIAL CHEMISTRY
Principal Component Analysis (PCA)
Molecular Docking Profacgen. The interactions between proteins and other molecules play important roles in various biological processes, including gene.
“Structure Based Drug Design for Antidiabetics”
Ligand Docking to MHC Class I Molecules
Reporter: Yu Lun Kuo (D )
Machine Learning – a Probabilistic Perspective
Presentation transcript:

ABCD Flexsim-R: A new 3D descriptor for combinatorial library design and in-silico screening 2 nd Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

ABCD Outline Introduction The Flexsim-R Methodology Validation Conclusion and Outlook

ABCD Introduction What is Flexsim-R? Flexsim-R calculates 3D descriptors for reagents, based on the virtual affinity fingerprint idea

ABCD Motivation to develop Flexsim-R Reagent-based descriptors are important for – combinatorial library design – virtual screening experiments – bioisosteric replacements – rational augmentation of inhouse reagent pool For large combinatorial libraries, product-based descriptor calculation is often not feasible -> possible solution: reagent-based product selection (e.g. by a GA) Descriptor calculation should be fast and automizable Descriptor should be related to experimental affinity data Encouragement by virtual affinity fingerprint methods

ABCD In-vitro Affinity Fingerprints Terrapin's Affinity Fingerprint Approach: (Kauvar et al., Chemistry & Biology, 1995, 2, ) Molecular similarity is defined by in-vitro binding patterns ("Affinity Fingerprints") of a ligand set (L) in reference binding assays (A) L1 L2 L3 L4 L5 L6 A1A2A3A4A5A6A7A8

ABCD Virtual Affinity Fingerprints (VAF) Terrapins in-vitro screening in diverse reference assays is simulated by Computational Docking into a reference panel of protein pockets (Docksim, Flexsim-X) by Computational Fitting onto a reference panel of small molecules (Flexsim-S) (Briem and Lessel, Perspectives in Drug Discovery and Design, 20 (2000) )

ABCD The Flexsim-R Method

ABCD The Flexsim-R Method Protein pocket Problems with Rgroups in conventional VAF approaches: Rgroups tend to be smaller than „drug-like“ molecules Alignment rule by common core attachment point gets lost Solution: Core-constrained multiple-site docking

ABCD The Flexsim-R Method Components of core-constrained multiple-site docking: 1. Rgroup Set 2. Common Core 3. Protein Binding Pockets

ABCD The Flexsim-R Method First step: Docking of common core group with FlexX Multiple (e.g. 50 best) solutions are stored RMS threshold can be applied to prevent clustering

ABCD Example: Thrombin active site with 50 best FlexX solutions of hydantoin (RMS threshold = 2.0) The Flexsim-R Method

ABCD The Flexsim-R Method Second step: Docking of core group + rgroup with FlexX Pre-stored core positions serve as reference FlexX scores are stored in descriptor matrix Core Pos Core Pos R1 R2 R3... Descriptor Matrix Protein pocket

ABCD The Flexsim-R Method

ABCD The Flexsim-R Method

ABCD The Flexsim-R Method

ABCD C1C2C3 Pocket 3 C1C2C3 Pocket 2 The Flexsim-R Method Multiple protein pockets-> Concatenated descriptor matrix R1 R2 R3... C1C2C3 Pocket 1

ABCD C1C2C3 X4 C1C2C3 X3 C1C2C3 X2 The Flexsim-R Method Multiple core attachment points -> Concatenated descriptor matrix R1 R2 R3... C1C2C3 X1

ABCD The Flexsim-R Method Example: Hydantoin Core 4 attachment points * 7 protein pockets * 50 FlexX solutions -> descriptor vector length = 1,400

ABCD The Flexsim-R Method Test set for method development and evaluation: Rgroups: 20 natural amino acids Core groups: 7 protein pockets: 1dwc, 1eed, 1pop, 2tsc, 3cla, 3dfr, 5ht2 (model)

ABCD Correlation Analysis Analyses were performed to check correlation between different protein pockets different cores different attachment points Analyses are based on euclidian distance matrices for all 190 pairwise amino acid vector combinations

ABCD Correlation Analysis Correlation matrix of protein pockets: (hydantoin core, all 4 attachment points)

ABCD Correlation Analysis Correlation matrix of core groups: (all 7 protein pockets, all attachment points)

ABCD Correlation Analysis Correlation matrix of attachment points: (hydantoin core, all 7 protein pockets)

ABCD Correlation Analysis Reduction of descriptor vector length (dimensionality) : no PCA was performed, since we want to get information about the most uncorrelated descriptor columns instead, an elimination method has been applied:  the complete pairwise correlation matrix is calculate  all pairs of columns with correlation coefficient (r) above a user- defined threshold (e.g. 0.7) are considered for elimination  from each correlating pair, that column is eliminated which can be better described by multiple linear regression of the remaining descriptors  resulting matrix doesn‘t contain pairs of columns with correlation coefficient above the threshold

ABCD Example: hydantoin core, all 7 proteins, all 4 attachment points Correlation Analysis Descriptor set 1 Descriptor set 2 Descriptor set 3

ABCD Correlation Analysis Thrombin with three most information-rich core positions

ABCD Descriptor Validation Five peptide datasets, taken from literature (Refs. in Matter, H., J. Peptide Res. 52 (1998) ) Product descriptors are generated by concatenation of respective reagent descriptors Validation by PLS Analysis leave-one-out (LOO) and leave-random-groups-out (LRGO) cross-validation

ABCD Descriptor Validation Datasets:

ABCD ACEBITBRAENKBR9 Leave - random-groups-out (LRGO) results: Descriptor Validation: Results

ABCD Summary Flexsim-R comprises a novel virtual affinity fingerprint method, which calculates meaningful 3D descriptors for reagents High correlation between different cores and attachment points For 3 out of 5 validation sets, significant cross-validated q 2 values could be obtained Rgroup alignment problem is tackled inherently Flexsim-R calculations are fast and can be automated easily: only clipped reagent structures are required core positions need to be calculated only once

ABCD Outlook More validation sets have to be tested (e.g. „real-life“ combichem dataset) Is there a set of descriptors, which works well for different datasets? Integration in Boehringer Ingelheim library design and virtual screening workflow

ABCD Acknowledgements Alexander Weber (Boehringer Ingelheim/University of Marburg) Andreas Teckentrup (Boehringer Ingelheim) Hans Matter (Aventis) BMBF for financial support