Time-Efficient Flexible Superposition of Medium-sized Molecules Presented by Tamar Sharir (Lemmen & Lengauer)

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Analysis of High-Throughput Screening Data C371 Fall 2004.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Case Study: Dopamine D 3 Receptor Anthagonists Chapter 3 – Molecular Modeling 1.
Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.
Fast Algorithms For Hierarchical Range Histogram Constructions
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
A 3-D reference frame can be uniquely defined by the ordered vertices of a non- degenerate triangle p1p1 p2p2 p3p3.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
1 PharmID: A New Algorithm for Pharmacophore Identification Stan Young Jun Feng and Ashish Sanil NISSMPDM 3 June 2005.
Structural bioinformatics
Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.
Surface Reconstruction from 3D Volume Data. Problem Definition Construct polyhedral surfaces from regularly-sampled 3D digital volumes.
Docking Algorithm Scheme Part 1: Molecular shape representation Part 2: Matching of critical features Part 3: Filtering and scoring of candidate transformations.
Protein Docking and Interactions Modeling CS 374 Maria Teresa Gil Lucientes November 4, 2004.
Agenda A brief introduction The MASS algorithm The pairwise case Extension to the multiple case Experimental results.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
FLEX* - REVIEW.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Recovering Articulated Object Models from 3D Range Data Dragomir Anguelov Daphne Koller Hoi-Cheung Pang Praveen Srinivasan Sebastian Thrun Computer Science.
BL5203: Molecular Recognition & Interaction Lecture 5: Drug Design Methods Ligand-Protein Docking (Part I) Prof. Chen Yu Zong Tel:
Pharmacophore-based Molecular Docking Bert E. Thomas, Diane Joseph- McCarthy, Juan C.Avarez.
QSD – Quadratic Shape Descriptors Surface Matching and Molecular Docking Using Quadratic Shape Descriptors Goldman BB, Wipke WT. Quadratic Shape Descriptors.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Object Recognition. Geometric Task : find those rotations and translations of one of the point sets which produce “large” superimpositions of corresponding.
A unified statistical framework for sequence comparison and structure comparison Michael Levitt Mark Gerstein.
Protein Structure Prediction Samantha Chui Oct. 26, 2004.
Model Database. Scene Recognition Lamdan, Schwartz, Wolfson, “Geometric Hashing”,1988.
RAPID: Randomized Pharmacophore Identification for Drug Design PW Finn, LE Kavraki, JC Latombe, R Motwani, C Shelton, S Venkatasubramanian, A Yao Presented.
Inverse Kinematics for Molecular World Sadia Malik April 18, 2002 CS 395T U.T. Austin.
1 Fingerprint Classification sections Fingerprint matching using transformation parameter clustering R. Germain et al, IEEE And Fingerprint Identification.
Pharmacophore and FTrees
Molecular Descriptors
BINF6201/8201 Principle components analysis (PCA) -- Visualization of amino acids using their physico-chemical properties
CSE554AlignmentSlide 1 CSE 554 Lecture 8: Alignment Fall 2014.
Gapped BLAST and PSI-BLAST : a new generation of protein database search programs Team2 邱冠儒 黃尹柔 田耕豪 蕭逸嫻 謝朝茂 莊閔傑 2014/05/12 1.
Similarity Methods C371 Fall 2004.
Surface Simplification Using Quadric Error Metrics Michael Garland Paul S. Heckbert.
A genetic algorithm for structure based de-novo design Scott C.-H. Pegg, Jose J. Haresco & Irwin D. Kuntz February 21, 2006.
A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Classifier Evaluation Vasileios Hatzivassiloglou University of Texas at Dallas.
In silico discovery of inhibitors using structure-based approaches Jasmita Gill Structural and Computational Biology Group, ICGEB, New Delhi Nov 2005.
Comp. Genomics Recitation 3 The statistics of database searching.
Ligand-based drug discovery No a priori knowledge of the receptor What information can we get from a few active compounds.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Altman et al. JACS 2008, Presented By Swati Jain.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
Lecture 16 – Molecular interactions
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
PROTEIN STRUCTURE SIMILARITY CALCULATION AND VISUALIZATION CMPS 561-FALL 2014 SUMI SINGH SXS5729.
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance Andrew I. Jewett, Conrad C. Huang and Thomas.
Sequence Alignment.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine Ajay N. Jain UCSF Cancer Research Institute and Comprehensive.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Molecular mechanics Classical physics, treats atoms as spheres Calculations are rapid, even for large molecules Useful for studying conformations Cannot.
CIVET seminar Presentation day: Presenter : Park, GilSoon.
Find the optimal alignment ? +. Optimal Alignment Find the highest number of atoms aligned with the lowest RMSD (Root Mean Squared Deviation) Find a balance.
A new protein-protein docking scoring function based on interface residue properties Reporter: Yu Lun Kuo (D )
Dense-Region Based Compact Data Cube
CSE 554 Lecture 8: Alignment
Chapter 15 QUERY EXECUTION.
Virtual Screening.
Protein structure prediction.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Presentation transcript:

Time-Efficient Flexible Superposition of Medium-sized Molecules Presented by Tamar Sharir (Lemmen & Lengauer)

Outline Definitions Definitions Goals in superposition of molecules Goals in superposition of molecules Structural-Activity relations Structural-Activity relations Problem definition Problem definition Assumptions and simplifications Assumptions and simplifications Biologic background for the algorithm Biologic background for the algorithm The main algorithm The main algorithm Modifications and improvments Modifications and improvments Results Results Summary Summary

Receptor Ligand Receptor Pocket What does it “look like ”?

Definitions - a protein, molecule which give a biological response upon uniting with chemically complementary molecules. Receptor- a protein, molecule which give a biological response upon uniting with chemically complementary molecules. - Small organic molecule, composed of atoms that forms a complex compound Ligand - Small organic molecule, composed of atoms that forms a complex compound - The binding area (site) Receptor Pocket - The binding area (site)

Definitions-Cont. Receptor -Can be considered as the largest common denominator shared by a set of active molecules. Represent an abstract concept that accounts for the common molecular interaction capacities of a group of compounds towards their target structure Pharmacophore Model-Can be considered as the largest common denominator shared by a set of active molecules. Represent an abstract concept that accounts for the common molecular interaction capacities of a group of compounds towards their target structure L1 L2 Pharmacophore

6 Areas of Interests Pharmaceutical Research Area- design molecules that interfere with specific biochemical pathways in living systems. Drug Design Area -develop small organic molecules with a high affinity of binding towards a given receptor (competition)

7 So we have a receptor and we have a ligand, where is the problem???

8 3D structure of receptor is enough 3D structure of receptor is enough But not always exists! But not always exists! In many cases, we only know a set of ligands together with their biological activities towards a receptor In many cases, we only know a set of ligands together with their biological activities towards a receptor Structural – activity relationship studies (3D QSAR) aim to correlate measured activities with structure-based properties of the ligands. Structural – activity relationship studies (3D QSAR) aim to correlate measured activities with structure-based properties of the ligands. Structural-Activity Relationship

9 What can we do with the results? Extract the relevant chemical features of ligands Extract the relevant chemical features of ligands Create a pharmacophore model. Create a pharmacophore model. Search ligands with the same activity Search ligands with the same activity Provide an estimate of the binding affinity of a novel ligand towards a given receptor Provide an estimate of the binding affinity of a novel ligand towards a given receptor Take the negative imprint of the set of superimposed ligands as a crude description of the binding pocket. (receptor modeling) Take the negative imprint of the set of superimposed ligands as a crude description of the binding pocket. (receptor modeling)

10 The Problem “in Visual”

11 Problem Definition Input: 2 molecules: The reference ligand - rigid, presented in the conformation inside the receptor packet The test ligand - flexible, given in an arbitrary conformation Output: the best structural alignment of the 2 molecules received in a short given time best=“highest score”

12 Overall Goal Drastically reduce run time, while limiting the inaccuracies of the model and the computation to a tolerable level

13 Existing Approaches Some methods need to be given the pharmacophore that displays the commonalities of both ligands Some methods need to be given the pharmacophore that displays the commonalities of both ligands Other methods treat both molecules as rigid Other methods treat both molecules as rigid Methods that handle molecular flexibility without extraneous knowledge of commonalities of both ligands are rare, but are in high demand Methods that handle molecular flexibility without extraneous knowledge of commonalities of both ligands are rare, but are in high demand This method takes into account the molecular flexibility of the test ligand and needs no predefined information on the pharmacophore shared by the reference and test ligands

14 Assumptions & Simplifications 1. Reference and test ligands occupy maximally overlapping areas in space 2. Reference and test ligands usually interact with the same functional group of the amino acids in the binding pocket 3. Only pairs of ligands are considered (no multiple superposition of several ligands) 4. Number of degrees of freedom is reduced to the torsional degree of freedom of the test ligand 5. Atoms of the reference ligand are kept fixed in space.

15 Strong binding requires optimal space-filling of the binding pocket The run time is small enough to perform several runs: with different conformations of the reference ligand pairwise comparisons among a larger set of ligands. Runs can be performed independently and in parallel existing methods that can be used for refining the superposition The more rigid the molecules, the higher their binding affinity Why do we allow these simplifications?

16 How do we score? van der Waals volume We will use physicochemical properties of the ligands not only for scoring, but also for generating the solutions The two main contributions for scoring: 1.paired inter-molecular interactions 2. overlap volumes electrostatic potential hydrophobicity hydrogen-bonding donor and acceptor potentials

17 How we score? –Cont. The contributions to the scoring function are divided into two groups: called hard and soft criteria. The hard criteria can be used to generate placements and to reject unsatisfactory ones (example: minimum threshold for the overlap volume serves as a criterion to reject unlikely placements) the soft criteria are used only for scoring and not for eliminating unlikely solutions (example: the scoring terms for the paired intermolecular interactions)

18 Paired Intermolecular Interactions are defined interaction surfaces are defined They amount to sections of a spherical surface surrounding the functional group of interest They amount to sections of a spherical surface surrounding the functional group of interest To each such a particular is attributed To each such interaction center a particular interaction type is attributed Intermolecular interactions with a potential receptor atom that are plausible for both ligands are paired and contribute a term to the overall score.

19 Paired Intermolecular Interactions O H N H N Reference Ligand Test Ligand hypothetical receptor side interaction surface

20 Paired Intermolecular Interactions sets of paired intermolecular interactions are called matches To quantify the weight of a match, a scoring function is defined Summing over the contributions of all matches results in the match score Receptor L2 L1

21 Overlap volumes of different chemical properties provide the major contributions to the binding affinity towards the receptor We assume for two ligands, which achieve a similar binding affinity, that their chemical fingerprints inside the receptor pocket are similar We assume for two ligands, which achieve a similar binding affinity, that their chemical fingerprints inside the receptor pocket are similar The scoring scheme also considers the physicochemical properties of both ligands The scoring scheme also considers the physicochemical properties of both ligands

22 The Algorithm Fragmentation and determination of a base fragment iterative Incremental construction of the entire test ligand Placement of the base fragment (onto the reference ligand)

24 1.Placing the Base Fragment 1.approximate the interaction surfaces by sets of points 2.search for nearly congruent triangles of such interaction points in both ligands. 3.Each pair of nearly congruent triangles determines a unique transformation that superimposes one triangle in the first molecule onto the other triangle in the second molecule Through this operation a possible placement of the fragment under consideration is defined

25 (Data Structures) The triangles for the reference ligand are stored in a triangle hash table (RL-table) in a preprocessing step. A query to this table, given a triangle from the test ligand (query triangle), results in a list of all triangles in the reference ligand that are nearly congruent to Pair consisting of the query triangle and a triangle in this list defines one placement of the base fragment over the reference ligand 1.Placing the Base Fragment-Cont.

26 1.we label each query triangle by the types of its corners (t(p1), t(p2) and t(p3), corresponding to the type of interaction points p1, p2 and p3) and the lengths of its sides (l(p1,p2), l(p2,p3) and l(p3,p1 )) 2. To make this label unique, the entries of the label [t(pi), t(pj), t(pk), l(pi,pj), l(pj,pk), l(pk,pi)] are ordered such that t(pi) <= t(pj) and t(pj) <= t(pk) hold 2. Clustering the query triangles # < (# interaction points) 3

p2 p3 p Rule: < t(p1)=t(p2)= t(p3)= Example: Two possible orderings by type: <<= p3p2p < <= p3p1p2

28 2. Clustering the query triangles-Cont. 1.All query triangles are compiled in a list (called TL-list), which is sorted lexicographically by the triangle labels ( The reason for doing so is to obtain contiguous segments of triangles with identical labels (called L-segments) 2.query each triangle in the TL-list against the RL- table ( In fact, we perform such queries only for the first triangle in each L-segment) 3.The triangles which we retrieve from the RL- table are mapped onto each triangle in the L- segment.

29 2. Clustering the query triangles-Cont. Normally, we produce between several hundred thousand up to millions of matches of triangles and, consequently, as many possible placements for the base fragment.

30 1. Reject matches for which the additional criterion for pairing interactions is missing 2. Van der Waals overlap volumes are computed to filter out unsatisfactory solutions 3. Run an efficient on-line procedure in order to cluster similar placements 2. Clustering the query triangles-Cont. So how we reduce the number of query triangles??

31 3. On-Line Clustering of placements The first computed placement p0 is taken as a reference from now on. For every new placement pnew, the RMS deviation dnew from p0 is determined. we merge p and pnew. Check if there is a cluster represented by a placement p that is similar to pnew pnew is retained as the representative of a new cluster. YESNO

32 the search for p is restricted to clusters that have an RMS distance d to the reference p0 which falls in the range of [dnew -delta,dnew +delta] we sort all placements by their RMS distance d to p0. The sorted list is maintained as a leaf- chained search tree. In this tree, placements within the range [dnew - delta,dnew +delta] form a continuous segment inside the leaf-chain 3. On-Line Clustering of placements-Cont.

33 So how do we know we received a good result of the alignment???

34 Evaluation Method Data Sets: How do we use the data sets? Lets say we take a receptor R and Ligands L1 and L2. According to the data set we know connections between some receptors and ligands. Lets assume we know the connection between receptor R and ligand L1 and the connection between receptor R and ligand L2. We wish to find connection between L1 and L2 By matching the connections of R-L1 and R-L2 we get a connection between L1 and L2

35 Evaluation Method-Cont. R L2 R L1 R The real Alignment derived from the Data-Sets:

36 Evaluation Method-Cont. L2L1 RMS Deviation Our Result: The accuracy of the result

37 RMS Results The quality of our results is measured in terms of the RMS deviation of the predicted from the measured orientation and conformation of the test ligand The mean RMS deviation is below 2 Å, and about 1 Å.

38 Run time Results The mean run time over all test cases is below 4 minutes per instance The run time spent parts on the base placement and on the complex construction is about equal Only a minor fraction of the run time is spent on I/O and preprocessing

39 Result Example Black- Reference Ligand White-Test Ligand (computed by our algorithm) Gray-The real result (from the data set)

40 Result Example ReceptorReference Ligand Test Ligand Run Time (mins)Accuracy (Å) (a)(b)(c)(d)(a)(b)(c) Carboxypeptidase A 7cpa1cbx11:47292: cpa2ctc11:39171: cpa3cpa135341: cpa6cpa24:422:347:

41 Disadvantages of Method Inaccuracy of the solutions The requirement of the rigid reference ligand (not always known) Prevent to produce better results for large ligands

42 Advantages of Method Reasonable accurancy Quick superimposing

43 Method Summary Structural alignment of medium-sized organic molecules For applications in 3D QSAR and in receptor modeling Ligand flexibility is modeled by decomposing the test ligand into molecular fragments Superimposes a base fragment of the test ligand onto the reference ligand and then attaches the remaining fragments of the test ligand in a step-by-step fashion The run time on a single problem instance is a few minutes on a common-day workstation