RAPID: Randomized Pharmacophore Identification for Drug Design PW Finn, LE Kavraki, JC Latombe, R Motwani, C Shelton, S Venkatasubramanian, A Yao Presented.

Slides:



Advertisements
Similar presentations
Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.
Advertisements

Case Study: Dopamine D 3 Receptor Anthagonists Chapter 3 – Molecular Modeling 1.
LIBRA: Lightweight Data Skew Mitigation in MapReduce
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.
A 3-D reference frame can be uniquely defined by the ordered vertices of a non- degenerate triangle p1p1 p2p2 p3p3.
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
Instructor: Mircea Nicolescu Lecture 13 CS 485 / 685 Computer Vision.
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin Nov
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.
Docking Algorithm Scheme Part 1: Molecular shape representation Part 2: Matching of critical features Part 3: Filtering and scoring of candidate transformations.
Protein Docking and Interactions Modeling CS 374 Maria Teresa Gil Lucientes November 4, 2004.
Taking a Numeric Path Idan Szpektor. The Input A partial description of a molecule: The atoms The bonds The bonds lengths and angles Spatial constraints.
Summary Molecular surfaces QM properties presented on surface Compound screening Pattern matching on surfaces Martin Swain Critical features Dave Whitley.
Agenda A brief introduction The MASS algorithm The pairwise case Extension to the multiple case Experimental results.
Luddite: An Information Theoretic Library Design Tool Jennifer L. Miller, Erin K. Bradley, and Steven L. Teig July 18, 2002.
FLEX* - REVIEW.
BL5203: Molecular Recognition & Interaction Lecture 5: Drug Design Methods Ligand-Protein Docking (Part I) Prof. Chen Yu Zong Tel:
A unified statistical framework for sequence comparison and structure comparison Michael Levitt Mark Gerstein.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Experimental Evaluation
Stochastic Roadmap Simulation: An Efficient Representation and Algorithm for Analyzing Molecular Motion Mehmet Serkan Apaydin, Douglas L. Brutlag, Carlos.
Model Database. Scene Recognition Lamdan, Schwartz, Wolfson, “Geometric Hashing”,1988.
How to Stall a Motor: Information-Based Optimization for Safety Refutation of Hybrid Systems Todd W. Neller Knowledge Systems Laboratory Stanford University.
Radial Basis Function Networks
Inverse Kinematics for Molecular World Sadia Malik April 18, 2002 CS 395T U.T. Austin.
Molecular Descriptors
NUS CS5247 A dimensionality reduction approach to modeling protein flexibility By, By Miguel L. Teodoro, George N. Phillips J* and Lydia E. Kavraki Rice.
Similarity Methods C371 Fall 2004.
Conformational Sampling
An Efficient Approach to Clustering in Large Multimedia Databases with Noise Alexander Hinneburg and Daniel A. Keim.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Particle Filters for Shape Correspondence Presenter: Jingting Zeng.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
1 Multiple Classifier Based on Fuzzy C-Means for a Flower Image Retrieval Keita Fukuda, Tetsuya Takiguchi, Yasuo Ariki Graduate School of Engineering,
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Optimal XOR Hashing for a Linearly Distributed Address Lookup in Computer Networks Christopher Martinez, Wei-Ming Lin, Parimal Patel The University of.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Computer Graphics and Image Processing (CIS-601).
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Spectral Sequencing Based on Graph Distance Rong Liu, Hao Zhang, Oliver van Kaick {lrong, haoz, cs.sfu.ca {lrong, haoz, cs.sfu.ca.
1 Embedding and Similarity Search for Point Sets under Translation Minkyoung Cho and David M. Mount University of Maryland SoCG 2008.
Selecting Diverse Sets of Compounds C371 Fall 2004.
Chapter 5 Ranking with Indexes 1. 2 More Indexing Techniques n Indexing techniques:  Inverted files - best choice for most applications  Suffix trees.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Time-Efficient Flexible Superposition of Medium-sized Molecules Presented by Tamar Sharir (Lemmen & Lengauer)
Modeling Protein Flexibility with Spatial and Energetic Constraints Yi-Chieh Wu 1, Amarda Shehu 2, Lydia Kavraki 2,3  Provided an approach to generating.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Data Mining and Decision Support
4. Molecular Similarity. 2 Similarity and Searching Historical Progression Similarity Measures Fingerprint Construction “Pathological” Cases MinMax- Counts.
Protein structure prediction Computer-aided pharmaceutical design: Modeling receptor flexibility Applications to molecular simulation Work on this paper.
Chapter 3 Brute Force Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Computational Approach for Combinatorial Library Design Journal club-1 Sushil Kumar Singh IBAB, Bangalore.
Clustering Data Streams A presentation by George Toderici.
Molecular mechanics Classical physics, treats atoms as spheres Calculations are rapid, even for large molecules Useful for studying conformations Cannot.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Deformation Modeling for Robust 3D Face Matching Xioguang Lu and Anil K. Jain Dept. of Computer Science & Engineering Michigan State University.
Page 1 Computer-aided Drug Design —Profacgen. Page 2 The most fundamental goal in the drug design process is to determine whether a given compound will.
Chapter 4 Basic Estimation Techniques
Simplified picture of the principles used for multiple copy simultaneous search (MCSS) and for computational combinatorial ligand design (CCLD). Simplified.
Geometric Hashing: An Overview
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Implementation of Relational Operations
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Presentation transcript:

RAPID: Randomized Pharmacophore Identification for Drug Design PW Finn, LE Kavraki, JC Latombe, R Motwani, C Shelton, S Venkatasubramanian, A Yao Presented by Greg Goldgof

Key Terms Pharmacophore/Invariant - a specific, three dimensional map of biological properties common to all active conformations of a set of ligands which exhibit a particular activity. Conceptually, a pharmacophore is a distillation of the functional attributes of ligands which accomplish a specific task (Kavraki). Feature (from AI) – a property of elements in a search space that is relevant to evaluation. Ligand – a small molecule that binds to a site on a macromolecule’s surface by intermolecular forces (Wikipedia). Conformation – A specific structural arrangement of a molecule (Wikipedia).

Reason for Research “The identification of pharmacophores is crucial in drug design since frequently the structure of targeted receptor is unknown but a number of molecules that interact with it have been discovered by experiments.” “In these cases the pharmacophore is used as a template for building more effective drugs.” “It is expected that our techniques and results will prove useful in other applications such as molecular database screening and comparative molecular field analysis.”

Pharmacophore Identification Problem General: “Given a set of ligands that interact with the same receptor, find geometric invariants of these ligands.” CS Terms: “[Find] a set of features embedded in R 3 that is present in one or more valid conformations of each of the ligands”

What is RAPID’s Strategy (pg3) Problem 1 (Conformational Search) Given a collection of ligands M = {M 1, M 2, … M n }, the degrees of freedom for each of them, and an energy function E, find for each M i a set of conformations C(M i ) = {C i1, C i2, …, C ik i }, such that E(C ij ) TOLERANCE for l != j and 1 <= j, l <= k i, where THRESHOLD and TOLERANCE are pre-specified values and d(.,.) is a distance function Problem 2 (Invariant Identification ) Given a collection of ligands M = {M 1, M 2, … M n }, where each M i has a set of conformations C(M i ) = {C i1, C i2, …, C ik i }, determine a set of labeled points S in R 3 with the property that for all i E {1, …, N} there exists some Cij E C(M i ) such that S is congruent to some subset of Cij. A solution S, if it exists, is called an invariant of M. In RAPID the identification of geometric invariants in a collection of flexible ligands denoted by M = { M1, M2, … Mn }, is treated as a two-stage process addressing the two following problems: In practice the input may contain ligands that do not contain the pharmacophore This requires us to consider a relaxation of Problem 2 above where a geometric invariant need only be present in conformations of some K of the N molecules

Conformational Search (pg4) “In, practice, only the torsional degrees of freedom are considered since these are the ones that exhibit large variations in their values.” “We obtain a random conformation by selecting each degree of freedom from its allowed range according to a user-specified distribution.” “An efficient minimizer is then used to obtain conformations at local energy minima” (most time-consuming step).

“To obtain a representative set of conformations from our sample we partition it into sets that reflect geometric similarity as captured by the distance measure DRMS…This transformation is computed using a basis of three predefined atoms…The clustering algorithm…is an approximation algorithm that runs in time O(nk) where n are the conformations to be clustered and k is the number of clusters, and guarantees a solution within twice the optimal value.” “The centers of the clusters are returned as representatives of the possible conformations of the molecule.”

Why use a randomized technique? “A systematic procedure has a higher chance of missing the irregularly shaped basins of attraction of the energy landscape of the molecule.”

Identification of Invariants Pairwise Matching Multiple Matching

Pairwise Matching: MATCH (pg5) BASIC-SAMPLE For some constant c perform c log n/ α 3 iterations of the following process: sample a triplet of points randomly from P1; determine three points in P2 congruent to this set; compute the resulting induced transformation and determine the number of points in P1 matching corresponding points in P2; and if this number exceeds n declare SUCCESS. Theorem 1 Given a common subset S of size |S| >= αn, the probability that BASIC- SAMPLE fails to declare SUCCESS is O(1/n). Theorem 2 BASIC-SAMPLE runs in time O~(n^2.8/ α^3) using space O(n^2). Runtime profiling revealed that BASIC-SAMPLE examines many spurious triples, i.e. tuples that do not yield a large invariant We propose the following modification of the random sampling procedure to handle this problem PARTITION-SAMPLE For some constant c, perform c log n iterations of the following process: randomly select two subsets A and B of size 1/ α from P1; also select a subset C of size 1/ α from P2; store all distances d(p, q) for all p E C and q E P 2 - C in a hash table for every triangle (a, b, q) with a E A, b E B, and p E P1 – (A U B), probe for d(p,a) and d(p,b) in the hash table to determine all matching triplets (c, p1, p2) with c E C and p1, p2 E p2 – C; finally as before, if the resulting transformation induces a match of more than n points declare SUCCESS. Theorem 3 Given a common subset S of size |S| >= αn, the probability that PARTITION-SAMPLE fails to declare SUCCESS is O(1/n). Theorem 4 PARTITION-SAMPLE runs in time O~(n^3.4/ α^3) using space O(n/ α 2 ).

Multiple Matching (pg6) Perform multiple pair-wise MATCH calls so that each of the molecules are examined for the pharmacophore. “We use a marking scheme to keep track of the number of times an invariant fails to match against a molecule, and reject those invariants which exceed the maximum allowed number of failures.”

Results Demonstration of randomized technique for finding conformations. Partition-Sample works better and faster even though it has a worse big-O runtime, because BASIC-SAMPLE examines many useless triples. It found a 7-atom pharmacophore that existed in all 4 molecules.

Question?