GraphSig: Mining Significant Substructures in Compound Libraries 1.

Slides:



Advertisements
Similar presentations
ADBIS 2007 Discretization Numbers for Multiple-Instances Problem in Relational Database Rayner Alfred Dimitar Kazakov Artificial Intelligence Group, Computer.
Advertisements

Analysis of High-Throughput Screening Data C371 Fall 2004.
Data Mining Tools Overview Business Intelligence for Managers.
 Data mining has emerged as a critical tool for knowledge discovery in large data sets. It has been extensively used to analyze business, financial,
gSpan: Graph-based substructure pattern mining
www.brainybetty.com1 MAVisto A tool for the exploration of network motifs By Guo Chuan & Shi Jiayi.
Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.
PharmaMiner: Geometric Mining of Pharmacophores 1.
Intelligent Systems and Software Engineering Lab (ISSEL) – ECE – AUTH 10 th Panhellenic Conference in Informatics Machine Learning and Knowledge Discovery.
Date : 21 st of May, Shri Ramdeo Baba College of Engineering and Management Presentation By : Rimjhim Singh Under the Guidance of: Dr. M.B. Chandak.
Discovering Substructures in Chemical Toxicity Domain Masters Project Defense by Ravindra Nath Chittimoori Committee: DR. Lawrence B. Holder, DR. Diane.
COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel.
Active Learning Strategies for Drug Screening 1. Introduction At the intersection of drug discovery and experimental design, active learning algorithms.
Active Learning Strategies for Compound Screening Megon Walker 1 and Simon Kasif 1,2 1 Bioinformatics Program, Boston University 2 Department of Biomedical.
Data Mining – Intro.
Mining Scientific Data Sets Using Graphs George Karypis Department of Computer Science & Engineering University of Minnesota (Michihiro Kuramochi & Mukund.
FAST FREQUENT FREE TREE MINING IN GRAPH DATABASES Marko Lazić 3335/2011 Department of Computer Engineering and Computer Science,
MDL Keys Revisited Joseph L. Durant, Burton A. Leland, Douglas R. Henry and James G. Nourse MDL Information Systems.
Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.
Topological Summaries: Using Graphs for Chemical Searching and Mining Graphs are a flexible & unifying model Scalable similarity searches through novel.
PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
Text Mining: Fast Phrase-based Text Indexing and Matching Khaled Hammouda, Ph.D. Student PAMI Research Group University of Waterloo Waterloo, Ontario,
Xiangnan Kong,Philip S. Yu Department of Computer Science University of Illinois at Chicago KDD 2010.
1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.
Ozgur Ozturk, Ahmet Sacan, Hakan Ferhatosmanoglu, Yusu Wang The Ohio State University LFM-Pro: a tool for mining family-specific sites in protein structure.
Institute for Advanced Studies in Basic Sciences – Zanjan Kohonen Artificial Neural Networks in Analytical Chemistry Mahdi Vasighi.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Multi-Relational Data Mining: An Introduction Joe Paulowskey.
SimFinder: A Unique Topology- based Approach to Similarity Searching 1.
University at BuffaloThe State University of New York Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Frequent.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Computer-aided drug discovery (CADD)/design methods have played a major role in the development of therapeutically important small molecules for several.
PharmaMiner: Geometric Mining of Pharmacophores 1.
Catalyst TM What is Catalyst TM ? Structural databases Designing structural databases Generating conformational models Building multi-conformer databases.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
PubChem: An Open Repository for Chemical Structure and Biological Activity Information Steve Bryant The NIH Biowulf Cluster: 10 Years of Scientific Supercomputing.
Use of Machine Learning in Chemoinformatics
Graph Indexing From managing and mining graph data.
Computational Approach for Combinatorial Library Design Journal club-1 Sushil Kumar Singh IBAB, Bangalore.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Subgraph Search Over Uncertain Graphs Erşan Demircioğlu.
Indiana University School of Indiana University ECCR Summary Infrastructure: Cheminformatics web service infrastructure made available as a community resource.
Page 1 Computer-aided Drug Design —Profacgen. Page 2 The most fundamental goal in the drug design process is to determine whether a given compound will.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Analysis of bio-molecular networks through RANKS (RAnking of Nodes
Data Mining 101 with Scikit-Learn
Hybrid Features based Gender Classification
Unsupervised Learning and Autoencoders
Classification and Prediction
Machine Learning Week 1.
Virtual Screening.
Graph Database Mining and Its Applications
Discriminative Frequent Pattern Analysis for Effective Classification
Benjamin Wooden, Nicolas Goossens, Yujin Hoshida, Scott L. Friedman 
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Megon Walker Bioinformatics Program Boston University
Deep Learning for Plant Stress Phenotyping: Trends and Future Perspectives  Asheesh Kumar Singh, Baskar Ganapathysubramanian, Soumik Sarkar, Arti Singh 
CNS BBB Library ChemDiv Inc..
Word embeddings (continued)
Objective- To graph a relationship in a table.
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Presentation transcript:

GraphSig: Mining Significant Substructures in Compound Libraries 1

GraphSig Input: Diverse background database Libraries of compounds with specific activity Output: Prioritized list of significant substructures Provides powerful insight into structure-activity relationship in the form of significant substructures 2

Applications of GraphSig What makes compounds active against a target? – Develop pharmacophore models based on results What substructures impart specific biological activity (e.g., BBB permeability)? – Screen compounds for similar activity 3

Key Benefits Only automatic tool that identifies significant substructures – No other tools can mine structure- activity relationship based on topology – Unique use of a background statistical model derived from diverse compound libraries. Scales to large databases – Two orders of magnitude faster than alternatives 4

Validation Studies Briem & Lessel dataset – Find substructures specific to activity classes, and their significance as well as support. hERG dataset – Find substructures that are significant for toxicity. Permeability Datasets (Oral bioavailability, BBB barrier) – Find substructures that are significant for permeability. 5

Briem and Lessel (BL) Dataset 957 compound subset of MDL Drug Data Report (MDDR) classified by biological activity. These are summarized in the table below. 6

Interesting Substructures from the BL Dataset 7 Found in HMG-CoA inhibitors only Found in PAF Antagonists and 5HT3 inhibitors only Found in all ACE Inhibitors Found in PAF Antagonists and ACE inhibitors only Found in PAF and TXA Antagonists and 5HT3 inhibitors only Found in PAF Antagonists and TXA inhibitors only Found in PAF Antagonists only

hERG Dataset This dataset consists of compounds known to be active or inactive as hERG blockers. Here are some interesting substructures that we found on applying GraphSig to this dataset. 8 Significant substructures contained only in hERG blockers Significant substructures contained in compounds that are not hERG blockers

BBB Permeability BBBDataSet – 1593 compounds labeled as BBB permeable or not Significant substructures belong exclusively to either the permeable or non-permeable set, not both. Hence, substructures are representative of each set. The frequency of occurrence of these substructures is very low (0.2-5%). Significant s ubstructures would not be found on the basis of frequency. Classification Accuracy of 82% with 5-fold cross validation 9

Connections to Other Acelot Tools Once a substructure is identified, use SimFinder for similarity searching. Perform 3-d alignment of compounds that contain the specific structure and find well-aligned pharmacophoric features 10

Technical Details: Mining Graph DB Feature vectors Significant feature vectors RWR on graphs Feature Vector Mining Sets of subgraphs Significant subgraphs Frequent subgraph mining high frequency threshold 11

Technical Details: in silico Prediction 12

References Huahai He; Ambuj K. Singh; GraphRank: Statistical Modeling and Mining of Significant Subgraphs in the Feature Space. Proceedings of the 6th IEEE International Conference on Data Mining (ICDM), December, 2006, doi: /ICDM doi: /ICDM Sayan Ranu and Ambuj Singh, GraphSig: A Scalable Approach to Mining Significant Subgraphs in Large Graph Databases in 25th International Conference on Data Engineering (ICDE), 2009.GraphSig: A Scalable Approach to Mining Significant Subgraphs in Large Graph Databases Sayan Ranu; Ambuj Singh; Mining Statistically Significant Molecular Substructures for Efficient Molecular Classification., J. Chem. Inf. Model., 2009, 49(11), pp 2537–2550 DOI : /ci900035zDOI : /ci900035z Hans Briem and Uta F Lessel, Perspect. Drug Discov. Dec (20),