FLAIRS '991 Applying the SUBDUE Substructure Discovery System to the Chemical Toxicity Domain Ravindra N. Chittimoori, Diane J. Cook, Lawrence B. Holder.

Slides:



Advertisements
Similar presentations
Analysis of High-Throughput Screening Data C371 Fall 2004.
Advertisements

3D Molecular Structures C371 Fall Morgan Algorithm (Leach & Gillet, p. 8)
GRAPH-BASED HIERARCHICAL CONCEPTUAL CLUSTERING by Istvan Jonyer, Lawrence B. Holder and Diane J. Cook The University of Texas at Arlington.
Applications of knowledge discovery to molecular biology: Identifying structural regularities in proteins Shaobing Su Supervisor: Dr. Lawrence B. Holder.
A Study on Feature Selection for Toxicity Prediction*
Structural Web Search Using a Graph-Based Discovery System Nitish Manocha, Diane J. Cook, and Lawrence B. Holder University of Texas at Arlington
Data Mining in DNA: Using the SUBDUE Knowledge Discovery System to Find Potential Gene Regulatory Sequences by Ronald K. Maglothin.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
Discovering Substructures in Chemical Toxicity Domain Masters Project Defense by Ravindra Nath Chittimoori Committee: DR. Lawrence B. Holder, DR. Diane.
Graph-Based Concept Learning Jesus A. Gonzalez, Lawrence B. Holder, and Diane J. Cook Department of Computer Science and Engineering University of Texas.
Statistical Relational Learning for Link Prediction Alexandrin Popescul and Lyle H. Unger Presented by Ron Bjarnason 11 November 2003.
Structural Knowledge Discovery Used to Analyze Earthquake Activity Jesus A. Gonzalez Lawrence B. Holder Diane J. Cook.
Graph-Based Data Mining Diane J. Cook University of Texas at Arlington
Chapter 11 Introduction to Organic Chemistry
FLAIRS Graph-Based Concept Learning Jesus Gonzalez, Lawrence Holder and Diane Cook Department of Computer Science and Engineering The University.
Subdue Graph Visualizer by Gayathri Sampath, M.S. (CSE) University of Texas at Arlington.
Improving sub national estimates of disability prevalence Alan Marshall Manchester University.
GUI implementation for Supervised and Unsupervised SUBDUE System.
Graph-based Learning and Discovery Diane J. Cook University of Texas at Arlington
Workshop1 Efficient Mining of Graph-Based Data Jesus Gonzalez, Istvan Jonyer, Larry Holder and Diane Cook University of Texas at Arlington Department.
Property Prediction and CAMD CHEN 4470 – Process Design Practice Dr. Mario Richard Eden Department of Chemical Engineering Auburn University Lecture No.
UV Irradiation of Interstellar Medium : Production of Organic Materials Evidence of Life in Meteorites: Origin of Life on Earth? 8 th Dec 2006 Yuki Shiraji.
Mining Scientific Data Sets Using Graphs George Karypis Department of Computer Science & Engineering University of Minnesota (Michihiro Kuramochi & Mukund.
Automatic assignment of NMR spectral data from protein sequences using NeuroBayes Slavomira Stefkova, Michal Kreps and Rudolf A Roemer Department of Physics,
Graduate Research Symposium 2014William G. Lowrie Dept. of Chemical and Biomolecular Engineering Evaluating the potential toxicity of chemical compounds.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Table 1.7 Average Bond Dipoles of Selected Covalent Bonds Bond Dipoles.
Lecture Notes Alan D. Earhart Southeast Community College Lincoln, NE Chapter 23 Organic Chemistry John E. McMurry Robert C. Fay CHEMISTRY Fifth Edition.
9 4.4 FUNCTIONAL GROUPS If one or more hydrogens are replaced by a new bond CH H CO CCCC HH or a different atom a Functional Group is created.
1 SUBSTRUCTURE DISCOVERY IN REAL WORLD SPATIO-TEMPORAL DOMAINS Jesus A. Gonzalez Supervisor:Dr. Lawrence B. Holder Committee:Dr. Diane J. Cook Dr. Lynn.
Optimizing Target Interactions
Introduction to ILP ILP = Inductive Logic Programming = machine learning  logic programming = learning with logic Introduced by Muggleton in 1992.
CHAPTER 23 ORGANIC CHEMISTRY. The Nature of Organic Molecules Carbon is tetravalent. It has four outer-shell electrons (1s 2 2s 2 2p 2 ) and forms four.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM.
TOPIC 11 – ORGANIC CHEMISTRY. TOPIC 11 – Regents Review Organic compounds consist of carbon atoms bonded to each other in chains, rings, and networks.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Lhasa ICH M7 Database – Use Cases Dr Angela White.
Organic Chemistry Larry Scheffler Lincoln High School Portland, OR.
Chapter 11 Introduction to Organic Chemistry: Alkanes
Chapter 4: Carbon. Carbon Overview: Carbon—The Backbone of Biological Molecules All living organisms are made up of chemicals based mostly on the element.
University at BuffaloThe State University of New York Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Frequent.
Functional Groups. Groups of atoms attached to a carbon chain that determine the chemistry of the molecule Usually combinations of C and H Identify and.
Hydrocarbons. Hydrocarbons Simplest organic compounds containing only carbon and hydrogen.
Extracting the Common Structure of Compounds to Induce Plant Immunity Activation using ILP 20/08/2015 Department of Industrial Administration, Faculty.
Dendral: A Case Study Lecture 25.
1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery.
Catalyst TM What is Catalyst TM ? Structural databases Designing structural databases Generating conformational models Building multi-conformer databases.
Introduction to Chemoinformatics and Drug Discovery Irene Kouskoumvekaki Associate Professor February 15 th, 2013.
Organic Chemistry Unit. What is Organic Chemistry? The study of carbon- containing compounds made up of non-metal elements (covalent bonds)
Organic Chemistry New Way Chemistry for Hong Kong A-Level Book 3A2 2 Course Name: Organic Chemistry for Medical Students Course Code : CLS 232 Instructor.
1 Mining Images of Material Nanostructure Data Aparna S. Varde, Jianyu Liang, Elke A. Rundensteiner and Richard D. Sisson Jr. ICDCIT December 2006 Bhubaneswar,
Use of Machine Learning in Chemoinformatics

1 A Methodology for automatic retrieval of similarly shaped machinable components Mark Ascher - Dept of ECE.
Organic Chemistry Organic chemistry is the study of carbon based compounds - This field of chemistry is very important because all living things and many.
UNIT PLAN: FROM ATOMS TO POLYMERS Father Judge High School Grade 9 Physical Science Mr. A. Gutzler.
Frequent Sub-Structure-Based Approaches for Classifying Chemical Compounds Mukund Deshpande, Michihiro Kuramochi, George Karypis University of Minnesota,
Improving Parallelism in Structural Data Mining Min Cai, Istvan Jonyer, Marcin Paprzycki Computer Science Department, Oklahoma State University, Stillwater,
Toxicity vs CHEMICAL space
DRUG DESIGN: OPTIMIZING TARGET INTERACTIONS
Yiming Yang1,2, Abhay Harpale1 and Subramanian Ganaphathy1
Alkenes, Alkynes and Functional Groups
Supervised Time Series Pattern Discovery through Local Importance
Question Circle and label the functional groups found in the following compounds. alcohol alkene ether ketone amine.
On Efficient Graph Substructure Selection
2.1 UNSATURATED HYDROCARBONS
CH 2-3 Survey of other Functional Groups in Organic Compounds
Organic Chemistry An Introduction.
Functional Groups.
Presentation transcript:

FLAIRS '991 Applying the SUBDUE Substructure Discovery System to the Chemical Toxicity Domain Ravindra N. Chittimoori, Diane J. Cook, Lawrence B. Holder Lawrence B. Holder Department of Computer Science and Engineering University of Texas at Arlington

FLAIRS '992 Motivation and Goal b Ever-increasing number of chemical compounds in use today (~100,000). b Needs to identify relationships between the molecular structure and the toxicity of a chemical compound. b Apply knowledge discovery to the U.S. National Toxicology Program (NTP) to identify such relationships.

FLAIRS '993 Knowledge Discovery in SUBDUE b Structural discovery system b Graph-based input representation b Beam search through substructure (subgraph) space b Graph compression heuristic based on minimum description length b Inexact, polynomial graph match

FLAIRS '994 object triangle R1 C1 S1 S2 S3S4 Input DatabaseSubstructure S1 (graph form) Compressed Database R1 C1 object square on shape T1 T2 T3T4 SUBDUE Example

FLAIRS '995 Chemical Toxicity Domain b Database of 367 chemicals b Levels of evidence assigned by NTP CE: clear evidence of cancerous activityCE: clear evidence of cancerous activity SE: some evidenceSE: some evidence E: equivocal evidenceE: equivocal evidence NE: no evidenceNE: no evidence

FLAIRS '996 Predictive Toxicology Evaluation b Predictive Toxicology Evaluation (PTE) challenge b PTE-2 ended November b PTE-3 scheduled for July July 2000

FLAIRS '997 Chemical Toxicity Data b Atoms (name, type, partial charge) b Bonds (type) b Chemical groups Alcohol, amine, amino, benzene, ester, ether, ketone, methanol, methyl, nitro, phenol and sulfideAlcohol, amine, amino, benzene, ester, ether, ketone, methanol, methyl, nitro, phenol and sulfide

FLAIRS '998 Chemical Toxicity Data b Carcinogenicity-related tests AmesAmes ChromexChromex ChromaberrChromaberr DrosophiliaDrosophilia Mouse-LymphMouse-Lymph Salmonella AssaySalmonella Assay

FLAIRS '999 Chemical Compound Representation

FLAIRS '9910 Input Representation b Sample Atomic Structure b SUDBUE graph input C H 1 v 1 atom v 2 C v 3 atom v 4 H d 1 2 name d 3 4 name u 1 3 1

FLAIRS '9911 Methodology b Training set further divided into learning and testing sets b Find best substructures in learning-set positives not prevalent in negatives b Find occurrences of substructure in testing

FLAIRS '9912 Results b b Learning set: 268 Positive compounds: 134/143 Negative compounds: 24/125 b b Testing set: 30 Positive compounds: 15/19 Negative compounds: 4/11 atom 10 c n tp atom br n tp

FLAIRS '9913 atom 10 c n tp atom 1 h n tp 0.34 atom 32 n n tp  atom h n tp Results b Learning set: 268 Positive compounds: 60/143Positive compounds: 60/143 Negative compounds: 0/125Negative compounds: 0/125 b Testing set: 30 Positive compounds: 8/19Positive compounds: 8/19 Negative compounds: 0/11Negative compounds: 0/11

FLAIRS '9914 Discussion b Consistent with results obtained by ILP system PROGOL (Srinivasan et al., ILP-97). b Groups discovered by SUBDUE (e.g., Amino) are unique substructures found only in compounds which test positive on carcinogenicity.

FLAIRS '9915 Conclusion b SUBDUE has the ability to discover interesting patterns (substructures) that might be helpful in predicting carcinogenicity. b SUBDUE is suitable for knowledge discovery in the chemical toxicity domain.

FLAIRS '9916 Future Research b Applying concept-learning SUBDUE to the chemical toxicity database Find substructures compressing positive graph, but not negative graphFind substructures compressing positive graph, but not negative graph b Incorporate more domain knowledge b PTE-3 challenge (July 1999)