1 Semantic Relations for Interpreting DNA Microarray Data and for Novel Hypotheses Generation Dimitar Hristovski, 1 PhD, Andrej Kastrin, 2 Borut Peterlin,

Slides:



Advertisements
Similar presentations
CPL The Convergence of Bioinformatics and Medical Informatics -- PL Chang, M.D.
Advertisements

Mining Association Rules from Microarray Gene Expression Data.
Semantic indexing in PubMed CERN Workshop on Innovations in Scholarly Communication (OAI8) CERN Workshop on Innovations in Scholarly Communication (OAI8)
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Semantic Medline: Multi-Document Summarization and Visualization Thomas C. Rindflesch, Ph.D. Marcelo Fiszman, M.D., Ph.D. Halil Kilicoglu, M.S. Lister.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
1 Knowledge Management for Disease Coding (KMDC): Background & Introduction Timothy Hays, Ph.D. Project Manager, Knowledge Management for Disease Coding.
Curators’ Meeting Oct. 27, 2003 Clustering MeSH Representations of Medical Literature Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and.
Distinguishing Regulators of Biomolecular Pathways Mentor: Dr. Xiwei Wu City of Hope Sean Caonguyen SoCalBSI 8/21/08.
Literature Informatics Beyond PubMed: Next Generation Literature Searching Carrie Iwema, PhD, MLS 24 th August 2011.
Integrating Literature and Experimental Data Fan Meng, Ph.D. Microarray Laboratory Psychiatry Department and Molecular & Behavioral Neuroscience Institute.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
1 CIS607, Fall 2006 Semantic Information Integration Instructor: Dejing Dou Week 10 (Nov. 29)
Social Pharmacy and Pharmacoepidemiology Lister Hill National Center for Biomedical Communications Text-based Discovery in Biomedicine The Architecture.
Literature Mining Tools for Analysis of Genomic Data Ramin Homayouni, Ph.D. Associate Professor of Biology Director of Bioinformatics UTHSC BINF April.
1 Ontology Generation Based on a User-Specified Ontology Seed Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University.
B IOMEDICAL T EXT M INING AND ITS A PPLICATION IN C ANCER R ESEARCH Henry Ikediego
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
KEY CONCEPT Genetics provides a basis for new medical treatments.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
Introduction to Precision Medicine
Representing, Querying and Mining Knowledge about Autism Phenotypes
Information Systems Basic Core Specialization Clinical Imaging BioInformatics Public Health Computer Science Methods (formal models) Biomedical Decision.
Linking Diseases and Genes through Informatics Knowledge Bases and Ontologies Joyce A. Mitchell, Ph.D. National Library of Medicine University of Missouri.
Bioinformatics and medicine: Are we meeting the challenge?
Session II: Scientific Publishing and Semantic Web W3C Semantic Web for Life Sciences Workshop October 27, 2004 Moderator: Alan R. Aronson.
Annual reports and feedback from UMLS licensees Kin Wah Fung MD, MSc, MA The UMLS Team National Library of Medicine Workshop on the Future of the UMLS.
Semantics and Literature 1 st HCLSIG Meeting Cambridge January 2006 Davide Zaccagnini MD, MS.
Outline Quick review of GS Current problems with GS Our solutions Future work Discussion …
Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications.
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
A New Oklahoma Bioinformatics Company. Microarray and Bioinformatics.
 CiteGraph: A Citation Network System for MEDLINE Articles and Analysis Qing Zhang 1,2, Hong Yu 1,3 1 University of Massachusetts Medical School, Worcester,
Integrated Biomedical Information for Better Health Workprogramme Call 4 IST Conference- Networking Session.
Biomedical Databases & Tools Rolando Garcia-Milian Biomedical & Health Information Services Department Health Sciences Center Library.
Survey of Medical Informatics CS 493 – Fall 2004 September 27, 2004.
Flexible Text Mining using Interactive Information Extraction David Milward
Literature Based Discovery Dimitar Hristovski Institute of Biomedical Informatics, Faculty of Medicine, University of Ljubljana,
From biomedical informatics to translational research
Overview  Introduction  Biological network data  Text mining  Gene Ontology  Expression data basics  Expression, text mining, and GO  Modules and.
Relevance Detection Approach to Gene Annotation Aid to automatic annotation of databases Annotation flow –Extraction of molecular function of a gene from.
Knowledge-Based Semantic Interpretation for Summarizing Biomedical Text Thomas C. Rindflesch, Ph.D. Marcelo Fiszman, M.D., Ph.D. Halil Kilicoglu, M.S.
Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.
The Gene Ontology and its insertion into UMLS Jane Lomax.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Extracting Semantic Predication from Medline Citations for Pharmacogenomics C.B. Ahlers 1, M. Fiszman 2, D.D. Fushman 1, F.M. Lang 1 and T.C. Rindflesch.
Overview of Bioinformatics 1 Module Denis Manley..
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
1 Literature-Based Knowledge Discovery using Natural Language Processing Dimitar Hristovski, 1 PhD, Carol Friedman, 2 PhD, Thomas C Rindflesch, 3 PhD,
An overview of Bioinformatics. Cell and Central Dogma.
Overview of Bioinformatics Module Denis Manley.. Contact Details Lecturer Name: Denis Manley Room number: KE-1-013a
Opportunities for Text Mining in Bioinformatics (CS591-CXZ Text Data Mining Seminar) Dec. 8, 2004 ChengXiang Zhai Department of Computer Science University.
Clinical Decision Support Systems Dimitar Hristovski, Ph.D. Institute of Biomedical.
5/6/04Biolink1 Integrated Annotation for Biomedical IE Mining the Bibliome: Information Extraction from the Biomedical Literature NSF ITR grant EIA
Automatically Identifying Candidate Treatments from Existing Medical Literature Catherine Blake Information & Computer Science University.
Advanced Library Services Developing a Biomedical Knowledge Repository to Support Advanced Information Management Applications Olivier Bodenreider, M.D.,
Literature Mining and Database Annotation of Protein Phosphorylation Using a Rule-based System Z. Z. Hu 1, M. Narayanaswamy 2, K. E. Ravikumar 2, K. Vijay-Shanker.
Effect of Alcohol on Brain Development NormalFetal Alcohol Syndrome.
Consumer Health Question Answering Systems Rohit Chandra Sourabh Singh
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
Daniel R. Harris Center for Clinical and Translational Sciences
Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD
KEY CONCEPT Genetics provides a basis for new medical treatments.
KEY CONCEPT Genetics provides a basis for new medical treatments.
KEY CONCEPT Genetics provides a basis for new medical treatments.
KEY CONCEPT Genetics provides a basis for new medical treatments.
Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan
KEY CONCEPT Genetics provides a basis for new medical treatments.
KEY CONCEPT Genetics provides a basis for new medical treatments.
Presentation transcript:

1 Semantic Relations for Interpreting DNA Microarray Data and for Novel Hypotheses Generation Dimitar Hristovski, 1 PhD, Andrej Kastrin, 2 Borut Peterlin, 2 MD PhD, Thomas C Rindflesch, 3 PhD 1 Institute of Biomedical Informatics, Medical Faculty, University of Ljubljana, Slovenia 2 Institute of Medical Genetics, University Medical Centre, Ljubljana, Slovenia 3 National Library of Medicine, National Institutes of Health, Bethesda, MD, U.S.A.

2 Introduction Microarray experiments: great potential to support progress in biomedical research, results NOT EASY to interpret, information about functions and relations of relevant genes needs to be extracted from the vast biomedical literature

Related Work Text mining and microarray analysis Literature-based Discovery

4 Proposed Solution Computerized text analysis system Extract semantic relations from literature –SemRep Integrate with microarray experiments Develop tools for: –Interpretation –Novel hypotheses generation

Overall Design Medline GEO SemRep Sem.rels Extraction R Bioconductor scripts Integrated Database= semantic relations + microarrays Interpretation & Discovery Tools semantic relations microarrays

SemRep Extracts semantic relations from biomedical text (implemented in Prolog) Based on UMLS Metathesaurus and Semantic Network – SEMNET RELATION Database of relations extracted from MEDLINE –6. 7 M citations (01/01/1999 through 03/31/2009) –43M sentences –21M relation instances –7M relation types 6

7 Semantic Relations Extracted Wide range of relations in : –Clinical medicine –Molecular genetics –Pharmacogenomics Genetic Etiology: associated_with, predisposes, causes Substance Relations: interacts_with, inhibits, stimulates Pharmacological Effects: affects, disrupts, augments Clinical Actions: administered_to, manifestation_of, treats, Organism Characteristics: location_of, part_of, process_of Co-existence: co-exists_with

8 Examples “… the loss of Mbd1 could lead to autism- like behavioral phenotypes …” Relation: MDB1 causes Autistic Disorder “… Mbd1 can directly regulate the expression of Htr2c, one of the serotonin receptors, …” Relation: MBD1 interacts_with HTR2C

10 Interpretation of Microarrays Find known facts from the literature: Desease related: –Associated genes –Current treatments –… Microarray Genes: –Relations between genes (INHIBITS, STIMULATES, …) –Relations between the genes and anything else

Relations with “Parkinson” as Argument?

What Treats Parkinson?

What (causes, associated_with) Parkinson?

Sentences from which Relations are Extracted

Genes from the Microarray Related to Anything?

16 Novel Hypotheses Generation Based on discovery patterns Discovery patterns: –search templates that have a higher likelihood of returning a new discovery Specific discovery patterns for specific discovery tasks

17 Discovery Patterns Inhibit the upregulated: –Search for substances, genes,... which, according to the literature, inhibit the top N (e.g. 300) genes that are upregulated on a given microarray –Such substances, genes, … might be used to regulate the upregulated genes Stimulate the downregulated: –Search for substances, genes,... which, according to the literature, stimulate the top N (e.g. 300) genes that are downregulated on a given microarray –Such substances, genes, … might be used to regulate the downregulated genes

Discovery Patterns – Graphical View Disease X Maybe_Treats2? Upregulated Downregulated Genes Y1 Genes Y2 Drug Z1 (or substance) Drug Z2 (or substance) Inhibits Stimulates Maybe_Treats1? Microarray Literature

19 Results – Inhibit the Upregulated Parkinson microarray GSE8397 HSP27 (HSPB1) gene is upregulated on the microarray We identified paclitaxel and quercetin as substances that inhibit the expression of this gene

Inhibit the Upregulated

21 Results – Stimulate the Downregulated NR4A2 downregulated on the microarray We found out that: – Pramipexol stimulates expression of NR4A2 – NR4A2 is associated with Parkinson disease

Explaining a Relation - Closed Discovery

Closed Discovery – Aligned Relations

Evaluation Estimate – based on [Masseroli, BMC Bioinformatics 2006]: Extract known facts – baseline precision on 2,042 extracted relations: –Gene – Disease (causes, assoc_with, …) P=74.2% –Gene – Gene (inhibits, stimulates, …) P=41.95% Propose Argument-Predicate distance for filtering (Gene-Gene): –At distance no more than 1: P=70.75%; R=43.6% –At distance no more than 2: P=55.88%; R=66.28% We use Argument-Predicate distance for ranking of semantic relations and we show relations more likely to be correct first.

25 Conclusion A new bioinformatics tool for interpretation and novel hypotheses generation Based on integration of semantic relations extracted from literature with microarrays Available at:

Syntactic Processing Mbd1 can directly regulate the expression of Htr2c MedPost tagger and shallow parser [ NP [head([… inputmatch(mdb1),tag(noun)])],... [verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],... NP [… head([… inputmatch(htr2c),tag(noun)])] ] 26

Semantic Processing Identify concepts: MetaMap and ABGene [ NP [head([… semtype(gngm),entrez(MBD1,4152)])],... [verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],... NP [… head([… semtype(gngm),entrez(HTR2C,3358)])] ] 27

Semantic Processing Identify concepts: MetaMap and ABGene [ NP [head([… semtype(gngm),entrez(MBD1,4152)],... [verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],... NP [… head([… semtype(gngm),entrez(HTR2C,3358])] ] Match semantic type patterns to ontology: INTERACTS_WITH 28

Semantic Processing Identify concepts: MetaMap and ABGene [ NP [head([… semtype(gngm),entrez(MBD1,4152)],... [verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],... NP [… head([… semtype(gngm),entrez(HTR2C,3358])] ] Match semantic type patterns to ontology: INTERACTS_WITH 29

Semantic Processing Identify concepts: MetaMap and ABGene [ NP [head([… semtype(gngm),entrez(MBD1,4152)],... [verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],... NP [… head([… semtype(gngm),entrez(HTR2C,3358])] ] Match semantic type patterns to ontology: INTERACTS_WITH Apply indicator rule: Verb(regulate)  INTERACTS_WITH 30

Semantic Processing Identify concepts: MetaMap and ABGene [ NP [head([… semtype(gngm),entrez(MBD1,4152)],... [verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],... NP [… head([… semtype(gngm),entrez(HTR2C,3358])] ] Match semantic type patterns to ontology: INTERACTS_WITH Apply indicator rule: Verb(regulate)  INTERACTS_WITH 31

Semantic Processing Identify concepts: MetaMap and ABGene [ NP [head([… semtype(gngm),entrez(MBD1,4152)],... [verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],... NP [… head([… semtype(gngm),entrez(HTR2C,3358])] ] Match semantic type patterns to ontology: INTERACTS_WITH Apply indicator rule: Verb(regulate)  INTERACTS_WITH Substitute concepts for semantic types: 32

Semantic Processing Identify concepts: MetaMap and ABGene [ NP [head([… semtype(gngm),entrez(MBD1,4152)],... [verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],... NP [… head([… semtype(gngm),entrez(HTR2C,3358])] ] Match semantic type patterns to ontology: INTERACTS_WITH Apply indicator rule: Verb(regulate)  INTERACTS_WITH Substitute concepts for semantic types: 33

Semantic Processing Identify concepts: MetaMap and ABGene [ NP [head([… semtype(gngm),entrez(MBD1,4152)],... [verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],... NP [… head([… semtype(gngm),entrez(HTR2C,3358])] ] Match semantic type patterns to ontology: INTERACTS_WITH Apply indicator rule: Verb(regulate)  INTERACTS_WITH Substitute concepts for semantic types: MBD1 INTERACTS_WITH HTR2C 34