Biomedical Text Mining and Its Applications By: Raul Rodriguez-Esteban Presented By: Ankita Tanwar
INTRODUCTION & MOTIVATION Tutorial for Biologists and Computational Biologists Main motivation to spread awareness Introduces to term BioNLP Introduces main concepts in Text Mining Lists multiple tools like Whatizit, GoPubMed, GoGene, …
TEXT MINING: MAIN CONCEPTS Term & Term Recognization Tools: Whatizit, Abner, GoPubMed/GoGene, … Relationships between terms Tools: MedGene, BioGene, Endeavour, G2G, … Discovering Relationships Tool: Arrowsmith Measure of output Quality F-Measure, i.e., harmonic mean between precision and recall Comprehensive text mining Sources of information: Medline and beyond.
EXAMPLES OF TEXT RECOGNITION
Focusing on Tool GoPubMed By: Ralph Delfs, Andreas Doms, Alexander Kozlenkov, and Michael Schroeder
INTRODUCTION Allows finding information needed through the use of biomedical background knowledge. It doesn't rank, the user does! It retrieves PubMed abstracts for user’s search query and sorts relevant information to the 4 top level categories: What Who Where When
MOTIVATION The biomedical literature grows at a tremendous rate and PubMed comprises over 14.000.000 abstracts Approaches such as protein interactions, pathways, and micro array data aim to improve literature search But, these approaches do not mimic human information foraging
CONTRIBUTIONS Introduction and realization of ontology-based literature search Derived a term-extraction algorithm Derived an induced ontology from the extracted terms
Structure of GENE ONTOLOGY
GoPubMed: Main Idea The main idea is to use GeneOntology to search and browse PubMed Problems to be solved: How to extract GeneOntology terms from PubMed abstract How to construct the relevant sub-ontology of GO
GoPubMed: Term Extraction Use of Regular Expression: \w matches a word \s a space the dot . any single character To repeatedly match an expression there are three operators: ? requires the preceding pattern to appear once or not at all + requires it to appear once at least once * requires it to appear any number of times (including 0)
Term Extraction: Example Keyword searched: cAMP-dependent kinase Seed term: kinase activity Seed Child: cAMP-dependent protein kinase activity Method to search for such a pattern: kinase \w+ cAMP-dependent .* kinase activity
GoPubMed: Induced Ontology Used to avoid unnecessary parts of ontology not relevant to given abstracts. Given an ontology, 𝑂 and a set of terms 𝑇′′, extracted from abstracts, construct a minimal sub-ontology of 𝑂 Find all the intermediate terms, from terms in 𝑇′′ to root
Screenshot of Initial Prototype
Screenshot of Current Application
THANK YOU!!