Genomics research paper presentation

Slides:



Advertisements
Similar presentations
Alexandria Digital Library Project Integration of Knowledge Organization Systems into Digital Library Architectures Linda Hill, Olha Buchel, Greg Janée.
Advertisements

An Information Retrieval and Extraction System for C. elegans Literature.
The Jikitou Biomedical Question Answering System: Using High-Performance Computing to Preprocess Possible Answers Michael A. Bauer 1,2, Daniel Berleant.
EventCube Aviation Safety Data Analysis System Fangbo Tao, Xiao Yu, Jiawei Han 08/10/13.
Textpresso Application and Extensibility Eimear Kenny GMOD Meeting, April 2004.
Ontology Notes are from:
GMOD Meeting, May 2005 Patent Pending, Caltech Proprietary Textpresso Search engine for Biomedical Literature ~Eimear Kenny~
Automating Discovery from Biomedical Texts Marti Hearst & Barbara Rosario UC Berkeley Agyinc Visit August 16, 2000.
A Flexible Workbench for Document Analysis and Text Mining NLDB’2004, Salford, June Gulla, Brasethvik and Kaada A Flexible Workbench for Document.
Toward Automatic Processing and Indexing of Microfilm.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
BioText Infrastructure Ariel Schwartz Gaurav Bhalotia 10/07/2002.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
The Jikitou Biomedical Question Answering System: Using a Syntactic Parser to Rank Possible Answers Michael A. Bauer 1,2, Daniel Berleant 1, Robert E.
B IOMEDICAL T EXT M INING AND ITS A PPLICATION IN C ANCER R ESEARCH Henry Ikediego
International Atomic Energy Agency INIS Training Seminar Principles of Information Retrieval and Query Formulation 07 – 11 October 2013 Vienna, Austria.
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
Indexing Knowledge Daniel Vasicek 2014 March 27 Introduction Basic topic is : All Human Knowledge Who Cares? Simple Examples.
Information Need Question Understanding Selecting Sources Information Retrieval and Extraction Answer Determina tion Answer Presentation This work is supported.
How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the.
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
Flexible Text Mining using Interactive Information Extraction David Milward
RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,
©2003 Paula Matuszek CSC 9010: Text Mining Applications Document Summarization Dr. Paula Matuszek (610)
Improving Curation Efficiency: User Contributions and Textpresso-Based Semi-Automation SAB 2008 WormBase Literature Curators Textpresso.
生物資訊程式語言應用 Part 5 Perl and MySQL Applications. Outline  Application one.  How to get related literature from PubMed?  To store search results in database.
Relevance Detection Approach to Gene Annotation Aid to automatic annotation of databases Annotation flow –Extraction of molecular function of a gene from.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.
Copyright OpenHelix. No use or reproduction without express written consent1.
Oct.27, 2003 Curator Meeting, Oct Gene Expression Curation ~WormBase, 2003 ~
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Basics of Information Retrieval and Query Formulation Bekele Negeri Duresa Nuclear Information Specialist.
Mining the Biomedical Research Literature Ken Baclawski.
A collaborative tool for sequence annotation. Contact:
1 MedAT: Medical Resources Annotation Tool Monika Žáková *, Olga Štěpánková *, Taťána Maříková * Department of Cybernetics, CTU Prague Institute of Biology.
Intelligent Database Systems Lab Presenter : JHOU, YU-LIANG Authors : Jae Hwa Lee, Aviv Segev 2012 CE Knowledge maps for e-learning.
An Ontology-based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design Feng Wang, Lanfen Lin, Zhou Yang College.
Intelligent Database Systems Lab Presenter: YU-TING LU Authors: Yong-Bin Kang, Pari Delir Haghighi, Frada Burstein ESA CFinder: An intelligent key.
Bio-Medical Text Mining with Python Jaganadh G Carlos Rodriguez-Penagos.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Chapter 2: Hypothesis development: Where research questions come from.
Review of Related Literature
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Sharing lessons through effective modelling
Systems Biology Tools for working with BIND data
Biomedical Text Mining and Its Applications
Improving Data Discovery Through Semantic Search
An Information Retrieval and Extraction System for C
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Concept of a document Lesson 3.
Social Knowledge Mining
The practice report format
Introduction to Search Engines
Extracting Semantic Concept Relations
Beyond PubMed--Next Generation Literature Searching
How to Use “Indian Citation Index (ICI)”
Introduction of KNS55 Platform
Advanced search techniques in databases
Bibliometric Analysis of Quality of Life Publication
Batyr Charyyev.
Introduction to Information Retrieval
How to Search in PubMed and ESGO Journal
Cell Biology and Genetics
Introduction to Search Engines
Information Retrieval
Presentation transcript:

Genomics research paper presentation Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature Hans-Michael Mu ̈ller, Eimear E. Kenny, Paul W. Sternberg* Division of Biology and Howard Hughes Medical Institute, California Institute of Technology, Pasadena, California, United States of America Presented by: Saghan Mudbhari

Introduction In this research, the authors build an ontology to support text mining in research papers published in the domain of Genomics. Definition of Ontology: Ontology is a formal representation of the knowledge by a set of concepts within a domain and the relationships between those concepts. Source of diagram: http://slidewiki.org/print/deck/11936

Motivation Suppose if someone wants to know what role gene “lin-12” plays in anchor cell, they would type lin-12 anchor cell as search query. But if they want to know which genes are responsible for functions of anchor cell then they may not be able to type all genes that are responsible. A generic word ‘gene’ and ‘anchor cell’ needs to be posed as query. So, we need to create an ontology to store what possible objects can the concept ‘gene’ store to return relevant results.

Contributions Creation of Ontology Created from 3800 papers in Caenorhabditis elegans. It uses ‘Gene Ontology(G0)’ as a reference to create categories. 30 out of 33 categories they created are also present in GO. Natural language used by researchers in the field to describe relationships form additional categories. (for example, ‘‘expressed,’’ ‘‘lineage,’’ ‘‘bound,’’ ‘‘required for’’). Wormbase and PubMed/NCBI are also used to populate Ontology with list of terms. Searchability of full text Recall for keyword search is ~94% in full text compared to ~44% in abstract search.

Main idea Textpresso splits papers into sentences, and sentences into words or phrases. Each word is labelled using XML into one of 33 categories. Regular expressions are used to label words into categories. 14,500 Regex created. The keywords and tags in the corpus are indexed to make the search in database fast.

33 categories

Query: (Gene) (Regulation/Association category) (Gene)

Thank you!