23-06-2015 DI FC UL1 Gene Function Prediction by Mining Biomedical Literature Pooja Jain Master in Bioinformatics Supervisor - Mário Jorge Costa Gaspar.

Slides:



Advertisements
Similar presentations
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Advertisements

Pathways analysis Iowa State Workshop 11 June 2009.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
LESSONS FROM THE BIOCREATIVE PROTEIN- PROTEIN INTERACTION (PPI) TASK RegCreative Jamboree, Friday, December, 1st, (2006) MARTIN KRALLINGER, 2006 LESSONS.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Global Alignment and Collaboration Jo
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.
1 Enriching UK PubMed Central SPIDER launch meeting, Wolfson College, Oxford Paul Davey, UK PubMed Central Engagement Manager.
Evidence-Based Information Retrieval in Bioinformatics
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Article Review Study Fulltext vs Metadata Searching Brad Hemminger School of Information and Library Science University of North Carolina.
Using the Semantic Web for Web Searches Norman Piedade de Noronha, Mário J. Silva XLDB / LaSIGE, Faculdade de Ciências, Universidade de Lisboa.
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Mining the Medical Literature Chirag Bhatt October 14 th, 2004.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
B IOMEDICAL T EXT M INING AND ITS A PPLICATION IN C ANCER R ESEARCH Henry Ikediego
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Introduction The goal of translational bioinformatics is to enable the transformation of increasingly voluminous genomic and biological data into diagnostics.
>>> Korean BioInformation Center >>> KRIBB Korea Research institute of Bioscience and Biotechnology GS2PATH: Linking Gene Ontology and Pathways Jin Ok.
Knowledge Integration for Gene Target Selection Graciela Gonzalez, PhD Juan C. Uribe Contact:
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Semantic Similarity over Gene Ontology for Multi-label Protein Subcellular Localization Shibiao WAN and Man-Wai MAK The Hong Kong Polytechnic University.
Accomplishments and Challenges in Literature Data Mining for Biology L. Hirschman et al. Presented by Jing Jiang CS491CXZ Spring, 2004.
IProLINK – A Literature Mining Resource at PIR (integrated Protein Literature INformation and Knowledge ) Hu ZZ 1, Liu H 2, Vijay-Shanker K 3, Mani I 4,
TAIR Workshop Model Organism Databases and Community Annotation Plant and Animal Genome XVI Conference, San Diego January 13, 2008.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Automatically Generating Gene Summaries from Biomedical Literature (To appear in Proceedings of PSB 2006) X. LING, J. JIANG, X. He, Q.~Z. MEI, C.~X. ZHAI,
EADGENE and SABRE Post-Analyses Workshop 12-14th November 2008, Lelystad, Netherlands 1 François Moreews SIGENAE, INRA, Rennes Cytoscape.
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
Relevance Detection Approach to Gene Annotation Aid to automatic annotation of databases Annotation flow –Extraction of molecular function of a gene from.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
BioSumm A novel summarizer oriented to biological information Elena Baralis, Alessandro Fiori, Lorenzo Montrucchio Politecnico di Torino Introduction text.
Top Four Essential TAIR Resources Debbie Alexander Metabolic Pathway Databases for Arabidopsis and Other Plants Peifen Zhang.
Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.
Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Multi-agent Systems in Medicine Štěpán Urban. Content  Introduction to Multi-agent Systems (MAS) What is an Agent? Architecture of Agent MAS Platforms.
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.
NLP pipeline for protein mutation knowledgebase construction Jonas B. Laurila, Nona Naderi, René Witte, Christopher J.O. Baker.
A collaborative tool for sequence annotation. Contact:
Bioinformatics and Computational Biology
Opportunities for Text Mining in Bioinformatics (CS591-CXZ Text Data Mining Seminar) Dec. 8, 2004 ChengXiang Zhai Department of Computer Science University.
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
Answering Gene Ontology terms to proteomics questions by supervised macro reading in MEDLINE Julien Gobeill 1, Emilie Pasche 2, Douglas Teodoro 2, Anne-Lise.
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,
Module 5: Future 1 Canadian Bioinformatics Workshops
Retrospective study of a gene by mining texts : The Hepcidin use-case Fouzia Moussouni-Marzolf.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
BioCreAtIvE Critical Assessment for Information Extraction in Biology Granada, Spain, March28-March 31, 2004 Task 2: Functional annotation of gene products.
` Comparison of Gene Ontology Term Annotations Between E.coli K12 Databases REDDYSAILAJA MARPURI WESTERN KENTUCKY UNIVERSITY.
Large Scale Semantic Data Integration and Analytics through Cloud: A Case Study in Bioinformatics Tat Thang Parallel and Distributed Computing Centre,
TDM in the Life Sciences Application to Drug Repositioning *
BME435 BIOINFORMATICS.
Biomedical Text Mining and Its Applications
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Data challenges in the pharmaceutical industry
Department of Genetics • Stanford University School of Medicine
Functional Annotation of the Horse Genome
Modified from slides from Jim Hu and Suzi Aleksander Spring 2016
CSE 635 Multimedia Information Retrieval
Citation-based Extraction of Core Contents from Biomedical Articles
Batyr Charyyev.
Presentation transcript:

DI FC UL1 Gene Function Prediction by Mining Biomedical Literature Pooja Jain Master in Bioinformatics Supervisor - Mário Jorge Costa Gaspar da Silva Co-Supervisor - Jörg Dieter Becker

DI FC UL2 Introduction: Context The central problems of post genomic era: Data Management Annotation of the Data Annotation is Crucial Main source of annotations : Published Literature

DI FC UL3 Introduction: Motivation & Objective The main motivations were, The lack of functional annotations To decrease the time and efforts for manual annotation Prediction of functions for genes from biomedical literature

DI FC UL4 Outline of Presentation  Introduction  Concepts  ProFAL  APEG  Results  Conclusions & Future Directions

DI FC UL5 Some Concepts Text Mining Ontology Information Modeling

DI FC UL6 Outline of Presentation  Introduction  Concepts  ProFAL  APEG  Results  Conclusions & Future Directions

DI FC UL7 Related Work: ProFAL Biological Database Literature Database ProFAL Annotations GOA GO Terms Validated Annotations FiGO Retrieval Extraction Validation Relevant Documents

DI FC UL8 Outline of Presentation  Introduction  Concepts  ProFAL  APEG  Results  Conclusions and Future Directions

DI FC UL9 APEG (Arabidopsis Pollen Expressed Genes) What is APEG ? Repository of Arabidopsis pollen expressed genes Web interface for different user types What are its contents? Results from expression studies Cross references to GenBank, SwissProt and TAIR Cross references to relevant literature Automatically extracted knowledge

DI FC UL10 ProFAL APEG : Class Model

DI FC UL11 Population of APEG Genome Chip Get probe set identifier Search at TAIR Get TAIR Id Get SwissProt Id Get GenBank Id Search at Pfam Get Family Input for APEG

DI FC UL12 Document Retrieval TAIR Id SwissProt Id GenBank Id Search at PubMed Get PubMed Id Get Abstract

DI FC UL13 Annotation Extraction GO Term from GOA lateral root morphogenesis

DI FC UL14

DI FC UL15

DI FC UL16

DI FC UL17

DI FC UL18

DI FC UL19

DI FC UL20

DI FC UL21

DI FC UL22

DI FC UL23 Outline of Presentation  Introduction  Concepts  ProFAL  APEG  Results  Conclusions and Future Directions

DI FC UL24 Automatic Extraction Inspection Comparison Manual Extraction ObservedExpected (By ProFAL)(By Curator) Results : Evaluation and Validation Document Retrieval

DI FC UL25 Results : Document Retrieval 55 distinct documents to 71 genes out of 147 genes (48%) using 117 distinct citations SP = SwissProt, GB = GenBank

DI FC UL26 Results : Annotation Extraction

DI FC UL27 Results : Observations Documents retrieved for 48% of genes. Low precision and recall

DI FC UL28 Results Analysis The main reason was An High number of false positives FP annotations were derived from: Terms other than Molecular Function GO Obsolete and Non existing GO terms In coherent Evidence text Evidence texts containing numbers, abbreviations and negation

DI FC UL29 Improved Results Improvements implemented Use of GO terms only from Molecular Function gene ontology Avoid obsolete and non existing GO terms

DI FC UL30 Discussion Specific Annotations Existing Vs Extracted Annotations – TP annotations for 20 genes out of 31 genes Probable Functions – 21 functions for 8 genes out of 31 genes

DI FC UL31 Outline of Presentations  Introduction  Concepts  ProFAL  Approach  Results  Conclusions and Future Directions

DI FC UL32 Conclusions APEG Database System Improvements in ProFAL In my opinion Text mining is useful for Biologists

DI FC UL33 Future Directions Improvement in Document Retrieval Integration of a NLP technique Usability study of the proposed approach Validation of approach with a larger set of genes

DI FC UL34 Key References  Couto, F., Silva, M. & Coutinho, P. (2004). FiGO: Finding GO terms in unstructured text. EMBO BioCreative Workshop - Handouts, Granada, Spain.  Becker, J.D., Boavida, L., Carneiro, J., Haury, M. & Feijó, J.A. (2003). Transcriptional profiling of Arabidopsis tissues reveals the unique characteristics of the pollen transcriptome. Plant Physiology, 133,  Couto, F., Silva, M. & Coutinho, P. (2003). ProFAL: PROtein Functional Annotation through Literature. In E. Pimentel, N.R. Brisaboa & J. Gomez, eds., VIII Conference on Software Engineering and Databases (JISBD), , Alicante, Spain.  Shatkay, H. & Feldman, R. (2003). Mining the biomedical literature in the genomic era: an overview. Journal of Computational Biology, 10, , PMID:  Mack, R. & Hehenberger, M. (2002). Text-based knowledge discovery: search and mining of life-sciences documents. Drug Discovery Today, 7, S89-S98.  The Gene Ontology Consortium (2001). Creating the Gene Ontology Resource: Design and Implementation. Genome Research, 11,

DI FC UL35 Thank you for your attention