RLIMS-P: A Rule-Based Literature Mining System for Protein Phosphorylation Hu ZZ 1, Yuan X 1, Torii M 2, Vijay-Shanker K 3, and Wu CH 1 1 Protein Information.

Slides:



Advertisements
Similar presentations
How Do You Save Your PubMed Search Results into an EndNote Library ? Sample search for articles on Cyclin D and published in the journal “Cell”. This search.
Advertisements

Literature Data Mining and Protein Ontology Development at the Protein Information Resource (PIR) Hu ZZ 1, Mani I 2, Liu H 3, Vijay-Shanker K 4, Hermoso.
EndNote X Basics and New Features. EndNote --for managing papers & bibliographies Construct papers in Word (built-in templates) Construct papers in Word.
An Information Retrieval and Extraction System for C. elegans Literature.
PubMed and its search options Jan Emmerich, Sonja Jacobi, Kerstin Müller (5th Semester Library Management)
Ke Liu1, Junqiu Wu2, Shengwen Peng1,Chengxiang Zhai3, Shanfeng Zhu1
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
Modeling Functional Genomics Datasets CVM Lesson 3 13 June 2007Fiona McCarthy.
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
50%, guessing 100%, all correct Accuracy = Figure 2 Predictive Accuracy of SMO algorithm using each attribute separately Prediction of catalytic residues.
NATIONAL LIBRARY OF MEDICINE The PubMed ID and Entrez, PubMed and PubMed Central Edwin Sequeira National Center for Biotechnology Information June 21,
B IOMEDICAL T EXT M INING AND ITS A PPLICATION IN C ANCER R ESEARCH Henry Ikediego
1 iProLINK: An integrated protein resource for literature mining and literature-based curation 1. Bibliography mapping - UniProt mapped citations 2. Annotation.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
1 SRI International Bioinformatics Advanced PGDB Editing: Regulation GO Terms Ingrid M. Keseler Bioinformatics Research Group SRI International
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Introduction to OBIS-USA Biological Data, Applications, & Relationships March 14, 2011.
Introduction to Mendeley. What is Mendeley? Mendeley is a reference manager allowing you to manage, read, share, annotate and cite your research papers...
Accomplishments and Challenges in Literature Data Mining for Biology L. Hirschman et al. Presented by Jing Jiang CS491CXZ Spring, 2004.
Copyright OpenHelix. No use or reproduction without express written consent1.
IProLINK – A Literature Mining Resource at PIR (integrated Protein Literature INformation and Knowledge ) Hu ZZ 1, Liu H 2, Vijay-Shanker K 3, Mani I 4,
Intralab Workshop - Reactome CMAP Chang-Feng Quo June 29 th, 2006.
Part 1 – PubMed Interface, Display options, Saving, Printing, and ing results. Instructions This part of the course is a PowerPoint demonstration.
 CiteGraph: A Citation Network System for MEDLINE Articles and Analysis Qing Zhang 1,2, Hong Yu 1,3 1 University of Massachusetts Medical School, Worcester,
Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center,
1 Bio-Trac 40 (Protein Bioinformatics) October 8, 2009 Zhang-Zhi Hu, M.D. Associate Professor Department of Oncology Department of Biochemistry and Molecular.
Protein Information Resource (PIR) for Functional Annotation: Protein Family Classification, Literature Mining and Protein Ontology In-Silico Analysis.
Lars Juhl Jensen Biomedical text mining. exponential growth.
生物資訊程式語言應用 Part 5 Perl and MySQL Applications. Outline  Application one.  How to get related literature from PubMed?  To store search results in database.
Anastasia Nikolskaya Lai-Su Yeh Protein Information Resource Georgetown University Medical Center Washington, DC PIR: a comprehensive resource for functional.
Protein Ontology (PRO) Amherst, NY May 15, 2013 Cathy H. Wu, Ph.D. Director, Protein Information Resource (PIR) Edward G. Jefferson Chair and Director.
1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International.
Playing Biology ’ s Name Game: Identifying Protein Names In Scientific Text Daniel Hanisch, Juliane Fluck, Heinz-Theodor Mevissen and Ralf Zimmer Pac Symp.
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
1 SRI International Bioinformatics And now for our ‘Feature’ presentation: Automatic Loading of Protein Sequence Annotation Data from UniProt to Pathway.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Protein Information Resource Protein Information Resource, 3300 Whitehaven St., Georgetown University, Washington, DC Contact
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
PREDICTION OF CATALYTIC RESIDUES IN PROTEINS USING MACHINE-LEARNING TECHNIQUES Natalia V. Petrova (Ph.D. Student, Georgetown University, Biochemistry Department),
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
A collaborative tool for sequence annotation. Contact:
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
PubMed/How to Search, Display, Download & (module 4.1)
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
Lisa Matthews, 1 Esther Schmidt, 2 Suzanna Lewis, 3 David Croft, 2 Bernard de Bono, 2 Peter D'Eustachio, 1 Marc Gillespie, 1 Gopal Gopinath, 1 Bijay Jassal,
DISCUSSION Using a Literature-based NMF Model for Discovering Gene Functional Relationships Using a Literature-based NMF Model for Discovering Gene Functional.
Organize. Collaborate. Discover. 1 Introduction to Mendeley.
Mendeley a free reference manager and academic social network…  Assists - cataloguing and managing your.
Literature Mining and Database Annotation of Protein Phosphorylation Using a Rule-based System Z. Z. Hu 1, M. Narayanaswamy 2, K. E. Ravikumar 2, K. Vijay-Shanker.
PubMed Basics Barbara A. Wood, MLIS Calder Library University of Miami Miller School of Medicine.
MEDLINE®/PubMed® PubMed for Trainers, Fall 2015 U.S. National Library of Medicine (NLM) and NLM Training Center An introduction.
BioCreAtIvE Critical Assessment for Information Extraction in Biology Granada, Spain, March28-March 31, 2004 Task 2: Functional annotation of gene products.
OncoTrack Bioinformatics Workshop Max Planck Institute for Molecular Genetics, Berlin Wednesday 6 th November 2013 TimeSubject 13:30-15:00 Introduction.
Text mining activities at PIR Cecilia Arighi March 12, 2013.
Cheminformatics and Metabolism Team The EBI Enzyme Portal.
Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD
Introduction to Endnote
Reference management soft wares Endnote & Mendeley
Functional Annotation of the Horse Genome
PIR: Protein Information Resource
New Features Update Web of Knowledge : Discovery Starts Here
Literature Data Mining and Protein Ontology Development
Tutorial: Bioinformatics Resources
Protein Sequence Analysis - Overview -
Figure 1.Flowchart of the article-selection process.
Protein Sequence Analysis - Overview -
Network biology An introduction to STRING and Cytoscape
Presentation transcript:

RLIMS-P: A Rule-Based Literature Mining System for Protein Phosphorylation Hu ZZ 1, Yuan X 1, Torii M 2, Vijay-Shanker K 3, and Wu CH 1 1 Protein Information Resource, 2 Department of Biostatistics, Bioinformatics, and Biomathematics, 4 Department of Computational Linguistics, Georgetown University, Washington, DC 20007; 3 University of Delaware, DE Contact: RLIMS-P Evidence attribution Manual tagging assisted with computational extraction: Training and testing sets of positive and negative samples for RLIMS-P development 3 objects Annotation tagged literature sets for PTMs from iProLINK literature mining resource Introduction: The RLIMS-P is a rule-based text-mining program specifically designed to extract protein phosphorylation information on protein kinase, substrate and phosphorylation sites from the abstracts (Hu et al., 2005). The program was originally developed by Narayanaswamy, Ravikumar, and Vijay-Shanker (2005), and was tested and benchmarked by PIR using iProLINK annotated datasets (Hu et al., 2004). The RLIMS-P program is now adopted at PIR and being developed into an online text mining tool for extracting protein phosphorylation information from PubMed literature (Yuan, et al., 2006). The online RLIMS-P currently provides the following functions to: 1) determine whether the MEDLINE abstract contains protein phosphorylation information and to extract protein kinase, protein substrate and phosphorylation site/residue when available; 2) tag extracted phosphorylation objects in the abstract in different colors; 3) map the protein substrate to UniProtKB protein entries based on PMID; 4) map protein names to UniProtKB protein entries based on BioThesaurus. Coupled with BioThesarus, RLIMS-P can facilitate the UniProtKB protein phosphorylation feature annotation. P RLIMS-P System Design Pattern 1: (in/at )? ATR/FRP-1 also phosphorylated p53 in Ser 15 Training/benchmarking data sets and pattern rules can be downloaded. Bioinformatics. 21: , 2005 Benchmarking of RLIMS-P High recall for paper retrieval and high precision for information extraction Web-based RLIMS-P Information retrieval and extraction Protein entity mapping C D A B The online RLIMS-P text-mining results: (A) The summary table lists PMIDs with top-ranking phosphorylation annotation. (B) The full report provides detailed annotation results with evidence tagging and automatic mapping to UniProtKB entry containing the citation (e.g., KPB1_RABIT). Name mapping of phosphorylated protein in RLIMS-P report (C) to UniProtKB entry using BioThesaurus (D). Name mapping includes options to use names appearing in the abstract or user- specified names to search online BioThesaurus. Here, “PBPA” retrieves 10 entries sharing the same name, including PBPA of Mycobacterium tuberculosis (P71586_MYCTU), the phosphorylated protein discussed in the abstract. A preliminary case study – Using RLIMS-P to facilitate the UniProtKB feature annotation Nuclear receptor (NR) phosphorylation was under- annotated in databases. Text-mining of 2170 PubMed abstracts (retrieved with query of NR phosphorylation) with RLIMS-P found significantly more phosphorylation sites to add to UniProt feature annotation. Future development of RLIMS-P program: Extend to mine full-length articles Mine in vivo protein phosphorylation and its cellular context, such as cell types and pathways References: Hu ZZ, et al., Comp Biol Chem. 28:409-16, Hu ZZ, et al., Bioinformatics. 21: , Narayanaswamy M, et al., Bioinformatics, Suppl.1 21: i319-i327, Yuan X, et al., Bioinformatics, April 27, Acknowledgements: NIH (UniProt), NSF (Entity Tagging). PIR team: Wu HT, Fang C, Huang H, Arminski L. Collaborators: Liu H, Narayanaswamya M, Ravikumar KE.