Moving beyond free text. Authors Scientist does research Scientist publishes research results in journal article Old Paradigm:

Slides:



Advertisements
Similar presentations
Annotation of Gene Function …and how thats useful to you.
Advertisements

Mouse Phenotype Ontology George Gkoutos. Phenotype Annotation Traditional phenotypic descriptions are captures as free text Information retrieval based.
Connecting Knowledge Silos using Federated Text Mining Guy Singh Senior Manager, Product & Strategic Alliances ©2014 Linguamatics Ltd.
Plant Phenotype Pilot Project AIM: To use ontologies in express and analyze plant phenotypes from multiple species The Issue: Traditional free text phenotype.
Social networks, in the form of bibliographies and citations, have long been an integral part of the scientific process. We examine how to leverage the.
Publish or perish? Linking Scratchpads and the new Biodiversity Data Journal for streamlining publication of botanical data D.N Koureas 1, L. Penev 2 &
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Making research findings visible – the future of the scientific paper Matthew Cockerill Publisher, BioMed Central.
1 Enriching UK PubMed Central SPIDER launch meeting, Wolfson College, Oxford Paul Davey, UK PubMed Central Engagement Manager.
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
COG and GO tutorial.
1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Today’s menu: -SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
B IOMEDICAL T EXT M INING AND ITS A PPLICATION IN C ANCER R ESEARCH Henry Ikediego
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
The Plant Ontology: Linking Phenotypes and Genomics Across Plant Taxa Laurel D. Cooper* 1, Ramona L. Walls 2, Justin Elser 1, Justin Preece 1, Dennis W.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Bioinformatics Dr. Víctor Treviño BT4007
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Christian M Zmasek, PhD Burnham Institute for Medical Research Bioinformatics and Systems Biology
The application of phenotype and environment ontologies to Natural History Collections Rutger Vos.
Flexible Text Mining using Interactive Information Extraction David Milward
Supported by EU projects 12/12/2013 Athens, Greece Open Data in Agriculture Hands-on with data infrastructures that can power your agricultural data products.
Teranode Tools and Platform for Pathway Analysis Michael Kellen, Solution Manager June 16, 2006.
Cell Ontology 2.0 Elimination of multiple is_a inheritance through instantiation of relationships to terms in outside ontologies, such as the GO cellular.
A paradigm shift in biodiversity publishing: mobilization, mark up, reuse and integration of small data Lyubomir D. Penev 1,3, Teodor A. Georgiev 3, Pavel.
Biodiversity Data Journal: mobilization, reuse and integration of small data Lyubomir D. Penev 1,3, Teodor A. Georgiev 3, Pavel E. Stoev 2,3, Jordan Bisserkov.
Semantic Technologies & GATE NSWI Jan Dědek.
生物資訊程式語言應用 Part 5 Perl and MySQL Applications. Outline  Application one.  How to get related literature from PubMed?  To store search results in database.
Organizing information in the post-genomic era The rise of bioinformatics.
University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
Brian Hole Beyond the PDF 2, Amsterdam, 19 March 2013 Articles are so 60’s New models.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
OWL Representing Information Using the Web Ontology Language.
NLP pipeline for protein mutation knowledgebase construction Jonas B. Laurila, Nona Naderi, René Witte, Christopher J.O. Baker.
Mining the Biomedical Research Literature Ken Baclawski.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Getting GO: how to get GO for functional modeling Iowa State Workshop 11 June 2009.
AgBase Shane Burgess, Fiona McCarthy Mississippi State University.
Paloma Marín Arraiza 17 th International Conference on Grey Literature 1 st and 2 nd December 2015, Amsterdam (Netherlands) SCIENTIFIC AUDIOVISUAL MATERIALS.
Plazi: Prospects for Markup of Legacy and New Taxonomic Literature Terry Catapano TDWG Fremantle, WA October 21, 2008.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Ontology Driven Data Collection for EuPathDB Jie Zheng, Omar Harb, Chris Stoeckert Center for Bioinformatics, University of Pennsylvania.
Overview 3D Slicer currently provides very basic technology for annotating images. This limits users in their ability to properly capture semantic information.
GENBANK FILE FORMAT LOCUS –LOCUS NAME Is usually the first letter of the genus and species name, followed by the accession number –SEQUENCE LENGTH Number.
CaNanoLab Data Curation Overview NCI Nano WG June 6, 2013.
Leveraging the Expertise of our Staff and the Information Resources We Manage MIT Libraries Visiting Committee April 13, 2005.
High throughput biology data management and data intensive computing drivers George Michaels.
The Cardiovascular Disease Ontology (CVDO) Mercedes Arguello Casteleiro 1, Julie Klein 2 and Robert Stevens 1 1 School of Computer Science, University.
GPML Plugin for Cytoscape Thomas Kelder Maastricht University
Use SIOC RDF format for representation of scientific statements Annotated statements created by manual curation automated extraction of biomedical literature.
Model Curation Edmund J. Crampin Auckland Bioengineering Institute
STRING Large-scale data and text mining
Development of the Amphibian Anatomical Ontology
Phenoscape Data Jamboree 2
Introduction to PubChem BioAssay
Interlinking standards, repositories and policies
Saccharomyces Genome Database (SGD)
Department of Genetics • Stanford University School of Medicine
Functional Annotation of the Horse Genome
ISI Web of Knowledge Early updates
Phenotype Annotation at TAIR
Benefits and Problems Facing Them
Jonathan Griffin, Managing Director, IFIS Publishing &
Presentation transcript:

Moving beyond free text

Authors

Scientist does research Scientist publishes research results in journal article Old Paradigm:

Want: All genes involved in seed development (name, species, protein sequence)

Read 3,404 articles???

Read 592,000 articles???

Results extracted from free text and converted to a structured format (ontology annotations) Structured data combined with other data for queries, further analysis manual curation (+ NLP…?) Scientist does research Scientist publishes research results as free text Database Old Paradigm - extended:

Example – Journal article about gene function

The goal: an annotation that captures the result Example – Journal article about gene function

Manual curation: Time consuming, does not scale well NLP: Very challenging The goal: an annotation that captures the result Example – Journal article about gene function

Example – phylogenetic treatment Relatively high degree of structure compared to journal article May be more amenable to natural language processing but still very challenging, complex information

Results extracted from free text and converted to a structured format (ontology annotations) Structured data combined with other data for queries, further analysis manual curation (+ NLP) Can we get authors involved? Scientist does research Scientist publishes research results as free text Database

Link to external resource Scientific Publishers are interested in this problem…

Science Direct: Scientific Publishers are interested in this problem…

Databases are interested in this problem…

What if we had a good general tool for authors to do this themselves?

Example: Morphological description of species

Example: Morphological description of species

PO: (leaf), PATO: (decreased width) PO: (ovule), PATO: (abnormal) PO: (seed), PATO: (reduced) Example: Mutant phenotype description

Scientist does research Scientist publishes research results as free text and as annotations using ontology terms Benefit to scientist – wider exposure and reuse of results Benefit to publishers – tagged text allows enhanced presentation for subscribers Benefit to research community – Better access to data New Paradigm: