The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego

Slides:



Advertisements
Similar presentations
NATIONAL LIBRARY OF MEDICINE PubMed Central Edwin Sequeira National Library of Medicine May 26, 2004.
Advertisements

EndNote Web Reference Management Software (module 5.1)
EndNote Web Reference Management Software (module 5)
NIH Public Access Compliance Cleveland Health Sciences Library Case Western Reserve University Kathleen C. Blazar.
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
How to write my paper and have it published in a computational biology journal? Phil Bourne University of California San Diego
Reference Management Software Tools Part A: EndNote Web.
Darrell W. Gunter EVP / CMO Collexis Holdings, Inc. March 23, 2010 Spring Conference CONTENT: Uncovering the Value and Benefits of Semantic Technology.
The Data Lifecycle and the Curation of Laboratory Experimental Data Tony Hey Corporate VP for Technical Computing Microsoft Corporation.
Trends in Scientific Publishing Guenther Eichhorn DirectorAbstracting & Indexing Cambridge, MA April 2010.
Adriana Acosta Chief Marketing and Sales Officer, AIP Publishing LLC June 11, 2013 CONNECTING WORLDS The Physical Sciences Community.
Soichi Tokizane Aichi University
JINR / CERN Grid and advanced information systems 2012 Anne Gentil-Beccot CERN Library GS/SIS The Library behind the scene Opportunities for Scientific.
The Library behind the scene How does it work ? The Library behind the scenes 1 JINR / CERN Grid and advanced information systems 2012 Anne Gentil-Beccot.
NATIONAL LIBRARY OF MEDICINE PubMed Central Brooke Dine National Library of Medicine Medical Library Association Conference May 2005.
Converging parallel universes Library services as building blocks of digital humanities research 42nd LIBER Annual Conference Munich June 2013 Gregor Horstkemper.
Enabling Academic Research: Office Add-ins Alex Wade Director – Scholarly Communication Microsoft External Research.
Information and Business Work
Scholarship 2.0 Gideon Burton Asst. Prof. of English Assoc. Editor, BYU Studies Presentation to HBLL Faculty Council March 23, 2007.
Build VIVO in the Cloud NIH Workshop on Value Added Services for VIVO Brand Niemann Semantic Community March 25-26,
Click to edit Master subtitle style JISC XYZ Project Principal Investigator: Peter Murray-Rust Project Team: Nick England, Brian Brooks Unilever Centre,
Proquest. Digital Commons/Institutional Repository at Pace.
New Modes of Scholarly Communication and Learning Philip E. Bourne University of California San Diego 1WSU December 2, 2008.
Simon Briggs Department of Clinical Pharmacology University of Oxford 13 th June 2008 Data management – A researchers prospective.
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Experimental Psychology PSY 433
Publishing Research Papers Charles E. Dunlap, Ph.D. U.S. Civilian Research & Development Foundation Arlington, Virginia
HOW TO SUBMIT A MANUSCRIPT International Journal of Eye Banking.
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. M I C R O S O F T ® Preparing for Electronic Distribution Lesson 14.
Sam Kalb Scholarly Communication Services Coordinator QUEEN’S.
Machine Learning in the New World of Scholarly Communication Philip E. Bourne University of California San Diego
The Value of a Unique Researcher Identifier to ChemSpider Projects Antony Williams ORCID Meeting, Boston, May 18 th 2011.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
What is SciVee? SciVee Partners University of California, San Diego.
Some Thoughts on Scholarly Communication and the Role of Bio-ontologies Philip E. Bourne University of California San Diego
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Digital Libraries: Redefining the Library Value Paradigm Peter E Sidorko The University of Hong Kong 3 December 2010.
BioKnOT Biological Knowledge through Ontology and TFIDF By: James Costello Advisor: Mehmet Dalkilic.
Thomson Scientific October 2006 ISI Web of Knowledge Autumn updates.
SCOPUS AND SCIVAL EVALUATION AND PROMOTION OF UKRAINIAN RESEARCH RESULTS PIOTR GOŁKIEWICZ PRODUCT SALES MANAGER, CENTRAL AND EASTERN EUROPE LVIV, 11 SEPTEMBER.
Microsoft Academic Search Search | Explore | Discover Alex D. Wade Director - Scholarly Communication.
Interactive Science Publishing: A Joint OSA-NLM Project Michael J. Ackerman National Library of Medicine.
The Promise of Open Access Philip E. Bourne PhD University of California San Diego Open Access Day October 14, 2008
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
EndNote Web Reference Management Software (module 5.1)
I am not a PDBid I am a Biological Macromolecule Philip E. Bourne University of California San Diego
Open Science One Person’s View and What We Are Doing About It Philip E. Bourne University of California San Diego 1PSB Open Science Workshop.
Towards Data Attribution & Citation in the Life Sciences Philip E. Bourne UCSD 8/22/11Data Attribution and Citation.
Philip E. Bourne Professional Development Lecture 7 Understanding and Working the Publishing Process.
Data Integration and Management A PDB Perspective.
Publishing & Citing Research Data Arun Prakash. Agenda  Introduction  Why is Data publishing important ?  Ongoing Work  Role of Semantics.
Entering the Data Era; Digital Curation of Data-intensive Science…… and the role Publishers can play The STM view on publishing datasets Bloomsbury Conference.
Telling Research Stories Through SciVee Philip E. Bourne University of California San Diego AAAS February 21, 2010.
A computer contains two major sets of tools, software and hardware. Software is generally divided into Systems software and Applications software. Systems.
A C S P U B L I C A T I O N S H I G H Q U A L I T Y. H I G H I M P A C T. The Art and Science of Information Dissemination Sarah Tegen, Managing Editor,
Deep Indexing in ProQuest Health and Medical Databases.
Empowering the Knowledge Worker End-User Software Engineering in Knowledge Management Witold Staniszkis The 17th International.
How to Develop and Write a Research Paper.
Next Generation Preprint Service
Tim Smith CERN Geneva, Switzerland
Collaborating with the National Center for Biomedical Ontology
How does your research reach an audience?
Scopus - Elsevier (Advanced Course Module 8)
Philip Bourne University of California San Diego
University of California San Diego
Gwyn P. Williams and Kim Kindrew Pizza Seminar, September 18, 2013
Introduction of KNS55 Platform
Adobe Acrobat DC Accessibility - Metadata, Reading Order, Links
Scopus - Elsevier (Advanced Course: Module 8)
Presentation transcript:

The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego

My Perspective … Ontology Developer (years ago – mmCIF - Bioinformatics : ) Database Developer – RCSB PDB Supporter of open access (provided there is a business model) - editor in chief of PLoS Computational Biology Co-founder - SciVee Inc. I am becoming increasingly interested in scholarly communication I use ontologies to support this work

Objective Today Describe how we are using ontologies to try and improve scholarly communication Motivate you towards thinking about ontologies that should be developed Learn from you where we might spend our efforts

First Consider What Motivates Us to Improve Scholarly Communication

We Cannot Possibly Read a Fraction of the Papers We Should Drivers of ChangeRenear & Palmer 2009 Science 325:

Hence We Are Scanning More Reading Less Renear & Palmer 2009 Science 325: Drivers of Change

The Truth About the Scientific eLaboratory I have ?? mail folders! The intellectual memory of my laboratory is in those folders This is an unhealthy hub and spoke mentality Drivers of Change

The Truth About the Scientific eLaboratory I generate way more negative that positive data, but where is it? Content management is a mess – Slides, posters….. – Data, lab notebooks …. – Collaborations, Journal clubs … Software is open but where is it? Farewell is for the data too Drivers of Change Computational Biology Resources Lack Persistence and Usability. PLoS Comp. Biol. 4(7): e

Data and the Publication Are Disjoint PubMed contains 18,792,257 entries ~100,000 papers indexed per month In Feb 2009: – 67,406,898 interactive searches were done – 92,216,786 entries were viewed 1078 databases reported in NAR 2008 MetaBase reports 2,651 entries edited 12,587 times Biosciences Data as of April 14, 2009Drivers of Change

Publishing Limitations A paper is an artifact of a previous era It is not the logical end product of eScience, hence: – Work is omitted – Article vs supplement is a mess – Visualization may be limited – Interaction and enquiry are non-existent – Rich media can help, but are rarely used Drivers of Change

We Need to do Better & The Game is Afoot It is being driven from the top down and the bottom up

Ontologies & Semantic Tagging

BioLit Data Extraction/Storage Database IDs Ontology terms Text excerpts Other… BioLit MySQL database XML XML, Meta-data web external databases Semantic Tagging

Tagging of PubMed Central Ontologies read from OBO Files Words converted to tree structures Matched to every non-trivial word in the paper Matches tagged A long paper can be matched to GO in less than 30 seconds Semantic Tagginghttp://biolit.ucsd.edu

Semantic Tagginghttp://biolit.ucsd.edu

ICTP Trieste, December 10, Tagging

Provision of Webservices to this tagging may be the most valuable contribution.. Semantic Tagging

Database & Literature Integration Context BMC Bioinformatics :220Semantic Tagging

Semantic Tagging of Database Content PLoS Comp. Biol. 6(2) e Semantic Tagging

Automatic Knowledge Discovery for Those with No Time to Read Immunology Literature Cardiac Disease Literature Shared Function Semantic Tagging

This is Literature Post-processing Better to Get the Authors Involved Authors are the absolute experts on the content More effective distribution of labor Add metadata before the article enters the publishing process BMC Bioinformatics :103 Semantic Tagging

Word 2007 Add-in for Authors Allows authors to add metadata as they write, before they submit the manuscript Authors are assisted by automated term recognition – OBO ontologies – Database IDs Metadata are embedded directly into the manuscript document via XML tags, OOXML format – Open – Machine-readable Open source, Microsoft Public License of Change

Word 2007 Add-in Example of What it Looks Like - Ontologies Inline Recognition, Highlighting, and Mark-up of Informative Terms – A recognized term will have a dotted, purple underline – Hovering generates a Smart Tag above the term add mark-up for this term ignore this term view the term in the ontology browser If a recognized term appears in more than one ontology, all instances of that term will be listed – Hovering over a marked-up term option to apply mark-up to all recognized instances of term stop recognizing a term – Pass ontology terms back to provider Semantic Tagging BMC Bioinformatics :103

Built-in Knowledge of Ontologies and Databases – Add-in provides a list of biomedical ontologies to download – and a list of databases for ID recognition (GenBank/RefSeq, UniProt, Protein Data Bank) – A user may also supply a URL to download other ontologies Ontology Browser – allows a user to select an ontology and then navigate through it to view terms and their relationships BMC Bioinformatics :103

Custom Metadata Ontologies do not contain all usages of a concept Add-in allows user to assign custom metadata Human Disease Ontology term: Leukemia, T-Cell, HTLV- II-Associated Synonym: Atypical hairy cell leukemia (disorder) Actual use in literature: – hairy cell leukemia – hairy-cell leukemia – hairy T cell leukemia – T cell hairy leukemia BMC Bioinformatics :103

Synonym mapping, disambiguation Inclusion of an additional set of synonyms for a term that reflect its use in natural language – Automated finding of synonyms in extant literature – Gather synonyms from term-mapping databases Incorporate a more sophisticated term recognition approach into the add-in BMC Bioinformatics :103

Challenges Author use – Familiarity with ontologies, terms – Agreement between co-authors End-use of semantically enriched manuscript Need to combine with NLM XML standard Semantic Tagging BMC Bioinformatics :103

Challenges: Author Use IF one or more publishers fast tracked a paper that had semantic markup I would argue it would catch on in no time Semantic Tagging BMC Bioinformatics :103

Where we Need {Better} Ontologies 1. To Support Mashups Between Different Types of Scholarly Output

Post-publication of Video and Paper Drivers of Change

Pubcast – Video Integrated with the Full Text of the Paper

Pubcasts - A Unique Technology Don’t understand what you are reading? Click and have the author pop-up and explain it! See the scientists and the experiments behind the research papers and textbooks Pubcasts - A Blend of Video, text, tables, figures, PowerPoints, comments, ratings… ALL SYNCHRONIZED FOR RAPID LEARNING Mashups –

Where we Need {Better} Ontologies 2. To Support Tagging of all Aspects of the Scholarly Product

Consider Today’s Academic Workflow Research [Grants] Journal Article Conference Paper Poster Session Feds Societies Publishers Reviews Blogs Community Service/Data Curation What Should be Done?

Consider Tomorrow’s Academic Workflow Research [Grants] Journal Article Conference Paper Poster Session Feds Societies Publishers Reviews Blogs Community Service/Data Curation Ideas, Data, Hypotheses What Should be Done?

Maybe The Line is Somewhere Else? Scientist Idea Experiment Data Conclusions Publish Laboratory Publisher

Maybe The Line is Somewhere Else? Scientist Idea Experiment Data Conclusions Publish What Should We Do? Laboratory Publisher Institution Lab Notebook

Crowd Sourcing the Electronic Printing Press (aka Workshop: Beyond the PDF) Proposal to the US National Science Foundation: Aims: – Define user requirements – Establish a specification document – Open source the development effort – Have a commitment from a publisher to publish a research object using the system – Act as an exemplar for what can be done

Question: What if Everyone Had An Electronic Printing Press? Peer review might change? Bibliometrics might change? Business models will likely change? What happens to the database/literature divide? Societies might do more self publishing? We might have improved the dissemination of science, but will we have improved the comprehension?

General References What Do I Want from the Publisher of the Future PLoS Comp Biol Fourth Paradigm: Data Intensive Scientific Discovery tion/fourthparadigm/

References to Exemplars Semantic Biochemical Journal : Using Utopia Article of the Future, Cell, 2009: Prospect, Royal Society of Chemistry, 2009: Adventures in Semantic Publishing, Oxford U, 2009: The Structured Digital Abstract, Seringhaus/Gerstein, 2008

Acknowledgements BioLit Team – Lynn Fink – Parker Williams – Marco Martinez – Rahul Chandran – Greg Quinn Microsoft Scholarly Communications – Pablo Fernicola – Lee Dirks – Savas Parastitidas – Alex Wade – Tony Hey wwPDB team SciVee Team – Apryl Bailey – Tim Beck – Leo Chalupa – Lynn Fink – Marc Friedman (CEO) – Ken Liu – Alex Ramos – Willy Suwanto http//

Questions?