PLOS Facilitating Text & Data Mining The Role the Publisher Can Play

Slides:



Advertisements
Similar presentations
Partnering with Faculty / researchers to Enhance Scholarly Communication Caroline Mutwiri.
Advertisements

Repositories, Learned Societies and Research Funders Stephen Pinfield University of Nottingham.
Enlighten: Glasgows Universitys online institutional repository Morag Greig University Library.
NIH Public Access Compliance Cleveland Health Sciences Library Case Western Reserve University Kathleen C. Blazar.
Open Access Publishing with Wiley. Gold v Green Open Access Gold or pay to publish Open Access: Article is made freely accessible online to anyone anywhere.
Metadata Best Practices FundRef, license metadata & others.
PubMed Central ANCHASL Spring Meeting April 1, 2005 Robert James Associate Director of Public Services Duke University.
PUBMED CENTRAL (PMC). HOMESCREEN SUBJECT: MEDICAL PMC is a free full-text archive of biomedical and life sciences journal literature at the U.S. National.
PubMed Central Mahyar Ahmadpour-B. Kowsar Publicatin Corp. Kowsar Editorial Meeting 1 September 19th, 2013 Tehran, Iran.
OPEN ACCESS PUBLICATION ISSUES FOR NSF OPP Advisory Committee May 30, /24/111 |
NATIONAL LIBRARY OF MEDICINE PubMed Central Brooke Dine National Library of Medicine Medical Library Association Conference May 2004.
NATIONAL LIBRARY OF MEDICINE PubMed Central Brooke Dine National Library of Medicine Medical Library Association Conference May 2005.
Guide to a successful PowerPoint design – simple is best
Method: systematically gather citations by KU faculty and approach those faculty for permission to deposit on their behalf articles published in journals.
How the University Library can help you with your term paper Computer Science SC Hester Mountifield Science Library x 8050
1 NIH Public Access Policy Policy on Enhancing Public Access to Archived Publications Resulting From NIH-Funded Research (Public Access Policy)
Presented by Ansie van der Westhuizen Unisa Institutional Repository: Sharing knowledge to advance research
What are research data? July 2015 This work is licensed under a Creative Commons Attribution 4.0 International LicenseCreative Commons Attribution 4.0.
Rich Foley - Executive Vice President Academic & Public Markets Helen Wilbur - Vice President Consortia Sales & Marketing Digital ArchivesResearch CollectionseBooks.
Managing journals: challenges and opportunities How to get started (with OJS) Jackie Proven.
Text Mining: Opportunities and Barriers John McNaught Deputy Director National Centre for Text Mining
5-7 November 2014 DR Workflow Practical Digital Content Management from Digital Libraries & Archives Perspective.
Committed to making the world’s scientific and medical literature a public resource.
1 ARRO: Anglia Ruskin Research Online Making submissions: Benefits and Process.
 Open access means that information can be freely accessed by anyone in the world using an internet connection. (Sherp Authors &Open access,2006 ) anyone.
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
1 © Xchanging 2010 no part of this document may be circulated, quoted or reproduced without prior written approval of Xchanging. MOSS Training – UI customization.
Kendra Hunter & Charde Johnson EDUC Dr. M. Kariuki.
Using Open Access Publishing for the Effective Dissemination of African Research PKP PUBLIC KNOWLEDGE PROJECT Ensuring a Journal’s Economic Sustainability,
Information Accesibility for learning December 11, 2015 University Policy on Open Access to scientific literature Chiara Cenderelli University Library.
Filling institutional repositories: considering copyright issues Susan Veldsman eIFL Content Manager
Ukpmc.ac.uk As a result of the mandates Research in the open How mandates work in practice 29 th May, 2009 Paul Davey, UK PubMed Central Engagement Manager,
Text and Data Mining for Systematic Reviews Investigating Trends to Update Collaboration Services Virginia Pannabecker Virginia Tech, University Libraries.
Introduction to SHERPA RoMEO and its Significance for Publishers
Committed to making the world’s scientific and medical literature
RightFind™ XML for Mining- One Cross-Publisher Initiative to Empower Text Mining Roy S Kaufman, Managing Director, New Ventures, CCC.
NRF Open Access Statement
REMOVE THIS SLIDE BEFORE PRESENTATION
Databases vs the Internet
A strategic conversation with Tim Jewell and Thom Deardorff
Using Open Access to Increase Personal Internet Presence
Open Access and Research Data Management: An Overview for LLOs
Author Rights Sarah A. Norris, Scholarly Communication Librarian,
Credit: Swiss National Science Foundation
Databases vs the Internet
Sarah Norris, Lily Flick, UCF Libraries
Institutional Repository and Friends
Education of a scientist video
Creative Commons at the Library
Publisher-Driven Preprints
Role of peer review in journal evaluation
Access  Discovery  Compliance  Identification  Preservation
What, why and best practices in open research
Users and Digital Collections
OpenML Workshop Eindhoven TU/e,
OMICS International OMICS International through its Open Access Initiative is committed to make genuine and reliable contributions to the scientific community.
Funding body requirements
Benefits and Problems Facing Them
Research Data Management
OMICS Journals are welcoming Submissions
A Case Study for Synergistically Implementing the Management of Open Data Robert R. Downs NASA Socioeconomic Data and Applications.
USER MANUAL - WORLDSCINET
OPEN ACCESS POLICY Larshan Naicker Rhodes University Library
OMICS Journals are welcoming Submissions
Judy MIELKE, PhD. Taylor & Francis
Open Access and The Role of HEI’s
Data + Research Elements What Publishers Can Do (and Are Doing) to Facilitate Data Integration and Attribution David Parsons – Lawrence, KS, 13th February.
Joyce Backus Associate Director, Library Operations
Where can I publish my article in Open Access without extra costs?
USER MANUAL - WORLDSCINET
Presentation transcript:

PLOS Facilitating Text & Data Mining The Role the Publisher Can Play Rosemary Dickin, Editorial Manager, PLOS Computational Biology and PLOS Genetics July 2017 The Role the Publisher Can Play Rosemary Dickin, Editorial Manager, PLOS Computational Biology and PLOS Genetics July 2017

Outline PLOS Policies How we support TDM Outline PLOS Policies

Background & Policies PHOTOS AND CAPTION ON BLACK

PLOS Mission PLOS is a non-profit publisher and advocacy organization with a mission to accelerate progress in science and medicine by leading a transformation in research communication. [Founded in 2001,] PLOS is a non-profit publisher [of seven journals] and advocacy organization with a mission to accelerate progress in science and medicine by leading a transformation in research communication.

We publish a lot of content: over 25k articles each year Over 190k articles in total up to 2016 We want people to read and reuse our content https://www.plos.org/annual-update

PLOS Core Principles PLOS and its authors choose to make scientific and medical research articles openly available for the advancement of science and the greater public good. PLOS and its authors choose to make scientific and medical research articles openly available for the advancement of science and the greater public good.

PLOS Supports Text & Data Mining We believe that TDM is an important research methodology that must be supported by the keepers of the scholarly literature, funders, academic institutions – all those involved in the research endeavour. We believe that TDM is an important research methodology that must be supported by the keepers of the scholarly literature, funders, academic institutions – all those involved in the research endeavour.

PLOS Supports Text & Data Mining By making all of our published content open access, PLOS is facilitating TDM. We hope to offer better options for accessing that content to TDM researchers moving forward. PLOS participates in industry efforts to further facilitate TDM and encourages all publishers to open their content stores to enable TDM with minimal barriers or obstacles. By making all of our published content open access, PLOS is facilitating TDM. We hope to offer better options for accessing that content to TDM researchers moving forward. PLOS participates in industry efforts to further facilitate TDM and encourages all publishers to open their content stores to enable TDM with minimal barriers or obstacles.

PLOS is a Signatory to The Hague Declaration PLOS is a signatory to and original participant in The Hague Declaration, which aims to foster agreement about how to best enable access to facts, data and ideas for knowledge discovery in the Digital Age. The declaration calls for intellectual property reform, policies to enable and reward TDM, and the development of technology and tools to allow TDM. Source: http://thehaguedeclaration.com/big-data-can-reshape-the-world-and-save-lives-infographic/ CC-BY

Text Mining Collection PLOS also has a collection of 38 articles containing research, opinion and education relating to TDM from across the PLOS journals http://collections.plos.org/textmining

Data Availability Policy “PLOS journals require authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. When submitting a manuscript online, authors must provide a Data Availability Statement describing compliance with PLOS's policy. If the article is accepted for publication, the data availability statement will be published as part of the final article.” Since 2014, all PLOS journals have required authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. This increases both the amount of data shared and the ease of locating data connected to an article. We believe that requiring data is the first step to ensuring greater reproducibility of research; something that TDM can also benefit. http://journals.plos.org/plosone/s/data-availability

Facilitating Text & Data Mining What we offer PHOTOS AND CAPTION ON BLACK Info: http://api.plos.org/text-and-data-mining/

Facilitating Text & Data Mining We provide unrestricted access to all of our articles and supplemental data in several different formats. We encourage TDM researchers to understand & use JATS XML because it provides data about the article as well as the article text in one standardized structured text file. We direct researchers to data for PLOS journals as well as other open source journals. PLOS is sometimes contacted by researchers looking for assistance in TDM. This information is also all on our website. The preferred method of access depends on the use case. We provide unrestricted access to all of our articles and supplemental data in several different formats. We encourage TDM researchers to understand & use JATS XML because it provides data about the article as well as the article text in one standardized structured text file. We direct researchers to data for PLOS journals as well as other open source journals. Info: http://api.plos.org/text-and-data-mining/

PLOS Search API Every PLOS article is indexed by DOI in our Solr search API. The search API can be used to download PLOS article metadata, to identify a subset of articles of interest, or to get the DOI of every published PLOS article. Every PLOS article is indexed by DOI in our Solr search API. The search API can be used to download PLOS article metadata, to identify a subset of articles of interest, or to get the DOI of every published PLOS article. Info: http://api.plos.org/text-and-data-mining/

Bulk Downloads Bulk downloading is the most efficient method for obtaining a copy of the entire corpus. PubMed Central (PMC) has made this extremely easy by packaging the Open Access Subset of research articles from multiple journals into single files and making them available via the PMC OA Bulk Download FTP site. [Text and Data Miners (TDM) generally want a copy of the entire corpus and write specialized software to process the data.] Bulk downloading is the most efficient method for obtaining a copy of the entire corpus. We encourage them to go via PubMed Central (PMC), which has made this extremely easy by packaging the Open Access Subset of research articles from multiple journals into single files and making them available via the PMC OA Bulk Download FTP site. Info: http://api.plos.org/text-and-data-mining/

Open Access & TDM Open Access (OA) journals can help TDM: OA article text and meta-data is provided in a single XML file format (JATS), giving the ability to process articles from multiple journals in addition to PLOS. OA articles are freely available to download and use for TDM as part of our CC-BY license standard.  OA publishers syndicate articles to PMC which provides this data as an ongoing service that is updated on a regular basis.  Writing specialized software takes time and effort. Writing software to download data from literally hundreds or thousands of journals is a huge barrier for TDM. Open Access (OA) journals remove this barrier in several important ways. OA article text and meta-data is provided in a single XML file format: the Journal Archive and Interchange Tag Set (JATS). Writing software to process JATS XML requires a larger upfront investment but the reward is the ability to process articles from multiple journals in addition to PLOS. Secondly OA articles are freely available to download and use for TDM as part of our CC-BY license standard.   Individual publisher API’s change frequently or do not exist. OA publishers syndicate articles to PMC which provides this data as an ongoing service that is updated on a regular basis.   Closed access publishers often do not make their text available for TDM or only do so under certain restrictions. Info: http://api.plos.org/text-and-data-mining/

Open Access & TDM This slide is a few years old, and is taken from a talk on OA by a researcher called Ross Mounce, but I’ve included it because it demonstrates some of the possibilities of OA – and I like the idea of having all of PLOS on a single USB stick. Credit: Ross Mounce, “Open Access for Early Career Researchers”, University of Bath Open Access Week session; 23rd October 2013. CC-BY 4.0. Slideshare.net

PLOS API (Non-Bulk Downloads) PLOS provides three ways to access data about PLOS articles or the articles themselves. JATS XML: structured data | article text & metadata Article PDF: limited TDM utility | useful for reading offline HTML Article Page: less useful for TDM PLOS also provides 3 ways to access data about PLOS articles or the articles themselves. [These methods are not as useful for bulk downloads but do provide anyone with specific interest in PLOS articles and data a way to access it.] JATS XML The Journal Archive and Interchange Tag Set (JATS) is the standard used to archive scientific articles.   JATS XML is the most convenient format for TDM because the data is structured.  Article text and meta-data can be accessed in a single file and in standard way.  Downloading individual article XML from the PLOS website is simple if the DOI of the article is known. Article PDF Each PLOS article is also available as a PDF. Article PDF’s have limited utility for TDM but are useful to printing or reading the article offline.  Html Article Page Article HTML is the primary method used to view PLOS articles online. Scraping the article HTML is a technique used by search engines to index articles and can be used for TDM. It is generally less useful for TDM because the article pages change over time, the data is not structured and meta-data is not easily identified. Info: http://api.plos.org/text-and-data-mining/

Post-talk update: https://t.co/7tgynrc8cz

In conclusion… PLOS’ mission is to encourage & enable reuse of our content. We aim to make TDM easier through our: Technology Licensing We’re open to suggestions. www.plos.org IMAGES WITH TEXT ON RIGHT

Questions & Comments? rdickin@plos.org api@plos.org PHOTO WITH HEADLINE 1 rdickin@plos.org api@plos.org