Preprints and literature provenance in Europe PMC Christine Ferguson, PhD RDA UK 2nd workshop July 2019
User story “As a data scientist (or researcher), I want to know whether any given preprint has subsequently been published. If yes, then for these to be linked reciprocally (from preprint to publication; from publication to preprint).” Real world user stories – I work on the literature services team and stories like this one are something we can tackle. We’ve been gathering user stories on the PID Forum. Those collected during FREYA project work and by the community who are part of the forum. Encourage anyone interested in PIDs to become a member - speak to any of the FREYA partners here today.
Extending the literature-centric graph Journal Article Preprint (focus on the circle first) Europe PMC is a literature repository that indexes scholarly literature from the life sciences, including journal articles, theses, books, patents, Agricola records and most recently preprints Take journal articles– currently index 25 million abstracts and 5 million full text articles enriches the record by providing links to 95M datasets (point to the 3 data related icons – top Right quadrant), 1 M ORCID IDs (point to ORCID icon) and 8M funding acknowledgements (point to funder and Grant icons bottom left quadrant) In July 2018, Europe PMC began indexing preprints Preprints are manuscripts shared online BEFORE the completion of journal-organized peer review. Can be considered to be a precursor version of the formal journal article but have their own unique DOIs. That can be cited, linked to ORCID IDs, data records, grant information. Notably, where a peer reviewed journal article version is available, the preprint is linked to it in Europe PMC. So by indexing these preprints, the entry points into the literature knowledgebase of Europe PMC are increased in number. 35M abstracts 5M FT articles 95 M datasets 1M ORCID Ids 8M funding acknowledgements
Preprint versions Preprints Journal Article … version 1 version x Extending access to the literature PID graph by exposing versioning information for preprints: In the same way that links are aggregated around each journal article. This is done for a preprint once it appears on the EuropePMC platform. What’s more is that a preprint may be the first of a succession of preprint versions that have been revised in preparation of the final peer reviewed article version. (might be revised for eg to alter data that is presented and accordingly include/exclude authors) Preprint version management would enable for eg Europe PMC users to know whether a preprint has other versions, which version they are viewing, which version they are linking to their ORCID profile, or citing. So Europe PMC is building in functionality to reveal the preprint history and links to other versions of the same study. This is in addition to exposing the ‘precursor’ relationship of any indexed preprint relative to a subsequent publication version 1 version x
Video demonstration
Views per month Attracting a huge amount of attention in the life sciences: 77 011 preprints indexed on July 2nd, 2019 , >76.7K views in June 2019 vs 35.7M indexed abstracts on July 2nd, 2019 13.3M abstract views in June 2019 https://datastudio.google.com/reporting/1MLBIwKW2qGWBTfBfFvjDLanNW-46_jPg/page/9uSO
Summary Offers increased transparency about how the article evolved Extending provenance Providing multiple entry points to the PID graph Preprints Journal Article version 1 version x … What I’ve told you is that exposing to users the versioning information for an article, this is -extending the provenance information for that article and -providing multiple entry points to the literature PID graph
Questions? Christine Ferguson, PhD https://orcid.org/0000-0002-9317-6819 @ChrisVFerg ferguson@ebi.ac.uk FREYA project has received funding from the European Union’s Horizon2020 research and innovation programme under grant agreement No 777523.