Presentation is loading. Please wait.

Presentation is loading. Please wait.

Publishing Data Workflows RDA Plenary 5 -- March 11, 2015 Session Chairs: Amy Nurnberger and Mary Vardigan Please sign in:

Similar presentations


Presentation on theme: "Publishing Data Workflows RDA Plenary 5 -- March 11, 2015 Session Chairs: Amy Nurnberger and Mary Vardigan Please sign in:"— Presentation transcript:

1 Publishing Data Workflows RDA Plenary 5 -- March 11, 2015 Session Chairs: Amy Nurnberger and Mary Vardigan Please sign in: http://bit.ly/1Hju0LM

2 Agenda Introduction: Objectives Progress so far Workflow Examples Get involved Dataverse workflow presentation SoftwareX workflow presentation Use case development Group notes document: http://bit.ly/1MlXysRhttp://bit.ly/1MlXysR

3 The working group members (currently) Theodora Bloom (BMJ) [CO-CHAIR] Sünje Dallmeier-Tiessen (Switzerland, CERN) [CO-CHAIR] Elizabeth Newbold (BL) [CO-CHAIR] Merce Crosas (US, Harvard University) Michael Diepenbroek (PANGAEA) Kim Finney (Australia, AADC) John Helly (US, UCSD) Brian Hole (Ubiquity Press, UK) Varsha Khodiyar (Nature Scientific Data) Hylke Koers (The Netherlands, Elsevier) Rebecca Lawrence (UK, F1000 Research Ltd.) Fiona Murphy (UK, Wiley-Blackwell) Others are very welcome ☺ Amy Nurnberger (US, Columbia University Libraries) Lisa Raymond (US, Library Woods Hole Oceanographic Institution) Johanna Schwarz (Germany, Springer) Jonathan Tedds (UK, University of Leicester) Mary Vardigan (US, ICPSR) Ruth Wilson (UK, Nature) Eva Zanzerkia (US, NSF) Angus Whyte (UK, DCC) And growing…

4 Background and Motivation Only a small fraction of research data is preserved and shared, often with a bare minimum of metadata Often due to the lack of “established” or “trusted” services and workflows But there are established or emerging workflows! Usually in selected disciplines, e.g., Earth Sciences Some provide credit via citation mechanisms

5 Objectives Provide an analysis of a representative range of existing and emerging workflows and standards for data publishing Including deposit and citation Provide reference models, a “classification” Test implementations of key components for application in new workflows Illustrate the benefits of the reference models for researchers and organisations

6 Relevance Information about workflows crucial for researchers and other stakeholders to understand the options available to practice open science Helps to illustrate different possibilities for data sharing, leading to more efficient and reliable reuse of research data Shows those involved in research data where they fit in the overall scheme of things

7 More detailed work programme Identification of a smaller set of reference models covering a range of such workflows to include: For example, when and where QA/QC and data peer-review fit into the publishing process Who does what and when… Automated vs. “manual” processes Selection of key use cases and organizations in which components of a reference model can be implemented and tested for suitability For example: dedicated data peer review For example: metadata checks

8 First results of workflow analysis

9 http://tinyurl.com/mvtbrek

10 Workflows in the current list - STFC Data centre - NSIDC Data centre - ENVRI reference model - OJS/ Dataverse - INSPIRE Digital library - NPG (PubChem & Scientific Data) Publisher - UK Data Archive/Service - PREPARDE (NCAR CISL) - Ocean Data Publication Cookbook (UNESCO IOC) - PURR Institutional repository - ICPSR - Edinburgh Datashare - F1000 Research - Ubiquity Press: Open Health Data Journal+... - PANGAEA - Data Publisher for Earth and Environmental Sciences - WDC Climate - Data Publisher for Climate Sciences - CMIP / IPCC DDC - International project series in Climate Sciences - GigaScience - Dryad digital repository with integrated journals workflow - Stanford Digital Repository - Academic Commons: Columbia University Institutional Research Repository - Elsevier: Data in Brief - Integrated data publishing solution at Elsevier [through “traditional” journals]

11 Categories we are looking at Discipline Function of workflow PID assignment to dataset PID type -- e.g., DOI, ARK, etc. Peer review of data (e.g., by researcher & editorial review) Curatorial review of metadata (e.g., by institutional or subject repository?) Technical review & checks (e.g., for data integrity at repository/data centre on ingest) Discoverability: Indexing of the data -- if yes, where? Formats covered Persons/Roles involved, e.g., editor, publisher, data repository manager, etc. Link to data paper or “standalone” data Links to grants, usage of author PIDs Data citation facilitated Data life cycle referred to Standards compliance

12 Observations The researcher/author generally initiates the workflow Discipline-specific repositories have the most rigorous ingest and review processes -- more general institutional repositories have a lighter touch Journals vs. repositories: For the former, any peer review is conducted externally, for many of the latter it is internal

13 Repository view

14 Data Deposit Ingest Quality Assurance Data Management LT Archiving Dissemination Access Producer Consumer/ Reuse Simplified generic repository workflow Researcher with a central role: submission/deposition Review/QA mainly internal

15 Data Deposit Ingest Quality Assurance Light Data Management LT Archiving Dissemination Access Producer Consumer (disciplinary) Ingest Quality Assurance Detailed Project Repositories: Data are published in a federated data infrastructure Data are added and corrected Poor documentation Usually no data backup Light-weight quality assurance against intl. and project standards Tendency that the project data never become stable Currently no PIDs assigned or reserved but Handles planned Long-term Archive: Data are archived for the long term at a single location Data are stable and curated Detailed documentation Data backup/redundancy Quality assurance process is more detailed and includes a review Data is a “snapshot” of the project data at a certain time DOIs assigned to data collections Consumer (interdisciplinary) Dissemination Access Designed by M. Stockhause

16 Lessons Learnt and questions Very diverse landscape Discipline-specific and cross-discipline actions Quality assurance a big topic in discipline-specific repositories Widespread persistent identification Data citation awareness Challenge: Bidirectional data-publication linking Challenge: Versioning

17 Publisher’s perspective

18 Article preparation Data Submission Article submission Peer Review Process Editing Producer Consumer/ Reuse Simplified generic publisher workflow Researcher takes over several roles: submitter, reviewer, editor potentially? Who takes on which role and responsibility? -Article/data container -Separate article and datasets Publishing

19 Example: Dryad repository integrated with journals

20 Lessons learnt and questions Recommended repositories for collaboration? Who decides/how? External review Open, plus invitation Closed, upon invitation Blind Emerging data and software journal landscape: no information yet on uptake

21 Current and future work

22 How to get involved Contribute to the workflow analysis: http://bit.ly/1BBQQPW Contribute your own workflow “walk-throughs” and use cases Tell us what is needed for a “successful” workflow in your institute/discipline … Moving to implementation Tell us if you are interested to learn from a specific example or are maybe considering implementing data publishing workflows Tell us if you have code/documentation to share

23 Break for presentations Dataverse: Eleni Castro SoftwareX: Hylke Koers

24 DATA PUBLISHING WORKFLOWS WITH DATAVERSE Eleni Castro (ecastro@fas.harvard.edu) Institute for Quantitative Social Science (IQSS) Harvard University RDA 5 th Plenary WG RDA/WDS Publishing Data Workflows March 11, 2015

25 An Integrated & Automated Journal / Data Publishing Workflow Features for automatic data citation insertion into article. Workflows + features for reviewing data before article publication. Long term preservation + persistent access to dataset. New versions of a dataset induce new research. Automatic integration w/ data repositories (common repository API). Code Submit Review Publish Reuse, Validate & Extend Prepare new submission Features for automatic data citation insertion into article. Workflows + features for reviewing data before article publication. Long term preservation + persistent access to dataset. New versions of a dataset induce new research. Automatic integration w/ data repositories (common repository API). Code Submit Review Publish Reuse, Validate & Extend Prepare new submission Features for automatic data citation insertion into article. Workflows + features for reviewing data before article publication. Long term preservation + persistent access to dataset. New versions of a dataset induce new research. Automatic integration w/ data repositories (common repository API). Code Submit Review Publish Reuse, Validate & Extend Prepare new submission Features for automatic data citation insertion into article. Workflows + features for reviewing data before article publication. Long term preservation + persistent access to dataset. New versions of a dataset induce new research. Automatic integration w/ data repositories (common repository API). Code Submit Review Publish Reuse, Validate & Extend Prepare new submission Features for automatic data citation insertion into article. Workflows + features for reviewing data before article publication. Long term preservation + persistent access to dataset. New versions of a dataset induce new research. Automatic integration w/ data repositories (common repository API). Code Submit Review Publish Reuse, Validate & Extend Prepare new submission Features for automatic data citation insertion into article. Workflows + features for reviewing data before article publication. Long term preservation + persistent access to dataset. New versions of a dataset induce new research. Automatic integration w/ data repositories (common repository API). Code Submit Review Publish Reuse, Validate & Extend Prepare new submission Features for automatic data citation insertion into article. Workflows + features for reviewing data before article publication. Long term preservation + persistent access to dataset. New versions of a dataset induce new research. Automatic integration w/ data repositories (common repository API). Code Submit Review Publish Reuse, Validate & Extend Prepare new submission 25 Journal Repository

26 Current Workflows in Dataverse: To Connect Data to Journals A. Journals include Dataverse as a Recommended Repository B. Authors Contribute Directly to a Journal’s Dataverse C. Automated Integration of Journal + Dataverse (e.g., OJS) 26

27 Example of Option C: Phase 1 OJS / Dataverse Integration Integrating Open Journal Systems (OJS) with Dataverse Reference Implementation: Automated via SWORD API Pilot with ~ 50 journals + expand to 1000s using OJS. Dataverse plugin is automatically available w/ OJS. Future: Embed Dataverse widgets into journal article. http://projects.iq.harvard.edu/ojs-dvn 27 Project Details: 2012-2014

28 In the Backend: Technical Workflow Client sends: XML file: AtomPub "entry” with Dublin Core Terms (e.g., title, creator, isReferencedBy (article citation), …) Zip file: All data files associated with that dataset. Repository sends: XML file: “Deposit Receipt” send data citation from repository to client. Plus updates from client to server during lifecycle (CRUD): In review, reject (delete), publish first version, update new versions. 28

29 On the Frontend: OJS Dataverse Plugin Walkthrough 29

30 Journal Manager Sets Up Plugin in OJS 30

31 Journal Manager Sets Up Data Policies Read full Data Policies / Guidelines Template: http://bit.ly/1xkLjoZhttp://bit.ly/1xkLjoZ Including Guidelines for: 1)Authors (data citation) 2)Reviewers 3)Copyeditors 31

32 Author Submits Manuscript + Data (1) 32

33 Author Submits Manuscript + Data (2) Option to: (a) deposit into Dataverse OR; (b) if data is already in a repository can include the data citation (w/ persistent URL/identifier). 33 To-Do: Support for adding multiple datasets to a journal article.

34 Editor Reviews Article + Data 34

35 Approved = Data Published in Dataverse When issue is published: 1) URL to Article displays in Dataverse. 2) Data Citation shows up in OJS Article (see next slide). 35 1 1 2 2

36 Article in OJS: Published w/ Data Citation 36

37 Video of OJS Dataverse Plugin Demo 37 http://bit.ly/1D1hphu

38 Phase 2: Expansion of API + Workflows Features for automatic data citation insertion into article. Workflows + features for reviewing data before article publication. Long term preservation + persistent access to dataset. New versions of a dataset induce new research. Automatic integration w/ data repositories (common repository API). Code Submit Review Publish Reuse, Validate & Extend Prepare new submission Features for automatic data citation insertion into article. Workflows + features for reviewing data before article publication. Long term preservation + persistent access to dataset. New versions of a dataset induce new research. Automatic integration w/ data repositories (common repository API). Code Submit Review Publish Reuse, Validate & Extend Prepare new submission Features for automatic data citation insertion into article. Workflows + features for reviewing data before article publication. Long term preservation + persistent access to dataset. New versions of a dataset induce new research. Automatic integration w/ data repositories (common repository API). Code Submit Review Publish Reuse, Validate & Extend Prepare new submission Features for automatic data citation insertion into article. Workflows + features for reviewing data before article publication. Long term preservation + persistent access to dataset. New versions of a dataset induce new research. Automatic integration w/ data repositories (common repository API). Code Submit Review Publish Reuse, Validate & Extend Prepare new submission Features for automatic data citation insertion into article. Workflows + features for reviewing data before article publication. Long term preservation + persistent access to dataset. New versions of a dataset induce new research. Automatic integration w/ data repositories (common repository API). Code Submit Review Publish Reuse, Validate & Extend Prepare new submission Features for automatic data citation insertion into article. Workflows + features for reviewing data before article publication. Long term preservation + persistent access to dataset. New versions of a dataset induce new research. Automatic integration w/ data repositories (common repository API). Code Submit Review Publish Reuse, Validate & Extend Prepare new submission Features for automatic data citation insertion into article. Workflows + features for reviewing data before article publication. Long term preservation + persistent access to dataset. New versions of a dataset induce new research. Automatic integration w/ data repositories (common repository API). Code Submit Review Publish Reuse, Validate & Extend Prepare new submission 38 2015-2016 (collaboration w/ Odum Institute) 1. Expand to more journals, publishing systems, & workflows 2.Develop Community-Based Repository API Standard: Work w/ RDA, WDS, Data FAIRport, FORCE11, CODATA, etc…  Should we extend the Repository API beyond SWORD?  Support for additional Metadata Schemas & fields (non-DC)?  Support for more/which dataset review workflows? Project Goals Project Questions

39 How Do I Get Involved? 39 1 1 Sign up to Contribute: Repositories Workshop + Dataverse Community Meeting June 9-11, 2015 @ Harvard http://bit.ly/1A51atJhttp://bit.ly/1A51atJ Find Out More: * Visit our Collaborations page: http://bit.ly/1Bg2nkwhttp://bit.ly/1Bg2nkw * Dataverse Project Site: http://dataverse.org Contact Project Coordinator: Eleni Castro (ecastro@fas.harvard.edu) 1 1 2 2 3 3

40 Thank You! Any Questions? 40 Contact Me: Eleni Castro (ecastro@fas.harvard.edu)

41 Hylke Koers, Head of Content Innovation, Elsevier RDA Plenary 5, San Diego SoftwareX – a home for research software

42 | 42 Open Access Software (like data) is high-value but hard to access Researcher survey, 3824 respondents (Publishing Research Consortium, 2010) Importance of access Ease of access High value & easy access High value & difficult to access

43 | 43 Open Access Many scholars develop software, but current paper based system does not capture this “born digital” research output systematically Users (readers) can’t find this valuable content Developers (authors) can’t claim credit Software is a research method in its own right – and deserved to receive full academic recognition Why SoftwareX?

44 | 44 Open Access SoftwareX: a home for research software SoftwareX aims to acknowledge the impact of software on today's research practice, and on new scientific discoveries in almost all research domains. SoftwareX also aims to stress the importance of the software developers who are, in part, responsible for this impact. To this end, SoftwareX aims to support publication of research software in such a way that: The software is provided with a peer-reviewed recognition of scientific impact The software developers are given the academic credit they deserve; The software is citable, allowing traditional metrics of scientific excellence to apply; The academic career paths of software developers are supported rather than hindered; The software is publicly available for inspection, validation, and re-use. Above all, SoftwareX aims to inform researchers about software applications, tools and libraries with a (proven) potential to impact the process of scientific discovery in various domains From “Aims & Scope”, see http://www.journals.elsevier.com/softwarexhttp://www.journals.elsevier.com/softwarex

45 | 45 Open Access SoftwareX: a home for research software Publishing “Original Software Publications”: - The software and code can include post publication updates - Metadata is systematically captured Article is Open Access under CC-BY license All software and code published is, and will remain, fully owned by their developers. Peer-reviewed; dedicated software Editors & Reviewers Multi-disciplinary Submission in 3 easy steps GitHub repository to store and expose all software and code Launched at FORCE15 See http://www.journals.elsevier.com/softwarex/news/you-can-now-submit-your-software-to-softwarex/

46 | 46 Open Access How does it work? How to submit your software to SoftwareX in 3 easy steps: 1.Select a repository for your software or pack your software into a zip file or archive. Remember to make your software public so that the reviewers and readers can find it. 2.Download the template for the OSP manuscript, and write your article describing your software following this template.Download 3.Submit your OSP manuscript via the SoftwareX submission site.SoftwareX submission site After review and acceptance, software and/or code will be copied to the journal archive on GitHub and integrated with the online version of your Original Software Publication available on ScienceDirect. See http://www.journals.elsevier.com/softwarexhttp://www.journals.elsevier.com/softwarex

47 | 47 Open Access Template contains structured metadata NrCode metadata descriptionPlease fill in this column C1Current code versionFor example v42 C2 Permanent link to code/repository used of this code version For example: https://github.com/mozart/mozart2 C3Legal Code LicenseList one of the approved licenses C4Code versioning system usedFor example svn, git, mercurial, etc. put none if none C5 Software code languages, tools, and services used For example C++, python, r, MPI, OpenCL, etc. C6 Compilation requirements, operating environments & dependencies C7 If available Link to developer documentation/manual For example: http://mozart.github.io/documentation/ C8Support email for questions

48 | 48 Open Access Template contains structured metadata Nr (Executable) software metadata description Please fill in this column S1Current software versionfor example 1.1, 2.4 etc. S2 Permanent link to executables of this version For example: https://github.com/combogenomics/DuctApe/relea ses/tag/DuctApe-0.16.4 S3Legal Software LicenseList one of the approved licenses S4Computing platforms/Operating Systems For example Android, BSD, iOS, Linux, OS X, Microsoft Windows, Unix-like, IBM z/OS, distributed/web based etc. S5Installation requirements & dependencies S6 If available, link to user manual - if formally published include a reference to the publication in the reference list For example: http://mozart.github.io/documentation/ S7Support email for questions

49 | 49 Open Access Flexible range of open-source licenses for computer code Apache License, 2.0 (Apache-2.0) BSD 3-Clause "New" or "Revised" license (BSD-3-Clause) BSD 3-Clause "Simplified" or "FreeBSD" license (BSD-2-Clause) GNU General Public License (GPL) GNU Library or "Lesser" General Public License (LGPL) MIT license (MIT) Mozilla Public License 2.0 (MPL-2.0) Common Development and Distribution License (CDDL-1.0) Eclipse Public License (EPL-1.0) Creative Commons Zero (CC0)

50 | 50 Open Access And now.. The moment you have all been waiting for…

51 | 51 Open Access A workflow diagram Researcher has code and paper Submits to journal as OSP + code (supp. mat.) Editorial + peer- review process Code made available on journal GitHub instance Bi-directional links OSP published on ScienceDirect

52 | 52 Open Access A workflow diagram Editorial + peer- review process Code made available on journal GitHub instance Bi-directional links OSP published on ScienceDirect Code deposited to (or build on) code repository OSP submitted to journal OSP linked with code

53 | 53 Open Access Thank you! Any questions?

54 Discussion

55 Use case development

56 Developing use cases for workflows ● The tools ○ Part A: http://goo.gl/forms/Wkc7KyxvX5http://goo.gl/forms/Wkc7KyxvX5 ○ Part B: http://goo.gl/forms/ZFRrzG6krXhttp://goo.gl/forms/ZFRrzG6krX ● The process ○ Walk through the tools ○ Form up in groups ○ Generate use cases

57 The tools: Part A http://goo.gl/forms/Wkc7KyxvX5 http://goo.gl/forms/Wkc7KyxvX5

58 The tools: Part A http://goo.gl/forms/Wkc7KyxvX5 http://goo.gl/forms/Wkc7KyxvX5

59 The tools: Part A http://goo.gl/forms/Wkc7KyxvX5 http://goo.gl/forms/Wkc7KyxvX5

60 The tools: Part A http://goo.gl/forms/Wkc7KyxvX5 http://goo.gl/forms/Wkc7KyxvX5

61 The tools: Part A http://goo.gl/forms/Wkc7KyxvX5 http://goo.gl/forms/Wkc7KyxvX5 Thank you! You have completed Part A of this use case. For the next part, you will be completing multiples of a form, to address each individual actor listed in this use case. Click this to get to Part B: http://goo.gl/forms/ZFRrzG6krX

62 The tools: Part B http://goo.gl/forms/ZFRrzG6krX http://goo.gl/forms/ZFRrzG6krX

63 The tools: Part B http://goo.gl/forms/ZFRrzG6krX http://goo.gl/forms/ZFRrzG6krX

64 The tools: Part B http://goo.gl/forms/ZFRrzG6krX http://goo.gl/forms/ZFRrzG6krX

65 The tools: Part B http://goo.gl/forms/ZFRrzG6krX http://goo.gl/forms/ZFRrzG6krX

66 Group up! ● The tools ○ Part A: http://goo.gl/forms/Wkc7KyxvX5http://goo.gl/forms/Wkc7KyxvX5 ○ Part B: http://goo.gl/forms/ZFRrzG6krXhttp://goo.gl/forms/ZFRrzG6krX


Download ppt "Publishing Data Workflows RDA Plenary 5 -- March 11, 2015 Session Chairs: Amy Nurnberger and Mary Vardigan Please sign in:"

Similar presentations


Ads by Google