Adoption of Data Citation Outcomes by BCO-DMO Cynthia Chandler, Adam Shepherd, David Bassendine Biological and Chemical Oceanography Data Management.

1 Adoption of Data Citation Outcomes by BCO-DMO Cynthia Chandler, Adam Shepherd, David Bassendine Biological and Chemical Oceanography Data Management Office Woods Hole Oceanographic Institution and Blue Dot Labs Chandler, CL; A. Shepherd; D. Bassendine (2016) “Adoption of Data Citation Outcomes by BCO-DMO”. Presented at Research Data Alliance Plenary Meeting September Denver, CO. This brief presentation was part of a series of adoption reports presented during an RDA P8 plenary session. 15-17 September 2016

2 A story of success enabled by RDA
An existing repository ( Marine research data curation since 2006 Faced with new challenges, but no new funding e.g. data publication practices to support citation Used the outcomes from the RDA Data Citation Working Group to improve data publication and citation services

3 BCO-DMO curated data are Served: (URLs, URIs)
BCO-DMO is a thematic, domain-specific repository funded by NSF Ocean Sciences and Polar Programs BCO-DMO curated data are Served: (URLs, URIs) Published: at an Institutional Repository (CrossRef DOI) Archived: at NCEI, a US National Data Center WHOAS: Woods Hole Open Access Server Example: Linked Data URI: NCEI: DOI: Also harvested by and discoverable from DataONE for Linked Data URI:

4 BCO-DMO Dataset Landing Page (Mar ‘16)
The larval krill dataset with measurements from 2001 and The full dataset has been archived at NCEI and assigned an accession number. Data used for a publication may only include a portion of the data. Ideally the appropriate subset of data would be published out and assigned a DOI, enabling subsequent retrieval of the exact data used for a publication.

5 Initial Architecture Design Considerations (Jan 2016)

6 Modified Architecture (March 2016)
Opportunity to update our information model

7 BCO-DMO Data Publication System Components
BCO-DMO publishes data to WHOAS and a DOI is assigned. As of June 2016, the BCO-DMO architecture supports data versioning. In the current BCO-DMO data management system architecture data are served via the BCO-DMO website, with parallel options for publishing data at the WHOAS Institutional Repository and archiving data at the appropriate NCEI national data center. In the current BCO-DMO system architecture a Drupal content management system provides access to the metadata catalog and direct access to data via a URL. The Drupal MySQL content is published out in a variety of forms one of which is Linked Data. When the data are declared ‘Final, no updates expected’, a package of metadata and data can be exported from the BCO-DMO system and submitted to the Woods Hole Open Access System (WHOAS). The WHOAS is an Institutional Repository (IR) hosted by the WHOI Data Library and Archives (DLA) which is part of the larger Marine Biological Laboratory and Woods Hole Oceanographic Institution (MBLWHOI) Library system located in Woods Hole, MA. A Digital Object Identifier (DOI) is assigned when the package is deposited in WHOAS, with reciprocal links entered at BCO-DMO and WHOAS to connect the package at WHOAS with the record in the BCO-DMO catalog. The DOI resolves to the WHOAS dataset landing page, and a Linked Data URI resolves to the BCO-DMO landing page for the same dataset. However, the current system only supports one version of a dataset. NCEI is the appropriate National Data Center in the US for ocean research data WHOAS is the local OAIS-compliant institutional repository BCO-DMO publishes data to WHOAS and a DOI is assigned. The BCO-DMO architecture now supports data versioning.

8 BCO-DMO Data Citation System Components
Data managed by BCO-DMO are published at WHOAS (the Institutional Repository (IR) curated by the MBLWHOI Data Library and Archives (DLA)), archived at NCEI and harvested by DataONE (an NSF funded harvesting system for environmental data). New data version assigned a new DOI (handle is versioned if only metadata changes) New capability (implemented): procedure: when a BCO-DMO data set is updated … A copy of the previous version is preserved Request a DOI for the new version of data Publish data, and create new landing page for new version of data, with new DOI assigned BCO-DMO database has links to all versions of the data (archived and published) Both archive and published dataset landing pages have links back to best version of full dataset at BCO-DMO BCO-DMO data set landing page displays links to all archived and published versions

9 BCO-DMO Data Set Landing Page
Versioned dataset example: original indicates the metadata only changed DOI: /1912/6421 Dataset URL: Maas - Pteropod respiration rates LOD URI: Data published at WHOAS: doi: /1912/bco-dmo

10 Published dataset DOI NSF award numbers BCO-DMO dataset URI (published out as Linked Data)

11 BCO-DMO Data Set Landing Page
LOD URI: Data published at WHOAS: doi: /1912/bco-dmo Linked from BCO-DMO dataset landing page to:

12 Linked to Publication via DOI
Linking data sets with publications using DOIs

13 New Capabilities … BCO-DMO becoming a DataONE Member Node

14 New Capabilities … BCO-DMO Data Set Citation
LOD URI: BCO-DMO Dataset landing page with new citation button CC by 4.0 license with suggested Citation text Based on DOI Data published at WHOAS: doi: /1912/bco-dmo Using the DOI citation formatter service: Doesn’t work yet for our DOIs (cause we don’t have enough DOI metadata), but when it does, it will return something like this Cite as: Twining, B. (2016). “Element Quotas of Individual Synechococcus Cells Collected During Bermuda Atlantic Time-Series Study (BATS) Cruises Aboard the R/V Atlantic Explorer Between Dates and ”. Version 05/06/2016. Biological and Chemical Oceanography Data Management Office (BCO-DMO) Dataset. doi: /1912/bco-dmo [access date]

15 Thank you … To the Data Citation Working Group for their efforts RDA US and MacArthur Foundation for funding this adoption project TIMELINE: Redesign/protoype completed by 1 June 2016 New citation recommendation by 1 Sep 2016 Report out at RDA P8 (Denver, CO) September 2016 Final report by 1 December 2016 Cyndy Chandler @cynDC @bcodmo ORCID:

17 Adoption of Data Citation Outputs
Evaluation Evaluate recommendations (done December 2015) Try implementation in existing BCO-DMO architecture (work began 4 April 2016) Trial BCO-DMO: R1-11 fit well with current architecture; R12 doable; test as part of DataONE node membership; R13-14 are consistent with Linked Data approach to data publication and sharing NOTE: adoption grant received from RDA US (April 2016)

18 RDA Data Citation (DC) of evolving data
DC goals: to create identification mechanisms that: allow us to identify and cite arbitrary views of data, from a single record to an entire data set in a precise, machine-actionable manner allow us to cite and retrieve that data as it existed at a certain point in time, whether the database is static or highly dynamic DC outcomes: 14 recommendations and associated documentation ensuring that data are stored in a versioned and timestamped manner identifying data sets by storing and assigning persistent identifiers (PIDs) to timestamped queries that can be re-executed against the timestamped data store More information from RDA site URL: In addition to BCO-DMO, there are already 8 other pilot studies that have expressed interest in adopting the DC WG recommendations: ARGO Austrian Centre for Digital Humanities BCO-DMO Center for Biomedical Informatics (CBMI), Washington University, St. Louis Climate Change Centre Austria ENVRIplus Natural History Museum London Ocean Networks Canada Virtual Atomic and Molecular Data Centre

19 RDA Data Citation WG Recommendations
»» Data Versioning: For retrieving earlier states of datasets the data need to be versioned. Markers shall indicate inserts, updates and deletes of data in the database. »» Data Timestamping: Ensure that operations on data are timestamped, i.e. any additions, deletions are marked with a timestamp. »» Data Identification: The data used shall be identified via a PID pointing to a time-stamped query, resolving to a landing page. Oct 2015 version w/ 14 recommendations DC WG chairs: Andreas Rauber, Ari Asmi, Dieter van Uytvanck

20 New capability (implemented)
procedure: when a BCO-DMO data set is updated … A copy of the previous version is preserved Request a DOI for the new version of data Publish data, and create new landing page for new version of data, with new DOI assigned BCO-DMO database has links to all versions of the data (archived and published) Both archive and published dataset landing pages have links back to best version of full dataset at BCO-DMO BCO-DMO data set landing page displays links to all archived and published versions

21 Extended description of recommendations
REFERENCES Extended description of recommendations Altman and Crosas “Evolution of Data Citation …” CODATA-ICSTI “Out of cite, out of mind” FORCE11 R. E. Duerr, et al. “On the utility of identification schemes for digital earth science data”, ESI, 2011. Altman and Crosas “Evolution of Data Citation Altman, M., & Crosas, M. (2013). The evolution of data citation: from principles to implementation. IAssist Quarterly, 37(1-4), CODATA-ICSTI 2013 “Out of cite, out of mind” Data Science Journal Vol. 12 (2013) p. CIDCR1-CIDCR75 Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data R. E. Duerr, et al “On the utility of identification schemes for digital earth science data” DOI: /s (online version, open access) Duerr, R. E., Downs, R. R., Tilmes, C., Barkstrom, B., Lenhardt, W. C., Glassy, J., ... & Slaughter, P. (2011). On the utility of identification schemes for digital earth science data: an assessment and recommendations. Earth Science Informatics, 4(3),

