Managing Dataset DOIs and Versions in a Changing Archive Steven Worley Bob Dattore Zaihua Ji National Center for Atmospheric Research Boulder, Colorado,

Slides:



Advertisements
Similar presentations
Conversion of CPC Monitoring and Forecast Products to GIS Format Viviane Silva Lloyd Thomas, Mike Halpert and Wayne Higgins.
Advertisements

Managing References : Mendeley
EndNote Web Reference Management Software (module 5.1)
EndNote Web Reference Management Software (module 5)
Providing access to your data: Determining your audience Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Data management in SCD Steven Worley General Categories –The Mass Storage System –NCAR user file services (home directories) –Computer attached storage.
Lorrie Apple Johnson Lead Librarian, Information Analysis & Services Office of Scientific and Technical Information (OSTI) National Academy of Sciences.
ANDS Services – Release July 2014 Joel Benn.
New Resources in the Research Data Archive Doug Schuster.
ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley.
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
The International Surface Pressure Databank (ISPD) and Twentieth Century Reanalysis at NCAR Thomas Cram - NCAR, Boulder, CO Gilbert Compo & Chesley McColl.
Operational Dataset Update Functionality Included in the NCAR Research Data Archive Management System 1 Zaihua Ji Doug Schuster Steven Worley Computational.
Introduction Downloading and sifting through large volumes of data stored in differing formats can be a time-consuming and sometimes frustrating process.
October 16-18, Research Data Set Archives Steven Worley Scientific Computing Division Data Support Section.
The Case for Data Stewardship: Preserving the Scientific Record Matthew Mayernik National Center for Atmospheric Research Version 2.0 [Review Date]
DOI Registration for Social and Economic Data da|ra Brigitte Hausstein GESIS Leibniz-Institute for the Social Sciences, Berlin.
Agenda: DMWG SM policy status ESIP meeting recap Reminder - DM Webinar Series New and updated web pages on DM website Metadata Training Sessions CDI meeting.
Research Data at NCAR 1 August, 2002 Steven Worley Scientific Computing Division Data Support Section.
Data for Climate and Energy Studies Steven Worley Computational and Information Systems Laboratory NCAR.
The NOAA National Geophysical Data Center And Collocated World Data Service for Geophysics Dan Kowal Data Administrator, Information Services Division.
The National Center for Atmospheric Research is operated by the University Corporation for Atmospheric Research under sponsorship of the National Science.
Dataset Citation: From Pilot to Production Mark Martin Assistant Director, Office of Scientific and Technical Information U.S. Department of Energy.
Preserving the Scientific Record: Case Study 1 – National Snow & Ice Data Center (NSIDC) Glacier Photos Matthew Mayernik National Center for Atmospheric.
Scientific Investigations; Support from Research Data Archives for Joint Office for Science Support 26 February, 2002 Steven Worley SCD/DSS.
Archive and Access Practices that Support Data Reuse and Transparency Steven Worley Doug Schuster Bob Dattore National Center for Atmospheric Research.
Describe workflows used to maintain and provide the RDA to users – Both are 24x7 operations Transition to the NWSC with zero downtime NWSC is new environment.
Data Access to Marine Surface Observations and Products from COADS 29 January, 2002 Steven Worley National Center for Atmospheric Research.
ICOADS: Update Status and Data Distribution Steven J. Worley Scott D. Woodruff Sandra J. Lubker Ziahua Ji J. Eric Freeman NCAR, NOAA/ESRL, NOAA/NCDC CLIMAR-III,
Analyzed Data Products Available from NCAR that Support Marine Climate Research JCOMM ETMC-III 9-12 February 2010 Steven Worley Doug Schuster.
Data Discovery and Access to The International Surface Pressure Databank (ISPD) 1 Thomas Cram Gilbert P. Compo* Doug Schuster Chesley McColl* Steven Worley.
Content, Discovery, and Accessibility Enhancements to the NCAR Research Data Archive Doug Schuster and Steve Worley NCAR.
Preserving the Scientific Record: Case Study 2 – Arctic Temperature Variability Data Matthew Mayernik National Center for Atmospheric Research Version.
RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?
NOAA Data Citation Procedural Directive 8 November 2012 DAARWG.
“Dynamic” Data at BCO-DMO Biological and Chemical Oceanography Data Management Office (BCO-DMO) Shannon Rauch -- Danie Kinkade --
1 Not So Strange Bedfellows: Information Standards For Librarians AND Publishers November 6, 2015.
The TIGGE Model Validation Portal: An Improvement in Data Interoperability 1 Thomas Cram Doug Schuster Hannah Wilcox Steven Worley National Center for.
29 March 2004 Steven Worley, NSF/NCAR/SCD 1 Research Data Stewardship and Access Steven Worley, CISL/SCD Cyberinfrastructure meeting with Priscilla Nelson.
TIGGE Archive Status at NCAR THORPEX Workshop and 6th GIFS-TIGGE Working Group Meetings WMO Headquarters Geneva September 2008 Steven Worley Doug.
SCD Research Data Archives; Availability Through the CDP About 500 distinct datasets, 12 TB Diverse in type, size, and format Serving 900 different investigators.
Earth System Curator and Model Metadata Discovery and Display for CMIP5 Sylvia Murphy and Cecelia Deluca (NOAA/CIRES) Hannah Wilcox (NCAR/CISL) Metafor.
Preserving Electronic Mailing Lists as Scholarly Resources: The H-Net Archives Lisa M. Schmidt
The Research Data Archive at NCAR: A System Designed to Handle Diverse Datasets Bob Dattore and Steven Worley National Center for Atmospheric Research.
TIGGE Archive Access at NCAR Steven Worley Doug Schuster Dave Stepaniak Hannah Wilcox.
Research Data Archive (RDA) Access and Services from Yellowstone Grace Peng and Doug Schuster 1.
Data Discovery and Access to The International Surface Pressure Databank (ISPD) 1 Thomas Cram Gilbert P. Compo* Doug Schuster Chesley McColl* Steven Worley.
RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?
5-7 May 2003 SCD Exec_Retr 1 Research Data, May Archive Content New Archive Developments Archive Access and Provision.
1 Digital Object Identifiers Update ESIP Data Stewardship Committee Meeting May 16, 2016 Presenters: Nate James, ESDIS Lalit Wanchoo, ADNET Systems Inc.
The Case for Data Stewardship: Preserving the Scientific Record Matthew Mayernik National Center for Atmospheric Research Section: The Case for Data Stewardship.
The TIGGE Model Validation Portal: An Improvement in Data Interoperability 1 Thomas Cram Doug Schuster Hannah Wilcox Michael Burek Eric Nienhouse Steven.
1. Gridded Data Sub-setting Services through the RDA at NCAR Doug Schuster, Steve Worley, Bob Dattore, Dave Stepaniak.
A41I-0105 Supporting Decadal and Regional Climate Prediction through NCAR’s EaSM Data Portal Doug Schuster and Steve Worley National Center for Atmospheric.
Introduction What purpose does a data archive center serve if users can’t find or access the holdings they might need to facilitate their research discoveries?
The National Center for Atmospheric Research is operated by the University Corporation for Atmospheric Research under sponsorship of the National Science.
RDA WG on Dynamic Data Citation
Progress Collaborations FUTURE
Federation of Earth Science Information Partners (ESIP)
Persistent Identifiers Implementation in EOSDIS
New input for CEOS Persistent Identifier Best Practices
Operational Dataset Update Functionality Included in the NCAR Research Data Archive Management System Zaihua Ji Doug Schuster Steven Worley Computational.
Research Data Archives at NCAR
WGISS Connected Data Assets Oct 24, 2018 Yonsook Enloe
A Case Study for Synergistically Implementing the Management of Open Data Robert R. Downs NASA Socioeconomic Data and Applications.
Comeaux and Worley, NSF/NCAR/SCD
Long-Lived Data Collections
Data Management Components for a Research Data Archive
Robert Dattore and Steven Worley
Comeaux and Worley, NSF/NCAR/SCD
Presentation transcript:

Managing Dataset DOIs and Versions in a Changing Archive Steven Worley Bob Dattore Zaihua Ji National Center for Atmospheric Research Boulder, Colorado, USA The National Center for Atmospheric Research is operated by the University Corporation for Atmospheric Research under sponsorship of the National Science Foundation

Topics RDA Background Use Cases User DOI Services

Research Data Archive (RDA) at NCAR distinct datasets for climate and weather research, 8M Files 2.Collections: ocean & atmosphere observations, analyses, reanalyses, operational NWP outputs 3.Free and open access

Technical Approach – MySQL DB DB records for each file – DOI – Internal Version Control (IVC) setting – Date and Time Stamp (DTS) of file activities Other features – Maintain file to dataset relationship – Maintain history of file activities – Tracks user access via registration and login

Use Case 1 – Create a new DOI dataset All files one-to-one match on tape (offline) and online storage – Exceptions: permit Endian byte swap, standard file packaging (tar, gzip, htar, etc.) Mint a new DOI for DataCite through EZID service T0T1 T2 T3 DOI (1), Use Case 1 Internal Version Control (IVC) = 1 Data and Time Stamp (DTS) = T0

Use Case 2 – Complete dataset replacement (e.g. new data from the provider) RDA dataset landing page (URL) is unchanged – Metadata (discovery, file content) updated Assign new DOI Old version – Files offline – tape archive – File-set permanently frozen – Create new landing page (URL) for old DOI Inform user of options – Go to new DOI – Initiate recovery of old DOI file-set – Update the URL in DataCite metadata via EZID

T0T1 T2 T3 DOI (2), Use Case 2DOI (2) IVC = 1 DTS = T0 DOI (2a) IVC = 1 DTS = T2

Use Case 3 – Routine dataset extension in time Add new files – Inherit existing DOI and IVC – Log DTS into DB – Allow adding data to the newest file E.G. Adding monthly data to an annual file Update DTS – Data replacement is not permitted Regularly update temporal coverage in DataCite metadata via EZID – Frequency: monthly or weekly (TBD)

T0T1 T2 T3 DOI (3), Use Case 3 IVC = 1 DTS = T0 DTS = T2 DTS= T3

Use Case 4 – Removal of a DOI dataset Update DataCite metadata so DOI resolves to a special dead dataset landing page (URL) Landing page explains status and options 1.File set is preserved and can be restaged Use Case 2, recover from tape (offline) archive 2.File set has been deleted from the system Explanation required

T0T1 T2 T3 DOI (4), Use Case 4DOI (4) IVC = 1 DTS = T0

Use Case 5 – Small scale replacement (fixes) within a dataset Erroneous files are removed from the file-set – Files permanently preserved – IVC and DTS are saved as history in DB Actions to replace a file – Incremented IVC, nn nn+1 – Re-assign IVC across complete file set – Add IVC notation to replacement file base-name » noaa_CFR_hourly_1988_2mTemp_IVC2.grb DOI remains unchanged

T0T1 T2 T3 DOI (5), Use Case 5 IVC = 1 File Replacement IVC = 2 DTS = T0 IVC = 3 DTS = T0 DTS = T1 File Replacement DTS = T1 DTS = T2

User DOI services Citation design – ESIP Guidelines Compo, G. P., et al International Surface Pressure Databank (ISPDv2) 1768 to Research Data Archive at the National Center for Atmospheric Research, Computational and Information Systems Laboratory. Accessed § dd mmm yyyy. § Please fill in the Accessed date with the day, month and year (e.g. – 5 Aug 2011) you last accessed the data from the RDA. Also offer AMS, AGU, DataCite styles as an option. RIS Download standard metadata for citation management software, e.g Endnote, Zotero, etc.

User DOI services Three ways to get a citation 1.Generic dataset citation, from RDA portal 2.Download service (scripts, subsetting): Provide complete dataset citation, including Accessed on date. 3.Generate citations on demand at a later time: – Display user specific access activities Utilize registration information – Allow activity selection – Create the complete citation

Some Outstanding Challenges No limit on data sharing after extraction from the RDA – Could lose ability to provide accurate citations Have not designed a way to tag an access event with the software ID used to enable it – E.g. format conversion, regridding, server-side computations Have not designed a systematic way to couple DOIs from the RDA with nearly identical or related datasets – Could be managed with metadata enhancements

Conclusions Managing DOIs for a dynamic archive has complications – Full dataset replacements – Dataset retirements – Routine dataset extension – Stewardship improvements – data fixes, patches, etc Implementation – keep records for each file, including: – DOI – Internal Version Control – Date and Time Stamp Provide users options to create citations, base on ESIP recommendations

Questions? RDA: DataCite: EZID: ESIP: wardship/Citations (Federation of Earth Science Information Partners)