Preservation Strategies: Framing The Approach Nancy Hoebelheinrich Knowledge Motifs LLC Data Management Workshop American Geophysical Union San Francisco, CA Tuesday, December 6, 2011
Overview Preservation strategies to pursue once the argument for data stewardship & data preservation is won Background of previous issues & discussions re: data stewardship & data management Provides a framework of questions that a scientist can answer to facilitate the preservation of his/her data for the long term
Relevance to Data Management Why is this important???? As a metaphorical example, consider the following situation:
Relevance to Data Management Documentation for My Latest Research Project To Data Manager: Don’t worry, the connections are all there [– in my head!]
Relevance to Data Management Documentation for My Latest Research Project To Data Manager: See, here’s the primary algorithm I used…
Relevance to Data Management Documentation for My Latest Research Project To Data Manager: Here’s the schedule we used to gather the data – although some months it was a little different…
Relevance to Data Management Documentation for My Latest Research Project To Data Manager: Oh, and here’s the team – our PI wasn’t available for the photo, so we put a placeholder for him – see the guy with the mustache below on the stick? And the project manager – she’s the one with the long ears…what was her name?
Relevance to Data Management So, what’s the Data Manager gonna do with all this stuff?? Ensure long term integrity & viability of your data incl. Various levels of processed data / data products, if desired Metadata (MD) you have (in your head or in documentation) Context & Provenance – “audit” trail of sources, processing, products By ingesting, identifying, storing, locating & providing access, if desired, to all of the above Deploy preservation strategies such as: Assigning checksums and/or identifiers to each “item” of a data set Migrating to non-proprietary and/or new formats over time Migrating to new storage media over time Refreshing the data over time
How can I (the scientist) help? Besides me, who’s going to care? Sponsor mandates to archive Specific requirements from sponsor e.g., NASA, NOAA, USGS Data archive requirements & desirements Negotiated & documented in Submission Information Package (OAIS SIP) Future scientists who want to use/re-use your data!! What kind of data should be kept? Formulae for decisionmaking, e.g., NOAA National Climatic Data Center’s Climate Data Record Maturity Matrix; factors include software readiness, existence / state of metadata & (other) documentation, utility of data, validity of product (based on certainty estimates), desire for / restrictions upon public access Documentation of specific disciplinary requirements, e.g., CDRs from Satellite Passive Microwave Sounders Allowing for serendipity & cyclical nature of scientific data Framework Questions:
Example Data Maturity Index
How can I (the scientist) help? Key Framework Question for future scientists who want to use/re-use my data: what will they need to know? (= MD that I probably know best) Documentation including restrictions on access & use Assumptions, hypotheses, algorithms about data (who, what, when, where, why & how) = “provenance & context” Sequence of time, date, technical details of data creation / acquisition and relationships among data units or how to figure out = “preservation MD” Key people, roles & their organizations = “citation MD”
What if I don’t have an existing archive for my data? Some disciplines may not have a data center or archive set up for them – what resources are available? Institutions with experience: governmental agencies (UK Data Centers, UK Digital Curation Center, in US: NASA, NOAA, USGS, NARA, Research Libraries, national & international libraries, archives and data centers Comprehensive information resources about preservation and archiving, e.g., CIESIN’s Geospatial Clearinghouse, at 9gzJWYlQJJ690! US Library of Congress, etc., and Duraspace, at 9gzJWYlQJJ690! US Library of Congress, etc. DataOne – NSF funded consortium, focused on preservation and access to multi-scale, multi-discipline, and multi-national science datahttps:// DataConservancy – an NSF funded consortium focused upon scientific data curation is a means to collect, organize, validate and preserve data,
References and Resources NASA Earth Science Data Preservation Content Specification (Nov 2011), NASA, 2011: Metadata Requirements – Base Reference for NASA Earth Science Data Products, (Nov 2011), Requirements_V1_ _0.pdf Requirements_V1_ _0.pdf Preliminary Principles and Guidelines for Archiving Environmental and Geospatial Data at NOAA: Interim Report, Archiving Strategy for USGS EROS Center & Our Future Direction, March 29, 2010, Example disciplinary requirements: NOAA Workshop on Climate Data Records from Satellite Passive Microwave Sounders Report.pdf Report.pdf NOAA NCDC Climate Data Record ( CDR) Maturity Matrix, ESIP Data Stewardship & Preservation Cluster, wiki found at
Other Relevant Modules The case for data stewardship Managing your data Creating documentation and metadata Working with your archive organization