The Data Management Plan (DMP) and your NSF proposal Raleigh L. Martin - AAAS S&T Policy Fellow, NSF Geosciences AUGUST 2, 2019 NSF Early Career Workshop This presentation does not reflect an official position of AAAS or of the U.S. National Science Foundation.
Introduction: What is a Data Management Plan (DMP)? Sample template for DMP Data collection overview Dataset 1: Data type Formats & standards Access & sharing Policies for reuse Archiving & preservation Dataset 2 Dataset 3 ... Plan for generating, sharing, and archiving project data Required 2 page supplementary document for all NSF proposals (single DMP across collaborative) NSF PAPPG (Proposal & Awards Policies & Procedures Guide) gives general policy, GEO Divisions & Programs define specifics
Presentation Overview Preparing the DMP for your proposal Executing the DMP for your awarded project Additional considerations for DMPs
I. Preparing The DMP for your Proposal Outline your DMP – What datasets to be generated? Where and how to manage them? Plan for each data object – Data standards, access timelines, preservation Integrate DMP with your broader proposal – Feasibility of management, budget, etc.
Preparing The DMP for your Proposal Outline your DMP – What datasets to be generated? Where and how to manage them? Plan for each data object Integrate DMP with your broader proposal
A. Outline YOUR DMP Data collection overview Dataset 1: Raw sensor records Only needed for short-term analysis Store on university server Dataset 3: Processed dataset Valuable for long-term reuse Store on disciplinary data repository Dataset 2: Analysis software Necessary for reproducibility Store in general-purpose repository Data collection overview A. Outline YOUR DMP List out the data to be generated – e.g., raw data, processed data, software, physical samples, model output, curricula, ... Determine the value of each data object – What is required to support publications? What is valuable for long-term reuse? What can be discarded? Identify resources for managing data – e.g., university server for short-term, disciplinary data repository for long-term preservation of specialized data, general purpose repository for other data
What “data” Should be shared? “Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants.” Policy does NOT require that all data be shared forever Exceptions/accommodations for “privileged or confidential information” Specific division data policies and/or program solicitations provide further details NSF 19-1 Proposal & Award Policies & Procedures Guide (Jan 2019), Chapter XI.D.4.b
Where can the data go? “Domain” repository (e.g., Arctic Data Center, Biological and Chemical Oceanography Data Management Office, Hydroshare) – This is ideal if your data fit within the discipline of the repository, and it may be required for certain programs. Institutional repository (e.g., university, museum) – This may be required by your university of research institution. General repository (e.g., Figshare, Zenodo, Dryad) – A good backup option. Journal article: Tables and figures – good for very high-level information, but not machine readable Article supplements – not advised: very difficult to access, often behind paywall
NSF/GEO Division-specific requirements Division / Policy # Maximum allowable period for data release “Data” definition Data repositories AGS: Atmospheric and Geospace Sciences (2018) 2 years after project completion Primary data, samples, physical collections and other supporting materials None specified EAR: Earth Sciences (2018) 2 years after data collection Observational datasets, derived data products, software, physical collections EAR-wide suggested list OCE: Ocean Sciences (NSF 17-037) Metadata files, full data sets, derived data products, software, physical collections OCE-wide preferred list Program-specific guidelines OPP: Office of Polar Programs (NSF 16-055) Default - Earlier of 2 years or project end AON - Immediately ASSP– 5 years Not specified Arctic – Arctic Data Center (metadata), AON, ASSP Antarctic – USAP Data Coordination Center Directorate for Geosciences—Data Policies (https://www.nsf.gov/geo/geo-data-policies/) Note that many GEO programs specify additional data requirements through their solicitations.
Preparing The DMP for your Proposal Outline your DMP Plan for each data object – Data standards, access timelines, preservation Integrate with your broader proposal
B. Plan for each data OBJECT in your DMP NSF PAPPG (Section II.C.j) – suggested elements for DMP Type of data to be produced (e.g., raw data, analyzed data, model outputs, software, physical samples, curricula) Data and metadata standards to be used (i.e., file formats, disciplinary standards, coding language, etc.) Policy for access and sharing (i.e., repository selection, protocol for access and citation, timeline of availability) Policy for reuse and distribution (i.e., licenses, reuse limitations) Plan for archiving and preservation (i.e., forever or a finite period?)
Example: Processed Dataset Dataset 1: Raw sensor records Dataset 2: Analysis software Data collection overview Wind Sediment flux Project: Integrating sensor data to understand wind-driven sediment transport Dataset 3: Calibrated sediment flux time series Type of data: Processed sediment flux time series data, derived from raw sensor measurements Data standard: Comma-delimited text file (.csv) Data access: Freely available on Zenodo.org upon publication of accompanying article or within 2 years of collection (whichever is sooner), to be assigned DOI for citation Data reuse: CC-BY license (anyone may reuse with attribution) Data preservation: Long-term (>10 years), responsibility of Zenodo.org
Preparing The DMP for your Proposal Outline your DMP Plan for each data object Integrate DMP with your broader proposal – Feasibility of project management, budget, etc.
C. Integrate DMP with your broader proposal Results of Prior NSF Support – If you have previously been supported on an NSF award, products of your past data sharing should be listed in this section of the Project Description (see PAPPG II.C.2.iii.(e)) Biographical Sketch (“Biosketch”) – “Products” listed on your biosketch can include data sets and other digital products (see PAPPG II.C.2.f.(c)) Budget Justification – This should indicate appropriate allocation of time and resources for data management. NOTE: Budget may include data deposit fees (though most NSF-supported repositories do not charge such fees) Letters of Collaboration – Not usually needed, but could be helpful if DMP indicates use of data facility resources beyond typical capacity
Presentation Overview Preparing the DMP for your proposal Executing the DMP for your awarded project Additional considerations for DMPs
II. Executing the DMP for your awarded project Initiating your project – Coordinate with your project team and stakeholders Award reporting – Document data progress and final data publication Things to consider after award completion – delayed data release, future proposals
A. Initiating Data management for your project https://arcticdata.io/submit/ Coordinate with your project team – Make sure that project roles for data management are clearly allocated to fulfill the objectives of your DMP Connect to target data facilities – Early on, discuss steps for submitting datasets, including fulfillment of data standards. Sharing your DMP may help. Get help from your institution – Your library likely offers dedicated research data services to support you on your project.
B. Award Reporting for your DMP Where to report – Report on data activities in the “Products-Websites” section of your annual and final project reports Annual Project Report – Provide a status update on progress toward goals of your DMP, including datasets that have been made publicly available within the last year. If your plans have changed, please explain. Final Project Report – List datasets that have been made publicly available. When listing data, provide a citation with a links back to the repository (e.g., via DOI); this will streamline compliance checking by Program Director
C. Things to consider after award completion Certain Division data policies allow a finite period of time between final data collection and subsequent data publication, which may extend beyond the award completion date. In such cases: Final Project Report – State plans for future data sharing. It may be possible to pre-populate repository with metadata entry before full data release. When dataset is published – Inform your managing Program Director In subsequent NSF proposals – Refer to published data products in “Results of Prior NSF Support” and in your Biosketch
Presentation Overview Preparing the DMP for your proposal Executing the DMP for your awarded project Additional considerations for DMPs
III. Additional considerations for DMPs https://dmptool.org/ Think of the audience – Be explicit and keep it succinct Keep it organized – I suggest describing each dataset to be generated. DMP template tools can help (e.g., DMPtool, ezDMP) Get help – Data management experts at your institution’s library or at relevant data facilities can be great resources
Questions? NOTE: All statements here are my own and do not necessarily reflect official NSF policy. Talk to your Program Director and examine the PAPPG (Proposal & Awards Policies & Procedures Guide), Division data policies, and program solicitations for definitive statements on NSF policy.