Data Management 101 for Earth Scientists Data Management Plans

Slides:



Advertisements
Similar presentations
Depositing Data for Archiving Libby Bishop ESDS Qualidata, University of Essex Changing Families, Changing Food Meeting University of Sheffield 15 March.
Advertisements

Peter Griffith and Megan McGroddy 4 th NACP All Investigators Meeting February 3, 2013 Expectations and Opportunities for NACP Investigators to Share and.
OVERVIEW & LIBRARY SUPPORT FOR DATA MANAGEMENT/SHARING Jim Van Loon, MSME/MLIS Science Librarian.
OPEN ACCESS PUBLICATION ISSUES FOR NSF OPP Advisory Committee May 30, /24/111 |
Data Management Plans PAUL H. BERN, PH.D. APRIL 3, 2014.
Data Management What? Why? How?. 2 What do we mean by … Managing your Research (aka Data) … Ensuring physical integrity of files and helping to preserve.
NSF Data Management Plan Requirements Alex Kanous
Open Exeter Project Team
INTRODUCTION TO RESEARCH DATA MANAGEMENT Robin Desmeules Janice Kung J W Scott Health Sciences Library University of Alberta Libraries.
Elements of a Data Management Plan Alison Boyer Environmental Sciences Division Oak Ridge National Laboratory.
Elements of a Data Management Plan
GRAD 521, Research Data Management Winter 2014 – Lecture 2 Amanda L. Whitmire, Asst. Professor.
DMPTool Expert Resources and Support for Data Management Planning Tao Zhang Michael Witt Purdue University Libraries 1.
Africa RISING West Africa Mega Site M&E Activities Summary Africa RISING Project Steering Committee Meeting February 4, 2014; Bamako, Mali Beliyou Haile,
U.S. Department of the Interior U.S. Geological Survey Tutorials on Data Management Lesson 2: Data Management Planning CC image by Joe Hall on Flickr.
Data Management Plans Bill Michener University Libraries and Biology Dept. University of New Mexico.
U.S. Department of the Interior U.S. Geological Survey Planning for Data Management Creating data management plans for your project.
ACCESS for VALIDITY ACCESS for INNOVATION. Starting January 2011 for NEW proposals Not voluntary – “integral part” of proposal and FastLane Required for.
Elements of a Data Management Plan Bill Michener University Libraries University of New Mexico Data Management Practices for.
LTER Information Management Training Materials LTER Information Managers Committee Data Management Planning (adapted from DataONE training materials)
UVa Library Research Data Services
Data Management Planning Lesson 2: Data Management Planning CC image by Joe Hall on Flickr.
CC&E Best Data Management Practices, April 19, 2015 Please take the Workshop Survey 1.
Data Archiving and Networked Services DANS is an institute of KNAW en NWO Data Archiving and Networked Services Introduction to Data Management Planning.
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
Data Management 101 for Earth Scientists Data Management Plans Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.
Agency Requirements: NSF Data Management Plans Ruth Duerr National Snow and Ice Data Center Version 1.0 October 2012 Section: The Case for Data Stewardship.
University Libraries/ITS Content Stewardship Program Mairéad Martin, Sr. Director, ITS Digital Library Technologies Presentation to FACAC March 1, 2011.
Changing Implementation of NSF Data Policy Dr. Jennifer M. Schopf, NSF OD/OIA/EPSCoR On behalf of the NSF Data Working Group March 17, 2011 CASC Spring.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Series 2013 Data Management at the National Climate Change and Wildlife Science Center.
Choosing Between Data Sharing Repositories for Engineering Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.
Data Management Plans Module 2. Data Management Plans Data Management Plans  What is a data management plan (DMP)?  Why prepare a DMP?  Components.
Data Management & the Library. FACT #1 Research is increasingly digital and produces digital data.
Elements of a Data Management Plan Bill Michener University of New Mexico
Primer on Data Management Data Management Plans Robert Cook Environmental Sciences Division Oak Ridge National Laboratory American Meteorological Society.
Elements of a Data Management Plan Ruth Duerr National Snow and Ice Data Center Version 1.0 February 2013 Data Management Plans Copyright 2013 Ruth Duerr.
DOE Data Management Plan Requirements
Data Management Lesley A. Brown Director of Proposal Development.
11 Researcher practice in data management Margaret Henty.
Federal Funder open data and literature requirements January 15, 2016 RAWG Meeting.
Copyright and Data Matthew Mayernik National Center for Atmospheric Research Section: Responsible Data Use Version 1.0 October 2012 Copyright 2012 Matthew.
Data Management Plans PAUL H. BERN, PH.D. APRIL 3, 2014.
SEDAC Long-Term Archive Development Robert R. Downs Socioeconomic Data and Applications Center Center for International Earth Science Information Network.
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
Digital Stewardship Lee Dotson Digital Initiatives Librarian University of Central Florida John C. Hitt Library Presentation available at
Using the DMPTool for data management plans Kathleen Fear February 27, 2014.
Training Course on Data Management for Information Professionals and In-Depth Digitization Practicum September 2011, Oostende, Belgium Concepts.
Writing a successful data management plan Kathleen Fear October 17, 2013.
Why ANDS? 16 May, 2011 Mathew Wyatt. Trends towards open data  Data science  Gov 2.0  Research 2.0  Open Science  Freedom of Information.
Research Data Management in the Humanities: an Introduction to the Basics Open Exeter Project Team.
ICPSR Data Fair November 8, 2010 Katherine McNeill, MIT Libraries
NRF Open Access Statement
Jeff Moon Data Librarian &
Lesson Topics What is a data management plan (DMP)? Why prepare a DMP?
Open Exeter Project Team
Data Management What? Why? How?.
Data Management 101 for Earth Scientists Data Management Plans
SowiDataNet - A User-Driven Repository for Data Sharing and Centralizing Research Data from the Social and Economic Sciences in Germany Monika Linne, 30.
W. Christopher Lenhardt
CFI John R Evans Leaders Fund Digital Data Management
General Finnish DMP Guidance
Getting Started with Data Management
Horizon 2020: Open data pilots and lessons learnt
e-Thesis Submission: What You Need to Know About Going Global
Introduction to Research Data Management
The Case for Data Management: Agency Requirements
Research Data Management
A Case Study for Synergistically Implementing the Management of Open Data Robert R. Downs NASA Socioeconomic Data and Applications.
Getting Started with Data Management & DMPTool
Fundamental Science Practices (FSP) of the U.S. Geological Survey
Presentation transcript:

Data Management 101 for Earth Scientists Data Management Plans Modules should be 3-7 minutes long Robert Cook Environmental Sciences Division Oak Ridge National Laboratory

Data Management Planning Topics What is a data management plan (DMP)? Why prepare a DMP? Components of a DMP Example of a NSF DMP Resources 2

“A goal without a plan is just a wish. ” “A goal without a plan is just a wish.” Antoine de Saint-Exupery (1900 – 1944) What is a DMP? A document that describes what you will do with your data during and after your research A DMP is a formal document that outlines what you will do with your data during and after your research. This ensures that your data will be safe now and in the future. 3

Value of a Data Management Plan Data Management 101 for Earth Scientists Value of a Data Management Plan National funding agencies have data sharing polices and requirements Scientific journals (Nature, Science, and PLoS) have sharing requirements. A shared, common data set may help researchers collaborate and accelerate discoveries (NY Times, 2010). For the researcher: helps organize data cultivate quality and efficiency help with preserving and sharing data 4 Thanks to Jeffery Loo, Chemical Informatics Librarian, UCB

From Grant Proposal Guidelines: NSF DMP Requirements From Grant Proposal Guidelines: Plans for data management and sharing of the products of research. Proposals must include a supplementary document of no more than two pages labeled “Data Management Plan”. This supplement should describe how the proposal will conform to NSF policy on the dissemination and sharing of research results (in AAG), and may include: the types of data, samples, physical collections, software, curriculum materials, and other materials to be produced in the course of the project  the standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies)  policies for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements  policies and provisions for re-use, re-distribution, and the production of derivatives  plans for archiving data, samples, and other research products, and for preservation of access to them As of January 2011 NSF is now requiring 2 page data management plan to be included in all submitted proposals. This is in addition to the 15 pages for the proposal. AAG = award and administration guide; see next slide 5

1. Information About Data & Data Format 1.1 Description of data to be produced Experimental, Observational, Raw or derived, Physical collections, Models, Images, etc. 1.2 How data will be acquired? When? Where? Methods? 1.3 How data will be processed? Software used, Algorithms, Workflows 1.4 File formats csv, tab-delimited, naming conventions 1.5 Quality assurance & quality control 1.6 Existing data If existing data are used, what are its origins? Will your data be combined with existing data? What is the relationship between your data and existing data? 1.7 How data will be managed in short-term Version control, Backing up, Security & protection, Who will be responsible? There are many different types of data that may be produced by a research study. Consider this list when describing data that will be produced by the project 6

2. Metadata Content & Format Metadata is the documentation describing all aspects of the data (e.g., who, why, what, when, where) 2.1 What metadata are needed Any details that make data understandable and usable 2.2 How metadata will be created and/or captured Lab notebooks? GPS units? Auto-saved on instrument? Manually entered? 2.3 What format will be used for the metadata Standards for community (EML, ISO 19115, etc.) Justification for format chosen Metadata Content and format Metadata provides important information about the dataset. WHO created the data? WHAT is the content of the data? WHEN were the data created? WHERE is it geographically? HOW were the data developed? WHY were the data developed? -Metadata records and captures critical information about a dataset --Metadata includes descriptions of temporal and spatial details, instruments, etc. -Metadata is the documentation and reporting of data   7

3. Policies for Access, Sharing, & Reuse 3.1 Obligations for sharing Funding agency, institution 3.2 Details of data sharing How long? When? How access can be gained? 3.3 Ethical/privacy issues with data sharing 3.4 Intellectual property & copyright issues Institutional policies Funding agency policies Embargos for political/commercial reasons 3.5 Intended future uses/users for data 3.6 Citation How should data be cited when used? Persistent citation? Obligations for sharing: Describe any requirements you might have for sharing as a result of your funding, your institution, your professional society, etc. Consider any legal obligations for sharing data. Check with your institution. They often have information available for researchers regarding sharing obligations within and beyond the institution. Details of sharing: How long will the data be available? When will that access be granted? Who will have access to those data? How will they gain access? Will the data collector/creator/PI have exclusive rights to data for a certain period of time? Ethical/privacy: - Aare data about humans? If so, address human subjects research protocols. Other example of ethical/privacy concerns include locations of sensitive habitats or organisms. Informed consent issues How privacy will be protected along with participant confidentiality arrangements. 8

4. Long-term Storage & Data Management 4.1 What data will be preserved 4.2 Where will it be preserved Most appropriate archive for data Community standards 4.3 Data transformations/formats needed Consider archive policies 4.4 Who will be responsible Contact person for archive Not all data should be preserved Focus on raw data Data derivatives can also be preserved but should be linked to raw data with workflows and/or in metadata. Where will it be preserved? What archives are commonly used in your community? Archives last longer than lab/personal websites. Check to see if institution has repository available to researchers. Are there any organizations or societies that might house your data? Include succession plans if the expected place for archiving goes out of existence. Data transformations/formats. Contact the data repository you will use early in the project to be sure you have data in correct format. This will save time later. Be sure there is a person assigned responsibility for maintaining contact information for the archive. This is especially important if restrictions on data use require that they contact the data collectors before reusing data. 9

Example Data Management Plan Mauna Loa CO2 Record Example, based on the work of CD Keeling & Colleagues Study the controls on the concentration of atmospheric CO2 high precision and accuracy measurements. http://www.esrl.noaa.gov/gmd/obop/mlo/pictures/sunsetmaunaloa1.jpg http://www.esrl.noaa.gov/gmd/obop/mlo/pictures/scripts.jpg Photos courtesy of NOAA GMD https://www.dataone.org/sites/all/documents/DMP_MaunaLoa_Formatted.pdf 10

Collected continuously at five towers Mauna Loa Example Data Management Plan 1. Information About Data & Data Format Collected continuously at five towers a central tower and four towers located at compass quadrants. Raw data files contain continuously measured CO2 concentrations, calibration standards, references standards, daily check standards, and blanks. Site conditions will also be noted and retained. Final data product will consist of 5-minute, 15-minute, hourly, daily, and monthly average atmospheric concentration of CO2, in mole fraction in water-vapor-free air measured Data in comma-separated-values in plain ASCII format, http://www.esrl.noaa.gov/gmd/obop/mlo/pictures/sunsetmaunaloa1.jpg http://www.esrl.noaa.gov/gmd/obop/mlo/pictures/scripts.jpg http://www.esrl.noaa.gov/gmd/obop/mlo/pictures/sunsensor.jpg Photos courtesy of NOAA GMD 11

Mauna Loa Example Data Management Plan 2. Metadata Content & Format contextual information about the data in a text based document ISO 19115 standard metadata in an xml file. Metadata formats provide a full explanation of the data (text format) and ensure compatibility with international standards (xml format). 12

No period of exclusive use by the data collectors. Mauna Loa Example Data Management Plan 3. Policies for Access, Sharing, & Reuse The final data product released when the recalibration of standard gasses has been completed and the data have been prepared (~six months). No period of exclusive use by the data collectors. Users can access documentation and final monthly CO2 data files via the Scripps CO2 Program website (http://scrippsco2.ucsd.edu ). Raw data will be maintained and made available on request at no charge 13

Data product citation, including DOI Mauna Loa Example Data Management Plan 4. Long-term Storage & Data Management High quality final data product will be available for use by the research and policy communities in perpetuity. Raw supporting data will be available for use by researchers to confirm the quality of the Mauna Loa Record. Long-term stewardship and curation at the Carbon Dioxide Information and Analysis Center (CDIAC), Oak Ridge National Laboratory. Standardized metadata record will be added to the metadata record database at CDIAC, Data product citation, including DOI Keeling, CD, 2004. Atmospheric CO2 Concentrations - Mauna Loa Observatory, Hawaii, 1958-2003. Numeric Data Package. Available on-line [http://cdiac.ornl.gov] Carbon Dioxide Information Analysis Center (CDIAC), Oak Ridge National Laboratory, Oak Ridge, TN, USA. doi: 10.3334/CDIAC/atg.ndp001 Keeling, RF, SC Piper, AF Bollenbacher and JS Walker, 2009. Atmospheric CO2 records from sites in the SIO air sampling network. In Trends: A Compendium of Data on Global Change. Carbon Dioxide Information Analysis Center, Oak Ridge National Laboratory, Oak Ridge,TN, USA. doi: 10.3334/CDIAC/atg.035 including DOI, that indicates the version of the Mauna Loa Data Product and how to obtain a copy of that product. Data available at CDIAC so that interested users can discover the Mauna Loa CO2 record along with other related Earth science data. 14

Data Management Planning Tool As one example, a consortium that is developing a Data Management Planning Online Tool. The tool “walks” scientists through the process of developing a concise, but comprehensive data management plan that could enable good stewardship of data and meet requirements of sponsors and home institutions. Contains a number of resource documents including other example data management plans and other information on NSF’s requirements for a DMP 15

References and Resources For a detailed questionnaire with most of the issues you may need to address in an NSF data management plan see: http://dmp.data.jhu.edu/sites/default/files/Questionnaire.doc MIT Libraries have a fairly comprehensive site about data management and publishing in general at http://libraries.mit.edu/guides/subjects/data- management/index.html The Digital Curation Center in the UK has a variety of information on data management plans at http://www.dcc.ac.uk/resources/data-management-plans 16