Copyright 2012 Matthew Mayernik.

Slides:



Advertisements
Similar presentations
Providing access to your data: Determining your audience Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Advertisements

Agency Requirements: NASA Data Management Plans Ronald Weaver National Snow and Ice Data Center W. Christopher Lenhardt Renaissance Computing Institute.
DATA PRESERVATION IN ALICE FEDERICO CARMINATI. MOTIVATION ALICE is a 150 M CHF investment by a large scientific community The ALICE data is unique and.
By: Farzad Dadgari Soil and Environmental Specialist SWHISA.
Data Formats: Using Self-describing Data Formats Curt Tilmes NASA Version 1.0 February 2013 Section: Local Data Management Copyright 2013 Curt Tilmes.
The Case for Data Stewardship: Preserving the Scientific Record Matthew Mayernik National Center for Atmospheric Research Version 2.0 [Review Date]
Preserving the Scientific Record: Establishing Relationships with Archives Matthew Mayernik National Center for Atmospheric Research Version 1.0 Review.
Social Science Data and ETDs: Issues and Challenges Joan Cheverie Georgetown University Myron Gutmann ICPSR – University of Michigan Austin McLean ProQuest.
Developing a result-oriented Operational Plan Training
Providing Access to Your Data: Tracking Data Usage Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Elements of a Data Management Plan Ruth Duerr National Snow and Ice Data Center Version 1.0 Review Date Section: Data Management Plans.
Advertising your data: Using data portals and metadata registries Nancy Hoebelheinrich Version 1.0 September 2012 Section: Local Data Management Copyright.
Citing Data Sets in the Literature: ORNL DAAC Practices Robert Cook, Suresh SanthanaVannan, and Daine Wright Environmental Sciences Division Oak Ridge.
Elements of a Data Management Plan Bill Michener University Libraries University of New Mexico Data Management Practices for.
Creating Documentation and Metadata: Metadata for Discovery Lola Olsen 1, Tyler Stevens 2, 1 National Aeronautics and Space Administration (NASA) 2 Wyle.
Preserving the Scientific Record: Case Study 1 – National Snow & Ice Data Center (NSIDC) Glacier Photos Matthew Mayernik National Center for Atmospheric.
Preserving the Scientific Record: Preserving a Record of Environmental Change Matthew Mayernik National Center for Atmospheric Research Version 1.0 [Review.
Providing Access to Your Data: Access Mechanisms Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Preserving the Scientific Record: Case Study 2 – Arctic Temperature Variability Matthew Mayernik National Center for Atmospheric Research Version 1.0 Review.
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
Managing Your Data: Backing Up Your Data Robert Cook Oak Ridge National Laboratory Section: Local Data Management Version 1.0 October 2012.
NOAA Administrative Order : Management of Environmental and Geospatial Data and Information Jeff Arnfield NOAA’s National Climatic Data Center Version.
Providing Access to Your Data Matthew Mayernik National Center for Atmospheric Research Copyright 2012 Matthew Mayernik. Version 1.0 October 2012 Section:
Preservation Strategies: Framing The Approach Nancy Hoebelheinrich Knowledge Motifs LLC Data Management Workshop American Geophysical.
Advertising your data Nancy Hoebelheinrich Version 1.0 September 2012 Section: Local Data Management Copyright 2012 Nancy J. Hoebelheinrich.
Responsible Data Use: Data Restrictions Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International Earth Science.
Creating Documentation and Metadata: Introduction to Metadata and Metadata Standards Lynn Yarmey National Snow and Ice Data Center Version 1.0 February.
Managing Your Data: Assign Descriptive File Names Robert Cook Oak Ridge National Laboratory Section: Local Data Management Version 1.0 October 2012.
Preserving the Scientific Record: Case Study 2 – Arctic Temperature Variability Data Matthew Mayernik National Center for Atmospheric Research Version.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
Advertising your data: Agency requirements for submitting metadata Nancy J. Hoebelheinrich Version 1.0 September 2012 Section: Local Data Management Copyright.
Why Create a Data Management Plan? Ruth Duerr National Snow and Ice Data Center Version 1.0 February 2013 Data Management Plans Copyright 2013 Ruth Duerr.
Responsible Data Use: Copyright and Data Matthew Mayernik National Center for Atmospheric Research Version 1.0 Review Date.
Dataset citation Clickable link to Dataset in the archive Sarah Callaghan (NCAS-BADC) and the NERC Data Citation and Publication team
Elements of a Data Management Plan Ruth Duerr National Snow and Ice Data Center Version 1.0 February 2013 Data Management Plans Copyright 2013 Ruth Duerr.
The Case for Data Stewardship: Enhancing Your Reputation Matthew Mayernik National Center for Atmospheric Research Version 1.0 September 2012 Section:
Creating Documentation and Metadata: Creating a Citation for Your Data Robert Cook Oak Ridge National Laboratory Section: Local Data Management Copyright.
Copyright and Data Matthew Mayernik National Center for Atmospheric Research Section: Responsible Data Use Version 1.0 October 2012 Copyright 2012 Matthew.
Providing access to your data: Determining your audience Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Providing access to your data: Handling sensitive data Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
Helpful hints for planning your Wednesday investigation.
The Case for Data Stewardship: Preserving the Scientific Record Matthew Mayernik National Center for Atmospheric Research Section: The Case for Data Stewardship.
Training Course on Data Management for Information Professionals and In-Depth Digitization Practicum September 2011, Oostende, Belgium Concepts.
Developing a Monitoring & Evaluation Plan MEASURE Evaluation.
Data Citation and You: The new AGU guidelines for data citation
Advertising your data: Using data portals and metadata registries
Data Formats: Choosing and Adopting Community Accepted Standards
Copyright 2013 Matthew Mayernik.
The Case for Data Management: Agency Requirements
Volunteer Project Leader Training & pairing Meeting
Persistent Identifiers Implementation in EOSDIS
Copyright 2012 Lola Olsen & Tyler Stevens.
HIV Drug Resistance Training
PREPARATION FOR GMP INSPECTION
Prepared by Rand E Winters, Jr. ASR Senior Auditor October 2014
Research4Life Programmes: Similarities and Differences!
Institutional role in supporting open access, open science, open data
Agency Requirements: NOAA Administrative Order Management of environmental and geospatial data and information This training module is part of.
Building Disaster-Resilient Places
Introduction to Research Data Management
Training Course on Data Management for Information Professionals and In-Depth Digitization Practicum September 2011, Oostende, Belgium Concepts.
The Case for Data Management: Agency Requirements
Partnerships and networks
Performance Management
Dissertation Course – Day 4
Gulf of Maine State of the Environment Reporting
Component 11 Unit 7: Building Order Sets
Documenting Personal and Community Stories Title slide Perry Collins
Fundamental Science Practices (FSP) of the U.S. Geological Survey
Presentation transcript:

Copyright 2012 Matthew Mayernik. Section: The Case for Data Stewardship Preserving the Scientific Record: Establishing Relationships with Archives Introduction: This is the Federation of Earth Science Information Partners Data Management for Scientists Short Course, Section: The Case for Data Stewardship - Preserving the Scientific Record, Module: Establishing relationships with archives.   Slide 1: Title: The Case for Data Stewardship - Preserving the Scientific Record: Establishing relationships with archives. This training module is part of the Federation of Earth Science Information Partners (or ESIP Federation's) Data Management for Scientists Short Course. The subject of this module is “Establishing Relationships with Archives". The module was authored by Matthew Mayernik from the National Center for Atmospheric Research. Besides the ESIP Federation, sponsors of this Data Management for Scientists Short Course are the Data Conservancy and the United States National Oceanic and Atmospheric Administration (NOAA). Version 1.0 October 2012 Matthew Mayernik National Center for Atmospheric Research Copyright 2012 Matthew Mayernik.

Image: http://www.flickr.com/photos/dullhunk/5436262905/ , Overview About data archives Data must be formally archived in order to be understandable and usable over long time periods Many data archives exist for particular kinds of data, or for particular institutions Building relationships with archives: What data do you have? Where will your data go? How to get set up with an archive? Slide 2: Overview In this presentation we are going to talk about archives, what they do and how they work. We’ll talk about how it’s important for data to be archived formally within archives in order to be understandable and useable over a long time period. There are many data archives for Earth Science data, so there should be one or more that are particularly appropriate for your data.   We will discuss a few ways that building relationships with archives happen, including discussions about what data you have, where your data will go and how you should go about getting set up with a data archive. Image: http://www.flickr.com/photos/dullhunk/5436262905/ , http://www.sciencemag.org/content/331/6018.cover-expansion

About Data Archives Archives view data resources as having an extended temporal horizon Past: Recovering legacy datasets Current: Attending to ongoing data collection Future: Designing for flexibility, compatibility, and sustainability of data Archives have extensive data description requirements To enable routine and unforeseen data uses To enable maintenance and preservation of data over time Archives continuously update technology Small iterative updates more common than large-scale replacements Archives are highly collaborative Work with scientists to ingest and document data Work with other archives to share capacity and knowledge Slide 3: About Data Archives We’d like to make some points about data archives that are important to keep in mind when thinking about what and why they do what they do.   Data archives are institutions that collect and curate data, and make them available over time. The temporal aspect of data archives’ responsibility for data is very important since archives look at data with a perspective that is extended from the present to the long term. - Data archives often collect past historical data and may elect to reconstruct or recover those data as part of their management and maintenance activities. - Archives work with investigators in the present to collect and manage data as they’re coming in. - Finally, archives have a very important role to play for the future, in that they want and need their systems and their data to be interoperable with future systems. As a result, the design of archives, their technologies and their data systems need to be flexible, compatible, and sustainable so they can provide a long term environment for their data. One way in which archives provide a long term environment for their data is by having extensive data description requirements. Data descriptions are important to enable routine use, but also to allow unforeseen data uses to happen. Documentation that is comprehensive can enable people to make use of data in ways that may not have been envisioned when the data were initially collected. Documentation is also very important to enable maintenance and preservation of data over time by indicating in what formats data are represented, what are the variables, how particular data sets have been processed, etc. As previously mentioned, one of the goals of an archive is to have interoperable and sustainable data systems over time. To achieve this goal, archives more commonly update their technological systems in small, iterative fashion to enable continuity rather than making large scale replacements of particular systems. A final, important note about archives is the built-in necessity for collaboration. Archive staff work with scientists to bring data in and to make sure that the data are well-documented. They work with staff of other archives to share knowledge and capacity, tips and best practices, and sometimes even share the data themselves with each other.

Building Relationships With Archives What data do you have? Where will the data go? How to get set up with an archive? Slide 4: Building relationships with Archives When it’s time to do something about managing and preserving your data, how do you go about building relationships with archives? Key questions to ask yourself are: what data do I have? Where will it go? How do I get set up with an archive? We’ll discuss each of these questions in more detail in the next few slides. Image: http://www.flickr.com/photos/kevinkrejci/6259499293/

What data do you have? What type of data will be produced? Ex. numerical data, image data, text sequences or modeling data? How much data, and at what growth rate? Are you gathering data by hand or using instrumentation that is able to capture a lot of data at once? Will there be more data as time goes on? If so, you will need to plan for growth. Will the data change frequently? Keeping track of rapidly changing datasets can be a challenge, so it's imperative you begin with a plan that will carry you through the data management process. From MIT Libraries, “Evaluate Your Data Needs,” http://libraries.mit.edu/guides/subjects/data-management/evaluate.html Slide 5: What data do you have? The first question to ask is what data do you have? This question is an important one for setting up a data management plan, that is, analyzing what kind of data you’re going to collect. For example, are you going to be collecting numerical data, image data, text sequences, or modeling data? Will you be creating or collecting any other kind of data?   How much data are you going to collect or have you collected? And is that collection growing? Are you collecting your data by hand, or are you using instrumentation that will allow you to capture a lot of data at the same time? The answer to this question will be important for documentation since the use of instrumentation may require more documentation to be kept, especially if you are using complex or unique tools. If your data will grow in volume, are you planning for the changes that increased data volume will bring? Growth in data volume is one aspect of change, but there may be other ways data sets can change. Change could also be made in instrument calibration, by reprocessing the data, or, by filtering or sub-setting the data, to name a few. Keeping track of how those data sets change can be difficult, but it’s very important to do and to know before you submit your data to an archive.

Where will your data go? Who is your audience for the data? How will they use them? This will tell you how to structure the data and where to distribute the data. How long should the data be retained? Ex. 3-5 years, 10-20 years, permanently? Not all data needs to be retained indefinitely. Look at archives’ data policies for retention options and requirements. From MIT Libraries, “Evaluate Your Data Needs,” http://libraries.mit.edu/guides/subjects/data-management/evaluate.html Slide 6: Where will your data go? The next category of questions is where will your data go? Who is the audience for your data? Do you want to reach out to other researchers, to students, to policy makers? What is your intended audience, and how do you expect that audience to use your data? Answers to these questions will impact how you will structure your data and where you will want to distribute it. Different audiences have different expectations for where to find data and for how to process data that might be more or less technology intensive, for instance.   A second component to this question is: how long you expect your data to be retained in the archive? For many data sets, the goal is to maintain them permanently, but realistically not all data sets need to be retained indefinitely. Do you expect your data to be useful for three to five years; for 10 - 20 years, or do you expect it to have an essentially indefinite use? Data policies differ for different data archives in terms of detailing retention options, service level agreements and requirements.

Where will your data go, cont.? Who controls the data (PI, student, lab, institution, funder)? Do any open data mandates exist? Before you spend a lot of time figuring out where to submit your data, you need to know if you have the authority to submit the data at all, and If your funder has any specific open data mandates or requirements that must be met. Slide 7: Where will your data go, cont.? A third key aspect to think about is the issue of control or ownership of your data, and whether there are any open data mandates for your data. Establishment of these facts is often not specifically discussed or well-understood. Ownership or control could belong to any one of a number of different people or organizations. A principal investigator might be the owner or controller of a data set. For a student project, the student could be the official owner. And again it could be the lab, the sponsoring institution or the funding organization. An archive will need to know who has the authorization to officially release data. The archive will also need to know whether there are any mandates or specific requirements from your funder or any other involved parties to make your data open. Understanding who has the responsibility to authorize release of your data and whether it needs to be open will be important factors to know before you submit your data to an archive.

How to get set up with an archive Look at an archive’s data submission policies/procedures. Example procedures are: Incorporated Research Institutions for Seismology: http://www.iris.edu/data/#submit_data National Snow and Ice Data Center: http://nsidc.org/data/submit.html Oak Ridge National Lab DAAC for Biogeochemical Dynamics: http://daac.ornl.gov/PI/pi_info.shtml National Space Science Data Center: http://nssdc.gsfc.nasa.gov/nssdc/submitting_data.html NASA Planetary Data System: http://pds.jpl.nasa.gov/tools/archiving.shtml Contact the data archive. Most archives will work with you directly to submit data, as long as the data are within their scope. Slide 8: How to get set up with an archive How do you actually get set up with an archive? Policies will differ here too from archive to archive, so it is important to look at data archive submission policies and procedures. Included on this slide are a number of links to important data archives in the Geosciences and to their procedures for how you submit data. Each of the archives listed go into detail about their submission process and how it works.   So, we urge you to contact likely data archives. Most data archives are very willing to work with you to get your data submitted, as long as the data fits within their scope. As an example of scope, an archive may be associated with an incorporated research institution for a seismology data center. If you have biological data, you will probably not want to submit them to that archive as they may not fully understand or know how to deal with your data. Look at the scope of the archive, look at the institution’s policies & procedures, and contact the data archives if you have questions.

References Hook, L., S.K. Santhana Vannan, T. Beaty, B. Wilson, and B. Cook. 2010. Best Practices for Preparing Environmental Data Sets to Share and Archive. Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, Tennessee. http://dx.doi.org/10.3334/ORNLDAAC/BestPractices-2010 Karasti, H., K. Baker, and E. Halkola. 2006. “Enriching the notion of data curation in escience: data managing and information infrastructuring in the long term ecological research (LTER) network.” Computer Supported Cooperative Work 15(4): 321-358. http://dx.doi.org/10.1007/s10606-006- 9023-2 MIT Libraries. 2011. Data Management and Publishing. http://libraries.mit.edu/guides/subjects/data-management/index.html Slide 9: References Within this module, we've made reference to a number of published information sources that we think you may want to review when you want more in-depth information. The most relevant references are listed here.

Resources MIT Libraries - “Evaluate Your Data Needs” http://libraries.mit.edu/guides/subjects/data- management/evaluate.html Data submission policies/procedures: Incorporated Research Institutions for Seismology: http://www.iris.edu/data/#submit_data National Snow and Ice Data Center: http://nsidc.org/data/submit.html Oak Ridge National Lab DAAC for Biogeochemical Dynamics: http://daac.ornl.gov/PI/pi_info.shtml National Space Science Data Center: http://nssdc.gsfc.nasa.gov/nssdc/submitting_data.html NASA Planetary Data System: http://pds.jpl.nasa.gov/tools/archiving.shtml Slide 10: Resources On this slide, you will find a linked listing of some additional resources you might find helpful should you need more information about some of the areas we’ve covered briefly in this module.

Other Relevant Modules Local Data Management: Managing Your Data Working with Your Archive Preservation strategies: Options for archiving your data What data goes into a long-term archive? The Case for Data Stewardship - Preserving the Scientific Record: Preserving a Record of Environmental Change Case Study 1 - NSIDC Glacier Photos Case Study 2 - Arctic Temperature Variability Data Slide 11: Other Relevant Modules. The modules of the ESIP Data Management for Scientists Short Course have been designed to complement and supplement each other. In light of this plan, we think you might find the following, related modules relevant to you as you gain a better understanding of data management: For more information about how to manage your data during the entire data creation and production process, see Local Data Management: Managing your data, and Local Data Management: Working with Your Archive.   If you want to know more about preservation strategies for your data, see the following modules: Preservation Strategies: Options for Archiving Your Data, and Preservation Strategies: What Data Goes into a Long-Term Archive? Finally, the modules within the Case for Data Stewardship – Preserving the Scientific Record section will more fully discuss the concepts touched upon in this module.

Recommended Citations Mayernik, M. 2012. “The Case for Data Stewardship - Preserving the Scientific Record: Establishing Relationships with Archives.” In Data Management for Scientists Short Course, edited by Ruth Duerr and Nancy J. Hoebelheinrich, Federation of Earth Science Information Partners: ESIP Commons. doi:10.7269/P3D21VHX Slide 12: Recommended Citation This module is available under a Creative Commons Attribution 3.0 license that allows you to share and adapt the work as long as you cite the work according to the citation provided.   Thank you very much for your interest in the ESIP Federation’s Data Management for Scientists Short Course. Copyright 2012 Matthew Mayernik.