Data Formats: Choosing and Adopting Community Accepted Standards

Slides:



Advertisements
Similar presentations
Elements of a Data Management Plan: Identifying the materials to be created Ruth Duerr National Snow and Ice Data Center Data Management Plans Copyright.
Advertisements

Data Formats: Using self-describing data formats Curt Tilmes NASA Version 1.0 Review Date.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved. Business and Administrative Communication SIXTH EDITION.
SM3121 Software Technology Mark Green School of Creative Media.
Level 2 IT Users Qualification – Unit 1 Improving Productivity Name.
WIKI IN EDUCATION Giti Javidi. W HAT IS WIKI ? A Wiki can be thought of as a combination of a Web site and a Word document. At its simplest, it can be.
Data Formats: Using Self-describing Data Formats Curt Tilmes NASA Version 1.0 February 2013 Section: Local Data Management Copyright 2013 Curt Tilmes.
Providing Access to Your Data: Rights Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International Earth Science.
Advertising your data: Using data portals and metadata registries Nancy Hoebelheinrich Version 1.0 September 2012 Section: Local Data Management Copyright.
Elements of a Data Management Plan Bill Michener University Libraries University of New Mexico Data Management Practices for.
Creating Documentation and Metadata: Metadata for Discovery Lola Olsen 1, Tyler Stevens 2, 1 National Aeronautics and Space Administration (NASA) 2 Wyle.
Managing Your Data: Backing Up Your Data Robert Cook Oak Ridge National Laboratory Section: Local Data Management Version 1.0 October 2012.
Elements of a Data Management Plan: Roles and Responsibilities Ruth Duerr National Snow and Ice Data Center Version 1.0 Review Date.
NOAA Administrative Order : Management of Environmental and Geospatial Data and Information Jeff Arnfield NOAA’s National Climatic Data Center Version.
Providing Access to Your Data Matthew Mayernik National Center for Atmospheric Research Copyright 2012 Matthew Mayernik. Version 1.0 October 2012 Section:
Advertising your data Nancy Hoebelheinrich Version 1.0 September 2012 Section: Local Data Management Copyright 2012 Nancy J. Hoebelheinrich.
Responsible Data Use: Data Restrictions Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International Earth Science.
Creating Documentation and Metadata: Introduction to Metadata and Metadata Standards Lynn Yarmey National Snow and Ice Data Center Version 1.0 February.
Preserving the Scientific Record: Case Study 2 – Arctic Temperature Variability Data Matthew Mayernik National Center for Atmospheric Research Version.
Advertising your data: Agency requirements for submitting metadata Nancy J. Hoebelheinrich Version 1.0 September 2012 Section: Local Data Management Copyright.
Why Create a Data Management Plan? Ruth Duerr National Snow and Ice Data Center Version 1.0 February 2013 Data Management Plans Copyright 2013 Ruth Duerr.
Elements of a Data Management Plan Ruth Duerr National Snow and Ice Data Center Version 1.0 February 2013 Data Management Plans Copyright 2013 Ruth Duerr.
The Case for Data Stewardship: Enhancing Your Reputation Matthew Mayernik National Center for Atmospheric Research Version 1.0 September 2012 Section:
Copyright and Data Matthew Mayernik National Center for Atmospheric Research Section: Responsible Data Use Version 1.0 October 2012 Copyright 2012 Matthew.
Providing access to your data: Determining your audience Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Providing access to your data: Handling sensitive data Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Working with Your Archive : Broadening Your User Community Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
The Case for Data Stewardship: Preserving the Scientific Record Matthew Mayernik National Center for Atmospheric Research Section: The Case for Data Stewardship.
PROBLEM SOLVING. Definition The act of defining a problem; determining the cause of the problem; identifying, prioritizing and selecting alternatives.
A Design Process Introduction to Engineering Design
[name] [demographic] [goals] A Template for Creating Buyer Personas.
A Marketer’s Template for Creating Buyer Personas [name] [demographic]
[name] [demographic] [goals] The Marketer’s Customer Persona Template.
Preparation and practice are essential for success in your examination
Hannah Pollard- Admissions Progression Officer
Hidden Slide for Instructor
Team Skill 1 - Analyzing the Problem
AP CSP: Data Assumptions & Good and Bad Data Visualizations
A Design Process Introduction to Engineering Design
Data access and sharing
Chapter 4: Design and Problem Solving
The Case for Data Management: Agency Requirements
Copyright 2012 Lola Olsen & Tyler Stevens.
Care and support for older people with learning disabilities
CS 641 – Requirements Engineering
CS 641 – Requirements Engineering
Reviewing Documents Guided Lesson.
Software Documentation
Agency Requirements: NOAA Administrative Order Management of environmental and geospatial data and information This training module is part of.
By Dr. Abdulrahman H. Altalhi
What will you do when you leave school?
Improvement 101 Learning Series
Microsoft Word Reviewing Documents.
A Design Process Principles Of Engineering
LO3 - Activities Movie and Try Surfing Animation - DA202
Module 6: Preparing for RDA ...
Teacher Academy Workshops
A Design Process.
Penn State Educational Programming Record (EPR) Guide
Research Live Presentation Template
The Case for Data Management: Agency Requirements
Research and Design Team Project
2. An overview of SDMX (What is SDMX? Part I)
Systems Engineering for Mission-Driven Modeling
A Design Process.
A Design Process Introduction to Engineering Design
What will you do when you leave school?
Foundations for making smart decisions
Building an Informatics-Savvy Health Department
A Marketer’s Template for Creating Buyer Personas [name] [demographic]
Presentation transcript:

Data Formats: Choosing and Adopting Community Accepted Standards Section: Local Data Management Data Formats: Choosing and Adopting Community Accepted Standards Introduction: Slide 1 This is the Federation of Earth Science Information Partners Data Management for Scientists Short Course, Section: Local Data Management – Data Formats; Module: Choosing and Adopting Community Accepted Standards.   This training module is part of the Federation of Earth Science Information Partners (or ESIP Federation's) Data Management for Scientists Short Course. The subject of this module is "Choosing and Adopting Community Accepted Standards". The module was authored by Curt Tilmes from the National Aeronautics and Space Administration (NASA). Besides the ESIP Federation, sponsors of this Data Management for Scientists Short Course are the Data Conservancy and the United States National Oceanic and Atmospheric Administration (NOAA). Curt Tilmes NASA Version 1.0 February 2013 Copyright 2013 Curt Tilmes

Overview Some guidelines for choosing and adopting community accepted standards Slide 2: Overview In this module, we’re going to talk about formats for your data, and provide some guidelines for choosing and adopting community accepted standards for data formats.

Background Most projects (rightly so) focus on the content of their data files, you need to consider the format as well. Since you captured or created the data, and stored them in your own files, you know how the data are organized, how to read them, how to use them, characteristics of the data that could constrain their use. The goal of a good data format is to make it easier for others to read the data too. Many hours have gone into developing standards for formats – try to learn from them. Slide 3: Background We can think of a number of reasons why you would want to adopt community accepted standards rather than develop your own for a new project. With most projects, the investigators of the project are very concerned about the project’s data, and rightly so, for that data is the main focus and the project’s reason for being. Still, it is the technical format in which the data are represented that allows the data content to be conveyed to other people. As either a principal investigator or primary researcher, you’ve captured and created your own data and sorted them into your own piles. As a result, you always know characteristics about your data, how the data are organized, how to read them, and how to use them. Others who’d like to use your content would have an easier time taking and using your data if they are expressed in a standard data format. Many hours have gone into the development of standard formats that make data easier for others to read, use and understand. Using community developed standards allows you to leverage those efforts.

Why use community standards? If you try to develop your data format from scratch, you will forget something. Build on the experience and improvements built into the community standards over years of use. Tools and analysis software natively support reading community standard data. Reduce development effort and support reuse. Positive feedback – they are more likely to be adopted by others. Slide 4: Why use community standards? (1 of 2) If you try to develop your own data format from scratch, you will always forget something important that is probably covered by a standard format. Standard formats are usually developed based on the road blocks that a larger number of people have found in representing their data in various formats.   Using the standards allows you to build on that community-based experience and take advantage of the improvements that have put into those standards. Another good reason for a standard format is that tools and third party analysis software often natively understand those formats. If you invent your own formats, there is much greater likelihood that already existing tools and software will not support your new format. Using a community developed standard will allow your data to be natively supported by those third party tools out of the box. In addition, coming up with a good format is a real pain. You can save yourself a lot of time and effort by adopting a standard that already exists.

Why use community standards? Slide 5: Why use community standards? (2 of 2) Of course, you might ask why are there so many standards? While this is a very good question, this little cartoon from XKCD shows that even if you think that you can invent the one best way to do something, you are probably not going to come up with the one way that everyone will agree is so much better than all the others that they will drop the other standards and switch to yours. You are better off taking advantage of one that already exists. http://xkcd.com/927/

A few guidelines Consider your archive: Consider your users: Do they have any recommendations? Consider your users: Who wants this data? Why do they want it? What do they want to do with it? Will they be using your data in concert with other data? Consider heritage: What worked well for similar data in the past? What could be done better for newly created data? Consider tools: Try to use data formats supported by the software you intend to use it with. Slide 6: A few guidelines We’d like to offer you a few guidelines that should help you when choosing which of the standards to use and how you will use it. If you are planning to transfer your data to a long term archive, definitely check with archive staff for recommendations. They will be very familiar with their users and will know if there are certain data formats that are commonly used.   Consider your user base. What type of person is going to get this data? With what other data will they already be familiar? What do they want to do with your data? Are they going to be using your data in concert with other content that already has a data format? If so, it might well benefit your project to represent your data in a similar data format so that it is more easily used with other data. Consider heritage. What has worked well in the past? In some cases, data creation may have been done poorly in the past and you know that you have a better way to do it in your own project. Still, it would behoove you to at least take a look at what has been done previously. Consider tools. Are there specific tools that make visualizations, analyze, or convert data like yours that are compatible with a data format? If so, your users may want to use your data with those tools, so the formats those tools support should be considered.

Some examples HDF – Hierarchical Data Format HDF4 and HDF5 versions are in use today A NASA variant called HDF-EOS is used within the Earth Observing System program. The Aura project developed a common approach across their instruments and released guidelines as a Technical Note. NetCDF – Network Common Data Form Widely used by agencies including NASA and NOAA Climate and forecast (CF) metadata conventions help standardize some things into NetCDF in a common manner. Slide 7: Some examples We’d like to illustrate the points by showing a couple of examples of NASA, NOAA and other Earth science-centric formats that are converging to become the go-to formats for data. They are HDF – the Hierarchical Data Format that has several versions and variants in use today. The Aura project mentioned on this slide illustrates how this format has been used. Another example is the Network Common Data Form (NetCDF) that is widely used by many agencies including NASA and NOAA, and which is further supported by climate and forecast metadata conventions.

Adopting standards The standard gives you a starting point, not a complete solution. Communicate early with a broad range of data users: archivists, software engineers, scientists. Consider how you will be writing the data and how you will be reading the data. Get feedback before making final decisions. Start sharing sample data in proposed format to nail down specifics and work out ambiguities. Document your use and application of the standard completely. Slide 8: Adopting standards Once you have considered a standard and chosen to use it, you should know that making this choice is just a starting point for deciding how to represent your data in that format. Even with a community based standard, you’ll find that there are good ways and poor ways of representing certain types of data with certain formats. It’s an excellent idea to communicate as early as possible with a broad range of the potential end users of your data. Think about who is going to use and archive your data? Will they be software engineers or scientists? Figure out how you will be writing the data and how your users will be reading the data so that you can come up with the best way to organize that data and optimize your software’s ability to deal with that format. Communicate with all the parties who have an interest in what the representation of your data will ultimately be rather than invent a format in isolation.   We advise that you come up with a proposal for the format that you want to use and circulate it for feedback from your potential users. A plan of action is always better than just saying, “Well I have a bunch of data and I’m going to just start dumping it into this file” since the data dump could end up being your format by default. Chances are that this kind of data representation will not really be optimal for many actions. Once the proposal has been vetted and accepted, it’s a good idea to share sample data in order to make sure that people understand the specific choices that you’ve made, and you can work out any ambiguities. We also recommend that you very carefully document your choices, and why you made them so that someone else can understand not just that you chose a specific data format, but the manner in which you are implementing that format for your specific data and your specific project. The more documentation and user guides that you have, the better the chances are that your users will be using your data in the way that they ought to be used, and that you intend them to be used.

Resources HDF: http://www.hdfgroup.org HDF-EOS: http://hdfeos.org HDF-EOS Aura File Format Guidelines: http://disc.sci.gsfc.nasa.gov/Aura/additional/documentation/HDFEOS_ Aura_File_Format_Guidelines.pdf http://www.esdswg.org/spg/spgfolder/events/esdswg-meeting-october- 25-27-2005/auraasabestpracticerev2.pdf NetCDF: http://www.unidata.ucar.edu/software/netcdf CF: http://cf-pcmdi.llnl.gov/ Slide 9: Resources On this slide, you will find a linked listing of some additional resources you might find helpful should you need more information about some of the data formats and guidelines for using them. Even if you don’t use these specific formats, it can be useful to review them because you will see some of the rationale for choosing a data format and why certain decisions were made as these formats were constructed.

Other Relevant Modules Local Data Management – Data Formats: Using Self- describing Data Formats Learn more about the advantages of using formats for your data that have important metadata and other information embedded within them Slide 10: Other Relevant Modules The modules of the ESIP Data Management for Scientists Short Course have been designed to complement and supplement each other. In light of this plan, we think you may find the following module relevant to you as you seek to gain a better understanding of data formatting:   Local Data Management – Data Formats: Using Self-describing Data Formats.

Recommended Citations Tilmes, C. 2013. “Local Data Management – Data Formats: Choosing and Adopting Community Accepted Standards.” In Data Management for Scientists Short Course, edited by Ruth Duerr and Nancy J. Hoebelheinrich, Federation of Earth Science Information Partners: ESIP Commons. doi:10.7269/P33N21B6 Slide 9: Recommended Citation This module is available under a Creative Commons Attribution 3.0 license that allows you to share and adapt the work as long as you cite the work according to the citation provided. Thank you very much for your interest in the ESIP Federation’s Data Management for Scientists Short Course. Copyright 2013 Curt Tilmes.