Data Formats: Using Self-describing Data Formats Curt Tilmes NASA Version 1.0 February 2013 Section: Local Data Management Copyright 2013 Curt Tilmes.

Slides:



Advertisements
Similar presentations
A Draft Standard for the CF Metadata Conventions Cheryl Craig and Russ Rew UCAR.
Advertisements

Elements of a Data Management Plan: Identifying the materials to be created Ruth Duerr National Snow and Ice Data Center Data Management Plans Copyright.
Providing access to your data: Determining your audience Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Data Formats: Using self-describing data formats Curt Tilmes NASA Version 1.0 Review Date.
The Future of NetCDF Russ Rew UCAR Unidata Program Center Acknowledgments: John Caron, Ed Hartnett, NASA’s Earth Science Technology Office, National Science.
Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group.
Data Formats Curt Tilmes/NASA Jeff Arnfield/National Climatic Data Center Al Fleig/PITA Version 1.0.
Agency Requirements: NASA Data Management Plans Ronald Weaver National Snow and Ice Data Center W. Christopher Lenhardt Renaissance Computing Institute.
StatCat Building a Statistical Data Finder ssrs.yale.edu/statcat Steven Citron-Pousty Ann Green Julie Linden Yale University.
1 CF Unleashed: Introduction to Cf/Radial Joe VanAndel National Center for Atmospheric Research 2013/1/8 The National Center for Atmospheric.
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
HDF-EOS Workshop VII, An XML Approach to HDF-EOS5 Files Jingli Yang 1, Bob Bane 1, Muhammad Rabi 1, Zhangshi Yin 1, Richard Ullman 1, Robert McGrath.
Providing Access to Your Data: Rights Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International Earth Science.
Providing Access to Your Data: Tracking Data Usage Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
DM_PPT_NP_v01 SESIP_0715_AJ HDF Product Designer Aleksandar Jelenak, H. Joe Lee, Ted Habermann Gerd Heber, John Readey, Joel Plutchak The HDF Group HDF.
Advertising your data: Using data portals and metadata registries Nancy Hoebelheinrich Version 1.0 September 2012 Section: Local Data Management Copyright.
Elements of a Data Management Plan: Identifying the materials to be created Ruth Duerr National Snow and Ice Data Center Version Review Date Section:
NPP/ NPOESS Product Data Format Richard E. Ullman NASA/GSFC/NPP NOAA/NESDIS/IPOAlgorithm / System EngineeringData / Information Architecture
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Elements of a Data Management Plan Bill Michener University Libraries University of New Mexico Data Management Practices for.
Creating Documentation and Metadata: Metadata for Discovery Lola Olsen 1, Tyler Stevens 2, 1 National Aeronautics and Space Administration (NASA) 2 Wyle.
Preserving the Scientific Record: Case Study 1 – National Snow & Ice Data Center (NSIDC) Glacier Photos Matthew Mayernik National Center for Atmospheric.
Providing Access to Your Data: Access Mechanisms Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
EARTH SCIENCE MARKUP LANGUAGE Why do you need it? How can it help you? INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
N P O E S S I N T E G R A T E D P R O G R A M O F F I C E NPP/ NPOESS Product Data Format Richard E. Ullman NOAA/NESDIS/IPO NASA/GSFC/NPP Algorithm Division.
1/14/200925th IIPS Conference 1 Challenges to Archive and Access NASA HDF-EOS Data in the long Term MuQun Yang (The HDF Group) Choonghwan Lee (The HDF.
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander.
Managing Your Data: Backing Up Your Data Robert Cook Oak Ridge National Laboratory Section: Local Data Management Version 1.0 October 2012.
Preservation Strategies: Intro to the OAIS Reference Model Curt Tilmes NASA Version 1.0 Review Date.
25th & 26th August 2009ICAT developer workshop 1.
Towards Long-Term Archiving of NASA HDF-EOS and HDF Data Data Maps and the Use of Mark-Up Language Ruth Duerr, Mike Folk, Muqun Yang, Chris Lynnes, Peter.
Creating Archive Information Packages for Data Sets: Early Experiments with Digital Library Standards Ruth Duerr, NSIDC MiQun Yang, THG Azhar Sikander,
Elements of a Data Management Plan: Roles and Responsibilities Ruth Duerr National Snow and Ice Data Center Version 1.0 Review Date.
NOAA Administrative Order : Management of Environmental and Geospatial Data and Information Jeff Arnfield NOAA’s National Climatic Data Center Version.
Providing Access to Your Data Matthew Mayernik National Center for Atmospheric Research Copyright 2012 Matthew Mayernik. Version 1.0 October 2012 Section:
Advertising your data Nancy Hoebelheinrich Version 1.0 September 2012 Section: Local Data Management Copyright 2012 Nancy J. Hoebelheinrich.
Responsible Data Use and Local Data Management Ruth Duerr National Snow and Ice Data Center.
Climate Data Formats Deniz Bozkurt
Responsible Data Use: Data Restrictions Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International Earth Science.
Creating Documentation and Metadata: Introduction to Metadata and Metadata Standards Lynn Yarmey National Snow and Ice Data Center Version 1.0 February.
Managing Your Data: Assign Descriptive File Names Robert Cook Oak Ridge National Laboratory Section: Local Data Management Version 1.0 October 2012.
Preserving the Scientific Record: Case Study 2 – Arctic Temperature Variability Data Matthew Mayernik National Center for Atmospheric Research Version.
NPOESS Enhanced Description Tool - “ned” Richard E. Ullman NASA/GSFC/NPP NOAA/NESDIS/IPO Data / Information Architecture Algorithm / System Engineering.
Advertising your data: Agency requirements for submitting metadata Nancy J. Hoebelheinrich Version 1.0 September 2012 Section: Local Data Management Copyright.
EARTH SCIENCE MARKUP LANGUAGE Tutorial on how to write an ESML Description File (for ESML Schema v3.0) “Define Once Use Anywhere” INFORMATION TECHNOLOGY.
The HDF Group Data Interoperability The HDF Group Staff Sep , 2010HDF/HDF-EOS Workshop XIV1.
Exporting WaterML from the Earth System Modeling Framework Xinqi Wang Louisiana State University NCAR SIParCS Program August 4, 2009.
Why Create a Data Management Plan? Ruth Duerr National Snow and Ice Data Center Version 1.0 February 2013 Data Management Plans Copyright 2013 Ruth Duerr.
The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.
NetCDF and Scientific Data Durability Russ Rew, UCAR Unidata ESIP Federation Summer Meeting
Data File Formats: netCDF by Tom Whittaker University of Wisconsin-Madison SSEC/CIMSS 2009 MUG Meeting June, 2009.
Elements of a Data Management Plan Ruth Duerr National Snow and Ice Data Center Version 1.0 February 2013 Data Management Plans Copyright 2013 Ruth Duerr.
The Case for Data Stewardship: Enhancing Your Reputation Matthew Mayernik National Center for Atmospheric Research Version 1.0 September 2012 Section:
Creating Documentation and Metadata: Creating a Citation for Your Data Robert Cook Oak Ridge National Laboratory Section: Local Data Management Copyright.
Copyright and Data Matthew Mayernik National Center for Atmospheric Research Section: Responsible Data Use Version 1.0 October 2012 Copyright 2012 Matthew.
Elements of a Data Management Plan: Organization and standards Ruth Duerr National Snow and Ice Data Center Version Review Date.
Providing access to your data: Determining your audience Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Providing access to your data: Handling sensitive data Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
CF 2.0 Coming Soon? (Climate and Forecast Conventions for netCDF) Ethan Davis ESO Developing Standards - ESIP Summer Mtg 14 July 2015.
Working with Your Archive : Broadening Your User Community Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh
The Case for Data Stewardship: Preserving the Scientific Record Matthew Mayernik National Center for Atmospheric Research Section: The Case for Data Stewardship.
NASA Earth Science Data Stewardship
Data Formats: Choosing and Adopting Community Accepted Standards
SRNWP Interoperability Workshop
The Case for Data Management: Agency Requirements
Copyright 2012 Lola Olsen & Tyler Stevens.
Common Framework for Earth Observation Data
The Case for Data Management: Agency Requirements
Presentation transcript:

Data Formats: Using Self-describing Data Formats Curt Tilmes NASA Version 1.0 February 2013 Section: Local Data Management Copyright 2013 Curt Tilmes

Local Data Management - Data Formats: Using Self-describing Data Formats; Version 1.0, February 2013 Overview Self-describing data formats have become a well accepted way of archiving and disseminating scientific data.

Local Data Management - Data Formats: Using Self-describing Data Formats; Version 1.0, February 2013 Background Before self-describing data formats became widely used, each project often invented their own data formats, often raw binary or even ASCII. These approaches had a number of problems: Machine dependent byte ordering or floating point organizations Required a ‘key’ to be able to open the file and read the right data. A new custom reader is needed for each different data organization. Working in a new language could be very difficult since you have to redevelop the reader anew.

Local Data Management - Data Formats: Using Self-describing Data Formats; Version 1.0, February 2013 Self-describing data formats Information describing the data contents of the file are embedded within the data file itself: Names for various fields Data types – Standardized, portable, machine independent Pointers to various fields, making it efficient to extract the particular fields you want without reading the entire file Attributes and flags related to the primary fields with extra information such as units, fill values, etc. Include a standard API and portable data access libraries in a variety of languages There are tools that can open and work with arbitrary files, using the embedded descriptions to interpret the data.

Local Data Management - Data Formats: Using Self-describing Data Formats; Version 1.0, February 2013 Some example formats HDF – Hierarchical Data Format HDF4 and HDF5 versions are in use today A NASA variant called HDF-EOS is used within the Earth Observing System program. NetCDF – Network Common Data Form Widely used by agencies including NASA and NOAA Climate and forecast (CF) metadata conventions help standardize some things into NetCDF in a common manner.

Local Data Management - Data Formats: Using Self-describing Data Formats; Version 1.0, February 2013 Best practices Choosing a self-describing format is a good first step, but it isn’t a panacea. You still have to decide how to encode your data into the format. Think carefully about the how you use the format: Layout of data within the file Unambiguous names for fields; Use standard names if possible Units Fill values Keep the users/readers of your files in mind. Some formats support seamless internal compression that can help with file sizes.

Local Data Management - Data Formats: Using Self-describing Data Formats; Version 1.0, February 2013 Case Study: Format abuse (1 of 3) A project had to distribute NORAD Two-Line Element (TLE) Sets This is a small amount of data, in a well defined format within ASCII, widely used and common. ASCII isn’t the best format, but for a small amount of data like this, especially in a widely used and understood format, it would have been fine. People understand the TLE format and have standard ways to parse it. Nevertheless, it isn’t self-describing, and people unfamiliar with TLE wouldn’t have a clue what those numbers mean. They chose to encode into HDF U 10123A

Local Data Management - Data Formats: Using Self-describing Data Formats; Version 1.0, February 2013 Case Study: Format abuse (2 of 3) A straightforward encoding would be to parse the fields, create fields with the right types (floating point) and name them according to their actual content from the TLE spec. They chose instead to maintain the ASCII text, encoding the individual characters of the file in their raw numerical form as an array of bytes. To read this data from the HDF file, you first have to extract the ASCII bytes, then parse them yourself according to the TLE spec. Rather than attaching metadata to the data fields, they created a separate empty dataset just to hold the metadata. This is just bizarre. Don’t do it like that.

Local Data Management - Data Formats: Using Self-describing Data Formats; Version 1.0, February 2013 Case Study: Format abuse (3 of 3)

Local Data Management - Data Formats: Using Self-describing Data Formats; Version 1.0, February 2013 Resources HDF: HDF-EOS: NetCDF: CF:

Local Data Management - Data Formats: Using Self-describing Data Formats; Version 1.0, February 2013 Other Relevant Modules Local Data Management – Data Formats: Choosing and Adopting Community Accepted Standards Learn more about how you can facilitate the use of your data for your own project and for re-use by others by using community accepted standards for data formats

Local Data Management - Data Formats: Using Self-describing Data Formats; Version 1.0, February 2013 Recommended Citations Copyright 2013 Curt Tilmes. Tilmes, C “Local Data Management – Data Formats: Using Self-describing Data Formats.” In Data Management for Scientists Short Course, edited by Ruth Duerr and Nancy J. Hoebelheinrich, Federation of Earth Science Information Partners: ESIP Commons. doi: /P3ZW1HVH