File Formats, Conventions, and Data Level Interoperability ESDSWG New Orleans, Oct 20, 2010 Joe Glassy, Chris Lynnes ESDSWG Tech Infusion.

Slides:



Advertisements
Similar presentations
1 NASA CEOP Status & Demo CEOS WGISS-25 Sanya, China February 27, 2008 Yonsook Enloe.
Advertisements

Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.
A Draft Standard for the CF Metadata Conventions Cheryl Craig and Russ Rew UCAR.
The HDF Group ESIP Summer Meeting Easy access HDF files via Hyrax Kent Yang The HDF Group 1 July 8 – 11, 2014.
Data Formats: Using self-describing data formats Curt Tilmes NASA Version 1.0 Review Date.
Reading HDF family of formats via NetCDF-Java / CDM
MODIS Data at NSIDC MODIS Collection 5/Long Term Data Record Workshop Molly McAllister & Terry Haran January
The Model Output Interoperability Experiment in the Gulf of Maine: A Success Story Made Possible By CF, NcML, NetCDF-Java and THREDDS Rich Signell (USGS,
The HDF Group HDF/HDF-EOS Workshop XIV1 Easy Remote Access via OPeNDAP Kent Yang and Joe Lee The HDF Group The 14 th HDF/HDF-EOS Workshop.
DMAC ST and the Activities of the IOOS PO Derrick Snowden DMAC Steering Team
Integrating NOAA’s Unified Access Framework in GEOSS: Making Earth Observation data easier to access and use Matt Austin NOAA Technology Planning and Integration.
The Future of NetCDF Russ Rew UCAR Unidata Program Center Acknowledgments: John Caron, Ed Hartnett, NASA’s Earth Science Technology Office, National Science.
Architecture and Data Management Strategy (Action Plan) Ivan 1 DeLoatch, USGS, ADC Co-chair Alessandro Annoni, EC, ADC Co-chair Jay Pearlman, IEEE, ADC.
Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Support EOS: Review and Discussions Kent Yang and Joe Lee The HDF Group October 16, 2012 Oct. 16, 2012Annual HDF Briefing to ESDIS1.
The HDF Group HDF/HDF-EOS Workshop XIV1 Easy Access of NASA HDF data via OPeNDAP Kent Yang and Joe Lee The HDF Group September 28,2010.
1 CF Unleashed: Introduction to Cf/Radial Joe VanAndel National Center for Atmospheric Research 2013/1/8 The National Center for Atmospheric.
Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Mike Folks, The HDF Group Ruth Duerr, NSIDC 1.
Data Formats: Using Self-describing Data Formats Curt Tilmes NASA Version 1.0 February 2013 Section: Local Data Management Copyright 2013 Curt Tilmes.
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
1 HDF-EOS APIs, tools, etc. Larry Klein, Abe Taaheri, and Cid Praderas L-3 Communications Government Services, Inc. November 30, 2005.
Important ESDIS 2009 tasks review Kent Yang, Mike Folk The HDF Group April 1st, /1/20151Annual briefing to ESDIS.
DM_PPT_NP_v01 SESIP_0715_AJ HDF Product Designer Aleksandar Jelenak, H. Joe Lee, Ted Habermann Gerd Heber, John Readey, Joel Plutchak The HDF Group HDF.
Unidata TDS Workshop TDS Overview – Part I XX-XX October 2014.
ATMOSPHERIC SCIENCE DATA CENTER ‘Best’ Practices for Aggregating Subset Results from Archived Datasets Walter E. Baskin 1, Jennifer Perez 2 (1) Science.
Mid-Course Review: NetCDF in the Current Proposal Period Russ Rew
Why do I want to know about HDF and HDF- EOS? Hierarchical Data Format for the Earth Observing System (HDF-EOS) is NASA's primary format for standard data.
Accomplishments and Remaining Challenges: THREDDS Data Server and Common Data Model Ethan Davis Unidata Policy Committee Meeting May 2011.
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
1 HDF-EOS Development Current Status and Schedule Larry Klein, Shen Zhao, Abe Taaheri and Ray Milburn L-3 Communications Government Services, Inc. September.
The HDF Group November 3-5, 2009 HDF-OPeNDAP Project Update HDF/HDF-EOS Workshop XIII1 Joe Lee and Kent Yang The HDF Group James Gallagher.
Towards Long-Term Archiving of NASA HDF-EOS and HDF Data Data Maps and the Use of Mark-Up Language Ruth Duerr, Mike Folk, Muqun Yang, Chris Lynnes, Peter.
Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct HDF and.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative External Observatory Integration Christopher Mueller, Matt Arrott, John Graybeal Life Cycle.
Creating Archive Information Packages for Data Sets: Early Experiments with Digital Library Standards Ruth Duerr, NSIDC MiQun Yang, THG Azhar Sikander,
EarthCube Building Block for Integrating Discrete and Continuous Data (DisConBB) David Maidment, University of Texas at Austin (Lead PI) Alva Couch, Tufts.
1 NASA CEOP Status & Demo CEOS WGISS-24 Oberpfaffenhofen, Germany October 15, 2007 Yonsook Enloe.
Adoption of RDA-DFT Terminology and Data Model to the Description and Structuring of Atmospheric Data Aaron Addison, Rudolf Husar, Cynthia Hudson-Vitale.
NetCDF Data Model Issues Russ Rew, UCAR Unidata NetCDF 2010 Workshop
THREDDS Catalogs Ethan Davis UCAR/Unidata NASA ESDSWG Standards Process Group meeting, 17 July 2007.
NetCDF file generated from ASDC CERES SSF Subsetter ATMOSPHERIC SCIENCE DATA CENTER Conversion of Archived HDF Satellite Level 2 Swath Data Products to.
HDF4 OPeNDAP Project Progress Report MuQun Yang and Hyo-Kyung Lee 1 HDF Developers' Meeting11/24/2015.
1 NASA CEOP Final Summary CEOS WGISS-26 Boulder, Colorado September 23, 2008 Yonsook Enloe
The HDF Group Data Interoperability The HDF Group Staff Sep , 2010HDF/HDF-EOS Workshop XIV1.
The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.
1 Status of HDF-EOS, Related Software and Tools. 2 TOOLKIT / HDF-EOS Support.
July 20, Update on the HDF5 standardization effort Elena Pourmal, Mike Folk The HDF Group July 20, 2006 SPG meeting, Palisades, NY.
Data File Formats: netCDF by Tom Whittaker University of Wisconsin-Madison SSEC/CIMSS 2009 MUG Meeting June, 2009.
11/8/2007HDF and HDF-EOS Workshop XI, Landover, MD1 Software to access HDF5 Datasets via OPeNDAP MuQun Yang, Hyo-Kyung Lee The HDF Group.
Convergence And Trust in Earth and Space Science Data Systems Ted Habermann, NOAA National Geophysical Data Center Documentation: It’s not just discovery...
NASA HDF-EOS File Format Overview Joseph M Glassy, Director, MODIS Software Development at NTSG School of Forestry, Numerical Terradynamics Simulation.
A Draft Standard for the CF Metadata Conventions Russ Rew, Unidata GO-ESSP 2009 Workshop
CF 2.0 Coming Soon? (Climate and Forecast Conventions for netCDF) Ethan Davis ESO Developing Standards - ESIP Summer Mtg 14 July 2015.
1 2.5 DISTRIBUTED DATA INTEGRATION WTF-CEOP (WGISS Test Facility for CEOP) May 2007 Yonsook Enloe (NASA/SGT) Chris Lynnes (NASA)
NetCDF: Data Model, Programming Interfaces, Conventions and Format Adapted from Presentations by Russ Rew Unidata Program Center University Corporation.
Update on Unidata Technologies for Data Access Russ Rew
NcBrowse: A Graphical netCDF File Browser Donald Denbo NOAA-PMEL/UW-JISAO
TSDS (HPDE DAP). Objectives (1) develop a standard API for time series-like data, (2) develop a software package, TSDS (Time Series Data Server), that.
Moving from HDF4 to HDF5/netCDF-4
SRNWP Interoperability Workshop
HDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFView
Plans for an Enhanced NetCDF-4 Interface to HDF5 Data
Efficiently serving HDF5 via OPeNDAP
Access HDF5 Datasets via OPeNDAP’s Data Access Protocol (DAP)
HDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFView
HDF-EOS Workshop XXI / The 2018 ESIP Summer Meeting
ExPLORE Complex Oceanographic Data
NCL variable based on a netCDF variable model
Presentation transcript:

File Formats, Conventions, and Data Level Interoperability ESDSWG New Orleans, Oct 20, 2010 Joe Glassy, Chris Lynnes ESDSWG Tech Infusion

Introduction & overview Outline of objectives: – Discuss role of standard, self-describing “File formats” in data level interoperability – Summarize common file formats in use, their properties, & benefits --“data life cycle economics” – Discuss criteria for choosing a file format, matching it to needs of consumer/producers. – Discuss critical role of Conventions – any file format needs good recipes to make them interoperable! – Examples: NASA Measures F/T, SMAP, AIRs, Aura

Role(s) Of File Formats in Interoperability File formats represent versatile “packages” for multi-dimensional science data and metadata. Offer self-describing “well-known structures” to codify desired, common conventions and practices. Offer well-documented reference cases to encapsulate specific data models. Standard file formats dock with format-aware tools to offer users a seamless end-to-end experience and platform portability Enhance Mission-to-Mission continuity

…investment  life-cycle economics…

Why (and how) are file formats important? Standard formats – Come with thorough documentation – Provide good Reference implementations Common formats – More datasets in a format  more tools that read that format Canonical structures and names  general purpose handlers for coordinates, etc.  smarter tools

A generic work flow… Consider user community needs and culture, fit within architecture, institutional policies & preferences Choose a standard file format (or sub-variant) Design a convention-enabled, specific internal layout with metadata interfaces Prototype: Implement in prototype, evaluate Implement in production context Integrate within discovery and catalog environments (Catalog interoperability…)

Examples of standard file formats HDF5 – a file format on its own, as well as a broad foundation for others netCDF v4 (stable at v4.1.1, newest : v4.1.2-beta1) – v4 Classic (widespread adoption, some limitations…) – v4 Enhanced (support Groups, User-defined, variable length types, and more) netCDF v3 Classic (legacy+, tools+, but limited) HDFEOS2, HDFEOS5 – EOS Terra, Aqua, Aura… HDF4 – legacy, extensive use by MODIS Terra, Aqua Many other domain-specific, less generic formats abound… (need transform tools to/from HDF?)

Some selection criteria… Do file-format’s capabilities support required functionality? What is breadth of acceptance, adoption within larger community? (and/or, does institutional policy dictate a specific format?) Presence and quality of documentation (reference, examples and especially tutorials), API software, and community support? Contribution to investment, data life-cycle economics? What is the level of standardization? Adaptability of format to widely used conventions like CF 1.x, or other accepted convention(s)?

Internal Layout / Design (once format is chosen & adopted…) Define &refine High level organization /structure /DATA /METADATA Distinguish ‘data’ from ‘metadata’, core structure vs. ‘attributes’ – Dimensions, Coordinate Variables, projection attributes – Missing_data, _Fillvalue vs. internal fill value – Units, Gain, offset, min, max, range, etc. Prototype it! – Leverage script environments (Python H5Py, PyTables, etc) – Panoply, HDFView also quick, useful for prototyping, feedback

Using “Groups” HDF5 (and NetCDF v4-Enhanced) support full use of groups e.g. /DATA vs. /METADATA, etc. Groups useful in partitioning out functionally related sets of data or attributes; Hierarchical view mimics file-system Facilitates appropriate information-hiding, highlights needed info, shield other (principle of least privilege…) Well supported by modern tools (Panoply, HDFViews, PyTables, H5Py) and low-lev APIs.

Example(s) of File Formats In Action HDF5 – NASA Measures – NASA Measures Freeze/Thaw (soon available at NSIDC) – AQUA AIRS Level 2 (from earlier talk) : – 0/285/AIRS L2.RetStd.v G hdf 0/285/AIRS L2.RetStd.v G hdf Aura TES ( TES-Aura_L3-CH4_r _F01_05.he5 )

Example: NASA Measures Freeze/Thaw, Daily in HDF5 Metadata Block: Attributes

Example: NASA Measures Daily Freeze/Thaw in HDF5 Data Variable (FT_SSMI) and Attributes

Example: NASA Level 2 AIRS (Swath) in HDF4

Example: NetCDF, (tos) Sea surface temperatures collected by PCMDI for use by the IPCC, illustrating CF v1.0 layoutIPCC

Example: TES (HDFEOS5) illustrating CF v1.0 layout

CF Conventions & file formats: --how they contribute to interoperability. CF v1.4.x -- the term “CF” is now broader than just climate-forecasting! Standard Name Table -- a step towards wider adoption of names, controlled vocabularies, units terminology CF v1.4.x provides tool-makers with helpful “lingua- franca” guidance. Within a file-format, adopting conventions like CF promotes common layout, names, semantics, for dataset-to-dataset compatibility -- a key to wider data level interoperability.

Attributes vs. Metadata? one man’s ceiling is another man’s floor… Collection level vs. Data Set vs. Granule level Structural vs. science-content Swath vs. grid vs. point Commonly used attributes: – CONVENTIONS attrib, communicates which convention was used – Basic globals: title, history, institution, source, references – Coordinate variables, axis, formula_terms – Units, _Fillvalue, missing_data, valid_range – Short_name, long_name, other provenance – (gain,offset /scale_factor,addOffset), etc.

Challenges? (just a few remain…) Evolution, bifurcation, asymmetric support can result in occasional user confusion: – HDF v1.8.x vs. v1.6.x families? – NetCDF v4 Enhanced vs. NetCDF v4 Classic vs. v3? – HDFEOS5 vs. HDFEOS2? Both GUI tool and API support tends to vary by platform (Linux, Mac, Win7) and sub-flavor… Multi-library dependency stacks beg for fully bundled, version-matched end-to-end install pkg! Conventions community (CF v1.4.x) and metadata standards communities also in motion (but that’s good too…)

Resources : URLs Climate Forecast (CF) Conventions (now at 1.4.x): – – HDF: – HDFEOS – – NetCDF: – – ml ml General: – Describing_Formats –

Resources: File format related Tools Panoply: HDFView: OpenDAP : IDV : McIDAS : Python : – h5py : – PyTables: Perl : PDL-IO-HDF5, and Biohdf? Many others: HEG, MTD, HDFEOS plug-in for HDFview, HDFLook, (ncdump, h5dump, and cousins), GRADS, Matlab, binary APIs

A provisional DOI, UUID Strategy What we used for NASA Measures Freeze/Thaw, daily (v2) just delivered: – DOI: assigned to our reference paper, by IEEE Transactions in Geoscience and Remote Sensing – UUID recipe, seedString = Import uuid uuid= uuid.uuid5(seedString)