Developing Conventions for netCDF-4 Russ Rew, UCAR Unidata June 11, 2007 GO-ESSP.

Slides:



Advertisements
Similar presentations
Introduction to the BinX Library eDIKT project team Ted Wen Robert Carroll
Advertisements

A Draft Standard for the CF Metadata Conventions Cheryl Craig and Russ Rew UCAR.
Operators & Identifiers The Data Elements. Arithmetic Operators exponentiation multiplication division ( real ) division ( integer quotient ) division.
Recent Work in Progress
ESCI/CMIP5 Tools - Jeudi 2 octobre CMIP5 Tools Earth System Grid-NetCDF4- CMOR2.0-Gridspec-Hyrax …
The Future of NetCDF Russ Rew UCAR Unidata Program Center Acknowledgments: John Caron, Ed Hartnett, NASA’s Earth Science Technology Office, National Science.
NetCDF An Effective Way to Store and Retrieve Scientific Datasets Jianwei Li 02/11/2002.
HDF4 and HDF5 Performance Preliminary Results Elena Pourmal IV HDF-EOS Workshop September
NetCDF Ed Hartnett Unidata/UCAR
Introduction to NetCDF Russ Rew, UCAR Unidata ICTP Advanced School on High Performance and Grid Computing 13 April 2011.
1 CF Unleashed: Introduction to Cf/Radial Joe VanAndel National Center for Atmospheric Research 2013/1/8 The National Center for Atmospheric.
Status of netCDF-3, netCDF-4, and CF Conventions Russ Rew Community Standards for Unstructured Grids Workshop, Boulder
Developing a NetCDF-4 Interface to HDF5 Data
1 Writing NetCDF Files: Formats, Models, Conventions, and Best Practices Russ Rew, UCAR Unidata June 28, 2007.
Data Formats: Using Self-describing Data Formats Curt Tilmes NASA Version 1.0 February 2013 Section: Local Data Management Copyright 2013 Curt Tilmes.
Introduction to NetCDF4 MuQun Yang The HDF Group 11/6/2007HDF and HDF-EOS Workshop XI, Landover, MD.
Developing a NetCDF-4 Interface to HDF5 Data Russ Rew (PI), UCAR Unidata Mike Folk (Co-PI), NCSA/UIUC Ed Hartnett, UCAR Unidata Quincey Kozial, NCSA/UIUC.
1 Russ Rew, Ed Hartnett, John Caron UCAR Unidata Program Center Mike Folk, Robert McGrath, Quincey Kozial NCSA and The HDF Group, Inc. Final Project Review,
HDF5 A new file format & software for high performance scientific data management.
NPP/ NPOESS Product Data Format Richard E. Ullman NASA/GSFC/NPP NOAA/NESDIS/IPOAlgorithm / System EngineeringData / Information Architecture
© Crown Copyright Met Office Towards improved netCDF-GIS interoperability: Potential utility of the “Well-Known Model” concept Phil Bentley, Met Office.
Animation of DSM2 Outputs in ArcMap Siqing Liu Bay Delta Office Department of Water Resources 2/17/2015.
Slide 1 TIGGE phase1: Experience with exchanging large amount of NWP data in near real-time Baudouin Raoult Data and Services Section ECMWF.
The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1.
N P O E S S I N T E G R A T E D P R O G R A M O F F I C E NPP/ NPOESS Product Data Format Richard E. Ullman NOAA/NESDIS/IPO NASA/GSFC/NPP Algorithm Division.
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
1 © 2002, Cisco Systems, Inc. All rights reserved. Arrays Chapter 7.
October 15, 2008HDF and HDF-EOS Workshop XII1 What will be new in HDF5?
Integrating netCDF and OPeNDAP (The DrNO Project) Dr. Dennis Heimbigner Unidata Go-ESSP Workshop Seattle, WA, Sept
Advanced Utilities Extending ncgen to support the netCDF-4 Data Model Dr. Dennis Heimbigner Unidata netCDF Workshop August 3-4, 2009.
1 HDF5 Life cycle of data Boeing September 19, 2006.
NetCDF Data Model Issues Russ Rew, UCAR Unidata NetCDF 2010 Workshop
NetCDF file generated from ASDC CERES SSF Subsetter ATMOSPHERIC SCIENCE DATA CENTER Conversion of Archived HDF Satellite Level 2 Swath Data Products to.
1 Understanding Cataloging with DLESE Metadata Karon Kelly Katy Ginger Holly Devaul
The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.
Lecture 3 Introduction to Computer Programming CUIT A.M. Gamundani Presentation Layout from Lecture 1 Background.
NetCDF-4: Software Implementing an Enhanced Data Model for the Geosciences Russ Rew, Ed Hartnett, and John Caron UCAR Unidata Program, Boulder
NetCDF and Scientific Data Durability Russ Rew, UCAR Unidata ESIP Federation Summer Meeting
00/XXXX 1 Data Processing in PRISM Introduction. COCO (CDMS Overloaded for CF Objects) What is it. Why is COCO written in Python. Implementation Data Operations.
Data File Formats: netCDF by Tom Whittaker University of Wisconsin-Madison SSEC/CIMSS 2009 MUG Meeting June, 2009.
Advances in the NetCDF Data Model, Format, and Software Russ Rew Coauthors: John Caron, Ed Hartnett, Dennis Heimbigner UCAR Unidata December 2010.
11/8/2007HDF and HDF-EOS Workshop XI, Landover, MD1 Software to access HDF5 Datasets via OPeNDAP MuQun Yang, Hyo-Kyung Lee The HDF Group.
QARTOD in Practice Luke Campbell, Software Engineer, RPS ASA.
Unidata Technologies Relevant to GO-ESSP: An Update Russ Rew
CF 2.0 Coming Soon? (Climate and Forecast Conventions for netCDF) Ethan Davis ESO Developing Standards - ESIP Summer Mtg 14 July 2015.
Development of a CF Conventions API Russ Rew GO-ESSP Workshop, LLNL
NetCDF: Data Model, Programming Interfaces, Conventions and Format Adapted from Presentations by Russ Rew Unidata Program Center University Corporation.
Update on Unidata Technologies for Data Access Russ Rew
The HDF Group Introduction to HDF5 Session 7 Datatypes 1 Copyright © 2010 The HDF Group. All Rights Reserved.
NcBrowse: A Graphical netCDF File Browser Donald Denbo NOAA-PMEL/UW-JISAO
Utilities for netCDF-4 Dr. Dennis Heimbigner Unidata Advanced netCDF Workshop July 25, 2011.
Unidata Infrastructure for Data Services Russ Rew GO-ESSP Workshop, LLNL
Libcf – A CF Convention Library for NetCDF Ed Hartnett Unidata Program Center Boulder Colorado June 11, 2007.
NetCDF Data Model Details Russ Rew, UCAR Unidata NetCDF 2009 Workshop
Copyright © 2010 The HDF Group. All Rights Reserved1 Data Storage and I/O in HDF5.
Other Projects Relevant (and Not So Relevant) to the SODA Ideal: NetCDF, HDF, OLE/COM/DCOM, OpenDoc, Zope Sheila Denn INLS April 16, 2001.
(Network Common Data Form)
Moving from HDF4 to HDF5/netCDF-4
SRNWP Interoperability Workshop
Plans for an Enhanced NetCDF-4 Interface to HDF5 Data
Requirements for GSICS Plotting Tool to support VIS/NIR products
What is FITS? FITS = Flexible Image Transport System
Unidata & NetCDF BoF Scientific File Formats
Julia Powell Coast Survey Development Laboratory
Unidata Advanced netCDF Workshop
Status for Endeavor 6: Improved Scientific Data Access Infrastructure
Libcf – A CF Convention Library for NetCDF
GSICS Baseline Review: Product meta-data and structures
Masaya Takahashi Japan Meteorological Agency
NCL variable based on a netCDF variable model
Presentation transcript:

Developing Conventions for netCDF-4 Russ Rew, UCAR Unidata June 11, 2007 GO-ESSP

Overview Two levels of conventions: NUG and CF “Classic” and extended netCDF-4 data models Data models and data formats Potential uses and examples of netCDF-4 data model features CF conventions issues Benefits of using netCDF-4 format but classic data model Recommendations and conclusions

Background: netCDF and Conventions Purpose of conventions –To capture meaning in data, intent of data provider –To foster interoperability NetCDF User Guide conventions –Concepts: simple coordinate variables, … –Attribute based: units, Conventions, … Climate and Forecast (CF) conventions –Concepts: generalized coordinates, … –Models relationships among variables –Standard names –Attribute based

Classic NetCDF Data Model Variables and attributes have one of six primitive data types. DataType char byte short int float double Dimension name: String length: int isUnlimited( ) Attribute name: String type: DataType values: 1D array Variable name: String shape: Dimension[ ] type: DataType array: read( ), … File location: Filename create( ), open( ), … A file has named variables, dimensions, and attributes. A variable may also have attributes. Variables may share dimensions, indicating a common grid. One dimension may be of unlimited length.

A file has a top-level unnamed group. Each group may contain one or more named subgroups, user-defined types, variables, dimensions, and attributes. Variables also have attributes. Variables may share dimensions, indicating a common grid. One or more dimensions may be of unlimited length. Dimension name: String length: int isUnlimited( ) Attribute name: String type: DataType values: 1D array Variable name: String shape: Dimension[ ] type: DataType array: read( ), … Group name: String File location: Filename create( ), open( ), … Variables and attributes have one of twelve primitive data types or one of four user-defined types. DataType PrimitiveType char byte short int int64 float double unsigned byte unsigned short unsigned int unsigned int64 string UserDefinedType typename: String Compound VariableLength Enum Opaque NetCDF-4 Data Model

Some Limitations of Classic NetCDF Data Model Little support for data structures, just multidimensional arrays and lists No “ragged arrays” or nested structures Only one shared unlimited dimension for appending new data efficiently Flat name space for dimensions and variables Character arrays rather than strings Small set of numeric types Variable size constraints, packing instead of compression, inefficient schema additions, …

NetCDF-4 Features for Data Providers Data model provides: Groups for nested scopes User-defined enumeration types User-defined compound types User-defined variable- length types Multiple unlimited dimensions String type Additional numeric types HDF5-based format provides: Per-variable compression Per-variable multidimensional tiling (chunking) Liberal variable size constraints Reader-makes-right conversion Efficient dynamic schema additions Parallel I/O

NetCDF Data Models and File Formats 1.Use simple classic data model and format 2.Use richer netCDF-4 data model and netCDF-4 format and a third less obvious choice: 3.Use classic data model with the netCDF-4 format Data providers writing new netCDF data have two obvious alternatives:

“Classic model” netCDF-4 files Supported by netCDF-4 library with file creation flag Ensures data can be read by netCDF-3 software (relinked to netCDF-4 library) Compatible with current conventions Writers get benefits of new format, but not data model Readers can –access compressed or chunked variables transparently –get performance benefits of reader-makes-right –use of HDF5 tools

Is it Time to Adopt NetCDF-4 Data Model? C-based netCDF-4 software still only in beta release Few netCDF utilities or applications adapted to full netCDF-4 model yet Little experience with netCDF-4 means useful conventions still in early stages Significant performance improvements available without netCDF-4 data model

Example Use of Groups Data for named geographical regions: group Europe { group France { dimensions: time = unlimited, stations = 47; variables: float temperature(time, stations); } group England{ dimensions: time = unlimited, stations = 61; variables: float temperature(time, stations); } group Germany { dimensions: time = unlimited, stations = 53; variables: float temperature(time, stations); } … dimensions: time = unlimited; variables: float average_temperature(time); }

Potential Uses for Groups Factoring out common information –Containers for data within regions regions –Model metadata Organizing a large number of variables Providing name spaces for multiple uses of same names for dims, vars, atts Modeling large hierarchies CF conventions issues –Ensembles –Shared structured grids –Other uses?

types: compound wind_vector_t { float eastward ; float northward ; } dimensions: lat = 18 ; lon = 36 ; pres = 15 ; time = 4 ; variables: wind_vector_t gwind(time, pres, lat, lon) ; wind:long_name = "geostrophic wind vector" ; wind:standard_name = "geostrophic_wind_vector" ; data: gwind = {1, -2.5}, {-1, 2}, {20, 10}, {1.5, 1.5},...; Example Use of Compound Type Vector quantity, such as wind:

Potential Uses for Compound Types Representing vector quantities like wind Modeling relational database tuples Representing objects with components Bundling multiple in situ observations together (profiles, soundings) Providing containers for related values of other user- defined types (strings, enums, …) Representing C structures portably CF Conventions issues: –should type definitions or names be in conventions? –should member names be part of convention? –should quantities associated with groups of compound standard names be represented by compound types?

Drawbacks with Compound Types Member fields have type and name, but are not netCDF variables Can’t directly assign attributes to compound type members –New proposed convention solves this problem, but requires new user-defined type for each attribute Compound type not as useful for Fortran developers, member values must be accessed individually

types: compound wind_vector_t { float eastward ; float northward ; } compound wv_units_t { string eastward ; string northward ; } dimensions: station = 5; variables: wind_vector_t wind(station) ; wv_units_t wind:units = {"m/s", "m/s"} ; wind_vector_t wind:_FillValue = {-9999, -9999} ; data: wind = {1, -2.5}, {-1, 2}, {20, 10},... ; Example Convention for Member Attributes

Example Use of Enums Named flag values for improving self-description: types: byte enum cloud_t { Clear = 0, Cumulonimbus = 1, Stratus = 2, Stratocumulus = 3, Cumulus = 4, Altostratus = 5, Nimbostratus = 6, Altocumulus = 7, Missing = 127 }; dimensions: time = unlimited; variables: cloud_t primary_cloud(time); cloud_t primary_cloud:_FillValue = Missing; data: primary_cloud = Clear, Stratus, Cumulus, Missing, …;

Potential Uses for Enumerations Alternative for using strings with flag_values and flag_meanings attributes for quantities such as soil_type, cloud_type, … Improving self-description while keeping data compact CF Conventions issues: –standardize on enum type definitions and enumeration symbols? –include enum symbol in standard name table? –standardize way to store descriptive string for each enumeration symbol?

Example Use of Variable-Length Type In situ observations: types: compound obs_t { float pressure ; float temperature ; float salinity ; } obs_t observations_t(*) ; // a variable number of observations compound sounding_t { float latitude ; float longitude ; int time; obs_t obs; } sounding_t soundings_t(*) ; // a variable number of soundings compound track_t { string id ; string description ; soundings_t soundings; } dimensions: tracks = 42; variables: track_t cruise(tracks);

Potential Uses for Variable-Length Type Ragged arrays In situ observational data (profiles, soundings, time series)

Notes on netCDF-4 Variable-Length Types Variable length value must be accessed all at once (e.g. whole row of a ragged array) Any base type may be used (including compound types and oter variable-length types) No associated shared dimension, unlike multiple unlimited dimensions Due to atomic access, using large base types may not be practical

Recommendations for Data Providers Continue using classic data model and format, if suitable CF Principle: Conventions should be developed only for known issues. Instead of trying to foresee the future, features are added as required Evaluate practicality and benefits of classic model with netCDF-4 format Test and explore uses of extended netCDF-4 data model features Help create new netCDF-4 conventions based on experience with what works

When is NetCDF-4 Data Model Needed? If non-classic primitive type is needed –64-bit integers for statistical applications –unsigned bytes, shorts, or ints for wider range –real strings instead of char arrays If making data self-descriptive requires new user-defined types –groups –compound –variable-length –enumerations –nested combinations of types

Three-Stage Chicken and Egg Problem Data providers –Won’t be first to use features not supported by applications or standardized by conventions Application developers –Won’t expend effort needed to support features not used by data providers and not standardized as published conventions Convention creators –Likely to wait until data providers identify needs for new conventions –Must consider issues applications developers will confront to support new conventions

Importance of CF Ray Pierrehumbert (University of Chicago) had this to say on realclimate.org:... I think one mustn't discount a breakthrough of a technological sort in AR4 though: The number of model runs exploring more of scenario and parameter space is vastly increased, and more importantly, it is available in a coherent archive to the full research community for the first time. The amount of good science that will be done with this archive in the next several years is likely to have a significant impact on our understanding of climate.