Unidata & NetCDF BoF Scientific File Formats

Slides:



Advertisements
Similar presentations
1 NASA CEOP Status & Demo CEOS WGISS-25 Sanya, China February 27, 2008 Yonsook Enloe.
Advertisements

A Draft Standard for the CF Metadata Conventions Cheryl Craig and Russ Rew UCAR.
Recent Work in Progress
The Model Output Interoperability Experiment in the Gulf of Maine: A Success Story Made Possible By CF, NcML, NetCDF-Java and THREDDS Rich Signell (USGS,
ESCI/CMIP5 Tools - Jeudi 2 octobre CMIP5 Tools Earth System Grid-NetCDF4- CMOR2.0-Gridspec-Hyrax …
The Future of NetCDF Russ Rew UCAR Unidata Program Center Acknowledgments: John Caron, Ed Hartnett, NASA’s Earth Science Technology Office, National Science.
The NODC Glider Technical Specification Tom Ryan, Dan Seidov, John Relph (NODC) and James Bennett (University of Washington) U.S. IOOS National Glider.
Web based tools Ideas for presentation of operational meteorological data Ernst de Vreede KNMI EGOWS /6/2009 Ideas for presentation of operational.
Introduction to NetCDF Russ Rew, UCAR Unidata ICTP Advanced School on High Performance and Grid Computing 13 April 2011.
Unidata TDS Workshop THREDDS Data Server Overview October 2014.
Status of netCDF-3, netCDF-4, and CF Conventions Russ Rew Community Standards for Unstructured Grids Workshop, Boulder
Quick Unidata Overview NetCDF Workshop 25 October 2012 Russ Rew.
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
NetCDF-4 The Marriage of Two Data Formats Ed Hartnett, Unidata June, 2004.
The Digital Library for Earth System Education: A Community Resource
Unidata: A Community Built one User at a Time (over 15+ years) Dr. Mohan Ramamurthy Director Unidata/UCAR SC04 Workshop Building Communities for Effective.
Unidata TDS Workshop TDS Overview – Part I XX-XX October 2014.
U.S. Department of the Interior U.S. Geological Survey Management of Oceanographic time-series data at the Woods Hole Coastal and Marine Science Center.
Providing data services, tools and cyberinfrastructure leadership Unidata Policy Committee May 2011 Organizational Collaboration, Participation,
N-Wave Stakeholder Users Conference Wednesday, May 11, Marine St, Rm 123 Boulder, CO Linda Miller and Mike Schmidt Unidata Program Center (UPC)-Boulder,
Mid-Course Review: NetCDF in the Current Proposal Period Russ Rew
The IDV: Unidata’s Integrated Data Viewer Mike Voss Department of Meteorology SJSU – Oct 11, 2006.
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
Integrating netCDF and OPeNDAP (The DrNO Project) Dr. Dennis Heimbigner Unidata Go-ESSP Workshop Seattle, WA, Sept
Unidata TDS Workshop THREDDS Data Server Overview
Quick Unidata Overview NetCDF Workshop 2 August 2009 Russ Rew Data Services Group.
NetCDF Data Model Issues Russ Rew, UCAR Unidata NetCDF 2010 Workshop
The CF Conventions: Options for Sustained Support Involving Unidata Russ Rew Unidata Policy Committee May 12, 2008.
The HDF Group Data Interoperability The HDF Group Staff Sep , 2010HDF/HDF-EOS Workshop XIV1.
The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.
NetCDF-4: Software Implementing an Enhanced Data Model for the Geosciences Russ Rew, Ed Hartnett, and John Caron UCAR Unidata Program, Boulder
NetCDF and Scientific Data Durability Russ Rew, UCAR Unidata ESIP Federation Summer Meeting
Data File Formats: netCDF by Tom Whittaker University of Wisconsin-Madison SSEC/CIMSS 2009 MUG Meeting June, 2009.
Advances in the NetCDF Data Model, Format, and Software Russ Rew Coauthors: John Caron, Ed Hartnett, Dennis Heimbigner UCAR Unidata December 2010.
11/8/2007HDF and HDF-EOS Workshop XI, Landover, MD1 Software to access HDF5 Datasets via OPeNDAP MuQun Yang, Hyo-Kyung Lee The HDF Group.
Convergence And Trust in Earth and Space Science Data Systems Ted Habermann, NOAA National Geophysical Data Center Documentation: It’s not just discovery...
End-to-End Data Services A Few Personal Thoughts Unidata Staff Meeting 2 September 2009.
SPDF Science Advisory Group - September 29-30, 2005 Page 12/24/2016 9:09:48 PM Services of the Space Physics Data Facility (SPDF) / Sun-Earth Connection.
A Draft Standard for the CF Metadata Conventions Russ Rew, Unidata GO-ESSP 2009 Workshop
Unidata Technologies Relevant to GO-ESSP: An Update Russ Rew
CF 2.0 Coming Soon? (Climate and Forecast Conventions for netCDF) Ethan Davis ESO Developing Standards - ESIP Summer Mtg 14 July 2015.
Developing Conventions for netCDF-4 Russ Rew, UCAR Unidata June 11, 2007 GO-ESSP.
1 2.5 DISTRIBUTED DATA INTEGRATION WTF-CEOP (WGISS Test Facility for CEOP) May 2007 Yonsook Enloe (NASA/SGT) Chris Lynnes (NASA)
Development of a CF Conventions API Russ Rew GO-ESSP Workshop, LLNL
NetCDF: Data Model, Programming Interfaces, Conventions and Format Adapted from Presentations by Russ Rew Unidata Program Center University Corporation.
Interoperability Day Introduction Standards-based Web Services Interfaces to Existing Atmospheric/Oceanographic Data Systems Ben Domenico Unidata Program.
Update on Unidata Technologies for Data Access Russ Rew
Unidata Infrastructure for Data Services Russ Rew GO-ESSP Workshop, LLNL
NetCDF Data Model Details Russ Rew, UCAR Unidata NetCDF 2009 Workshop
Chapter 25 – Configuration Management 1Chapter 25 Configuration management.
Can Data be Organized for Science and Reuse?
Data Browsing/Mining/Metadata
Advancing netCDF-CF for the Geoscience Community
James Gallagher OPeNDAP
GSICS Collaboration Servers a Vehicle for International Collaboration Status 2011 Peter Miu EUMETSAT.
Moving from HDF4 to HDF5/netCDF-4
SRNWP Interoperability Workshop
AWRA – Open Water Data Initiative – Lightning Talk
NetCDF 3.6: What’s New Russ Rew
Plans for an Enhanced NetCDF-4 Interface to HDF5 Data
Summit 2017 Breakout Group 2: Data Management (DM)
.Net A brief introduction to
FDA Objectives and Implementation Planning
Access HDF5 Datasets via OPeNDAP’s Data Access Protocol (DAP)
CEE 6440 GIS in Water Resources Fall 2004 Term Paper Presentation
Remote Data Access Update
Status for Endeavor 6: Improved Scientific Data Access Infrastructure
Brokering as a Core Element of EarthCube’s Cyberinfrastructure
OPeNDAP/Hyrax Interfaces
Robert Dattore and Steven Worley
Presentation transcript:

Unidata & NetCDF BoF Scientific File Formats RDA Fourth Plenary Meeting 23 September 2014 Amsterdam, NL Dr. Mohan Ramamurthy Unidata Program Center UCAR Community Programs

Unidata: A Geosciences Data Facility Established in 1984 Funded primarily by NSF on a five-year proposal cycle The collateral benefits have been far and wide

Core Activities Acquire and distribute real-time meteorological data for education, research, and outreach Develop software for accessing, managing, analyzing, visualizing, and effectively using geosciences data Provide comprehensive training and support to users Facilitate advancement of standards and conventions Advocate on behalf of the community and negotiate data & software agreements Assess and respond to community needs, engage the stakeholders, and promote sharing of data, tools, and ideas Provide funds to universities to enable/enhance their participation

A Snapshot of Products & Services Software: Data Distribution: LDM Remote Data Access: THREDDS, ADDE, and RAMADDA Data Management: netCDF, UDUNITS, and Rosetta Analysis and Visualization: GEMPAK, McIDAS, IDV, and AWIPS II GIS support via TDS (WCS, WMS) and KML and Shapefiles Data: Over 30 data streams provided in real-time Data collection, cataloging, and distribution Both push and pull technologies are used User Support & Training: Direct email support Community mailing lists Annual Training Workshops, Triennial Users Workshops, and Regional Workshops as needed Community: Equipment Awards to universities; Seminars; Information Commons; Advocacy; Data standards

Real-time Data Distribution Model Satellite Radar About 30 different streams of real-time weather data from diverse sources are provided to the community, which collectively move over 30 Terabytes of data per week.

Remote Data Access OPeNDAP ADDE HTTP FTP WCS and WMS Complements the IDD/LDM push data delivery system Made available via THREDDS Data Server, RAMADDA, and ADDE data servers that support several protocols: OPeNDAP ADDE HTTP FTP WCS and WMS

NetCDF: Experiences and Lessons Learned Slides courtesy of Russ Rew, Unidata

NetCDF: not just a format A standard format for platform-independent data (NASA ESDS-RFC-011) CF-netCDF has been adopted as a formal OGC binary encoding standard But netCDF is also A data model for multidimensional and structured scientific data A set of application programming interfaces (C, Java, Fortran, C++, …) for data access A reference implementation for the APIs David Arctur of OGC suggested CF-netCDF as an OGC standard in May 2009 (at an IOOS meeting). Ben Domenico presented this at the June 2009 OGC meeting, with general agreement that it Would be approprate. Use of netCDF reference implementation software in many analysis and visualization packages is At least as important as the standard format for interoperability and preservation. The software interfaces insulate programs from format changes netCDF is also a foundation for other standards, e.g. the CF Conventions

NetCDF History April 11, 2011 NetCDF for developers 4 8

Classic netCDF data model File location: Filename create( ), open( ), … Variables and attributes have one of six primitive data types. DataType PrimitiveType char byte short int float double Dimension name: String length: int isUnlimited( ) Attribute name: String type: DataType values: 1D array Variable name: String shape: Dimension[ ] type: DataType array: read( ), … A file has variables, dimensions, and attributes. Variables also have attributes. Variables may share dimensions, indicating a common grid. One dimension may be of unlimited length.

Enhanced netCDF data model, for netCDF-4 A file has a top-level unnamed group. Each group may contain one or more named subgroups, user-defined types, variables, dimensions, and attributes. Variables also have attributes. Variables may share dimensions, indicating a common grid. One or more dimensions may be of unlimited length. Dimension name: String length: int isUnlimited( ) Attribute type: DataType values: 1D array Variable shape: Dimension[ ] type: DataType array: read( ), … Group File location: Filename create( ), open( ), … Variables and attributes have one of twelve primitive data types or one of four user-defined types. DataType PrimitiveType char byte short int int64 float double unsigned byte unsigned short unsigned int unsigned int64 string UserDefinedType typename: String Compound VariableLength Enum Opaque Compatible evolution of data model, by extensions, permits compatible evolution of format and APIs.

Infrastructure for sharing scientific data Applications depend on lower layers Sharing requires agreements formats protocols conventions Data need metadata Remember this?

NetCDF infrastructure Provides format and library for netCDF data model Endorsed by several standards bodies Active conventions communities Several servers and protocols for remote data access Many open source and commercial utilities and applications netCDF plays a part in all levels of the infrastructure April 14, 2011 April 12, 2011 Intro to netCDF 6 12

How do formats change? Simple formats don’t change, they’re defined once and frozen forever ASCII GRIB 1 GRIB 2 Some formats change infrequently and usually incompatibly Complex formats (and their software) may evolve in lots of small increments 4.0.1 3.6.3 netCDF 2.4.3 1.0

Compatibility commitment For scientific data, preserving access to data for future generations should be sacrosanct Strong commitment is needed to ensure practical access to old data by new programs Careful library evolution can ensure data and API compatibility

Declaration of Compatibility For future access to archives, netCDF development will continue to ensure the compatibility of: Data access: netCDF software will provide both read and write access to all earlier forms of netCDF data. Programming interfaces: C and Fortran programs using documented netCDF APIs from previous versions will continue to work after recompiling and relinking (if needed). Future versions: netCDF will continue to support both data access compatibility and API compatibility in future releases. Declaration of Compatibility Along with this commitment was an admission that keeping netCDF releases backward compatible might Seem difficult, but it would really be easy compared to the amount of effort required to use Microsoft PowerPoint drawing tools to create this slide’s border.

Aspects of compatibility Costs Effort to support older interfaces and formats Comprehensive compatibility testing with every software release Benefits Data in archives don’t have to change Client program sources don’t have to change Software can access archived data without being aware of format version Implemented compatibly by evolving data model Add or grow abstractions, instead of replacing them Ensure previous data model is included in enhanced data model software and formats can evolve without affecting access to archived data

NetCDF Metrics NetCDF-C downloads (last 12 months): 101,703 (116 “countries” = Top-level Internet Domains) NetCDF-java downloads (last 12 months): 14,716 Defects per 1000 lines of code (Coverity estimate): 0.36 Google hits in April 2014 for "netcdf-3": 828,000 Google hits in April 2014 for "netcdf-4": 759,000 Google scholar entries in April 2014 for "netcdf": 11,000 Free software packages that can access netCDF data: 83 Commercial packages that can access netCDF data: 23 Number of license plates with NETCDF: 1

Impact on Climate Science

NetCDF: Lessons Learned Giving developers freedom to explore a solution space can lead to great software. Reimplementing someone else's weak implementation of a good idea is a good strategy for developing useful software. Porting software to a large variety of platforms leads to higher quality software, because some platforms reveal bugs that others don’t. Employing a test driven development and agile development process Comprehensive testing and tools help, but a large user community will find innovative ways to use the software to discover bugs that would be difficult to anticipate. A serious bug can go undetected through many releases if it is rare and hard to detect (the "dreaded nofill bug"). David Arctur of OGC suggested CF-netCDF as an OGC standard in May 2009 (at an IOOS meeting). Ben Domenico presented this at the June 2009 OGC meeting, with general agreement that it Would be approprate. Use of netCDF reference implementation software in many analysis and visualization packages is At least as important as the standard format for interoperability and preservation. The software interfaces insulate programs from format changes netCDF is also a foundation for other standards, e.g. the CF Conventions

Summary NetCDF has become a de facto standard in the atmospheric sciences and oceanography, providing a key component of cyberinfrastructure for the geosciences. NetCDF has made it possible for data providers, scientists, software developers, and data archives to make use of standard interfaces and formats to share and reuse data and benefit from the resulting interoperability. It has a large, active, and diverse collection of users, willing to contribute to its continued use. Over 100 commercial and free applications can access netCDF data. Many projects and archives, including CMIP-5 and World Data Centers, are storing large volumes of data in netCDF form, for future use . Think of the children!

Concluding Remarks Format obsolescence need not be an issue for data durability and preservation Evolve data models by extension, not by incompatible modification Preserve previous programming interfaces Support previous format variants transparently Avoid gratuitous invention of new formats Data preservation and stewardship requires much more than dealing with format standards and their evolution Think of the children!

Questions? Contact: mohan@ucar.edu http://www.unidata.ucar.edu/ Unidata is funded primarily by the National Science Foundation (Grant NSF-1344155)