What is NetCDF ? And what are its plans for world domination?

Slides:



Advertisements
Similar presentations
1 NASA CEOP Status & Demo CEOS WGISS-25 Sanya, China February 27, 2008 Yonsook Enloe.
Advertisements

Complex Scientific Analytics in Earth Science at Extreme Scale John Caron University Corporation for Atmospheric Research Boulder, CO Oct 6, 2010.
The HDF Group ESIP Summer Meeting Easy access HDF files via Hyrax Kent Yang The HDF Group 1 July 8 – 11, 2014.
Reading HDF family of formats via NetCDF-Java / CDM
Recent Work in Progress
The Model Output Interoperability Experiment in the Gulf of Maine: A Success Story Made Possible By CF, NcML, NetCDF-Java and THREDDS Rich Signell (USGS,
A Common Data Model In the Middle Tier Enabling Data Access in Workflows … HDF/HDF-EOS Workshop XIV September 29, 2010 Doug Lindholm Laboratory for Atmospheric.
Streaming NetCDF John Caron July What does NetCDF do for you? Data Storage: machine-, OS-, compiler-independent Standard API (Application Programming.
® OGC Web Services Initiative, Phase 9 (OWS-9): Innovations Thread - OPeNDAP James Gallagher and Nathan Potter, OPeNDAP © 2012 Open Geospatial Consortium.
THREDDS, CDM, OPeNDAP, netCDF and Related Conventions John Caron Unidata/UCAR Sep 2007.
7 +/- 2 Maybe Good Ideas John Caron June (1) NetCDF-Java (aka CDM) has lots of functionality, but only available in Java – NcML Aggregation – Access.
The Future of NetCDF Russ Rew UCAR Unidata Program Center Acknowledgments: John Caron, Ed Hartnett, NASA’s Earth Science Technology Office, National Science.
NextGen Network-Enabled Weather (NNEW) Concepts Aaron Braeckel.
Unidata TDS Workshop THREDDS Data Server Overview October 2014.
Status of netCDF-3, netCDF-4, and CF Conventions Russ Rew Community Standards for Unstructured Grids Workshop, Boulder
John Caron Unidata October 2012
Quick Unidata Overview NetCDF Workshop 25 October 2012 Russ Rew.
Implementation of Model Data Interoperability for IOOS: Successes and Lessons Learned Rich Signell USGS Woods Hole, MA / NOAA Silver Spring USA Model Data.
NetCDF for Developers and Data Providers Russ Rew, UCAR Unidata ICTP Advanced School on High Performance and Grid Computing 14 April 2011.
Unidata’s TDS Workshop TDS Overview – Part II October 2012.
HDF5 A new file format & software for high performance scientific data management.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Unidata TDS Workshop TDS Overview – Part I XX-XX October 2014.
Unidata’s Common Data Model John Caron Unidata/UCAR Nov 2006.
THREDDS Data Server Ethan Davis GEOSS Climate Workshop 23 September 2011.
Coverages and the DAP2 Data Model James Gallagher.
Weathertop Consulting, LLC Wednesday, January 14, 2009 IIPS 11A.2 1 A General Purpose System for Server-side Analysis of Earth Science Data Roland Schweitzer.
NetCDF-Java Overview John Caron Oct 29, Contents Data Models / Shared Dimensions Coordinate Systems Feature Types NetCDF Markup Language (NcML)
NcML Aggregation vs Feature Collections. NcML functionality 1.Modify the objects found in CDM files – Especially Attributes – Don’t have to rewrite the.
Unidata’s TDS Workshop TDS Overview – Part II Unidata July 2011.
Mid-Course Review: NetCDF in the Current Proposal Period Russ Rew
Accomplishments and Remaining Challenges: THREDDS Data Server and Common Data Model Ethan Davis Unidata Policy Committee Meeting May 2011.
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
THREDDS Data Server Unidata’s Common Data Model Background / Summary John Caron Unidata/UCAR Mar 2007.
Integrating netCDF and OPeNDAP (The DrNO Project) Dr. Dennis Heimbigner Unidata Go-ESSP Workshop Seattle, WA, Sept
DAP4 James Gallagher & Ethan Davis OPeNDAP and Unidata.
IOOS Modeling Testbed Cyberinfrastructure Rich Signell, USGS, Woods Hole, MA IOOS-RA-Briefing, Feb 14, 2012.
Unidata TDS Workshop THREDDS Data Server Overview
Accessing Remote Datasets using the DAP protocol through the netCDF interface. Dr. Dennis Heimbigner Unidata netCDF Workshop August 3-4, 2009.
Recent developments with the THREDDS Data Server (TDS) and related Tools: covering TDS, NCML, WCS, forecast aggregation and not including stuff covered.
NetCDF Data Model Issues Russ Rew, UCAR Unidata NetCDF 2010 Workshop
Unidata’s Common Data Model and the THREDDS Data Server John Caron Unidata/UCAR, Boulder CO Jan 6, 2006 ESIP Winter 2006.
IOOS Data Services with the THREDDS Data Server Rich Signell USGS, Woods Hole IOOS DMAC Workshop Silver Spring Sep 10, 2013 Rich Signell USGS, Woods Hole.
Unidata’s TDS Workshop TDS Overview – Part I July 2011.
The HDF Group Data Interoperability The HDF Group Staff Sep , 2010HDF/HDF-EOS Workshop XIV1.
Unidata’s Common Data Model and NetCDF Java Library API Overview John Caron Unidata/UCAR Nov 2008.
The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.
Unidata's Involvement in Developing and Supporting Climate Science Infrastructure Russ Rew UCAR Unidata April 2010.
NetCDF-4: Software Implementing an Enhanced Data Model for the Geosciences Russ Rew, Ed Hartnett, and John Caron UCAR Unidata Program, Boulder
NetCDF and Scientific Data Durability Russ Rew, UCAR Unidata ESIP Federation Summer Meeting
Data File Formats: netCDF by Tom Whittaker University of Wisconsin-Madison SSEC/CIMSS 2009 MUG Meeting June, 2009.
Advances in the NetCDF Data Model, Format, and Software Russ Rew Coauthors: John Caron, Ed Hartnett, Dennis Heimbigner UCAR Unidata December 2010.
GIS for Atmospheric Sciences and Hydrology By David R. Maidment University of Texas at Austin National Center for Atmospheric Research, 6 July 2005.
Weathertop Consulting, LLC Server-side OPeNDAP Analysis – Concrete steps toward a generalized framework via a reference implementation using F-TDS Roland.
Common Data Model Scientific Feature Types John Caron UCAR/Unidata July 8, 2008.
Unidata Technologies Relevant to GO-ESSP: An Update Russ Rew
Rich Signell Roland Viger Curtis Price USGS Community for Data Integration Feb 15, 2012.
NetCDF: Data Model, Programming Interfaces, Conventions and Format Adapted from Presentations by Russ Rew Unidata Program Center University Corporation.
Update on Unidata Technologies for Data Access Russ Rew
Unidata Infrastructure for Data Services Russ Rew GO-ESSP Workshop, LLNL
NetCDF-Java version 2.2 Common Data Model John Caron Unidata/UCAR Dec 10, 2004.
The CUAHSI Hydrologic Information System Spatial Data Publication Platform David Tarboton, Jeff Horsburgh, David Maidment, Dan Ames, Jon Goodall, Richard.
Data Are from Mars, Tools Are from Venus
Moving from HDF4 to HDF5/netCDF-4
Efficiently serving HDF5 via OPeNDAP
Recent Work in Progress
HDF-EOS Workshop XXI / The 2018 ESIP Summer Meeting
NCL variable based on a netCDF variable model
OPeNDAP/Hyrax Interfaces
Adapting an existing web server to S3
Presentation transcript:

What is NetCDF ? And what are its plans for world domination? John Caron Unidata August 2009

NetCDF is…. A file format A library An Application Programmer’s Interface (API) A data model A dessert topping A floor wax

In the beginning Application IDL Ruby Matlab C++ Fortran Perl C Python API Ruby API Matlab API C++ API Fortran API Perl API C Python API netCDF C library / API netCDF file

Things got more complicated Application Java Python API Ruby API Matlab API C++ API Fortran API Perl API C netCDF C library / API netCDF Java library / API netCDF-4 file OPeNDAP data netCDF-3 file

Scientific Feature Types But wait, there’s more! Scientific Feature Types Application Datatype Adapter NetCDF-Java/ CDM architecture NetcdfDataset CoordSystem Builder OPeNDAP NetCDF-3 HDF4 I/O service provider GRIB GINI NIDS NetcdfFile NetCDF-4 … Nexrad DMSP THREDDS Catalog.xml NcML

Netcdf-Java 4.0 File Formats General: NetCDF-3, NetCDF-4, HDF5, HDF4, OPeNDAP Gridded: GRIB-1, GRIB-2, GEMPAK, McIDAS, UAMIV CAMx Point: BUFR, GEMPAK Radar: NEXRAD 2&3, DORADE, CINRAD, UF Satellite: DMSP, GINI, McIDAS, FYSAT Misc: GTOPO, NLDN, USPLN, etc Diversity of formats:

What is netCDF ? Application Java Python Ruby Matlab C++ Fortran Perl API Ruby API Matlab API C++ API Fortran API Perl API C netCDF C library netCDF Java library netCDF-4 file OPeNDAP NetCDF-3 HDF4 I/O service provider GINI NetcdfFile NetCDF-4 … Nexrad DMSP GRIB NIDS netCDF-3 file OPeNDAP data

NetCDF is a… API File format Software library Store data model objects Persistence layer NetCDF-3, netCDF-4 Software library Implements the API C, Java, others API An API is the interface to the Data Model for a specific programming language An Abstract Data Model describes data objects and what methods you can use on them

NetCDF is a… File format Stores the objects in the data model Persistence layer NetCDF-3, netCDF-4 9

What you should know about Storage Formats Locality, locality, locality I/O cost is measured in # disk accesses Entire block is read at once Sequential access is 100x faster than random Many factors that affect this Local disk, NFS mounted (shared), server RAID The disk is caching sectors The File System / OS is caching pages Library may be caching data Applications can try to optimize file layout write, read, common access patterns Only matters for large I/O-bound apps

NetCDF-3 file format Header Variable 1 float var1(z, y, x) Non-record Variable Variable 1 float var1(z, y, x) Row-major order Variable 2 Variable 3 … Record Variables float rvar2(0, z, y, x) float rvar3(0, z, y, x) float rvar1(0, z, y, x) Record 0 Record 1 float rvar2(1, z, y, x) float rvar3(1, z, y, x) float rvar1(1, z, y, x) unlimited…

NetCDF-4 file format Built on HDF-5 Much more complicated than netCDF-3 Storage efficiency Compression : can optimize chunking for common I/O pattern Compound types

Row vs Column storage Netcdf-3 is a column store All data for one variable is stored together Traditional RDBMS is a row store All fields for one row in a table are stored together Netcdf-4 allows both row and column store Row: compound type Column: regular variable Recent RDBMS research focusing on possible advantages with column oriented storage

NetCDF is a… Software library Implements the API C, Java, third-party 14

NetCDF Libraries NetCDF C library – reference implementation Read/write netCDF-3 and netCDF-4 Read OPeNDAP (alpha) NetCDF Java Library – exploratory 100% Java == portable Read netCDF-3, netCDF-4, OPeNDAP, many others Only writes netCDF-3 (considering a JNI interface to C library for writing netCDF-4) Thread safe, good for servers, used by the THREDDS Data Server (TDS)

What you should know about Multicore CPUs Commodity CPU’s wont get faster – too hot! Lifecycle cost dominated by electricity $$$ Moores Law -> multiple CPUs on chip Multithreaded programs can take advantage of new multicore computer architecture Good for servers, harder for client programs to take advantage of this New languages (eventually)

NetCDF is a… API An API is the interface to the Data Model for a specific programming language An Abstract Data Model describes data objects and what methods you can use on them 17

NetCDF APIs Application Programmers Interface Its what you have to deal with Changing this breaks your code Lots of language bindings, same data model An API is the interface to the Data Model for a specific programming language

NetCDF-3 data model Multidimensional arrays of primitive values byte, char, short, int, float, double Key/value attributes Shared dimensions Fortran77

NetCDF-4 Data Model

NetCDF, HDF5, OPeNDAP Data Models Shared dimensions NetCDF (classic) NetCDF (extended) NetCDF (classic) OPeNDAP HDF5

Gridded Data Cartesian coordinates Data is 2,3,4D All dimensions have 1D coordinate variables (separable) float gridData(t,z,y,x); float t(t); float y(y); float x(x); float z(z); netCDF: coordinate variables OPeNDAP: grid map variables HDF: dimension scales

Swath track and cross-track not separate time dimension two dimensional track and cross-track not separate time dimension aka curvilinear coordinates float swathData( track, xtrack) float lat(track, xtrack) float lon(track, xtrack) float alt(track, xtrack) float time(track)

Point Observation Data Set of measurements at the same point in space and time = obs Collection of obs = dataset Sample dimension not connected float obs1(sample); float obs2(sample); float lat(sample); float lon(sample); float z(sample); float time(sample);

Shared Dimensions Status netCDF Shared dimension plus conventions is general solution for coordinates :coordinates = “lat lon alt time” OPeNDAP No shared dimensions in current data model Shared dimensions will be added to DAP-4 HDF5 HDF-EOS added shared dimensions in metadata NetCDF-4 adds a workaround NetCDF-4 not a subset of HDF-5 NetCDF-4 does not (yet) read all HDF-5 objects HDF-5 not a subset of NetCDF-4

Back to API / Data Models Scientific Feature Types Application Datatype Adapter NetCDF-Java/ CDM architecture NetcdfDataset CoordSystem Builder Data Access OPeNDAP NetCDF-3 HDF4 I/O service provider GRIB GINI NIDS NetcdfFile NetCDF-4 … Nexrad DMSP THREDDS Catalog.xml NcML

NetCDF “Index Space” Data Access: OPeNDAP URL: http://motherlode.ucar.edu:8080/thredds/dodsC/ NAM_CONUS_80km_20081028_1200.grib1.ascii? Precipitable_water[5][5:1:30][0:1:77] “Coordinate Space” Data Access: NCSS URL: http://motherlode.ucar.edu:8080/thredds/ncss/grid/ NAM_CONUS_80km_20081028_1200.grib1? var=Precipitable_water& time=2008-10-28T12:00:00Z& north=40&south=22&west=-110&east=-80

Scientific Feature Types Coordinate Space Access Scientific Feature Types Application Datatype Adapter NcML NetCDF-Java/ CDM architecture NetcdfDataset CoordSystem Builder Index Space Access OPeNDAP NetCDF-3 HDF4 I/O service provider GRIB GINI NIDS NetcdfFile NetCDF-4 … Nexrad DMSP NcML 28

Coordinate System UML

Netcdf-Java Library parses these Conventions CF Conventions (preferred) COARDS, NCAR-CSM, ATD-Radar, Zebra, GEIF, IRIDL, NUWG, AWIPS, WRF, M3IO, IFPS, ADAS/ARPS, MADIS, Epic, RAF-Nimbus, NSSL National Reflectivity Mosaic, FslWindProfiler, Modis Satellite, Avhrr Satellite, Cosmic, …. Write your own CoordSysBuilder Java class

Projections (CF) albers_conical_equal_area lambert_azimuthal_equal_area lambert_conformal_conic mcidas_area mercator orthographic rotated_pole stereographic (including polar) transverse_mercator UTM (ellipsoidal) vertical_perspective

Vertical Transforms (CF) atmosphere_sigma atmosphere_hybrid_sigma_pressure atmosphere_hybrid_height ocean_s ocean_sigma existing3DField

Add your own Transform Pluggable framework Add at runtime CoordTransBuilder.registerTransform() Implement CoordTransBuilderIF

Coordinate Systems Summary How? Write your own Java code, plug into CDM Write your files using CF Conventions Why? Standard visualization, debugging, and data manipulation tools Standard servers to make your data remotely accessible

Payoff

NetCDF-Java library Used as a component in other software (partial) Integrated Data Viewer, ToolsUI (Unidata) Panoply (NASA) ncBrowse (EPIC/NOAA) Java NEXRAD Viewer (NCDC/NOAA) MyWorld GIS (Northwestern) EDC for ArcGIS, ERRDAP (SFSC/NOAA) Live Access Server (PMEL/NOAA) ncWMS (Reading) Matlab plug-in (USGS)

THREDDS Data Server catalog.xml configCatalog.xml THREDDS Server Servlet Container catalog.xml Remote Access Client THREDDS Server WCS OPeNDAP HTTPServer WMS NetCDF-Java library configCatalog.xml Datasets IDD Data motherlode.ucar.edu

THREDDS Data Server (TDS) Web server for scientific data From Unidata Provides remote data access OPeNDAP Open Geospatial Consortium (OGC) WMS and WCS HTTP file transfer Experimental data access protocols.

OGC Web Map Service Jon Blower’s (Reading, UK) ncWMS integrated with TDS Coordinate Space subsetting Produces JPEG output Fast generation of images Reproject images into large number of coordinate systems

WMS Clients NASA World Wind Cadcorp SIS Google Earth 3rd-party clients can’t use the custom WMS extensions

Web Coverage Service Coordinate Space subsetting Return formats GeoTIFF floating point, grayscale NetCDF/CF No reprojections, resamplings Restricted to CDM files that have Grid coordinate system evenly spaced x,y

NetCDF Markup Language (NcML) XML representation of netCDF metadata (like ncdump -h) Create new netCDF files (like ncgen) Modify (“fix”) existing datasets without rewriting them Create virtual datasets as aggregations of multiple existing files. Integrated with the TDS Another component of the netCDF java library is support for the netCDF Markup Language. NcML is an XML representation of netCDF metadata. It can be used to create new netCDF files as well as modify an existing dataset and even aggregate a set of existing datasets.

NcML Modify and serve through TDS <dataset name=“Polar Orbiter Data" urlPath =“idd/sat/PolarData“ > <netcdf location="/data/sat/P02393.hdf”> <attribute name="Conventions" value="CF-1.4"/> <variable name="Reflectivity" orgName=“R34768”> <attribute name="units" value=“dBZ" /> <attribute name=“coordinates" value=“time lat lon" /> </variable> </netcdf> </dataset>

TDS / NcML Modify all files in datasetScan <datasetScan name=“Polar Orbiter" path="/data/sat/" location= "/data/hdf/polar/"> <netcdf> <attribute name="Conventions" value="CF-1.4"/> <variable name="Reflectivity" orgName=“R34768”> <attribute name="units" value=“dBZ" /> <attribute name=“coordinates" value=“time lat lon" /> </variable> </netcdf> </datasetScan>

TDS / NcML aggregation <dataset name="WEST-CONUS_4km Aggregation" urlPath="satellite/3.9/WEST-CONUS_4km"> <netcdf> <aggregation dimName="time" type="joinExisting"> <scan location="/data/satellite/WEST-CONUS_4km/" suffix=".gini" /> </aggregation> </netcdf> </dataset>

Conclusions NetCDF is a floor wax and a dessert topping A data model is a good way to see the forest through the trees We now have a useable merger of netCDF, OPeNDAP, HDF5 technologies Add Coordinate information to allow “coordinate space subsetting” NcML/TDS can help But the right way to do this is….

Conclusion I will use CF Conventions