Unidata’s Common Data Model and the THREDDS Data Server John Caron Unidata/UCAR, Boulder CO Jan 6, 2006 ESIP Winter 2006
Outline Definitions Creating a Common Data (Access) Model from NetCDF, HDF5, OPeNDAP CDM Coordinate Systems, Data Types CDM implementation NetCDF Markup Language (NcML) The THREDDS Data Server
NetCDF-3 Machine and OS independent file format for “self-describing” scientific data C library (Fortran, C++, Perl, IDL, MatLab, Python, Ruby), Java library Efficient subsetting of multidimensional arrays. > 20,000 downloads last year
HDF5 Machine and OS independent file format for “self-describing” scientific data C library (Fortran, Java, PyTables) Evolution from HDF4, but different. HDF-EOS, HDF5-EOS, standard formats for EOSDIS, ASCI, NPOESS Parallel-IO, chunked storage, compression filters, many data types. Developed at NCSA, now independent
NetCDF-4 Project funded by NASA to create new version of netCDF using the HDF5 file format. “Extend and merge” netCDF and HDF5 – Widespread use and simplicity of netCDF –Generality and performance of HDF5
NetCDF-Java 2.2 (nj22) 100% Java library Prototype implementation of CDM File formats: –General: NetCDF, HDF5, OPeNDAP –Grids: GRIB1, GRIB2 –Radar: NEXRAD, NIDS, DORADE –Satellite: DMSP, GINI Access to THREDDS catalogs
OPeNDAP Client-server protocol for scientific data access C++ client and server, Java client and server libraries. Current version 2.0; NASA ESE standard Working on new 4.0 protocol spec
THREDDS Originally funded by NSDL –“discovery and use of scientific data” –Middleware between data providers and users –Dataset Inventory Catalogs (XML) Now part of Unidata core funding –Data Serving (pull)
What’s a Data Model? Its about scientific data: storing, accessing It’s an abstraction Equivalent to an abstract object model in OOP An Abstract Data Model describes data objects and what methods you can use on them
What’s a Data Model? An API is the interface to the Data Model for a specific programming language A file format is a way to persist the objects in the Data Model. A data access protocol plays the role of a file format. The Abstract Data Model removes the details of any particular API and the persistence format.
Creating a Common Data Access Model from NetCDF, HDF5, OPeNDAP
NetCDF-3 Data Model
OPeNDAP Data Model (DAP-2)
HDF5 Data Model
Common Data (Access) Model
Coordinate Systems and Scientific Data Types
Coordinate Systems Common Data Model Layers Data Access Scientific Datatypes Grid Point Radial Trajectory Swath Station
Coordinate Systems needed NetCDF, OPeNDAP, HDF data models do not have integrated coordinate systems – so georeferencing not part of API –Need conventions to specify (eg CF-1, COARDS, etc) Contrast GRIB, HDF-EOS, other specialized formats Must be done in a general way
Same underlying mathematics as VisAD, ASCII Coordinate Systems
Scientific DataTypes Based on datasets Unidata is familiar with –APIs are evolving How are data points connected? Intended to scale to large, multifile collections Intended to support “specialized queries” –Space, Time Corresponding “standard” NetCDF file conventions
Point Observation Data
PointObsDataset Methods // Collection of StructureData Collection getData( LatLonRect boundingBox, Date start, Date end);
Trajectory Data
TrajectoryObs Methods int getNumPoints(); StructureData getData(int point);
Station Data
StationObs Methods // return List of Station List getStations(); // return List of StructureData List getData( Station s, Date start, Date end);
Radial Data
Radial methods interface Radial { int getNumGates(); float getData(int gate); float getStartingGate(); float getGateSize(); float getElevation(); float getAzimuth(); double getTime(); }
Gridded Data
Grid methods interface GridCoordSys { CoordinateAxis getTaxis(); CoordinateAxis getXaxis(); CoordinateAxis getYaxis(); CoordinateAxis getZaxis(); Projection getProjection(); } Array getDataCube(Range time, Range z, Range y, Range x);
Image/Swath
Standardizing NetCDF Formats Grid: CF-1 Convention –Need improvements for regional models (WRF), GIS info Radar: “Radar Exchange Format” –With radar community (led by NCAR ATD) Point Observations –Unidata Observation Dataset Conventions
CDM implementations: NetCDF-4 and NetCDF-Java 2.2
34 NetCDF-4 C Library HDF5 Library netCDF-4 Library netCDF-3 Interface NetCDF-4 C Library
NetCDF-4 Status 4.0 Beta implements CDM access layer –complete, but waiting for HDF5 release 1.8 to finalize file format 4.1: adding Coordinate Systems 4.?: merge OPeNDAP access (pending funding)
NetCDF-Java 2.2 (nj22) Prototype implementation of CDM File formats: –General: NetCDF, HDF5, OPeNDAP –Grids: GRIB1, GRIB2 –Radar: NEXRAD, NIDS, DORADE –Satellite: DMSP, GINI Access to THREDDS catalogs Implements NcML
Coordinate Systems Common Data Model Data Access Scientific Datatypes Grid Point Radial Trajectory Swath Station
NetcdfDataset Application Scientific Datatypes NetCDF-Java version 2.2 architecture OPeNDAP THREDDS Catalog.xml NetCDF-3 HDF5 I/O service provider GRIB GINI NIDS NetcdfFile NetCDF-4 … Nexrad DSMP CoordSystem Builder Datatype Adapter ADDE
NetCDF-Java 2.2 Status Data Access layer: Beta quality –also waiting for HDF5 release to finish NetCDF-4, commit to API Coordinate Systems: early Beta –Finishing docs, runtime plugability Data Types: Alpha, still experimenting with APIs
NetCDF Markup Language (NcML) XML representation of netCDF metadata (like ncdump -h) Create new netCDF files (like ncgen) Modify existing datasets –Add/delete/rename –Create logical sections of existing variables. Create unions and aggregations of multiple existing datasets.
<netcdf xmlns=" location=“/data/nids/N0R_ _2147"> NcML example
NcML Aggregation Union Join Existing Join New Forecast Model Run ++= + =
NcML Aggregation Example
THREDDS Data Server Integrates data access with THREDDS catalogs and services Tomcat/Servlet, 100% Java, single war file Data input is netCDF Java 2.2 library Data output: –OPeNDAP –HTTP Server –OGC Web Coverage Server (gridded)
HTTP Tomcat Server THREDDS Data Server Datasets Catalog.xml hostname.edu THREDDS Server Application NetCDF-Java library IDD Data OPeNDAP HTTPServer WCS
HTTP Tomcat Server TDS as WCS Gateway Catalog.xml hostname.edu THREDDS Server Application NetCDF-Java library OPeNDAP HTTPServer WCS OPeNDAP Server anotherHost.org
HTTP Tomcat Server TDS and NcML Catalog.xml hostname.edu THREDDS Server Application Netcdf-Java OPeNDAP Datasets NcML WCS
TDS and NcML Server serves the dataset “wrapped” by the NcML –Client sees OPeNDAP or WCS, not NcML Can “fix” metadata problems Can augment metadata Use NcML aggregation on the TDS –replaces the old “Aggregation Server”
HTTP Tomcat Server TDS and Digital Libraries Datasets Catalog.xml otherhost.gov THREDDS Server Application NetCDF-Java library OPeNDAP HTTPServer WCS OPeNDAP Server hostname.edu OAI Harvester DL Records
TDS and Digital Libraries Framework to add metadata –By hand (collection level) –Automatic extraction from datasets Send records to existing DLs –No search Both collection and inventory level
Future Plans NetCDF-Java –Get API’s stable, docs, runtime plugability –NetCDF-4 (!) –HDF4, HDF-EOS, BUFR (need funding) NetCDF-4 C Library –DataTypes too immature to port –NcML? –Java on the server
TDS Future Plans Aggregation –Driven by IDD data (motherlode) Pluggable Authorization access control by dataset Performance Services –Coordinate System Verifier (eg CF-1) –Data access –Subset and get netcdf file
File Format #N File Format #2 File Format #1 CDM Visualization &Analysis Conclusion N + M instead of N * M things on your TODO List! NetCDF file OpenDAP Server WCS Service