THREDDS Data Server Unidata’s Common Data Model Background / Summary John Caron Unidata/UCAR Mar 2007
HTTP Tomcat Server THREDDS Data Server Datasets catalog.xml motherlode.ucar.edu THREDDS Server Application NetCDF-Java library IDD Data HTTPServer NetcdfSubset WCS OPeNDAP
THREDDS Catalogs XML over HTTP Hierarchical listing of online resources (datasets) Container for arbitrary search metadata –Standard set maps to DC, GCMD, ADN –Unidata/CDP Metadata can be inherited Design goal: Make it easy for data providers TDS uses for configuration –Client view vs. server view Data Access URLS –“Crossing the protocol boundary”
catalog.xml
Motherlode catalog example
THREDDS WCS 1.0 Server Each (gridded) Dataset is WCS Each Grid is a Coverage Return formats –GeoTIFF: floating point, greyscale –NetCDF / CF-1.0 (same as NetcdfSubset Service) No reprojections, resampling GALEON 2 –upgrade to WCS 1.1 –Try returning point datasets
THREDDS OPeNDAP Server Current version 2.0; NASA ESE standard –Working on new 4.0 protocol spec Based on Java-OPeNDAP library –shared development by Unidata/opendap.org Any CDM dataset can be served Server4 (Hyrax): –latest version of opendap.org C++ library –uses THREDDS catalog generation code –THREDDS Catalogs replace dods_dir
HTTP Tomcat Server Common Data Model catalog.xml hostname.edu THREDDS Server Application NetCDF-Java library IDD Data HTTPServer NetcdfSubset WCS OPeNDAP Then a miracle happens Datasets
NetcdfDataset Application Scientific Datatypes NetCDF-Java version 2.2 architecture OPeNDAP THREDDS Catalog.xml NetCDF-3 HDF5 I/O service provider GRIB GINI NIDS NetcdfFile NetCDF-4 … Nexrad DMSP CoordSystem Builder Datatype Adapter ADDE NcML
I/O Service Provider Implementations General: NetCDF, HDF5, OPeNDAP Gridded: GRIB-1, GRIB-2 Radar: NEXRAD level 2 and 3, DORADE, Chinese NEXRAD Point: BUFR, ASCII Satellite: DMSP, GINI, McIDAS AREA In development / tentative –NOAA CLASS legacy files –Barrowdale DataBlade
Coordinate Systems Common Data Model Layers Data Access Scientific Datatypes Grid Point Radial Trajectory Swath StationProfile
NetCDF-4 and Common Data Model (Data Access Layer)
NetCDF-4 C library 4.0 Beta implements CDM access layer –complete, but waiting for HDF5 release 1.8 to finalize file format (Maybe this month, 1.5 years late!) –Persistence format for complete CDM 4.1: adding Coordinate Systems –Optional layer, focus on CF-1 (libcf) 4.?: merge OPeNDAP access (pending funding)
Coordinate Systems UML
NcML: NetCDF Markup Language XML representation of netCDF metadata Core: netCDF data access model Coordinate System: general and georeferencing coordinate system Dataset: redefine, aggregate, subset Luca Cinquini (NCAR/SCD/ESG), John Caron, Ethan Davis, Bob Drach (LLNL), Stefano Nativi (Florence), Russ Rew
NcML NcML Coordinate Systems further developed into NcML-G by Stefano et al. NcML Core and Dataset combined into single schema to allow dataset modification Aggregation: –Union –Syntactic join on (existing or new) outer dimension –Semantic aggregation of (runtime, forecast time) = Forecast Model Run Collection
<netcdf xmlns=" location=“/data/nids/N0R_ _2147"> NcML example
TDS / NcML example
TDS / NcML aggregation
Datasets vs. Files Must hide actual location of data files on your server Would like to hide actual file format Must encapsulate collections of files into logical datasets –Homogenous metadata –Hide arbitrary storage decisions –Minimize number of datasets
Forecast Model Run Collection (FMRC)
Data Model: Sampled Functions Our phenomena are continuous functions: F: Domain → Range where Domain = subset of space-time (3 spatial, time) ( Ε 4 ) Range = R n (product set of real numbers) Our measurements are sampled functions Domain is a point subset = {p, p є Ε 4 } M: E 4 → R n
Variables Variable is a container for an Array of values dimensions lat = 64; lon = 128; variables: float temperature( lat, lon); Domain is a set of points in Index space: Temperature : {[0..63] x [0..127]} → R Temperature : I 2 → R Variable : I m → R n
Coordinate Systems Coordinate Axis : I m → R {Axis} = Coordinate System : I m → E 4 V: I m → R n CS: I m → E 4 V ° CS -1 : E 4 → R n
Scientific Data Types Trying to go beyond index-space subsetting Trying to satisfy V ° CS -1 : E 4 → R n –I.e. support subsetting using Space, Time “queries” Based on datasets Unidata is familiar with –APIs are evolving Intended to scale to large, multifile collections Corresponding “standard” NetCDF file format conventions
Implementations Datatype Grid PointObs RadialSweep Swath Dataset GridDataset FMRCDataset CollectionOfPointObs StationCollectionOfPointObs StationCollectionOfRadialSweep
Conclusions CDM is our implementation data model Map to data access models such as OGC Current work is to serve collections instead of individual files. Dataset is desired level of granularity Scientific data types are implementations with specialized access
Datatype Collection GridDataset collection of GridDatatype
NetcdfDataset Application Scientific Datatypes NetCDF-Java version 2.2 architecture OPeNDAP THREDDS Catalog.xml NetCDF-3 HDF5 I/O service provider GRIB GINI NIDS NetcdfFile NetCDF-4 … Nexrad DMSP CoordSystem Builder Datatype Adapter ADDE NcML
Gridded Datatype float gridData(t,z,y,x); float time(t); float y(y); float x(x); float lat(y,x); float lon(y,x); float z(z); float height(t,z,y,x); Cartesian coordinates All dimensions are connected horizontal: lat,lon or projection x,y time(time) orthogonal 1D seperable: (x, y) X time X z
GridDatatype methods CoordinateAxis getTaxis(); CoordinateAxis getXaxis(); CoordinateAxis getYaxis(); CoordinateAxis getZaxis(); Projection getProjection(); int[] findXYindexFromCoord( double x_coord, double y_coord); LatLonRect getLatLonBoundingBox(); Array getDataSlice (Range[] …) GridDatatype makeSubset (Range[] …)
Radial Data radialData(radial, gate) : distance(gate) azimuth(radial) elevation(radial) time(radial) Polar coordinates All dimensions are connected Not separate time dimension
Swath swathData(line,cell) lat(line,cell) lon(line,cell) time(line) z(line,cell) ?? lat/lon coordinates not separate time dimension all dimensions are connected
Unstructured Grid float unstructGrid(t,z,pt); float lat(pt); float lon(pt); float time(t); float height(z); Pt dimension not connected Looks the same as point data Need to specify the connectivity explicitly
Point Observation Data Structure { lat, lon, z, time; v1, v2,... } obs( pt); Set of measurements at the same point in space and time Point dimension not connected float obs1(pt); float obs2(pt); float lat(pt); float lon(pt); float z(pt); float time(pt);
PointObsDataset Methods // Iterator Iterator getData( LatLonRect boundingBox, Date start, Date end);
Time series Station Data Structure { name; lat, lon, z; Structure{ time; v1, v2,... } obs(*); // connected } stn(stn); // not connected
StationObs Methods // List List getStations( LatLonRect boundingBox); // Iterator Iterator getData( Station s, Date start, Date end);
Structure { name; Structure { lat, lon, z, time; v1, v2,... } obs(*); // connected } traj(traj) // not connected Trajectory Data Structure { lat, lon, z, time; v1, v2,... } obs(pt); // connected pt dimension is connected Collection dimension not connected
Profiler/Sounding Station Data Structure { name; lat, lon, time; Structure { z; v1, v2,... } obs(*); // connected } loc(nloc); // not connected Structure { name; lat, lon; Structure { time, Structure { z; v1, v2,... } obs(*); // connected } time(*); // connected } stn(stn); // not connected
Data Types Summary Data access through a standard API Convenient georeferencing Specialized subsetting methods –Efficiency for large datasets
File Format #N File Format #2 File Format #1 CDM Visualization &Analysis Payoff N + M instead of N * M things on your TODO List! NetCDF file OpenDAP Server WCS Service Web Service
Next: DataType Aggregation Work at the CDM DataType level, know (some) data semantics Forecast Model Collection –Combine multiple model forecasts into single dataset with two time dimensions –With NOAA/IOOS (Steve Hankin) Point/Station/Trajectory/Profile Data –Allow space/time queries, return nested sequences –Start from / standardize “Dapper conventions”
Forecast Model Collections
Coordinate Systems: implicit/explicit NetCDF, OPeNDAP, HDF data models do not have explicit coordinate systems – so georeferencing not part of API –Need conventions to specify (eg CF-1, COARDS, etc) GRIB, HDF-EOS (eg) are explicit –But no uniform API
47 NetCDF-4 C Library HDF5 Library netCDF-4 Library netCDF-3 Interface NetCDF-4 C Library
Conclusion Standardized Data Access in good shape –HDF5, NetCDF, OPeNDAP –Write an IOSP for proprietary formats (Java) But that’s not good enough! To do: –Standard representations of coordinate systems –Classifications of data types, standard services for them