Recent Work in Progress John Caron, June 3, 2003 THREDDS development Dynamic Catalogs: DQC, Resolvers IDD Data Server ADDE Cataloger NetCDF development NetCDF Markup Language (NcML) More efficient Java I/O (NIO) NetCDF/DODS/HDF5 Data Models 90% done – last 10% take another 90%.
THREDDS Catalogs Catalog Generator Data Server Client Application HTTP Server CatalogRef.xml Catalog.xml Catalog Generator Data Server DODS, ADDE, FTP, HTTP Client Application Catalogs are lists of dataset URLs. Static case. Catgen talks to server or file system. Catalog might be static or dynamically generated. CatalogRef allows dynamic partial generation (benno) Datasets hostname.edu
Dynamic Catalogs = Services HTTP Tomcat Server Catalog.xml Catalog Service Catalog Generator CatalogRef.xml Query Resolver Service DQC.xml Client Application Resolver Service URI URL Need more dynamic system for real time and very large datasets. Catalog is a file, but these are services, that is, code. Show IDD Server catalog – show sattellite DQC, then show radar DQC Data Server DODS, ADDE, FTP, HTTP Datasets hostname.edu
Dataset Query Capability (DQC) XML document. Describes what the user can ask for as a set of orthogonal “selections”. On the client, a “query URL” is formed based on the user’s choices, and sent to the server. The “query resolver” server finds which datasets satisfy the query and returns a list of real dataset URLs. The DQC describes the queries that the server is capable of responding to. In response to Level 3 radar data Generality of it was Ethan’s idea. Robb implemented the radar server. Note its an indirection: a dataset which you use to find other datasets.
Resolver Services Logical Dataset, eg “latest ETA model run” Dataset with Service type “Resolver” On the client, the URI of the logical dataset is sent to the server The server finds what is available and returns a list of real dataset URLs. DQC as a fancy kind of Resolver Service. URI to URL Credit Jeff and Don.
ADDE Cataloger Catalog.xml Client Application ADDE Data Server HTTP Tomcat Server Catalog.xml Catalog Service ADDE Cataloger CatalogRef.xml Query Resolver Service DQC.xml Xxxxx Xxxx Client Application Try to improve response to ADDE Servers. Poll / cache results Add backdoor to ADDE servers. Will try on gozer. If time, show ADDE Cataloger – client. ADDE Data Server hostname.edu Datasets IDD
Summary IDD Data Server Get as much of the IDD Data feeds available via THREDDS as possible. NCEP model data (catgen) (DODS) Level 3 NEXRAD (custom server/DQC) (ADDE) SSEC/Unidata Satellite data (ADDE Cataloger) (ADDE) Text Data: Metars, Surface Obs, etc (DQC/custom server), returns text or XML. Profiler Data (custom server/DQC) (ADDE) Credit Robb
NetCDF 3 NetCDF-3 library API Client Application NetCDF OpenDAP File Dataset protocol NcML Dataset XML Virtual dataset Local file HTTP protocol NetCDF-3 library API HTTP, Dataset Java-only Client Application
NetCDF Markup Language XML representation of netCDF metadata, uses XML Schema Core: existing netCDF data model Coordinate System: general and georeferencing coordinate system Dataset: redefine, aggregate, subset Luca Cinquini (NCAR/SCD/ESG), John Caron, Ethan Davis, Bob Drach (LLNL), Stefano Nativi (Florence), Russ Rew
NcML Coordinate Systems Convention Parser ATDRadar AWIPS COARDS CF CSM GDV NUWG WRF Zebra NetCDF File OpenDAP Dataset Netcdf Dataset NcML Dataset XML
GeoGrids, GeoTiffs, Geowhiz! NetCDF File OpenDAP Dataset Convention Parser Netcdf Dataset GeoGrid factory VisAD / IDV GeoGrid Dataset WCS Server Have sent GeoTIFFS to ESRI using this (credit Roland) using ccm2 files. Limited to lat/lon, want to add projections. Yuan will continue work on this. Stefano work on WCS. GeoTiff Writer Strange land of GIS OpenGIS WCS GeoTiff File
NcML Dataset : “virtual view” NetCDF File OpenDAP Dataset NcML Dataset XML Dataset XML Parser Java-netCDF 2.1 CS output format of Convention parsing Dataset is an input format. Client Application NetCDF Dataset
NcML Dataset Use NcML like CDL, to declare the contents of a netCDF file. Add, delete or rename Variables, Attributes, and Dimensions Subset Variables Reorder a Variable’s dimensions Aggregate multiple netCDF files, a la DODS Aggregation Server NcML Dataset is a “virtual view” or can make copy to a local netCDF file. Declarative vs procedural Subset AWIPS Investigate Integrate with thredds catalogs? Demo copy dods file from Ken Keiser’s UAH “passive microwave” data.
2: NcML Datasets on a Server Catalog.xml DODS Agg/Netcdf Server DODS, ADDE, FTP, HTTP Dataset XML Parser Client Application NcML Dataset XML Datasets hostname.edu
3: NcML Datasets via Catalogs Catalog.xml NcML Dataset XML NetCDF File OpenDAP Dataset Catalog/Dataset XML Parser Java-netCDF v 2.1.1 Third party metadata. Client Application
NIO Rewrite ucar.nc2 I/O layer using java.nio package (currently using ucar.netcdf) Uses memory mapping, bulk I/O transfer Prototype has 7x speedup on large files. Requires JDK 1.4+ HTTP access must be rewritten
NIO vs current Java NIO Current old/new First access small (3.9 Mb) 281 671 2.4 large (240 Mb) 3334 28221 8.5 Average next 5 accesses small 54 290 5.4 large 2239 16367 7.3 Time in millisecs to sequentially read entire file Wintel 2GHz, 1 GB main memory Java 1.4.2 -client
NIO vs optimized C NIO C C/NIO First access small (3.9 Mb) 281 370 1.3 large (240 Mb) 3334 19348 5.8 Average next 5 accesses small 54 24 .44 large 2239 953 .43 Java 1.4.2 –client vs. VC 6.0 /O2 Probably OS / JDK dependent.
NetCDF Data Model NetcdfFile Dimension Variable Attribute Attribute DataType byte char short int float double
OpenDAP Data Model Dataset BaseType primitive (8) string array grid structure sequence Dataset array Dimension BaseType BaseType Attribute Attribute structure / sequence Allow DB like queries on sequences, eg all records with x < 3.0 Attribute BaseType Attribute
HDF5 Data Model Datatype Dataset Groups Datatype DataSpace Attribute Fixed point floating point date/time string bit field Opaque Compound Reference Enumeration Variable length Array Dataset DataSpace Attribute Datatype Groups File directory structure inside HDF file. Data storage Compact External Layout Indexed Striped Neither HDF5 or OpenDAP has shared global dimensions.
Possible Extensions to netCDF data model Add new data types: Strings: variable length arrays of bytes, plus an encoding attribute. Structures: collections of any other element types, allow nested structures. Vector: a variable length 1D array of any type. Allow reusable structure definition = user defined data type. Allow unnamed, undeclared dimensions = anonymous dimensions. Allow multiple unlimited dimensions (outer dimension only) Compression. Push scale/offset into library, allow variable bit sizes. Explicit support for coordinate variables/axes. All probably feasible on top of HDF5.
New NetCDF Data Model NetcdfFile Variable Dimension Attribute Structure DataType byte short int long float double String Structure Vector Dimension DataType DataType DataType Deprecate char, replace with String. Vector Length Attribute DataType Attribute
NetCDF 4 NetCDF 4 library API Client Application NetCDF V.1 and 2 File OpenDAP Dataset HDF5 File OpenDAP 4.0 protocol Local file or HTTP protocol NcML Dataset XML NetCDF 4 library API Influence both OpenDAP and HDF5 Ambitious, but useful for THREDDS clients Unidata Priorities ?? The most compelling reason…. Virtual dataset Client Application