THREDDS Data Server (TDS) and Data Discovery John Caron Unidata/UCAR May 15, 2006
HTTP Tomcat Server THREDDS Data Server Datasets Catalog.xml otherhost.gov THREDDS Server Application NetCDF-Java (CDM) library OPeNDAP HTTPServer WCS OPeNDAP Server hostname.edu OAI Harvester DL Records OAI Provider
Collection vs Inventory Datasets Dataset Catalog Dataset /model/NCEP/DGEX/CONUS_12km/file.grib2 Dataset DatasetScan /models/ncep/NAM/ File1.grib File2.grib File3.grib Dataset
DL Harvesting Dataset Catalog Dataset DatasetScan /models/ncep/NAM/ File1.grib File2.grib File3.grib Dataset Metadata Record isHarvest = true inherit = true Metadata Record
Metadata Information Title / Summary Publisher / Creator / Rights Lat/Lon bounding box Time range –Relative time: “latest 7 days” Variable names –DLESE : no (not dataset oriented) –GCMD: controlled list, required Unique ID/ Resource URL
Why not harvest Inventory? Too many of them, eg in IDD: –NCEP models: 28 collections, 6000 files –NEXRAD level 3 files: ~8M files Real-time datasets are never current DLs (GCMD, DLESE) don’t want them –Collection search in DL, browse inventory on server.
Current Work: Aggregation Make many files into single logical dataset: Make Collection Dataset = Inventory Uses NcML to read into CDM, works at the “syntactic” level. Replaces older “Aggregation Server” –Union –Join on existing dimension –Join on new dimension
TDS/NcML Aggregation
Next: DataType Aggregation Work at the CDM DataType level, know (some) data semantics Forecast Model Collection –Combine multiple model forecasts into single dataset with two time dimensions –With NOAA/IOOS (Steve Hankin) Point/Station/Trajectory/Profile Data –Allow space/time queries, return nested sequences –Start from / standardize “Dapper conventions”
Forecast Model Collections
Web services for discovery “Latest dataset” Resolver service Dataset Query Capability (DQC) : accept query, return results as a collection of datasets in a catalog Future: Dynamic dataset creation based on user query ??
Summary Expect discovery to be 2 phased: 1.Search for collections in DL with browser 2.Use an application like the IDV (OPeNDAP) or GIS client (WCS) to drill down to the actual data. Expect aggregation / query will (eventually) tame the “inventory problem”
Dataset Query Capability Document XML document that describes the set of valid queries for a dataset. Queries are URLS: Selectors: –List of choices –List of stations –Numeric range (point or subrange) –DateRange –Latitude/Longitude Bounding Box Orthogonal selections (except Lists can be nested) Returns a catalog containing inventory datasets.
Example DQC <station name="SD" value="ABR“ <selectFromDateRange id="datePnt" title="Date“ selectType="point" start=" T00:00" end=" T12:00" />.5u reflectivity.5u storm rel. velocity
Issues DQC itself doesn’t deal with the query Queries are expressible as param=value –Extend to arbitrary URLs (token substitution), eg dods –SOAP RPC? Returns a catalog, might be the data itself. Prototype/non-standard, need buy-in from clients to bother continuing.