CWIC Developers Meeting January 28 th 2014 Calin Duma CSW and OpenSearch from the CWIC Start client perspective
Agenda CSW standard CWIC Start CSW dependencies CWIC Start CSW implementation OpenSearch challenges and opportunities CWIC Start CSW vs. CWIC Start OpenSearch 2
CSW Standard Mature but complex standard: – CSW provides a significant amount of artifacts (documentation, schemas and XML files) for implementers – The amount of complexity involved discourages implementers from approaching more complex topics such as: GetDomain for obtaining value ranges of metatada record elements or request parameters Catalog Transactions (Insert, Update, Delete) Synchronous and Asynchronous Client Harvesting of catalog holdings Mechanisms to extend the standard to best accommodate the implementer's needs Application Profiles (Earth Observation Profile is not finalized*) Sophisticated query language with support for complex logical, temporal and spatial operands As a result, implementers end up with standard extensions and very basic query capabilities 3
CWIC Start CSW Dependencies GCMD CSW: – Used for CSW dataset searches tagged with project = cwic – Provides 49 queryable properties available to CSW clients – Rich set of logical (11) and spatial operators (8) and geometry operands (4) – Client can specify one or more datasets of interest for further examination via GetRecordById GCMD KMS: – Provides valid values for the ScienceKeyword queryable property CWIC CSW: – Used for CSW granule searches for a specified GCMD dataset / entry ID – Provides 4 (datasetId,AOI,2*TOI) queryable properties available to CSW clients – Provides a set of logical (7) and spatial operators (1) and geometry operands (1) – Client must specify a single dataset of interest for further examination via GetRecordById 4
CWIC Start CSW Implementation We ended up with a very simple common denominator: – Basic BBOX AOI, basic TOI, AnyText and ScienceKeyword used for GCMD dataset searches – Basic BBOX AOI and basic TOI used for CWIC granule searches for a given dataset – Logical operators and more complex logical statements are in theory supported by both CWIC and GCMD but in practice we only exercised AND and OR Examples: Get GCMD datasets where (AOI and TOI and searchTerms [and (keyword1 OR keyword2)]) Get CWIC granules where (datasetId 1 and AOI and TOI) – Implemented distributed granule searches in CWIC Start to supplement the CWIC support for granule searches within a single dataset Issues left to resolve: – Retrieval of valid request parameter values – GCMD correctness of responses based AnyText (no clear understanding on how individual metadata fields are indexed and why sometimes there are anomalies in AnyText queries responses) Examples: When CWIC added Radarsat-1 the AnyText search for Radarsat-1 failed but NRCAN worked Similar occurrences happen from time to time, a good explanation from GCMD on how the AnyText queries work will be very helpful 5
OpenSearch challenges and opportunities Standard is not mature – Documentation is spread over OpenSearch, Extensions, ESIP extensions etc. – this is very confusing – Still no schema to validate an OSDD – Language is ambiguous in many parts of the document – Flexibility in specifying common parameter names (searchTerms, AOI, TOI, pagination support) can lead to confusion – No consistency between request and response shape specification (spatial constraint vs. spatial coverage) – No consistency between request and response TOI specification (OpenSearch in request and Dublin Core in response) Simplicity compensates for lack of clarity and maturity – I will simply deny making this statement if it is ever brought to my attention 6
CWIC Start CSW vs. CWIC Start OpenSearch CWIC Start CSW implements a very basic CSW client based on GCMD and CWIC implementations The current CSW CWIC Start implementation can be replaced with an OpenSearch implementation with no loss of functionality* if: OpenSearch would allow combining more than one searchTerms in a request Example: searchTerms[]=MODIS&searchTerms[]=ALBEDO (provide results for MODIS and ALBEDO) OpenSearch would allow usage of wildcards in the searchTerms There is potential for dynamic UI generation in CWIC Start OpenSearch: – We considered it for CWIC Start CSW datasets searches but the lack of valids and a good description of each GCMD queryable together with the CSW responses correctness and UI programming complexity deterred us 7
HTTP CWIC-Start UI GET HTTP CWIC-Start Web Application jRuby CWIC Data Sets and CWIC Data Granules GCMD CSW GetCapabilities, GetRecords, GetRecordById CWIC CSW GetRecords, GetRecordById 4 HTTP / FTP 5 1 IDENTIFY DESIRED DATA SETS by specifying AOI, TOI, Location, Science Keywords, Platform, Instrument, Free Text Search 2 CWIC-Start Web Application translates HTTP GET requests to GCMD CSW GetCapabilities, GetRecords and GetRecordById (Project = CWIC) 3 GCMD KMS REST API SKOS Vocabulary GCMD responds with data set metadata displayed for user inspection and selection of data sets of interest 4 IDENTIFY DESIRED GRANULES for the data sets of interest by specifying AOI and TOI CWIC-Start Web Application translates HTTP GET requests to CWIC GetRecords and GetRecordById for granules in the data sets of interest 6 CWIC responds with granule metadata for the desired data sets, end-user selects desired granule or browse data for download 7 DOWNLOAD DESIRED GRANULE / BROWSE based on the DigitalTransferOptions or BrowseGraphic online access URIs in the CWIC GetRecordbyId response GCMD CSW Web Application Rich Data Set Metadata Java POST (CSW XML) NOAANASAUSGS AOE China INPE Brazil CWIC Data Providers INPE APIs AOE APIs NOAA APIs ECHO APIs USGS APIs GHRSST JAXA Japan GHRSST APIs JAXA APIs CSW XML Response CWIC CSW Catalog Web Application Data Granules Metadata and Access URLs INPEAOENOAANASAUSGSGHRSST JAXA CWIC Data Provider Connectors HTTP POST (CSW XML) CSW XML Response GCMD KMS Web Service GCMD Controlled Vocabulary 1a HTTP GET 1b KMS XML Response CWIC-Start System Architecture and High Level Interaction with GCMD and CWIC