NERC DataGrid Vocabulary Workshop, RAL, February 25, 2009 NERC DataGrid Vocabulary Server Description
NERC DataGrid Outline Vocabulary Server: Data model Implementation Content Usage Development path
NERC DataGrid Vocabulary Server Data Model The fundamental building block of the data model is a term, which is equivalent to a SKOS “concept” Each term has: Key: a semantically neutral string that forms the basis of a URN Label: a human-readable name for the concept Alternative label: used for abbreviations Definition: more verbose explanation of the concept
NERC DataGrid Vocabulary Server Data Model The terms are aggregated into lists equivalent to SKOS ‘collections’ Each list is given a semantically neutral identifier (4-byte string) Lists may aggregated in ‘Superlists’ Each ‘Superlist’ is given a semantically opaque identifier (bytes 1-3 of the component list identifiers)
NERC DataGrid Vocabulary Server Data Model The ‘Superlist’ concept was inherited from 1980s BODC infrastructure It has no parallel in any knowledge representation standard It is has the unpleasant side effect of giving terms alternative possible URNs Its deprecation is becoming a priority
NERC DataGrid Vocabulary Server Implementation Server back end is an Oracle relational database All terms are stored in a single table List and superlist aggregations implemented as a 2-level indexing table hierarchy Heavily defended by constraints and triggers Fully automated timestamps and update ‘fingerprints’ Fully automated audit trails Fully automated list and superlist versioning
NERC DataGrid Vocabulary Server Implementation Term URLs, list URLs and API calls invoke Java applications that submit SQL queries and wrap up the output as XML documents
NERC DataGrid Vocabulary Server Implementation Why not XML? Grew out of an integral part of the BODC Oracle infrastructure Experiments with XML – particularly OWL – technology did not go well Maintenance tools seem less effective Navigation difficulties through very large XML documents Performance issues with lists containing terms XML has benefits such as access to inference engines, so worth persevering Answer might be to have operational XML builds from a relational back end
NERC DataGrid Vocabulary Server Content Server Contents ( ) 76 public superlists 125 public lists public terms public mappings (RDF triples) Some of the subject areas covered Parameters Platforms Instruments Coverage terms Geographic keywords
NERC DataGrid Vocabulary Server Usage Server Usage for 2008 (2009 to in brackets) (607172) total hits (7134) vocabulary catalogue downloads (10233) vocabulary term/list downloads 1367 (433) vocabulary map downloads 2479 (73) term searches 1501 (74) term verifications Rest of total is robots mining semantic links (getRelatedRecordByTerm method)
NERC DataGrid VS Development Path Version 1.1 current operational version Version 1.2 currently under development Transparent upgrade (no change to WSDL) Bug fix and activation of versioned list serving Additional service API providing list content upgrade functionality to authenticated, authorised external users
NERC DataGrid VS Development Path Version 2.0 currently being designed Revisit back end design Governance labelling Deprecation support Introduce more XML technology? Introduce formally-registered, truly permanent URNs Single RESTful API giving both read and write access through appropriate HTTP methods Output document revision to SKOS 2008
NERC DataGrid VS Development Path Whatever happens with V2.0 we will not annoy a large and very active user base through change Both versions will therefore run in parallel until V1.2 calls are no longer logged