Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parameter Vocabularies in the NERC DataGrid Project Presented by Roy Lowry Roy Lowry British Oceanographic Data Centre on behalf of the on behalf of the.

Similar presentations


Presentation on theme: "Parameter Vocabularies in the NERC DataGrid Project Presented by Roy Lowry Roy Lowry British Oceanographic Data Centre on behalf of the on behalf of the."— Presentation transcript:

1 Parameter Vocabularies in the NERC DataGrid Project Presented by Roy Lowry Roy Lowry British Oceanographic Data Centre on behalf of the on behalf of the NERC DataGrid Team and MMI Community Workshop on Grid Middleware and Geospatial Standards for Earth System Science Data, Edinburgh, September 2005

2 Players  NERC DataGrid Team  BADC: Bryan Lawrence, Sue Latham, Marta Gutierrez  CCLRC e-Science Centre: Andrew Woolf, Kevin O’Neill, Dominic Lowe, Kirsten Kleese Van Dam  BODC: Roy Lowry, Michael Hughes, Siva Kondapalli, Laura Bird, Ray Cramer  MMI Community  MBARI core team: John Graybeal, Luis Bermudez, Stephanie Watson (now at Texas A&M)  Workshop domain leads: Cyndy Chandler, Bob Arko, Julie Thomas, Roy Lowry, Karen Stocks/Mark Costello, Jerome King  Many, many more who are too numerous to catalogue

3 Presentation Overview  Parameter vocabulary types  Parameter vocabulary issues in NERC DataGrid  MMI approach to vocabulary harmonisation  VINE tool demonstration

4 Parameter Vocabulary Types  There are two types of parameter vocabulary:  Parameter Usage Vocabularies  Contain terms used to describe individual measurements in a dataset (GML phenomena)  Parameter Discovery Vocabularies  Contain terms used to facilitate the location of datasets based on common known concepts that usually map to groups of phenomena

5 Parameter Usage Vocabularies  Terms that may be linked to a data value describing what was measured and how it was measured  PUVs usually have a key known as a parameter code mapped to the descriptive terms plus other metadata items through a parameter dictionary  Parameter dictionary entries must include (or map one-to-one to) a specification of units of measurement  Specific, narrow and unambiguous terms are best  Required for ‘use’ metadata  Examples include CF Standard Names (nearly actually – reasons coming up), SISMER Parameter Dictionary, MEDS Parameter Dictionary and BODC Parameter Usage Vocabulary

6 Parameter Usage Vocabularies  The ultimate goal for PUVs is to provide the means for software agents to decide whether two phenomena are ‘the same’  Information requirements for this are very high  Must be able to unambiguously describe:  What the phenomenon is  The sphere or medium to which it relates  The units of measurement

7 Parameter Usage Vocabularies  Problems for software agent interoperability between existing PUVs are:  The required information may:  Be spread over any number of metadata fields  Be embedded in non-standardised plaintext  Be missing, needing to be generated by an ‘implied semantics agent’ (human brain)

8 Parameter Usage Vocabularies  Why the CF Standard Name list is ‘nearly’ a PUV  Many phenomena are fully described by the Standard Name and associated canonical units  However, others require qualification by additional attributes within the CF conventions (usually additional variable attributes) resulting in a single Standard Name describing a group of different phenomena (e.g. spectral irradiance).  One term describing many phenomena falls into the definition of a Parameter Discovery Vocabulary, not a Parameter Usage Vocabulary  Divorcing the Standard Names from the NetCDF format and the full CF convention model to produce a stand-alone PUV is therefore an accident waiting to happen  There is evidence that this divorce is underway e.g. MarineXML GML phenomenon dictionary

9 Parameter Discovery Vocabularies  Terms that may be linked to collections of related data values (e.g. winds for wind speed and direction) which are often referred to as keywords  Broad terms with even granularity aligned with concepts familiar to users work best  Required for ‘discovery’ metadata  Examples include GCMD Parameter Valids, AGU Glossary, SEA-SEARCH Agreed Parameter Groups and BODC Parameter Discovery Vocabulary

10 Parameter Discovery Vocabularies  Granularity of PDV terms is a contentious issue  One camp takes the view that their discovery process comprises ‘search and browse’ requiring coarse-grained PDV terms underpinned by fine-grained PUV terms in use metadata  The second camp only recognise ‘search’ requiring very fine-grained PDV terms (i.e. using a PUV instead of a PDV in the discovery metadata)  Camp one argues passionately that camp two’s vocabularies bury them in unwanted information  Camp two argues with equal passion that camp one’s vocabularies bury them in unwanted search hits

11 Parameter Discovery Vocabularies  Can these positions be resolved?  Significant energy could be expended converting camp two followers into happy browsers but experience indicates such a venture is a hiding to nothing  We therefore need to find ways to provide variable granularity for the parameter vocabulary terms used by our discovery search engines  Could addition of ‘broaden’ and ‘refine’ controls to parameter search interfaces be the way forward?

12 Parameter Vocabularies in NDG  ‘NDG-enabling’ a dataset entails the following:  Choose the tranche of data that is to be the dataset  Generate MOLES records for the dataset discovery metadata  Generate CSML document for the dataset use metadata

13 Parameter Vocabularies in NDG  For parameters, this includes  Populating MOLES data entity parameter elements with terms from one or more PDVs  Populating CSML phenomenon definitions with entries from a GML phenomenon dictionary  This is a prime candidate for automation  Any dataset may be resolved to a parameter set  The parameter set may be expressed in the terms from the required vocabularies through mappings

14 Parameter Vocabularies in NDG  The reality for BODC  All BODC data are marked up using parameter codes from the BODC Parameter Usage Vocabulary  Mappings exist to BODC Parameter Discovery Vocabulary and GCMD Parameter Valids  Web service required to translate parameter codes into GML phenomenon dictionary elements

15 Parameter Vocabularies in NDG  The reality for BADC  Data marked up using originator plaintext or CF Standard Names  Mappings need to be built between these and at least one PDV (manual conversion to GCMD used to date)  GML phenomenon dictionary for CF Standard Names needs to be developed from prototype to operational status to handle data marked up in CF, including sorting out how to handle cases where other CF fields are significant  How to generate phenomenon dictionary entries from originator plaintext is under investigation, but not solved

16 Parameter Vocabularies in NDG  Both BODC and BADC can speak GCMD to some extent giving us PDV interoperability  The issues of achieving PUV interoperability through a common GML phenomenon dictionary are recognised, but not addressed  Extending NDG to further data hosts whilst maintaining interoperability without a common well managed and maintained (extended quickly with QA) PUV will not be possible

17 The MMI Approach  Parameter vocabulary interoperability has been one of the primary areas of interest for the Marine Metadata Interoperability project.  Approach is based on building an ontology containing both discovery and usage terms interlinked by relationships  Primarily developed by John Graybeal and Luis Bermudez at MBARI

18 The MMI Approach  Term lists converted into proto-ontologies (OWL files) using bespoke tool (voc2OWL) incorporating definitions where available  OWL files loaded into VINE – an extremely powerful term-mapping tool – and relationships (equal, narrower, broader, user-defined) defined  Explicit relationships created by ergonomic point-and- click sequence. VINE adds inferred relationships.  Metadata indicating mapping confidence level may be linked to individual relationships.  Protégé may then be used to formalise the result into a full-blown ontology

19 The MMI Approach  Approach trialled at a workshop in Boulder, Colorado in August 2005  Mapping teams comprising domain experts, ontology specialists, a VINE tool specialist and leader/facilitator put together to map terms for a specified domain from available vocabularies  Altogether some 3000 relationships (2200 inferred) established in a 2.5-day workshop, including presentations as well as mapping sessions  Worked extremely well for some domains, such as CTD parameters and pigments with well understood terminology  Didn’t work quite so well for the less well-defined ‘benthic habitats’ domain

20 The MMI Approach  The question asked by many of ontology developers is ‘Now you’ve got it, what do you do with it?’  Luis’s answer is to interface the ontology to a Web Service layer to build a term server  Can be placed between discovery portal interface and metadata repository to provide ‘intelligent’ searching (e.g. find datasets labelled ‘chlorophyll-a’ for search term ‘pigments’)

21 The MMI Approach

22 VINE Demonstration  VINE tool downloaded from MMI site (http://marinemetadata.org)  Many vocabularies available on the site as proto-ontologies (OWL encodings)  voc2OWL tool available to convert further resources (MMI would appreciate a copy of OWL files produced to post on the site)  Demonstration set up for 3 vocabularies (GCMD parameter keywords, BODC Parameter Discovery Vocabulary and CORIS dictionary)

23 VINE Demonstration  Driver file to do this:  Similar size RDF file required to redirect resource URLs to files


Download ppt "Parameter Vocabularies in the NERC DataGrid Project Presented by Roy Lowry Roy Lowry British Oceanographic Data Centre on behalf of the on behalf of the."

Similar presentations


Ads by Google