Parameter Vocabularies in the NERC DataGrid Project Presented by Roy Lowry Roy Lowry British Oceanographic Data Centre on behalf of the on behalf of the.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

1 UK e-Science All-Hands Meeting Nottingham, 2004 Enterprise specification of the NERC DataGrid Andrew Woolf, Ray Cramer.
SeaDataNet Web Services Roy Lowry British Oceanographic Data Centre SeaDataNet Training Course.
GE/BCDMEP Meeting March 2004 EnParDis Enabling Parameter Discovery Roy Lowry, Michael Hughes & Laura Bird British Oceanographic Data Centre.
A Semantic Modelling Approach to Biological Parameter Interoperability Roy Lowry & Laura Bird British Oceanographic Data Centre Pieter Haaring RIKZ, Rijkswaterstaat,
Roy Lowry Adam Leadbetter British Oceanographic Data Centre.
The BODC Parameter Markup and Usage Vocabulary Semantic Model Roy Lowry British Oceanographic Data Centre GO-ESSP Meeting, RAL, June 2005.
Vocabulary management: a foundation for semantic interoperability through ontology development Roy Lowry British Oceanographic Data Centre GO-ESSP, Paris,
NERC DataGrid Vocabulary Governance Vocabulary Workshop, RAL, February 25, 2009.
NERC Data Grid Helen Snaith and the NDG consortium …
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Demonstration of adding content to an ICAN Semantic Resource Roy Lowry, Adam Leadbetter, Olly Clements (NETMAR - BODC) Tanya Haddad (ICAN - OCA)
The MMI Workshop Advancing Domain Vocabularies August 9-11, Boulder, Colorado.
Ontology Semantic Mediation in the Big Picture MMI Workshop - August 2005.
EDMED and EDIOS Roy Lowry, Karen Vickers (Technical) Lesley Rickards, Liz Bradshaw (Content) British Oceanographic Data Centre.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
2 nd Training Workshop 4 – 5 June 2007 Common Data Index - CDI By Dick M.A Schaap Technical Coordinator SeaDataNet.
The NERC DataGrid Vocabulary Server Roy Lowry British Oceanographic Data Centre Ontology Registry Meeting.
The NERC DataGrid Vocabulary Server: an operational system with distributed ontology potential Roy Lowry British Oceanographic Data Centre GO-ESSP 2008,
SeaDataNet Ontology Use Case Roy Lowry British Oceanographic Data Centre Coastal Atlas Interoperability Workshop, Corvallis, July (+ Lessons.
Metadata (for the data users downstream) RFC GIS Workshop July 2007 NOAA/NESDIS/NGDC Documentation.
BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Metadata Agents and Semantic Mediation Mikhaila Burgess Cardiff University.
Metadata Understanding the Value and Importance of Proper Data Documentation Exercise 2 Reading a Metadata File Exercise 3 Using the Workbook Exercise.
1 © Netskills Quality Internet Training, University of Newcastle Metadata Explained © Netskills, Quality Internet Training.
Metadata Guides for Smarties Marine Metadata Initiative URL:
The Marine Metadata Interoperability Project
Bryan Lawrence on behalf of BADC, BODC, CCLRC, PML and SOC An Introduction to NDG concepts [ ]=
Speeding up ontology creation of scientific terms. Luis Bermudez, John Graybeal, Montery Bay Aquarium Research Institute December.
When Hydrospheres Collide Lessons in Practical Environmental Ontologies John Graybeal, Luis Bermudez Marine Metadata Interoperability Project 12 October.
CF Conventions Support at BADC Alison Pamment Roy Lowry (BODC)
NERC DataGrid Vocabulary Server Access Vocabulary Workshop, RAL, February 25, 2009.
The MMI Tools Carlos Rueda Monterey Bay Aquarium Research Institute OOS Semantic Interoperability Workshop Marine Metadata Interoperability Project Boulder,
1 The NERC DataGrid DataGrid The NERC DataGrid DataGrid AHM 2003 – 2 Sept, 2003 e-Science Centre Metadata of the NERC DataGrid Kevin O’Neill CCLRC e-Science.
1 NESC workshop Grid and Geospatial Standards 7-Sep-2005 Data integration with the Climate Science Modelling Language Andrew Woolf 1, Bryan Lawrence 2,
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
NOCS, PML, STFC, BODC, BADC The NERC DataGrid = Bryan Lawrence Director of the STFC Centre for Environmental Data Archival (BADC, NEODC, IPCC-DDC.
Coastal Atlas Interoperability - Ontologies (continued) Luis Bermudez Stephanie Watson Marine Metadata Interoperability Initiative 1.
NERC DataGrid NERC DataGrid Vocabulary Server Use Cases Vocabulary Workshop, RAL, February 25, 2009.
Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander.
VO Sandpit, November 2009 CEDA Metadata Steve Donegan/Sam Pepler.
Coastal Atlas Interoperability - Ontologies (Advanced topics that we did not get to in detail) Luis Bermudez Stephanie Watson Marine Metadata Interoperability.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
The Marine Metadata Interoperabillity Luis Bermudez SECOORA March 10, 2005.
NDG Discovery Gateway ISO19139 Issues Bryan Lawrence Director of Environmental Data Archival and Associated Research, CCLRC Head of the British Atmospheric.
Construction of Marine Vocabularies in the Marine Metadata Interoperability Project Luis Bermudez, John Graybeal, MBARI Anthony Isenor, Defence R&D Canada.
The NERC DataGrid Prototype Bryan Lawrence 2, Ray Cramer 3, Marta Gutierrez 2, Kerstin Kleese van Dam 1, Siva Kondapalli 3, Susan Latham 2, Roy Lowry 3,
M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society OpenDAP 2007
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
AUKEGGSWorkshop ANU, Canberra, 29 November 2006 Implementing CSML Feature Types in applications within the NERC DataGrid Dominic Lowe, British Atmospheric.
Marine Metadata Interoperability - Web Services Marine scientists face an opportunity and a challenge in the volume of data available from various ocean.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Dictionary based interchanges for iSURF -An Interoperability Service Utility for Collaborative Supply Chain Planning across Multiple Domains David Webber.
Page 1 Drexel University, College of Engineering ACHIEVING SEMANTIC INTEROPERABILITY WITH HYDROLOGIC ONTOLOGIES FOR THE WEB 6 th International Conference.
Making SOAP web services semantically understandable Luis Bermudez OOSTech Baltimore Oct,
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Integrating Distributed Data Systems Using Ontologies, Web Services and Standards: An MMI Case Study John Graybeal, Luis Bermudez, Kevin Gomes, Michael.
A centre of expertise in digital information management UKOLN is supported by: Functional Requirements Eprints Application Profile Working.
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
British Atmospheric Data Centre ( Searching: Whither NDG? Bryan Lawrence.
NESC Worshop – 07 September 2005 Development of a Marine Metadata Standard Greg Reed Executive Officer Australian Ocean Data Centre Joint Facility.
Semantic Web underpinnings of the IRI Data Library Semantic Web as a Framework for Multiple Metadata IRI Data Library: presenting Data in multiple frameworks.
1 Alison Pamment, 2 Calum Byrom, 1 Bryan Lawrence, 3 Roy Lowry 1 NCAS/BADC,Science and Technology Facilities Council, 2 Tessella plc, 3 British Oceanogrphic.
Roy Lowry British Oceanographic Data Centre.  Controlled Vocabularies - What and Why  Controlled Vocabularies - History  Controlled Vocabularies -
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
IRI Data Library Faceted Search: an example of RDF-based faceted search for climate data Drawing on multiple ontologies to build an application Using inference.
Validation of Metadata XML files SeaDataNet Training, June 2008 Presented by with contributions from Karen Vickers (BODC) Presented by Michèle Fichaut.
NERC DataGrid: Googling for Secure Data
Controlled Vocabularies: What, Why, How?
Applications of IFLA Namespaces
Presentation transcript:

Parameter Vocabularies in the NERC DataGrid Project Presented by Roy Lowry Roy Lowry British Oceanographic Data Centre on behalf of the on behalf of the NERC DataGrid Team and MMI Community Workshop on Grid Middleware and Geospatial Standards for Earth System Science Data, Edinburgh, September 2005

Players  NERC DataGrid Team  BADC: Bryan Lawrence, Sue Latham, Marta Gutierrez  CCLRC e-Science Centre: Andrew Woolf, Kevin O’Neill, Dominic Lowe, Kirsten Kleese Van Dam  BODC: Roy Lowry, Michael Hughes, Siva Kondapalli, Laura Bird, Ray Cramer  MMI Community  MBARI core team: John Graybeal, Luis Bermudez, Stephanie Watson (now at Texas A&M)  Workshop domain leads: Cyndy Chandler, Bob Arko, Julie Thomas, Roy Lowry, Karen Stocks/Mark Costello, Jerome King  Many, many more who are too numerous to catalogue

Presentation Overview  Parameter vocabulary types  Parameter vocabulary issues in NERC DataGrid  MMI approach to vocabulary harmonisation  VINE tool demonstration

Parameter Vocabulary Types  There are two types of parameter vocabulary:  Parameter Usage Vocabularies  Contain terms used to describe individual measurements in a dataset (GML phenomena)  Parameter Discovery Vocabularies  Contain terms used to facilitate the location of datasets based on common known concepts that usually map to groups of phenomena

Parameter Usage Vocabularies  Terms that may be linked to a data value describing what was measured and how it was measured  PUVs usually have a key known as a parameter code mapped to the descriptive terms plus other metadata items through a parameter dictionary  Parameter dictionary entries must include (or map one-to-one to) a specification of units of measurement  Specific, narrow and unambiguous terms are best  Required for ‘use’ metadata  Examples include CF Standard Names (nearly actually – reasons coming up), SISMER Parameter Dictionary, MEDS Parameter Dictionary and BODC Parameter Usage Vocabulary

Parameter Usage Vocabularies  The ultimate goal for PUVs is to provide the means for software agents to decide whether two phenomena are ‘the same’  Information requirements for this are very high  Must be able to unambiguously describe:  What the phenomenon is  The sphere or medium to which it relates  The units of measurement

Parameter Usage Vocabularies  Problems for software agent interoperability between existing PUVs are:  The required information may:  Be spread over any number of metadata fields  Be embedded in non-standardised plaintext  Be missing, needing to be generated by an ‘implied semantics agent’ (human brain)

Parameter Usage Vocabularies  Why the CF Standard Name list is ‘nearly’ a PUV  Many phenomena are fully described by the Standard Name and associated canonical units  However, others require qualification by additional attributes within the CF conventions (usually additional variable attributes) resulting in a single Standard Name describing a group of different phenomena (e.g. spectral irradiance).  One term describing many phenomena falls into the definition of a Parameter Discovery Vocabulary, not a Parameter Usage Vocabulary  Divorcing the Standard Names from the NetCDF format and the full CF convention model to produce a stand-alone PUV is therefore an accident waiting to happen  There is evidence that this divorce is underway e.g. MarineXML GML phenomenon dictionary

Parameter Discovery Vocabularies  Terms that may be linked to collections of related data values (e.g. winds for wind speed and direction) which are often referred to as keywords  Broad terms with even granularity aligned with concepts familiar to users work best  Required for ‘discovery’ metadata  Examples include GCMD Parameter Valids, AGU Glossary, SEA-SEARCH Agreed Parameter Groups and BODC Parameter Discovery Vocabulary

Parameter Discovery Vocabularies  Granularity of PDV terms is a contentious issue  One camp takes the view that their discovery process comprises ‘search and browse’ requiring coarse-grained PDV terms underpinned by fine-grained PUV terms in use metadata  The second camp only recognise ‘search’ requiring very fine-grained PDV terms (i.e. using a PUV instead of a PDV in the discovery metadata)  Camp one argues passionately that camp two’s vocabularies bury them in unwanted information  Camp two argues with equal passion that camp one’s vocabularies bury them in unwanted search hits

Parameter Discovery Vocabularies  Can these positions be resolved?  Significant energy could be expended converting camp two followers into happy browsers but experience indicates such a venture is a hiding to nothing  We therefore need to find ways to provide variable granularity for the parameter vocabulary terms used by our discovery search engines  Could addition of ‘broaden’ and ‘refine’ controls to parameter search interfaces be the way forward?

Parameter Vocabularies in NDG  ‘NDG-enabling’ a dataset entails the following:  Choose the tranche of data that is to be the dataset  Generate MOLES records for the dataset discovery metadata  Generate CSML document for the dataset use metadata

Parameter Vocabularies in NDG  For parameters, this includes  Populating MOLES data entity parameter elements with terms from one or more PDVs  Populating CSML phenomenon definitions with entries from a GML phenomenon dictionary  This is a prime candidate for automation  Any dataset may be resolved to a parameter set  The parameter set may be expressed in the terms from the required vocabularies through mappings

Parameter Vocabularies in NDG  The reality for BODC  All BODC data are marked up using parameter codes from the BODC Parameter Usage Vocabulary  Mappings exist to BODC Parameter Discovery Vocabulary and GCMD Parameter Valids  Web service required to translate parameter codes into GML phenomenon dictionary elements

Parameter Vocabularies in NDG  The reality for BADC  Data marked up using originator plaintext or CF Standard Names  Mappings need to be built between these and at least one PDV (manual conversion to GCMD used to date)  GML phenomenon dictionary for CF Standard Names needs to be developed from prototype to operational status to handle data marked up in CF, including sorting out how to handle cases where other CF fields are significant  How to generate phenomenon dictionary entries from originator plaintext is under investigation, but not solved

Parameter Vocabularies in NDG  Both BODC and BADC can speak GCMD to some extent giving us PDV interoperability  The issues of achieving PUV interoperability through a common GML phenomenon dictionary are recognised, but not addressed  Extending NDG to further data hosts whilst maintaining interoperability without a common well managed and maintained (extended quickly with QA) PUV will not be possible

The MMI Approach  Parameter vocabulary interoperability has been one of the primary areas of interest for the Marine Metadata Interoperability project.  Approach is based on building an ontology containing both discovery and usage terms interlinked by relationships  Primarily developed by John Graybeal and Luis Bermudez at MBARI

The MMI Approach  Term lists converted into proto-ontologies (OWL files) using bespoke tool (voc2OWL) incorporating definitions where available  OWL files loaded into VINE – an extremely powerful term-mapping tool – and relationships (equal, narrower, broader, user-defined) defined  Explicit relationships created by ergonomic point-and- click sequence. VINE adds inferred relationships.  Metadata indicating mapping confidence level may be linked to individual relationships.  Protégé may then be used to formalise the result into a full-blown ontology

The MMI Approach  Approach trialled at a workshop in Boulder, Colorado in August 2005  Mapping teams comprising domain experts, ontology specialists, a VINE tool specialist and leader/facilitator put together to map terms for a specified domain from available vocabularies  Altogether some 3000 relationships (2200 inferred) established in a 2.5-day workshop, including presentations as well as mapping sessions  Worked extremely well for some domains, such as CTD parameters and pigments with well understood terminology  Didn’t work quite so well for the less well-defined ‘benthic habitats’ domain

The MMI Approach  The question asked by many of ontology developers is ‘Now you’ve got it, what do you do with it?’  Luis’s answer is to interface the ontology to a Web Service layer to build a term server  Can be placed between discovery portal interface and metadata repository to provide ‘intelligent’ searching (e.g. find datasets labelled ‘chlorophyll-a’ for search term ‘pigments’)

The MMI Approach

VINE Demonstration  VINE tool downloaded from MMI site (  Many vocabularies available on the site as proto-ontologies (OWL encodings)  voc2OWL tool available to convert further resources (MMI would appreciate a copy of OWL files produced to post on the site)  Demonstration set up for 3 vocabularies (GCMD parameter keywords, BODC Parameter Discovery Vocabulary and CORIS dictionary)

VINE Demonstration  Driver file to do this:  Similar size RDF file required to redirect resource URLs to files