M. Benno Blumenthal International Research Institute for Climate and Society Connecting netcdf/CF to a semantic.

Slides:



Advertisements
Similar presentations
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Advertisements

The Semantic Web – WEEK 4: RDF
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
CS570 Artificial Intelligence Semantic Web & Ontology 2
RDF Tutorial.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Module 2b: Modeling Information Objects and Relationships IMT530: Organization of Information Resources Winter, 2007 Michael Crandall.
Requirements for DSML 2.0. Summary RFC 2251 fidelity Represent existing directory protocols with new transport syntax Backwards compatibility with DSML.
ToolMatch: Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Patrick West 1 Nancy Hoebelheinrich.
Linking Disparate Datasets of the Earth Sciences with the SemantEco Annotator Session: Managing Ecological Data for Effective Use and Reuse Patrice Seyed.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
Practical RDF Chapter 1. RDF: An Introduction
The Semantic Web Service Shuying Wang Outline Semantic Web vision Core technologies XML, RDF, Ontology, Agent… Web services DAML-S.
INF 384 C, Spring 2009 Ontologies Knowledge representation to support computer reasoning.
The MMI Tools Carlos Rueda Monterey Bay Aquarium Research Institute OOS Semantic Interoperability Workshop Marine Metadata Interoperability Project Boulder,
Master Informatique 1 Semantic Technologies Part 11Direct Mapping Werner Nutt.
Discovering accessibility, display, and manipulation of data in a data portal Nancy Hoebelheinrich Patrick West 2
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Towards a semantic web Philip Hider. This talk  The Semantic Web vision  Scenarios  Standards  Semantic Web & RDA.
Adoption of RDA-DFT Terminology and Data Model to the Description and Structuring of Atmospheric Data Aaron Addison, Rudolf Husar, Cynthia Hudson-Vitale.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
Tutorial 13 Validating Documents with Schemas
M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society OpenDAP 2007
ToolMatch Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Products Patrick West 1 Nancy Hoebelheinrich.
It’s all semantics! The premises and promises of the semantic web. Tony Ross Centre for Digital Library Research, University of Strathclyde
RELATORS, ROLES AND DATA… … similarities and differences.
Semantic Technologies and Application to Climate Data M. Benno Blumenthal IRI/Columbia University CDW /04-01.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Server-side Analysis and a Semantic Framework for Metadata M. Benno Blumenthal International Research Institute for Climate and Society Columbia University.
Metadata Registries Registry: authoritative, centrally controlled store of information – W3C Web Services Glossary, 2004
PHS / Department of General Practice Royal College of Surgeons in Ireland Coláiste Ríoga na Máinleá in Éirinn Knowledge representation in TRANSFoRm AMIA.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar
M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Problems with XML & XML Schemas XML falls apart on the Scalability design goal. 1.The order in which elements appear in an XML document is significant.
Creating a Semantic Web with Linked Data Todd King.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Description of Information Resources: RDF/RDFS (an Introduction)
THE SEMANTIC WEB By Conrad Williams. Contents  What is the Semantic Web?  Technologies  XML  RDF  OWL  Implementations  Social Networking  Scholarly.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lotzi Bölöni.
Data Interoperability at the IRI: translating between data cultures Benno Blumenthal International Research Institute for Climate Prediction Columbia University.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society IRI Data Library.
Semantic Web underpinnings of the IRI Data Library Semantic Web as a Framework for Multiple Metadata IRI Data Library: presenting Data in multiple frameworks.
M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society Using a Resource.
 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,
® IBM Software Group © 2009 IBM Corporation Viewpoints and Views in SysML Dr Graham Bleakley
Setting the stage: linked data concepts Moving-Away-From-MARC-a-thon.
IRI Data Library Faceted Search: an example of RDF-based faceted search for climate data Drawing on multiple ontologies to build an application Using inference.
M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society Use of RDF/OWL.
Using the Semantic Web M. Benno Blumenthal International Research Institute for Climate and Society Columbia University 31 July 2012 CU Metadata Group.
An Introduction to the Semantic Web M. Benno Blumenthal International Research Institute for Climate and Society Columbia University 2 November 2011.
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
The Semantic Web By: Maulik Parikh.
Building the Semantic Web
Transport and Access of Data, Metadata, and Semantics using RDF
Connecting netcdf/CF to a semantic framework
Data Standards at the IRI Data Library
RDF Standard Data Model Exchange
Ontology-Based Approaches to Data Integration
IRI Data Library Faceted Search: an example of
M.Benno Blumenthal, Michael Bell,
ToolMatch Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Products Patrick West1 Nancy
Presentation transcript:

M. Benno Blumenthal International Research Institute for Climate and Society Connecting netcdf/CF to a semantic framework

RDF/OWL and earth science metadata The standards underlying the Semantic Web -- Resource Description Framework (RDF) and Web Ontology Language (OWL), among others – show great promise in addressing some of the basic problems in earth science metadata. In particular they provide a single framework that allows us to describe datasets according to multiple standards, creating a more complete description than any single standard can support, and avoiding the difficult problem of creating a super-standard that can describe everything about everything.

RDF is not a killer app Resource Description Framework (RDF) is a framework to write down relationships in a reusable way, a semantic framework A lot of CF is not written down in a usable way, e.g. relationships between different values of standard_names, or how data representations in CF correspond to concepts or other standards. It is past time to fix that Some would prefer Java, some prefer english

Different Representations of The CF Standard CF EnglishJavaRDF/OWL English – gloriously vague and flexible Java – complete implementation makes a great black-box, an API becomes yet-another-standard RDF/OWL – can facilitate creating/enhancing both the English and Java versions (as well as other programming languages)

CF metadata in a semantic framework A literal level which explains which attributes are available to be attached to datasets/variables, A more semantic level, which gives explicit expression to concepts like Coordinate and Non-Coordinate variables, and how a Non-Coordinate Variable can be geo-located.

Semantic interoperability Writing down the CF standard on a semantic level then allows interoperability with other standards, e.g. other ways of marking geolocated Non-Coordinate variables.

additional issues how to less-ambiguously tag metadata in netcdf files so that software can more easily determine which attribute belongs to which metadata standard How to better register netcdf metadata standards in general How to better register CF concepts so that interoperability (or indeed operability) can occur.

Why RDF? Make implicit semantics explicit Web-based system for interoperating semantics Decontextualizes the information, facilitating reuse RDF/OWL is an emerging technology, so tools are being built that help solve the semantic problems in handling data

Standard Metadata Users Datasets Tools Standard Metadata Schema/Data Services

Many Data Communities Tools Users Datasets Standard Metadata Schema Tools Users Datasets Standard Metadata Schema Tools Users Datasets Standard Metadata Schema Tools Users Datasets Standard Metadata Schema Tools Users Datasets Standard Metadata Schema

Super Schema Tools Users Datasets Standard Metadata Schema Tools Users Datasets Standard Metadata Schema Tools Users Datasets Standard Metadata Schema Tools Users Datasets Standard Metadata Schema Tools Users Datasets Standard Metadata Schema Standard metadata schema

One take on semantic interoperability `When I use a word,' Humpty Dumpty said in rather a scornful tone, `it means just what I choose it to mean--neither more nor less.' `The question is,' said Alice, `whether you CAN make words mean so many different things.' `The question is,' said Humpty Dumpty, `which is to be master-- that's all.' Through the Looking Glass (And What Alice Found There) Carroll, Lewis Published: 1871 Type(s): Novels, Young Readers, Fantasy Source: Wikisource

Super Schema: direct Tools Users Datasets Standard Metadata Schema Tools Users Datasets Standard Metadata Schema Tools Users Datasets Standard Metadata Schema Tools Users Datasets Standard Metadata Schema Tools Users Datasets Standard Metadata Schema Standard metadata schema/data service

Flaws A lot of work Super Schema/Service is the Lowest- Common-Denominator, so you end up saying less-and-less about more-and- more. Science keeps evolving, so that standards either fall behind or constantly change

RDF Standard Data Model Exchange Tools Users Datasets Standard Metadata Schema Tools Users Datasets Standard Metadata Schema Tools Users Datasets Standard Metadata Schema Tools Users Datasets Standard Metadata Schema Tools Users Datasets Standard Metadata Schema Standard metadata schema RDF

Standard metadata schema Tools Users Datasets Standard Metadata Schema RDF Tools Users Datasets Standard Metadata Schema RDF Tools Users Datasets Standard Metadata Schem RDF RDF Data Model Exchange RDF Tools Users Datasets Standard Metadata Schema RDF Tools Users Datasets Standard Metadata Schema RDF

Why is this better? Maps the original dataset metadata into a standard format that can be transported and manipulated Still the same impedance mismatch when mapped to the least-common-denominator standard metadata, but When a better standard comes along, the original complete-but-nonstandard metadata is already there to be remapped, and “late semantic binding” means everyone can use the new semantic mapping Can use enhanced mappings between models that have common concepts beyond the least-common- denominator EASIER – tools to enhance the mapping process, mappings build on other mappings

RDF Architecture RDF Virtual (derived) RDF queries

Triplets of Subject Property (or Predicate) Object URI’s identify things, i.e. most of the above Namespaces are used as a convenient shorthand for the URI’s URI’s do not need to resolve RDF: framework for writing connections

Datatype Properties {WOA} dc:title “NOAA NODC WOA01” {WOA} dc:description “NOAA NODC WOA01: World Ocean Atlas 2001, an atlas of objectively analyzed fields of major ocean parameters at monthly, seasonal, and annual time scales. Resolution: 1x1; Longitude: global; Latitude: global; Depth: [0 m,5500 m]; Time: [Jan,Dec]; monthly”

Object Properties {WOA} iridl:isContainerOf {Grid-1x1}, {Grid-1x1} iridl:isContainerOf {Monthly}

WOA01 diagram

Standard Properties {WOA} dcterm:hasPart {Grid-1x1}, {Grid-1x1} dcterm:hasPart {MONTHLY} Alternatively {WOA} iridl:isContainerOf {Grid-1x1}, {iridl:isContainerOf} rdfs:subPropertyOf {dcterm:hasPart}

{SST} rdf:type {cfatt:non_coordinate_variable}, {SST} cfobj:standard_name {cf:sea_surface_temperature}, {SST} netcdf:hasDimension {longitude} Data Structures in RDF Object properties provide a framework for explicitly writing down relationships between data objects/components, e.g. vague meaning of nesting is made explicit Properties also can be related, since they are objects too

Virtual Triples Use Conventions to connect concepts to established sets of concepts Generate additional “virtual” triples from the original set and semantics RDFS – some property/class semantics OWL – additional property/class semantics: more sophisticated (ontological) relationships SWRL – rules for constructing virtual triples

Define terms Attribute Ontology Object Ontology Term Ontology These are different ways RDF can be used

Attribute Ontology Subjects are the only type-object Predicates are “attributes” Objects are datatype Isomorphic to simple data tables Isomorphic to netcdf attributes of datasets Some faceted browsers: predicate = facet e.g. longwell from MIT

cf-att – CF transcribed

cf-att with some attributes

RDF helps decontextualizes {sst variable} cfatt:standard_name “sea_surface_temperature” Where cfatt = the cfatt URI prefix, temporarily u/ontologies/cf-att.owlhttp://iridl.ldeo.columbia.ed u/ontologies/cf-att.owl# Put data in netcdf file Set conventions attribute to “CF-1.0” Set standard_name of variable “sst” to “sea_surface_temperat ure” Current system requires data in a netcdf file for CF to be understood

Object Ontology Objects are object-type Isomorphic to “belongs to” Isomorphic to multiple data tables connected by keys Express the concept behind netcdf attributes which name variables Concepts as objects can be cross-walked Concepts as objects can be interrelated

Example: controlled vocabulary {variable} cfatt:standard_name {“string”} Where string has to belong to a list of possibilities. {variable} cfobj:standard_name {stdnam} Where stdnam is an individual of the class cfobj:StandardName

Example: controlled vocabulary Bi-direction crosswalk between the two is somewhat trivial, which means all my objects will have both cfatt:standard_name and cfobj:standard_name

Example: controlled vocabulary If I am writing software to read/write netcdf files, I use the cfatt ontology and in particular cfatt:standard_name If I am making connections/cross-walks to other variable naming standards, I use cfobj:standard_name

Some cf-obj classes

Term Ontology Concepts as individuals Simple Knowledge Organization System (SKOS) is a prime example standard_name as object would be such

Nuanced tagging Concepts as objects can be interrelated: specific terms imply broader terms Object ends up being tagging with terms ranging from general to specific. Search can then be nuanced tagging can proceed in absence of perfect information Partial information can be written down

CF standard names.. I would add that standard names alone (in the cases where a standard name is sufficient) have the same kind of role as common concepts. The definitions of standard names allow some vagueness, though some are more precise than others, because their role is to indicate which things should validly be regarded as the same thing by visualisation and processing software Jonathan Gregory 04/22/08 23:22:01 pcmdi.llnl.gov/trac/ticket/24

CF standard regions I don't think the regions can be exactly standardised, because part of the reason for having names is in order to be somewhat "vague". Just as a common standard name is given to quantities from different data sources when those quantities are regarded as comparable, the same standard region name would be given to data which represent the same region in a way which is regarded as comparable. For instance, different GCMs do not have exactly the same shape for the Atlantic Ocean, but Atlantic meridional overturning streamfunctions are calculated from each model, and these are regarded as comparable. Jonathan Gregory dateWed, Aug 27, 2008 at 5:11 AM

I.E. In other words, the broader the standard, the more vague. On the other hand, we can say something. In fact, we can say quite a lot about how these terms interrelate. And how they relate to less broad, less vague systems.

What we can do easily Establish URI’s for the concepts in CF (standard_names, standard_regions, the attributes themselves) so that statements can be written about them in XML and RDF. Establish a machine-readable version of so that we can write code to extract (decontextualize) metadata from netcdf files Start writing down the relationships between the concepts. Agree on a cf-att ontology, and work on cf-obj so that we can connect with other conventions. Set a convention for explicitly labeling netcdf attributes with their convention, i.e. namespace labels so that process of figuring out which convention covers which attribute is purely gramatical Set a convention for referring to a URI-identified concept in a netcdf file

Search Interface Items (datasets/maps) Terms Facets Taxa

Search Interface Semantic API {item} dc:title dc:description rss:link iridl:icon dcterm:isPartOf {item2} dcterm:isReplacedBy {item2} {item} trm:isDescribedBy {term} {term} a {facet} of {taxa} of {trm:Term}, {facet} a {trm:Facet}, {taxa} a {trm:Taxa}, {term} trm:directlyImplies {term2}

Faceted Search w/Queries

RDF Architecture RDF Virtual (derived) RDF queries

Data Servers Ontologies MMI JPL Standards Organizations Start Point RDF Crawler RDFS Semantics Owl Semantics SWRL Rules SeRQL CONSTRUCT Search Queries Location Canonicalizer Time Canonicalizer Sesame Search Interface bibliography IRI RDF Architecture

Cast of Characters NC – netcdf data file format CF – Climate and Forecast metadata convention for netcdf SWEET - Semantic Web for Earth and Environmental Terminology (OWL Ontology) IRIDL – IRI Data Library

CF attributes SWEET Ontologies (OWL) Search Terms CF Standard Names (RDF object) IRIDL Terms NC basic attributes IRIDL attributes/objects SWEET as Terms CF Standard Names As Terms Gazetteer Terms CF data objects Location

Thoughts Pure RDF framework seems currently viable for a moderate collection of data Potential for making a lot of implicit data conventions explicit Explicit conventions can improve interoperability Simple RDF concepts can greatly impact searches

Some Thoughts Reproducibility implies complete metadata Non-standard complete metadata just needs to be mapped to more standard schemes A multiple-scheme system like RDF retains reproducibility even with partial mapping to standards Should be able to measure the misfit – find the space of the “unexplained” – guidance for developing standards.

Stovepipe Conventions Fixed Schema Agreed upon metadata domain Agreed upon data domain Designed to be a partial solution General server software needs to decide whether data legitimately fits the standard User contemplates bash-to-fit