Presentation is loading. Please wait.

Presentation is loading. Please wait.

Connecting netcdf/CF to a semantic framework

Similar presentations


Presentation on theme: "Connecting netcdf/CF to a semantic framework"— Presentation transcript:

1 Connecting netcdf/CF to a semantic framework
M. Benno Blumenthal International Research Institute for Climate and Society The standards underlying the Semantic Web -- Resource Description Framework (RDF) and Web Ontology Language (OWL) -- show great promise in addressing some of the basic problems in earth science metadata. They provide a framework for explicitly describing the data models implicit in programs that display and manipulate data. They also provide a framework where multiple metadata standards can be described. Most importantly, these data models and metadata standards can be interrelated, a key step in creating interoperability. As a exercise in understanding how this framework might be used, we have created an RDF expression of the datasets and some of the metadata in the IRI/LDEO Climate Data Library. This includes concepts like datasets, units, dependent variables, and independent variables. We have also created an RDF expression of a taxonomy that forms the basis of a earth data search interface. These concepts include location, time, quantity, author, and institution. A series of inference engines are then used to infer the connections between data-oriented concepts of the data library to the distinctly different conceptual framework of the data search. We would also like to use this RDF framework to gather and operate on dataset metadata. The goal is to interoperate between metadata conventions that are attached to data as they travel in different formats and are processed by different software. One could also envision a processing framework that records the connections between processed data, their source data, and their processing filters that could be used both to reapply the processing and document the results.

2 RDF/OWL and earth science metadata
The standards underlying the Semantic Web -- Resource Description Framework (RDF) and Web Ontology Language (OWL), among others – show great promise in addressing some of the basic problems in earth science metadata. In particular they provide a single framework that allows us to describe datasets according to multiple standards, creating a more complete description than any single standard can support, and avoiding the difficult problem of creating a super-standard that can describe everything about everything.

3 RDF is not a killer app Resource Description Framework (RDF) is a framework to write down relationships in a reusable way, a semantic framework A lot of CF is not written down in a usable way, e.g. relationships between different values of standard_names, or how data representations in CF correspond to concepts or other standards. It is past time to fix that Some would prefer Java, some prefer english

4 Different Representations of The CF Standard
English – gloriously vague and flexible Java – complete implementation makes a great black-box, an API becomes yet-another-standard RDF/OWL – can facilitate creating/enhancing both the English and Java versions (as well as other programming languages)

5 CF metadata in a semantic framework
A literal level which explains which attributes are available to be attached to datasets/variables, A more semantic level, which gives explicit expression to concepts like Coordinate and Non-Coordinate variables, and how a Non-Coordinate Variable can be geo-located.

6 Semantic interoperability
Writing down the CF standard on a semantic level then allows interoperability with other standards, e.g. other ways of marking geolocated Non-Coordinate variables.

7 additional issues how to less-ambiguously tag metadata in netcdf files so that software can more easily determine which attribute belongs to which metadata standard How to better register netcdf metadata standards in general How to better register CF concepts so that interoperability (or indeed operability) can occur.

8 Why RDF? Make implicit semantics explicit
Web-based system for interoperating semantics Decontextualizes the information, facilitating reuse RDF/OWL is an emerging technology, so tools are being built that help solve the semantic problems in handling data

9 Standard Metadata Schema/Data Services
Datasets Tools Users

10 Many Data Communities Tools Users Datasets Standard Metadata Schema

11 Super Schema Standard metadata schema Tools Users Datasets

12 One take on semantic interoperability
`When I use a word,' Humpty Dumpty said in rather a scornful tone, `it means just what I choose it to mean--neither more nor less.' `The question is,' said Alice, `whether you CAN make words mean so many different things.' `The question is,' said Humpty Dumpty, `which is to be master--that's all.' Through the Looking Glass (And What Alice Found There) Carroll, Lewis Published: 1871 Type(s): Novels, Young Readers, Fantasy Source: Wikisource

13 Super Schema: direct Standard metadata schema/data service Tools Users
Datasets Standard Metadata Schema Tools Users Datasets Standard Metadata Schema Tools Users Datasets Standard Metadata Schema Tools Users Datasets Standard Metadata Schema Tools Users Datasets Standard Metadata Schema

14 Flaws A lot of work Super Schema/Service is the Lowest-Common-Denominator, so you end up saying less-and-less about more-and-more. Science keeps evolving, so that standards either fall behind or constantly change

15 RDF Standard Data Model Exchange
Standard metadata schema RDF RDF RDF Tools Users Datasets Standard Metadata Schema Tools Users Datasets Standard Metadata Schema RDF RDF RDF Tools Users Datasets Standard Metadata Schema Tools Users Datasets Standard Metadata Schema Tools Users Datasets Standard Metadata Schema

16 RDF Data Model Exchange
Standard metadata schema RDF Tools Users Datasets Standard Metadata Schema RDF Tools Users Datasets Standard Metadata Schema RDF Tools Users Datasets Standard Metadata Schema RDF Tools Users Datasets Standard Metadata Schem RDF Tools Users Datasets Standard Metadata Schema RDF

17 Why is this better? Maps the original dataset metadata into a standard format that can be transported and manipulated Still the same impedance mismatch when mapped to the least-common-denominator standard metadata, but When a better standard comes along, the original complete-but-nonstandard metadata is already there to be remapped, and “late semantic binding” means everyone can use the new semantic mapping Can use enhanced mappings between models that have common concepts beyond the least-common-denominator EASIER – tools to enhance the mapping process, mappings build on other mappings

18 RDF Architecture queries queries queries Virtual (derived) RDF RDF

19 RDF: framework for writing connections
Triplets of Subject Property (or Predicate) Object URI’s identify things, i.e. most of the above Namespaces are used as a convenient shorthand for the URI’s URI’s do not need to resolve Going on, I want to briefly go over how RDF handles semantics

20 Datatype Properties {WOA} dc:title “NOAA NODC WOA01”
{WOA} dc:description “NOAA NODC WOA01: World Ocean Atlas 2001, an atlas of objectively analyzed fields of major ocean parameters at monthly, seasonal, and annual time scales. Resolution: 1x1; Longitude: global; Latitude: global; Depth: [0 m,5500 m]; Time: [Jan,Dec]; monthly” This is what we are used to thinking about as attributes, except that the property is a URI rather than a prearranged agreement. Already this is progress.

21 Object Properties {WOA} iridl:isContainerOf {Grid-1x1},
{Grid-1x1} iridl:isContainerOf {Monthly}

22 WOA01 diagram

23 Standard Properties {WOA} dcterm:hasPart {Grid-1x1},
{Grid-1x1} dcterm:hasPart {MONTHLY} Alternatively {WOA} iridl:isContainerOf {Grid-1x1}, {iridl:isContainerOf} rdfs:subPropertyOf {dcterm:hasPart} This is important: I have locally defined a specific relationship while mapping it to a convention. When a more specific convention comes along, I can add that mapping as well.

24 Data Structures in RDF {SST} rdf:type {cfatt:non_coordinate_variable}, {SST} cfobj:standard_name {cf:sea_surface_temperature}, {SST} netcdf:hasDimension {longitude} Object properties provide a framework for explicitly writing down relationships between data objects/components, e.g. vague meaning of nesting is made explicit Properties also can be related, since they are objects too

25 Virtual Triples Use Conventions to connect concepts to established sets of concepts Generate additional “virtual” triples from the original set and semantics RDFS – some property/class semantics OWL – additional property/class semantics: more sophisticated (ontological) relationships SWRL – rules for constructing virtual triples

26 Define terms Attribute Ontology Object Ontology Term Ontology
These are different ways RDF can be used

27 Attribute Ontology Subjects are the only type-object
Predicates are “attributes” Objects are datatype Isomorphic to simple data tables Isomorphic to netcdf attributes of datasets Some faceted browsers: predicate = facet e.g. longwell from MIT

28 cf-att – CF transcribed

29 cf-att with some attributes

30 RDF helps decontextualizes
{sst variable} cfatt:standard_name “sea_surface_temperature” Where cfatt = the cfatt URI prefix, temporarily Put data in netcdf file Set conventions attribute to “CF-1.0” Set standard_name of variable “sst” to “sea_surface_temperature” Current system requires data in a netcdf file for CF to be understood Once you decontextualize, you can do other things with the information, e.g. aggregate it into a site-wide metadata store, or operate on it in an inference network.

31 Object Ontology Objects are object-type Isomorphic to “belongs to”
Isomorphic to multiple data tables connected by keys Express the concept behind netcdf attributes which name variables Concepts as objects can be cross-walked Concepts as objects can be interrelated

32 Example: controlled vocabulary
{variable} cfatt:standard_name {“string”} Where string has to belong to a list of possibilities. {variable} cfobj:standard_name {stdnam} Where stdnam is an individual of the class cfobj:StandardName

33 Example: controlled vocabulary
Bi-direction crosswalk between the two is somewhat trivial, which means all my objects will have both cfatt:standard_name and cfobj:standard_name

34 Example: controlled vocabulary
If I am writing software to read/write netcdf files, I use the cfatt ontology and in particular cfatt:standard_name If I am making connections/cross-walks to other variable naming standards, I use cfobj:standard_name

35 Some cf-obj classes

36 Some cf-obj classes

37 Term Ontology Concepts as individuals
Simple Knowledge Organization System (SKOS) is a prime example standard_name as object would be such

38 Nuanced tagging Concepts as objects can be interrelated: specific terms imply broader terms Object ends up being tagging with terms ranging from general to specific. Search can then be nuanced tagging can proceed in absence of perfect information Partial information can be written down

39 CF standard names .. I would add that standard names alone (in the cases where a standard name is sufficient) have the same kind of role as common concepts. The definitions of standard names allow some vagueness, though some are more precise than others, because their role is to indicate which things should validly be regarded as the same thing by visualisation and processing software Jonathan Gregory 04/22/08 23:22:01

40 CF standard regions I don't think the regions can be exactly standardised, because part of the reason for having names is in order to be somewhat "vague". Just as a common standard name is given to quantities from different data sources when those quantities are regarded as comparable, the same standard region name would be given to data which represent the same region in a way which is regarded as comparable. For instance, different GCMs do not have exactly the same shape for the Atlantic Ocean, but Atlantic meridional overturning streamfunctions are calculated from each model, and these are regarded as comparable. Jonathan Gregory date Wed, Aug 27, 2008 at 5:11 AM

41 I.E. In other words, the broader the standard, the more vague.
On the other hand, we can say something. In fact, we can say quite a lot about how these terms interrelate. And how they relate to less broad, less vague systems.

42 What we can do easily Establish URI’s for the concepts in CF (standard_names, standard_regions, the attributes themselves) so that statements can be written about them in XML and RDF. Establish a machine-readable version of so that we can write code to extract (decontextualize) metadata from netcdf files Start writing down the relationships between the concepts. Agree on a cf-att ontology, and work on cf-obj so that we can connect with other conventions. Set a convention for explicitly labeling netcdf attributes with their convention, i.e. namespace labels so that process of figuring out which convention covers which attribute is purely gramatical Set a convention for referring to a URI-identified concept in a netcdf file

43 Search Interface Items (datasets/maps) Terms Facets Taxa

44 Search Interface Semantic API
{item} dc:title dc:description rss:link iridl:icon dcterm:isPartOf {item2} dcterm:isReplacedBy {item2} {item} trm:isDescribedBy {term} {term} a {facet} of {taxa} of {trm:Term}, {facet} a {trm:Facet}, {taxa} a {trm:Taxa}, {term} trm:directlyImplies {term2}

45 Faceted Search w/Queries

46 RDF Architecture queries queries queries Virtual (derived) RDF RDF

47 IRI RDF Architecture MMI Data Servers Ontologies JPL bibliography
Start Point Standards Organizations RDF Crawler Location Canonicalizer RDFS Semantics Owl Semantics SWRL Rules SeRQL CONSTRUCT Time Canonicalizer Sesame Search Queries Search Interface

48 Cast of Characters NC – netcdf data file format
CF – Climate and Forecast metadata convention for netcdf SWEET - Semantic Web for Earth and Environmental Terminology (OWL Ontology) IRIDL – IRI Data Library

49 CF attributes NC basic attributes IRIDL attributes/objects CF data objects CF Standard Names (RDF object) SWEET Ontologies (OWL) Location CF Standard Names As Terms IRIDL Terms SWEET as Terms Search Terms Gazetteer Terms

50 Thoughts Pure RDF framework seems currently viable for a moderate collection of data Potential for making a lot of implicit data conventions explicit Explicit conventions can improve interoperability Simple RDF concepts can greatly impact searches

51 Some Thoughts Reproducibility implies complete metadata
Non-standard complete metadata just needs to be mapped to more standard schemes A multiple-scheme system like RDF retains reproducibility even with partial mapping to standards Should be able to measure the misfit – find the space of the “unexplained” – guidance for developing standards.

52 Stovepipe Conventions
Fixed Schema Agreed upon metadata domain Agreed upon data domain Designed to be a partial solution General server software needs to decide whether data legitimately fits the standard User contemplates bash-to-fit


Download ppt "Connecting netcdf/CF to a semantic framework"

Similar presentations


Ads by Google