Semantically Enabling the Global Geodynamics Project: Incorporating Feature-Based Annotations via XML Pointer Language (XPointer) I. Lumb, J. Lederman, J. Freemantle & K. Aldridge HPCS 2007
2 Representing GGP Data via ESML and RDF Lumb & Aldridge (2005, 2006)
3 Filename ST LOG Station Strasbourg, France Instrument GWR C026 Author yyyymmdd hhmmss comment C******************************************** microgal offset of unknown origin power loss due to lightening strike... How is GGP Log Data Handled? Involved usage of XSLT –Even more complicated when RDF representations are taken into account Features are difficult to describe –Especially those that cross-cut ESML element boundaries Features are difficult to correlate to primary and auxilliary data
4 Can GGP Log Data be Better Represented? Options –Re-purpose existing ESML elements –Extend the ESML Schema Consequences –Not vanilla ESML anymore An ‘enhanced ESML’ –ESML gets more complicated RDF representations are also more complicated –Features aren't necessarily nested Features cross-cut ESML element boundaries –This is a showstopper!!
5 Consider Annotation ESML has a limited ability to represent features –Features (especially complex ones) don’t necessarily obey XML element boundaries –Likely true for all XML dialects - including DFDL “Annotations are comments, notes, explanations, or other types of external remarks that can be attached to a Web document or a selected part of the document. As they are external, it is possible to annotate any Web document independently, without needing to edit that document. From the technical point of view, annotations are usually seen as [editorial] metadata, as they give additional information about an existing piece of data.” –Amaya 9.52, W3C
See also Annozilla (Annotea on Mozilla),
8 <r:RDF xmlns:r=" xmlns:a=" xmlns:t=" xmlns:http=" xmlns:d=" Annotating a Complex Selection (1) This is RDF-based!!
9 #xpointer(start-point(string-range(/html[1]/body[1]/table[3]/tr[1]/td[1]/pre[1],"",658,1)) /range-to(end-point(string-range(/html[1]/body[1]/table[3]/tr[1]/td[1]/pre[2],"",65,1)))) Annotation of Agreements and Standards T10:31: : T10:32: :43 Annotating a Complex Selection (2)
10 XPointer - XML Pointer Language An extension of XPath –XPath is used by XLink to locate remote link resources Relative addressing –Allows links to places with no anchors Flexible and robust –XPointer/XPath expressions often survive changes in the target document Can point to substrings in character data and to whole tree fragments Status –The key specification is a Working Draft in the W3C’s Recommendation Track
11 Representing GGP Data via ESML and RDF Lumb & Aldridge (2005, 2006)
12 Representing GGP Data via ESML and RDF with Annotation
13 Self-Contained Annotated Informal Ontology … BUT The representation will likely require use of OWL Full –Computationally incomplete May not be able to infer valid conclusions –Undecidable May not be able to make inferences in a finite amount of time To ensure OWL Description Logic representation –Ontologies and their external annotations may need to remain separate Lumb et al., submitted to Computers & Geosciences (2007)
14 Summary Automate the introduction of a self-describing representation –Use an XML-based approach Automate the extraction of relationships –Use RDF to represent relationships –Use GRDDL to extract relationships Describe and relate features via annotation –XPointer is a standards-based vehicle –Use annotation tools (like Amaya or Annozilla) to automate wherever possible –Integrate annotations into ontology (?) Transform data into information into knowledge
15 Future Work Replace ESML by DFDL (?) Develop single schema for annotation types/properties –XPointer and OWL each have their own Semantically base annotations Automate annotation Transform RDF to OWL –Extract OWL classes, properties and individuals from RDF-based representations –Develop tools W3C strategy specified Ontology/annotation integration
Questions?
Additional Slides
18 Makes use of XML Schema Supports semi-structured ASCII format files Includes Earth-Science affinities Being used in various projects –GGP to LEAD On track for standards compliance –Data Format Description Language (DFDL) An Open Grid Forum (OGF) Working Group and emerging recommendation Earth Science Markup Language (ESML)
19 Filename ST GGP Station Strasbourg, France Instrument GWR C026 Phase Lag (deg/cpd) nominal N Latitude (deg) estimated E Longitude (deg) estimated Height (m) estimated Gravity Cal (mgal/v) measured Pressure Cal(mbar/v) nominal Author yyyymmdd hhmmss gravity(V) pressure(V) C******************************************** :. ESML Handles GGP Data via a Template
20 Consider an External Scheme via Annotation ESML has a limited ability to represent features –Features don’t necessarily obey XML element boundaries –Likely true for all XML dialects Including DFDL (!) “Annotation is the linking of a new commentary node to someone else's existing node. It is the essence of a collaborative hypertext.” –TBL, W3 Archive, c "... the addition of information to existing documents without changing the originals.” –Passin, Explorer’s Guide to the Semantic Web, 2004 Editorial metadata –Current work
‘describe’ ‘relate’ ‘compare’ ‘infer’ The stack of expressive power After
22 Automating Annotation Quick-and-dirty solution –Perl script CPAN offers a number of XML-targeted Perl modules … More-appropriate solution –Leverage the XML family XPath/XQuery –To help ‘place’ the selection in the document to be annotated »XPointer my also be useful here XPointer –To annotate the selection
3C454.3
Courtesy Ross Baker, York University
25 Annotations are Everywhere! Analog –Post-its Digital –Productivity software Office Comments –Microsoft Word (Live), Open Office, Google Docs … Web –Browser-based mouse-overs –Google Notebook, Google Earth –Amaya –Source code OpenMP directives
Lumb et al. (2007)
28 Mozilla DOM Inspector
29 Modeling with Formal Ontologies Seek to make use of OWL Description Logic –Maximally expressive –Computational complete All valid conclusions can be inferred –Decidable The inferences take a finite amount of time OWL DL constrains annotation properties –Annotations are well-behaved comments Caution –XPointer-based annotations are highly likely to violate OWL DL constraints on integration tnto ontologies Results in OWL Full Lumb et al. (2007)
30 Origin/Destination: ESML vs. XPointer