AUKEGGS Canberra, 2006-11-29 Exposing legacy file-based data (interop-for-files) Andrew Woolf CCLRC Rutherford Appleton Laboratory

Slides:



Advertisements
Similar presentations
OGC Technical Committee Huntsville, GALEON – NERC/CCLRC experience Andrew Woolf e-Science Centre, CCLRC Rutherford.
Advertisements

Applications of XSLT. generating Word documents WordML provides formatting and content elements Word 2003 can read WordML files XSLT can be used to transform.
Use of the SPSSMR Data Model at ATP 12 January 2004.
Open Office.Org What is the Open Office.org Source Project? Open source project through which Sun Microsystems is releasing the technology for the popular.
Information Modelling MOLES Metadata Objects for Linking Environmental Sciences S. Ventouras Rutherford Appleton Laboratory.
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
Copyright © Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy. All rights reserved. Software Connectors.
Software Connectors. Attach adapter to A Maintain multiple versions of A or B Make B multilingual Role and Challenge of Software Connectors Change A’s.
Copyright © Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy. All rights reserved. Software Connectors Software Architecture Lecture 7.
XLink Van Lepthien CSCI 7818 Fall Overview What is XLink? W3C Stuff XLink Elements Linkbases Traversals Implementations Comments References.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Copyright © Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy. All rights reserved. Software Connectors Software Architecture Lecture 7.
National Coastal Data Development Center A division of the National Oceanographic Data Center Please a list of participants at each location to
The Relational Database Model
An Extension to XML Schema for Structured Data Processing Presented by: Jacky Ma Date: 10 April 2002.
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
NetCDF-4 The Marriage of Two Data Formats Ed Hartnett, Unidata June, 2004.
NERC DataGrid and CSML NDG Team. CSML: Context NERC DataGrid: the integration problem –multiple organisations, formats, storage mechanisms (file, relational)
Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer.
EARTH SCIENCE MARKUP LANGUAGE Why do you need it? How can it help you? INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
The European Organisation for the Safety of Air Navigation OGC Aviation Domain Working Group - GML Guidelines / Profile -
Copyright © 2009, Open Geospatial Consortium, Inc. Time issue : Meteo Domain needs and WMS present means Meteorology DWG Frédéric Guillaud, Marie-Françoise.
Deutscher Wetterdienst
Andrew S. Budarevsky Adaptive Application Data Management Overview.
AUKEGGS Architecturally Significant Issues (that we need to solve)
Project 4 : SciDAC All Hands Meeting, September 11-13, 2002 A. Choudhary, W. LiaoW. Gropp, R. Ross, R. Thakur Northwestern UniversityArgonne National Lab.
Advanced Utilities Extending ncgen to support the netCDF-4 Data Model Dr. Dennis Heimbigner Unidata netCDF Workshop August 3-4, 2009.
Part One Review Policy Intro to ISO Levels of Metadata Data Discovery.
XLinks Praveen Polishetty. Contents : XLink Capabilities XLink Concepts XLink Core Properties XLink Semantic Properties XLink Behavior Properties XLink.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Integrating the Climate Science Modelling Language with geospatial software and services Dominic Lowe British Atmospheric Data
NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA DEVELOPMENT for the EARTH SYSTEM GRID Luca Cinquini (SCD/NCAR)
AUKEGGSWorkshop ANU, Canberra, 29 November 2006 Implementing CSML Feature Types in applications within the NERC DataGrid Dominic Lowe, British Atmospheric.
A Fedora 3 to 4 Migration Case Study for UNSW Australia Library Fedora 4 Training Workshop, eResearch Australasia 2015, Brisbane UNSW Library Arif Shaon,
The HDF Group Data Interoperability The HDF Group Staff Sep , 2010HDF/HDF-EOS Workshop XIV1.
Introducing XLink and XPointer ©NIITeXtensible Markup Language/Lesson 10/Slide 1 of 23 Objectives In this lesson, you will learn to: * Identify the types.
Dominic Lowe, British Atmospheric Data Centre, STFC OGC TC, Boulder.
UML Basics and XML Basics Navigating the ISO Standards.
00/XXXX 1 Data Processing in PRISM Introduction. COCO (CDMS Overloaded for CF Objects) What is it. Why is COCO written in Python. Implementation Data Operations.
Global attributes provided by XLink Type definition attribute type Locator attributehref Semantic attributesrole, arcrole, title Behavior attributesshow,
Interoperability How to Build a Digital Library Ian H. Witten and David Bainbridge.
® Using (testing?) the HY_Features model, 95th OGC Technical Committee Boulder, Colorado USA Rob Atkinson 3 June 2015 Copyright © 2015 Open Geospatial.
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Linking XML Documents Ellen Pearlman Eileen Mullin Programming the Web.
Software Connectors Acknowledgement: slides mostly from Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic,
Copyright © Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy. All rights reserved. Software Connectors in Practice Software Architecture.
UC 2006 Tech Session 1 NetCDF in ArcGIS 9.2. UC 2006 Tech Session2 Overview Introduction to Multidimensional DataIntroduction to Multidimensional Data.
CF 2.0 Coming Soon? (Climate and Forecast Conventions for netCDF) Ethan Davis ESO Developing Standards - ESIP Summer Mtg 14 July 2015.
 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
® Sponsored by Hosted by HY_Features Part 3 - OWL encoding: rhyme and reason 96th OGC Technical Committee Nottingham, UK Rob Atkinson 17 September 2015.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
Ontology Technology applied to Catalogues Paul Kopp.
The HDF Group Introduction to HDF5 Session Two Data Model Comparison HDF5 File Format 1 Copyright © 2010 The HDF Group. All Rights Reserved.
Software Connectors. What is a Software Connector? 2 What is Connector? – Architectural element that models Interactions among components Rules that govern.
WMO GRIB Edition 3 Enrico Fucile Inter-Program Expert Team on Data Representation Maintenance and Monitoring IPET-DRMM Geneva, 30 May – 3 June 2016.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
Model Based Engineering Environment Christopher Delp NASA/Caltech Jet Propulsion Laboratory.
Other Projects Relevant (and Not So Relevant) to the SODA Ideal: NetCDF, HDF, OLE/COM/DCOM, OpenDoc, Zope Sheila Denn INLS April 16, 2001.
Aeronautical Information Exchange Model (AIXM) – GML Encoding Guidelines.
OGP Seabed Survey Data Model (SSDM)
NERC DataGrid: Googling for Secure Data
Binary Universal Form Representation (BUFR) Paul Hamer November, 2009
Chapter Eight Interoperability How to Build a Digital Library
Modelling approaches for EO application schema
Transport and Access of Data, Metadata, and Semantics using RDF
The Re3gistry software and the INSPIRE Registry
Raphael Malyankar; Eivind Mong
Metadata The metadata contains
NCL variable based on a netCDF variable model
ECMWF usage, governance and perspectives
Presentation transcript:

AUKEGGS Canberra, Exposing legacy file-based data (interop-for-files) Andrew Woolf CCLRC Rutherford Appleton Laboratory

AUKEGGS Canberra, Outline Introduction The feature model as integration key An interoperability approach for files xlink review and proposed profile for legacy data Examples Issues

AUKEGGS Canberra, Introduction Much ‘earth-science’ data exists as large legacy file-stores –e.g. ECMWF: 2 Pb of file-based data –e.g British Atmospheric Data Centre: 40 Tb of file- based data Interoperability demands common approaches BUT, multitude of formats masks commonality –netCDF, HDF4, HDF5, GRIB, NASA Ames, PP,...

AUKEGGS Canberra, Introduction File-centred data management focusses on the container rather than content File API is fundamental point of reference –binary format details not always exposed or guaranteed –public API may be only supported access mechanism –often implemented as performant optimised native library Conclusion: can’t/shouldn’t migrate

AUKEGGS Canberra, Want to expose information, not format... Introduction

AUKEGGS Canberra, Introduction Information structures may be composed across files

AUKEGGS Canberra, The feature model Common pattern with file-data: –need to integrate information structures across multiple files –(relational tables provide this implicitly) Semantics provide an integration key –e.g. an oceanographer and meteorologist can share a conversation about data despite format differences

AUKEGGS Canberra, The feature model

AUKEGGS Canberra, A model for file-based interoperability Retain file-based persistence format Supplement with feature-based conceptual model ‘Cast’ legacy data onto conceptual model –interoperableData = (featureModel) legacyData Legacy file data + GML-encoded conceptual ‘metadata’ = ‘interoperable view’ –may be exposed through W*S

AUKEGGS Canberra, A model for file-based interoperability GML provides conceptual feature ‘skeleton’ File provides ‘flesh’ GML ‘by-reference’ pattern for property values –uses simple xlink –“The value of a GML property that carries an xlink:href attribute is the resource returned by traversing the link”

AUKEGGS Canberra, xlink review extended xlink [role] [title] local resource D [role] [title] [label] remote resource C [href] [role] [title] [label] remote resource B [href] [role] [title] [label] local resource A [role] [title] [label] arc 1 [arcrole] [title] [show] [actuate] arc 2 arc 3

AUKEGGS Canberra, xlink review simple xlink [role] [title] local resource [role] [title] [label] remote resource [href] [role] [title] [label] arc [arcrole] [title] [show] [actuate]

AUKEGGS Canberra, xlink review ‘role’ (URI): –indicates a property of the remote resource –must be a URI reference that “identifies some resource that describes the intended property” ‘arcrole’ (URI): –describes the “meaning of the arc’s ending resource relative to its starting resource” –corresponds to RDF notion of a property starting-resource HAS arc-role ending-resource

AUKEGGS Canberra, extended xlink xlink patterns for files GML feature instance Aggregation semantics determined by xlink arc traversal rules

AUKEGGS Canberra, simple xlink xlink patterns for files GML feature instance Aggregation semantics determined by storage descriptor

AUKEGGS Canberra, xlink proposal href examples: –netCDF#variable –RDBMS#SQLQuery –GRIBFile#recordNumber –CSMLStorageDescriptor#arrayID <someGMLElement xlink:arcrole="hasRemoteContentEmbeddedAt#localXpath" xlink:href="storageDescriptor#portion" xlink:role="storageSchemaIdentifier" xlink:show="embed" xlink:actuate="onRequest | onLoad"/>

AUKEGGS Canberra, Example GML CR –ISO CV_ReferenceableGrid x y Geodetic longitude x Linear Geodetic latitude x y Linear

AUKEGGS Canberra, Example netCDF ASCII dump: netcdf myfile { dimensions: x = 8 ; y = 5 ; variables: float lon(x) ; lon:long_name = “longitude” ; lon:units = “degrees_east” ; float lat(x,y) ; lat:long_name = “latitude” ; lat:units = “degrees_north” ; float temp(x,y) ; temp:coordinates = “lon lat” ; temp:long_name = “temperature” ; temp:units = “degC” ; data: lon = 13.5, 24.9, 32.4, 37.7, 41.5, 46.8, 54.4, 65.7 ; lat = 53.1, 48.7, 46.2, 44.7, 43.9, 43.3, 43.1, 44.0, 46.2, 43.2, 41.5,...

AUKEGGS Canberra, Example Geodetic longitude x Linear <gml:coordAxisValues xlink:arcrole=“ xlink:href=“myfile.nc#lon” xlink:role=“ xlink:show=“embed”>

AUKEGGS Canberra, Issues Need to ‘get as close as possible’ to target –‘merge’ semantics consistent with GML? (Opportunity: no best practice for GML yet!) “If both a link and content are present in an instance of a property element, then the object found by traversing the xlink:href link shall be the normative value of the property. The object included as content shall be used by the data recipient only if the remote instance cannot be resolved; this may be considered to be a "cached" version of the object.” [GML ]

AUKEGGS Canberra, Issues xlink:href (URI) for remote resource fragment (format- specific) –e.g. RDBMS#SQLQuery, netCDF#variable, etc... xlink:role (URI) for resource format –e.g. reference PRONOM-type format repository? implied conversion to GML target content type xlink:arcrole (URI) for ‘embed remote content’ semantics –‘insert at relative XPath’ essential simple xlink can’t handle multiple resources –application-specific ‘storage descriptor’ schemas for file aggregation semantics

AUKEGGS Canberra, Conclusion Presented a profile for xlink with files in absence of current best practice Meets key practical requirements –retain file-based persistence formats –provide interoperability ‘wrapper’ –focus on logical content, not container (feature model) Semantic governance at appropriate points Enables powerful, scalable mechanism for real data –e.g. large meteorological datasets