The role of registries within a spatial data infrastructure Simon CoxRob Atkinson Research ScientistSpatial Architect 16 April 2008
CSIRO EGU2008-A Cox SOA Outline Spatial Data Infrastructure ~ Cyberinfrastructure Brief comment on state of SDI deployment Analysis examples Metadata Concept identifiers Expanded role for registries
CSIRO EGU2008-A Cox SOA SDIs
CSIRO EGU2008-A Cox SOA Spatial Data Infrastructure ideal Goal: automated workflow/service-chain composition on- demand matching clients to services Matching services and clients requires components to be described to a high level of detail service type content that it exposes Schema vocabularies queries that it supports response formats quality of service … “Service classification axes”
CSIRO EGU2008-A Cox SOA SDI reality Some service instances OGC-WFS, WMS, WCS, OpenDAP Dataset metadata directories Clearinghouses, GEON, ESIP, ASDD, Go-Geo … Is this enough to achieve the goal? No Is the number of registered resources growing? Not enough Are the right resources being registered? No Why not? Governance patterns not resolved Metadata is insufficient, but creating it is too hard Semantic interoperability requires community agreements See Markup/Standards-based methodology paper
CSIRO EGU2008-A Cox SOA Metadata
CSIRO EGU2008-A Cox SOA Metadata capture Everyone agrees that metadata is a good idea But researchers are reluctant to provide it Why? Not integrated with workflow No perceived reward Researchers don’t themselves rely on metadata-based discovery systems Tedious to create …
CSIRO EGU2008-A Cox SOA Is it the metadata models? Standards are complex … but they are also highly normalized
CSIRO EGU2008-A Cox SOA It’s the implementation! Records are usually de-normalized A better way Records refer to externally governed elements Geoscience Australia (GA) Director, Sales and Distribution, CIMA GPO Box 378 Canberra ACT 2601 Australia custodian Each record reproduces every element Each repository assumes governance of all the elements
CSIRO EGU2008-A Cox SOA Normalized records distributed governance Metadata records should primarily consist of a set of references -Use keyboard only for title/label & abstract/description! -Drop-down lists for everything else -List == (online) register Separate registers for key classes, e.g. Responsible party Access conditions Feature types … These registers are under independent governance Access Federation data standard licenses published community schemas Infrastructure
CSIRO EGU2008-A Cox SOA Identifiers
CSIRO EGU2008-A Cox SOA CGI persistent identifiers IUGS Commission for Geoscience Information GeoSciML Testbed III Interoperable WFS from 10 geological surveys USGS, GSC, BGS, BGR, GA, GSV, SGU, APAT, GSJ, AzGS More logos here Interoperability levels: Schematic/model – common XML Schema GeoScML v2.0 - see other paper in this conference Semantic – common vocabularies
CSIRO EGU2008-A Cox SOA GeoSciML Example … GSNSW Mafic volcaniclastic sandstone, siltstone, shale, chert; minor limestone, conglomerate Kabadah Formation Ojck urn:cgi:feature:GA:Stratno:29570 published description typicalNorm urn:cgi:classifier:ICS:StratChart:2004:Ordovician unspecified … Most property values are references to registers Common values interoperability
CSIRO EGU2008-A Cox SOA Concept identifiers Concepts are denoted by language-neutral identifiers Identifiers must be universal and persistent urn:ogc:def:crs:EPSG:6.14:4326 urn:cgi:classifier:ICS:StratChart:2008:ediacaran urn:cgi:classifierscheme:ICS:StratChart:2008 urn:cgi:schema:CGI:GeoSciML:2.0 urn:cgi:featuretype:CGI:GeoSciML:2.0:GeologicUnit urn:cgi:feature:USGS_NGDM:Id56jn23
CSIRO EGU2008-A Cox SOA “Controlled vocabularies” Concepts exist in context urn:ogc:def:crs:EPSG:6.14:4326 urn:cgi:classifier:ICS:StratChart:2008:ediacaran urn:cgi:classifierscheme:ICS:StratChart:2008 urn:cgi:schema:CGI:GeoSciML:2.0 urn:cgi:featuretype:CGI:GeoSciML:2.0:GeologicUnit urn:cgi:feature:USGS_NGDM:Id56jn23 Term from versioned vocabulary owned by an organization Feature type defined in a schema owned by an organization
CSIRO EGU2008-A Cox SOA Governance The governance arrangements require separate registers of: Classifiers, classifier-schemes Resource classes – def, classifier, schema, featuretype, feature, Concept owners – EPSG, ICS, CGI, USGS_NGDM and are reflected in/enforced by the structure of the persistent identifier urn:ogc:def:crs:EPSG:6.14:4326 urn:cgi:classifier:ICS:StratChart:2008:ediacaran urn:cgi:classifierscheme:ICS:StratChart:2008 urn:cgi:schema:CGI:GeoSciML:2.0 urn:cgi:featuretype:CGI:GeoSciML:2.0:GeologicUnit urn:cgi:feature:USGS_NGDM:Id56jn23
CSIRO EGU2008-A Cox SOA Structured vs. opaque identifiers? URN vs DOI? Use structured identifiers for strongly governed concepts, system resources Slow rate of change, many references, Identifiers must be stable Resolution often not needed Useful if they are memorable Use opaque identifiers for weakly governed data resources Frequent update, few references Data & identifiers may be transient Should be easily resolvable Don’t need to be memorable
CSIRO EGU2008-A Cox SOA Summary
CSIRO EGU2008-A Cox SOA Key points Many controlled vocabs + other lists required for an infrastructure Each is typically under independent governance Almost all “lists” (and ontologies) should be managed as “registers” Semantic web (AI for C21?) hopes to do this automatically? Agreements (standards) are possible in the context of coherent technical communities To enable an infrastructure, we need a lot of registers These must Use persistent identifiers for both registers and contents Be resolvable Have transparent governance arrangements
CSIRO EGU2008-A Cox SOA ISO Register Organization model
Contact Us Phone: or Web: Thank you Exploration & Mining Simon Cox Research Scientist Phone: Web: Land & Water Rob Atkinson Spatial Architect Phone: