Metadata Schema Registries in the Partially Semantic Web: the CORES experience Rachel Heery, Pete Johnston, UKOLN, University of Bath András Micsik, Csaba Fülöp, MTA SZTAKI, Budapest DC-2003, Seattle, Washington, USA 28 September – 2 October, 2003
The CORES experience The CORES project The CORES registry –The registry & the schema creation tool –The registry data model The CORES registry in practice
The CORES project
The CORES project Funded within European Commission FP5 IST Programme Partners –PricewaterhouseCoopers Luxembourg –Fraunhofer Gesellschaft –UKOLN, University of Bath –MTA SZTAKI, Budapest To encourage the sharing of metadata semantics –Standards Interoperability Forum –Metadata Schema Registry
The CORES registry
What is the CORES metadata schema registry? Software application that provides access to information on metadata element sets, their constituent elements, and their use Primarily to support disclosure and discovery, but also reuse Information provided to registry in the form of machine-readable schemas Interfaces for human readers and software applications
What is the CORES metadata schema registry? CORES Registry Schema
The registry data model Builds on earlier work in DESIRE, SCHEMAS projects Dublin Core "grammatical principles" –Element refinement –Encoding Scheme –Resource – Property (DC Element or DC Element refinement) – Value –Value may be associated with Encoding Scheme
The registry data model The concept of the "application profile" –Metadata elements are defined/managed as members of metadata element sets –Implementers “optimise” their use of metadata elements May constrain usage of elements in context May narrow "standard" element semantics May draw elements from multiple element sets
Agency Element Set 1m Encoding Scheme 1 m mm (Controlled Vocab) Value 0m m m 1 m Element Usage 1 m App Profile 1 m Element 1 m m 1
The registry software Software development by –András Micsik & Csaba Fülöp (MTA SZTAKI) Enhancing MEG Registry project software –Dave Beckett & Damian Steer (ILRT) Schema creation tool –Java Swing, Jena RDF toolkit –Forms-based authoring –Save/reload as RDF/XML –"Use" existing Elements by "drag-and-drop" from results of search of registry database –Submit data to registry (HTTP POST)
CORES Schema Creation Tool Search registry for “title” Elements Drag Element to create Element Usage in App Profile
The registry software Registry server –Perl/CGI, Redland RDF application framework –Upload/publication API (HTTP POST) –Simple query API (HTTP GET) Very basic, designed to support Schema Creation Tool –HTML interface Browse, query, annotate, administer
CORES Registry HTML interface
The CORES registry in practice
The CORES registry, August 2003 Registered –11 Element Sets (152 Elements) –86 Encoding Schemes –9 Application Profiles (254 Element Usages) Mostly DC-based Sources –Schemas created using CORES tool –Existing RDFS data published on Web
Using existing RDFS data CORES tools built on registry data model –uses RDFS –but also application-specific RDF vocabulary Schema Creation Tool –does not load "pure" RDFS documents Registry server –does read/index RDFS data –but requires supplementary data to describe application-specific attributes, relations
Using existing RDFS data Use of existing RDFS/RDFVDL data is possible but not straightforward using CORES registry tools
Providing access to source Schemas Schema creation tool –creates descriptions of entities in model –stores descriptions as Schema (RDF/XML) –does not assume one-to-one relations between Element Set and Schema between Application Profile and Schema –does not create description of Schema itself Registry server –has no metadata about Schema –does not maintain record of source of data
Providing access to source Schemas Registry provides access to descriptions of resources submitted, but does not provide direct access to source Schemas Suggest adding Schema to model as entity, amend tools to generate/use rdfs:isDefinedBy statements
Availability of CORES registry server(s) CORES as fixed-term project –Unable to guarantee long-term availability of current CORES registry service –Schema owners reluctant to invest effort in creating data for registry Distinction between –Schemas created using the CORES tools –Service currently provided by MTA SZTAKI Guaranteed by "Persistence Policy" till mid 2004 –The CORES registry software Available from Sourceforge
Availability of CORES registry server(s) Continued availability of Schemas is independent of availability of SZTAKI registry Other service provider can provide registry server using CORES registry software –Re-index existing Schemas from Web Data also available to other RDF/RDFS applications –But N.B. Schema Creation Tool uses CORES RDF vocabulary
Shared models for metadata Registry based on DC metadata model But metadata vocabularies may have own (unrelated) metadata model –"Element" != CORES Element/RDF Property Some metadata vocabularies defined primarily in terms of XML tree structure –XML element != CORES Element/RDF Property Even where vocabulary has RDF expression, additional effort –e.g. RDFS Class != CORES Encoding Scheme
Shared models for metadata May be possible to map from "native" model to CORES model, but –requires element-by-element analysis –different process for every vocabulary –arguably, result of limited value to implementer working only with "native" model Conversely, registry can not "predict" structural conventions of arbitrary XML encodings –Application Profile metadata includes optional pointer to XML Schema
Integrity and trust Basis of Application Profile is reuse of existing Elements, Encoding Schemes –resources defined/published by others in a global space Expectation that the URI will continue to denote what it denotes at the time of reuse Requires level of trust –in source/owner of URI –in mediating service (registry) that exposes metadata about that resource
Conclusions Registry data model is simplification of complex reality –Good "fit" for DC Application Profiles –More problematic where models diverge Application-specific RDF vocabulary does limit interoperability –Review in light of recent RDF specs, OWL Trust issues require work Shared model is critical, especially where reuse encouraged
Acknowledgements UKOLN is funded by Resource: the Council for Museums, Archives and Libraries, the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath where it is based.
Metadata Schema Registries in the Partially Semantic Web: the CORES experience Rachel Heery, Pete Johnston, UKOLN, University of Bath András Micsik, Csaba Fülöp, MTA SZTAKI, Budapest DC-2003, Seattle, Washington, USA 28 September – 2 October, 2003