Thesauri, Terminologies and the Semantic Web
CCLRC Council for the Central Laboratory of the Research Councils (CCLRC) Big Science Synchrotron Radiation Sources Lasers Pulsed Neutron Source Large-scale IT demands: tera-scale data, computation Strong IT R&D programme BITD: Business and Information Technology Department
SWAD-E Semantic Web Advanced Development for Europe (SWAD-E) EU Project W3C Semantic Web Activity R&D Demos & Apps Guidelines & Best Practises Partners: HP Labs, ILRT, Stilo, ERCIM (INRIA), CCLRC
Semantic Web Current Web: Semantic Web: Why? Enabling technologies: Web of information for humans Semantic Web: Web of data for computers Why? Automation, organisation, search Enabling technologies: RDF: Resource Description Framework Data linking, low-level semantics OWL: Web Ontology Language High-level semantics, inference
SWAD-E Thesaurus Activity Why is SWAD-E interested in thesauri? Large body of well-engineered knowledge Enrich & bootstrap semantic web SWAD-E Thesaurus Activity Design Schemas for thesaurus data Guidelines for use and migration Supporting technologies and demos
Exploiting the Semantic Web What can you get out of the semantic web? Integration & Connectivity Data Interoperability Application Interoperability N.B. The Semantic Web is a Tool.
Integration & Connectivity Recurring use-case: high/low-level thesauri E.g. GCL & Cultural Heritage thesauri E.g. Gerry’s macro/micro thesauri RDF => Data linking Can create large linked thesaurus structures – super-thesauri Linking solves problem of concurrent versions
Data Interoperability Interoperability is a major goal Move to XML technologies is a step in the right direction But … XML does not equal Interoperability [N.B. plethora of XML formats for thesauri]
SKOS-Core: RDF Schema for Thesauri Introducing: SKOS-Core 1.0 RDF Schema for Thesauri SKOS: Simple Knowledge Organisation Systems The First Challenge: coping with variety in thesaurus design and structure Allow unique features Support interoperability The Second Challenge: interoperating with/migrating to ontologies, taxonomies, classification schemes etc. N.B. The semantic web is where data collide
SKOS Meta-Model SKOS Meta-Model: Concept-orientation Concepts (given URIs) Labels (pref, alt) (symbols) Concept Schemes Semantic Relations (extensible set) Broader, narrower, related … Semantic Mappings (extensible set) Exact, inexact … Scope notes, definitions, depictions Infer meaning of concept from labels, scope notes, definitions, depictions & neighbours. skos:semanticRelation sub-property of skos:broader sub-property of skos:broaderInstantive
SKOS Meta-Model
Example: GEMET Non-Standard Features: Solution: Extend SKOS-Core: Groups & Super-Groups Themes Solution: Extend SKOS-Core: Class gemet:Group Sub-class of skos:Concept Class gemet:Theme Property gemet:broaderTheme Sub-property of skos:broader Property gemet:broaderGroup Sub-propery of skos:broader skos:semanticRelation sub-property of skos:broader sub-property of gemet:broaderGroup
Approach to Non-Standard Thesauri Design schema as extension to SKOS-Core Preserve unique features Can interoperate with anything based on SKOS-Core Have your cake and eat it!
Example: MeSH Medical Subject Headings Thesaurus features: Semantically ambiguous concept hierarchy Ontology features: Semantic typing of concepts, e.g. Calcymycin type Antibiotic Use combination of SKOS and OWL to represent this hybrid structure RDF, SKOS & OWL means you can migrate a thesaurus to ontology merely by adding statements (No re-engineering or transformation is required)
Interoperability: Thesaurus Mapping Common use-case: overlapping thesauri Mappings support interchangeable use of overlapping thesauri Introducing: SKOS-Mapping RDF Schema for inter-thesaurus mapping
SKOS-Mapping Sets of Resources in which Concept Occurs Exact Inexact Major Minor Partial Broad Narrow Ordered Combinators: AND, OR, NOT
Data Interop Summary SKOS-Core supports interoperability of KOS with variable structures SKOS-Mapping supports interoperability of overlapping KOS
Multilinguality Analyse multilingual thesaurus into language components Multilingual Labelling Use SKOS-Core E.g. GEMET Inter-lingual Mapping Use SKOS-Core + SKOS-Mapping E.g. AAT, Merimee
Multilinguality (2) Analyse each language component Multilingual Labelling Interlingual Mapping
Application Interoperability Move to modularisation and distribution via web services is step in the right direction But … Web services does not equal application interoperability Web service API Community driven design Endorsed by wider community
SKOS API SKOS API Participate on public mailing list: Interface to terminology web service Under development (pre-release is online) Participate on public mailing list: public-esw-thes@w3.org
Web Service Implementation Reference Implementation of SKOS API Leverage power of sem-web tools E.g. RDF Query E.g. transitive closure of broader concepts SOAP Service Back end: Sesame RDF Store Modular design Semantic Web Services
Future Issues Social aspects Change Management Semantic web technology supports community driven development of thesauri But … validation? Change Management Versioning Evolution
Summary Semantic Web Technologies are tools SKOS-Core & SKOS-Mapping support data interoperability SKOS API supports application interoperability W3C Semantic Web Best Practises Working Group Thesaurus Task Force
SWAD-E & Eco-informatics Work with thesaurus developers and managers Publish RDF encodings of current thesauri Test design and coverage of SKOS-Core Community building for development of web service API
Thank You Links: SWAD-Europe Thesaurus Activity http://www.w3.org/2001/sw/Europe/reports/thes/ SKOS-Core 1.0 Guide http://www.w3.org/2001/sw/Europe/reports/thes/1.0/guide/ SKOS API http://www.w3.org/2001/sw/Europe/reports/thes/skosapi.html SKOS-Core 1.0 Guidelines for Migration http://www.w3.org/2001/sw/Europe/reports/thes/1.0/migrate/ Public Thesaurus Mailing list public-esw-thes@w3.org W3C Semantic Web Activity http://www.w3.org/2001/sw/ SWAD-E Project http://www.w3.org/2001/sw/Europe/