STITCH project CATCH User Group January 30th 2007.

Slides:



Advertisements
Similar presentations
OMV Ontology Metadata Vocabulary April 10, 2008 Peter Haase.
Advertisements

Ontology Assessment – Proposed Framework and Methodology.
Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop November 2007.
OAEI 2007: Library Track Results Antoine Isaac, Lourens van der Meij, Shenghui Wang, Henk Matthezing Claus Zinn, Stefan Schlobach, Frank van Harmelen Ontology.
Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book.
Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC STITCH Project SIKS Semantic Web Seminar, Utrecht April 11 th, 2007.
STITCH final event KB July Agenda Brief presentation of STITCH main achievements Demo: annotation suggestion at KB The future use of STITCH results.
Using XSLT for Interoperability: DOE and The Traveling Domain Experiment Monday 20 th of October, 2003 Antoine Isaac, Raphaël Troncy and Véronique Malaisé.
Why, what were the idea ? 1.Create a data infrastructure, 2.Data + the knowledge products that are produced on the basis of data a) Efficiant access to.
Interoperability Aspects in Europeana Antoine Isaac Workshop on Research Metadata in Context 7./8. September 2010, Nijmegen.
Multilingual multimedia thesaurus for conservation and restoration collaborative networked model of construction Lucijana Leoni University of Dubrovnik.
Galia Angelova Institute for Parallel Processing, Bulgarian Academy of Sciences Visualisation and Semantic Structuring of Content (some.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
SKOS and Linked Data Antoine Isaac ISKO, London, Sept. 14th 2010.
A web-based repository service for vocabularies and alignments in the Cultural Heritage domain Lourens van der Meij Antoine Isaac Claus Zinn.
Notes on ThoughtLab / Athena WP4 November 13, 2009 Antoine Isaac
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
Aligning Thesauri for an integrated Access to Cultural Heritage Collections Antoine ISAAC (including slides by Frank van Harmelen) STITCH Project UDC Conference.
The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context Antoine Isaac, Claus Zinn, Henk Matthezing, Lourens van der Meij, Stefan.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Multi-Concept Alignment and Evaluation Shenghui Wang, Antoine Isaac, Lourens van der Meij, Stefan Schlobach Ontology Matching Workshop Oct. 11 th, 2007.
Vocabulary Matching for Book Indexing Suggestion in Linked Libraries – A Prototype Implementation & Evaluation Antoine Isaac, Dirk Kramer, Lourens van.
Putting ontology alignment in context: Usage scenarios, deployment and evaluation in a library case Antoine Isaac Henk Matthezing Lourens van der Meij.
SemanTic Interoperability To access Cultural Heritage Frank van Harmelen Henk Matthezing Peter Wittenburg Marjolein van Gendt Antoine Isaac Lourens van.
Accessing Cultural Heritage using Semantic Web Techniques Antoine ISAAC VU Amsterdam - KB Digital Access to Cultural Heritage Master March 20 th, 2008.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Some facets of knowledge management in mathematics Wolfram Sperber (Zentralblatt Math) Patrick Ion (Math Reviews) Facets of Knowledge Organization A tribute.
Carlos Lamsfus. ISWDS 2005 Galway, November 7th 2005 CENTRO DE TECNOLOGÍAS DE INTERACCIÓN VISUAL Y COMUNICACIONES VISUAL INTERACTION AND COMMUNICATIONS.
Networking Session: Global Information Structures for Science & Cultural Heritage - The Interoperability Challenge «INTEROPERABILITY FROM THE CULTURAL.
1/ 27 The Agriculture Ontology Service Initiative APAN Conference 20 July 2006 Singapore.
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Towards Translating between XML and WSML based on mappings between.
Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Europeana and semantic alignment of vocabularies Antoine Isaac Jacco van Ossenbruggen, Victor de Boer, Jan Wielemaker, Guus Schreiber Europeana & Vrije.
Multilingual Information Exchange APAN, Bangkok 27 January 2005
D4: SKOS and HIVE—Enhancing the Creation, Design and Flow of Information Speakers: Hollie White Jane Greenberg Coordinator: Alan Keely.
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
DDI-RDF Leveraging the DDI Model for the Linked Data Web.
Incorporating ARGOVOC in DSpace-based Agricultural Repositories Dr. Devika P. Madalli & Nabonita Guha Documentation Research & Training Centre Indian Statistical.
Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies.
1 Everyday Requirements for an Open Ontology Repository Denise Bedford Ontolog Community Panel Presentation April 3, 2008.
Very Large Cross-lingual Resources at OAEI 2008 Laura Hollink Véronique Malaisé Vrije Universiteit Amsterdam.
Logics for Data and Knowledge Representation
Using Several Ontologies for Describing Audio-Visual Documents: A Case Study in the Medical Domain Sunday 29 th of May, 2005 Antoine Isaac 1 & Raphaël.
A Systemic Approach for Effective Semantic Access to Cultural Content Ilianna Kollia, Vassilis Tzouvaras, Nasos Drosopoulos and George Stamou Presenter:
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
AGROVOC Thesaurus. 1980s: developed as multilingual structured thesaurus for agricultural terminology (“rice”) : parallel effort to express thesaurus.
1 Understanding Cataloging with DLESE Metadata Karon Kelly Katy Ginger Holly Devaul
Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar
Working with Ontologies Introduction to DOGMA and related research.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
Text Analytics in Action: Using Text Analytics as a Toolset TBC 4:15 p.m. - 5:00 p.m. Marjorie Hlava Semantic enrichment / Semantic Fingerprinting.
2.An overview of SDMX (What is SDMX? Part I) 1 Edward Cook Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October 2015.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Trait ontology approach Marie-Angélique LAPORTE NCEAS June 7 th 2010.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Objectives and scope of semantic enrichment and tools Europeana v1.0 work package 3 meeting Berlin, 25/26 January 2010 Stefan Gradmann / Marlies Olensky.
Characterizing Knowledge on the Semantic Web with Watson Mathieu d’Aquin, Claudio Baldassarre, Laurian Gridinoc, Sofia Angeletou, Marta Sabou, Enrico Motta.
Ontologies COMP6028 Semantic Web Technologies Dr Nicholas Gibbins
D3.4 Report on Cross-Language Subject Access Options Subject access seminar, Prague Patrice Landry Swiss National Library.
Linked Open Data Approaches within the ARIADNE project
WP5: Semantic Multimedia
Conceptualizing the research world
Applications of IFLA Namespaces
From a thesaurus standard to a general knowledge organization standard?! 04/12/2018.
Presentation transcript:

STITCH project CATCH User Group January 30th 2007

Summary Global presentation of project and past work Cases the project is currently focusing on Scientific problems STITCH between scientific research and CH domain

Summary Global presentation of project and past work General motivations Pilot project Cases the project is currently focusing on Scientific problems STITCH between scientific research and CH domain

Motivation Current CH trend: portals that build on heterogeneous collections Different databases Documents described/accessed according to different points of view (controlled vocabularies/MD schemes)

CH Interoperability Problems Current CH trend: portals that build on heterogeneous collections Different databases/vocabularies/MD schemes Syntactic interoperability problem is being solved Access can be granted, cf. deployed portals Semantic interoperability still to be addressed Links with original vocabularies/MD structures are lost

STITCH General Goals [SemanTic Interoperability To access Cultural Heritage] Allow heterogeneous CH collections to be accessed In a seamless way Still benefiting from specific collection commitments Keeping original metadata schemes and vocabularies

STITCH General Goals (2) Allow heterogeneous CH collections to be accessed In a seamless way Still benefiting from specific collection commitments Keeping original metadata schemes and vocabularies Using Semantic Web means for Representation of the different points of view in one system Creation and use of the alignment knowledge 2 methodological concerns Generalize as much as possible Automatize as much as possible

Summary Global presentation of project and past work General motivations Pilot project Cases the project is currently focusing on Scientific problems STITCH between scientific research and CH domain

Experiment On a reduced scale 2 collections and associated vocabularies Output wished: insights on Use of SW off-the-shelf techniques with CH-specific resources Impact of turning to standard proposals (SW-linked tools and methods) In a context of natural semantics (thesauri) Added value of this effort Quantitative and qualitative evaluation Simple prototype for accessing documents

1 st Collection: KB Illustrated Manuscripts

2 nd Collection: Rijksmuseum ARIA collection

Experiment Steps

Steps

Gathering vocabulary and collection data Analyzing it Transforming it using SW standards All record/vocabulary information in one repository

SKOS Simple Knowledge Organisation Systems Model to represent traditional vocabularies (thesauri, classification schemes) on the Semantic Web Classes and properties to create XML/RDF data Concepts and Concept schemes Lexical properties (prefLabel, altLabel) Semantic relations (broader, related) Notes (scopeNote, definition)

Vocabulary Formalisation: ARIA in SKOS

Steps

Provide mappers with vocabulary data Proceed to evaluation/selection of their results Put the alignment in the repository

Automatic Ontology Matching Techniques Generally aiming at recognizing equivalence or subsumption links between ontology elements Lexical Labels of entities, textual definitions Structural Structure of the formal definitions of entities, position in the hierarchy Statistical Objects, instantiation of the concepts Shared background knowledge (“oracles”) Using conceptual references to deduce correspondences Most mapping tools use a mix of such approaches E.g. lexical string matching can ignite a structural alignment process brainLongtumor Long

Collection Integration: Ontology Mapping Tools Tests with 2 mapping tools S-Match, Trento Tree-like structures mapper Falcon-AO, Nanjing Standard OWL ontology mapper Using Lexical comparisons Structural comparisons Third resource (Wordnet as ‘oracle’)

Mappings

Steps Adapted faceted browsing paradigm (Flamenco) Search by navigating through several dimensions Adaptation of the paradigm: From facets corresponding to orthogonal dimensions of object description (‘material’, ‘location’) to facets corresponding to different conceptual schemes (ARIA, IconClass) 3 views (sets of facet definitions) on integrated collections Single view Combined view Merged view

Collections Access: Single View Facets based on 1 concept scheme Access to objects indexed against concepts from other schemes If mapping between their index and the selected concepts A single point of view on integrated data set

Collections Access: Combined View Search based on 2 concepts schemes Facets attached to the different vocabularies are presented Simultaneous access from different points of view on the same data

Collections Access: Merged View Facets using a merged concept scheme with hierarchical links coming from schemes and alignment Making the links between vocabularies more visible during search A way to ‘enrich’ weakly structured vocabularies

Collection Access: demo

Summary Global presentation of project and past work Cases the project is currently focusing on KB internal case Illuminated Manuscripts from KB and BNF Iconclass Scientific problems STITCH between scientific research and CH domain

KB books, collections and vocabularies

KB Vocabularies Brinkman Large (5200 terms) Weakly structured 1-level deep hierarchy GTT general subjects Huge (35000 terms) Very weakly structured : 0.5-level deep hierarchy [NBC] Large (2000 classes) Weakly but regularly structured (balanced 2-level deep classification) Common point: almost standard thesaurus information Associative relationships (RT) Synonyms/non-preferred terms Scope notes

KB Vocabularies Thv zzz "het zoeken van patronen, regelmatigheden of zelfs kennis in databases. De inductie van begrijpelijke modellen en patronen uit databases" databanken kunstmatige intelligentie data mining knowledge discovery in databases KDD ICT-zakboekje

KB Aim Integration of GTT and Brikman One single subject vocabulary instead of two Requirement: keeping links to old indexing subjects Thesaurus refinement Focusing on KB scientific interest (humanities) Re-structuring thesaurus: more hierarchical links top terms for GTT!

Summary Global presentation of project and past work Cases the project is currently focusing on KB internal case Illuminated Manuscripts from KB and BNF Iconclass Scientific problems STITCH between scientific research and CH domain

Manuscripts, 1 st Collection: KB Illustrated Manuscripts

Manuscripts, 2 nd Collection: BNF Mandragore

Manuscripts vocabularies Mandragore Huge (16000 terms) Weakly structured (2-level deep, multi-inheritance) Alternative lexical forms Definitions IconClass Huge (>24000 subjects) Richly structured : 10 level hierarchy, cross-references Compound concepts: keys, structural digits… Keywords

Manuscripts, Aim Integrated access All illuminations via Mandragore vocabulary All illuminations via Iconclass vocabulary

Summary Global presentation of project and past work Cases the project is currently focusing on KB internal case Illuminated Manuscripts from KB and BNF Iconclass Scientific problems STITCH between scientific research and CH domain

Iconclass Iconclass contains complex information Links between normal subjects and possible qualifiers Compound concepts Local extensions Existing representation is mostly text-based Aims Building a Semantic Web-enabled complete representation Dedicated ontology Conversion process implemented -> 1.2 M RDF triples Providing this representation as a (web) service As well as a standard SKOS version

Summary Global presentation of project and past work Cases the project is currently focusing on Scientific problems Solving representation heterogeneity Solving conceptual heterogeneity Evualuation STITCH between scientific research and CH domain

Steps Gathering vocabulary and collection data Analyzing it Transforming it using SW standards All record/vocabulary information in one repository

Conversion into RDF specific model and SKOS BNF export tagInterpretationConversion to Record-specific RDFS model SKOS interpretation Descriptor inScheme #descripteursSchemeinScheme #mandragoreScheme libellelabelmd:hasPreferredLabelprefLabel xml:lang="fr" descriptionDefinition notemd:hasDefinitiondefinition formes- rejetees/forme- rejetee/libelle (+ optional /description) Rejected label (and optional description) md:hasRejectedForm which points at an anonymous [RejectedForm] resource with hasRejectedLabel and hasRejectedLabelDefinition with textual values respectively set to the content of libelle and description elements (using rdf:parseType=Resource) altLabel xml:lang="fr" + definition notesComplementary definitionmd:hasNotenote codes-dewey/code- dewey Thematic classification (given by DDC code which is attached to a classification element) md:hasThematicClassificationbroader

Transformation into SKOS Example grégoire 11 pierre roger de beaufort cardinal diacre de sainte-marie-nouvelle, pape Conversion of thesauri main features Preferred and alternative labels Semantic relationships (BT, RT) Notes (scope notes, definitions)

Collection Formalization Problems Interpreting and representing vocabularies using formal standards is hindered by expressivity variation Complex models Non-standard features Fuzzy structures, weakly structured Some information is lost when converting to SKOS Qualifiers Compound concepts Relation between terms (not only concepts) We kept complete models Adhoc ontologies, cf. Iconclass

Collection Formalization Problems System-specific conversions were done Depending on application environment Standard RDFS expressivity and implemented tools Depending on the mapping tools, which might make different hypotheses on the nature of knowledge to align OWL classes vs. nodes in trees

Summary Global presentation of project and past work Cases the project is currently focusing on Scientific problems Solving representation heterogeneity Solving conceptual heterogeneity Evaluation STITCH between scientific research and CH domain

Steps Provide mappers with vocabulary data Proceed to evaluation/selection of their results Put the alignment in the repository

Lessons learned: Collection Integration We have ontology mappers, not thesaurus mappers Input: pre-processing to pure RDFS/OWL ontologies Mapping process Using resources that may be absent from CH vocabularies Rich formal/structural information Not (properly) using all information found in CH vocabularies E.g. rich lexical information Output: needs re-interpretation of mapping relations

Automatic Ontology Matching Techniques Generally aiming at recognizing equivalence or subsumption links between ontology elements Lexical Labels of entities, textual definitions Structural Structure of the formal definitions of entities, position in the hierarchy Statistical Objects, instantiation of the concepts Shared background knowledge (“oracles”) Using conceptual references to deduce correspondences Most mapping tools use a mix of such approaches E.g. lexical string matching can ignite a structural alignment process brainLongtumor Long

Alignment: lessons learned from previous experiments Lexical approaches Should use everything in a thesaurus Structural approaches Useless (even harmful) in a context where hierarchical information is weak Using background knowledge Needs to find proper resource/dictionary (Wordnet) Statistical approaches Needs dually classified data

Alignment: here Lexical approaches Should use everything in a thesaurus Structural approaches Useless (even harmful) in a context where hierarchical information is weak Using background knowledge Needs to find proper resource/dictionary (Wordnet) Statistical approaches Needs dually classified data

Lexical alignment: Manuscripts case [Monolingual case, since IC comes in French] Basic label comparison Preferred labels Alternative labels Going beyond labels Labels and definitions IC keywords and Mandragore labels Lexical information as bags-of-words Words in IC (glossy labels) found in Mandragore labels, and vice versaWords in IC (glossy labels) found in Mandragore labels Words in Mandragore definitions found in IC labels

Lexical alignment: Manuscripts case From 430 to found matches Some redundant Some comparisons bringing quite some noise Interesting is that we have a gradation Interesting coverage for the application Mandragore terms accessible from an IC term IC terms accessible from a MG term Fuzziness of original hierarchies allows for (associative) noise Problems: Better NLP treatments (e.g. lemmatization) Choice of proper alignment link depending on the features compared

Lexical alignment: Manuscripts case broaderEquivalent

Demo Corn

Statistic approach: KB case

Comparing documents indexed with BK concepts and documents indexed with GTT concepts Overlap measure

Statistic approach: problems Finding threshold to filter resultsresults Taking into account thesaurus use Levels of indexing are different Statistical significance is not granted Overlap measure is less significant when concepts are used only a few times

Using background knowledge Interesting research BK brings additional structural semantics to concepts BK brings more lexical knowledge (synonyms) in the loop Problem: needs to find proper resource/dictionary Domain-specific vocabularies Language-specific vocabularies

First experiments on anchoring GTT to Wordnet [with Véronique Malaisé, CHOICE] Setting Using an online Dutch-English dictionary Comparing translations (and definitions) found with Wordnet content Nice feature: many GTT are already manually translated Results Poor recall: 9% of concepts for which there was a manual translation were anchored to WN Problems: Encoding Domain-specificity Complex terms Results are better with another vocabulary

Other Dutch vocabularies hanging around

Summary Global presentation of project and past work Cases the project is currently focusing on Scientific problems Solving representation heterogeneity Solving conceptual heterogeneity Evaluation STITCH between scientific research and CH domain

A transversal problem: evaluation Assessing quality of mapping In a specific context Taking into account Use of thesauri (indexing levels) Integration aim (hierarchical browsing) Designing evaluation tools Methods to evaluate samples to guide mapping process at a low cost And yet have statistical relevance

Summary Global presentation of project and past work Cases the project is currently focusing on Scientific problems STITCH between scientific research and CH domain Pushing research results to the CH world Concrete collaborations Bringing domain problems to the research community

Pushing Research Results to the CH World Paper publications European Conference on Digital Libaries 2006 Informatie Professional 2006 Dissemination papers on SKOS and OWL Talks, demonstrations done and planned Digital Erfgoed Conference BNF KB RNA demo middag UDC seminar Lecture for Masters on Book & Digital Media (Leiden) SKOS CATCH day

Summary Global presentation of project and past work Cases the project is currently focusing on Scientific problems STITCH between scientific research and CH domain Pushing research results to the CH world Concrete collaborations Bringing domain problems to the research community

Concrete Collaborations Collaborations with CH institutes Digitaal Erfgoed Nederland Creation of a thesaurus inventory questionnaire KB experts Illuminated Manuscripts Operational departments BNF Illuminated Mansucripts Rijksbureau Kunsthistorische Documentatie Iconclass [Illuminare (Leuven)]

Pushing Research Results to the CH World CH-oriented research projects The European Library Research proposal on multilingual thesaurus alignment CATCH Rijksmuseum collections (CHIP) Anchoring GTT to Wordnet (CHOICE) Metadata Recommendation (CHOICE, MITCH) Iconclass and GTAA Service (CHOICE) Mapping and vocabulary repository (CHOICE) RNA

Summary Global presentation of project and past work Cases the project is currently focusing on Scientific problems STITCH between scientific research and CH domain Pushing research results to the CH world Concrete collaborations Bringing domain problems to the research community

Confrontation of existing SW tools to real CH data VU talks and collaborations External collaboration (Trento) Papers BNAIC SWI Prolog and the Web (Theories and Practices for logic programming) Participation in W3C Semantic Web Deployment working group Editor of SKOS use cases and requirements document Contribution of Manuscript and Iconclass use cases

Free discussion