Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop November 2007
Agenda TELPlus Context Improving subject access –3 sub-tasks Services for TEL
TELPlus Context Started October 2007 Running 27 months Content WPs –OCRing previously digitised material –Improving the usability of TEL through OAI PMH compliancy –Improving Access –Integrating services with TEL portal –User personalisation services –Extending TEL to Bulgaria & Romania
WP3 – Improving Access Task 1: Indexing for usability –Review/test state-of-the-art semantic search engines On content of documents Task 2: Improving subject access Task 3: FRBR aggregation, search and browsing –Create/exploit FRBR metadata repositories Task 4: Focus on users –Focus groups on prototypes
WP 3 Task 2 – Improving Subject Access Improving subject access via semantic alignment between subjects Search through collections –Using metadata –In a controlled setting Paving the way for enhanced usages –Advanced treatments mentioned in TELplus need conceptual structures and links between these structures E.g. clustering
WP 3 Task 2 – Improving Subject Access Improving subject access via semantic alignment between subjects Reference: MACS project –Manually-built semantic equivalences between Rameau, SWD & LCSH headings
MACS: Querying Collections
MACS: Query Reformulation Options
WP 3 Task 2 – Improving Subject Access Improving subject access via semantic alignment between subjects Reference: MACS project –Manual equivalences between Rameau, SWD, LCSH headings Here: an experiment on deploying automatic alignment techniques –Determining possible strategies –Assessing feasibility and usefulness –MACS context
WP3.2 Sub-tasks Converting the subjects to standard representation language –Semantic web format (SKOS) Aligning the vocabularies –Semantic correspondences between subjects Deploying the alignment knowledge obtained into TEL framework –E.g. using links to reformulate queries from one subject list to the other
Converting subjects to standard representation language Goal: solving syntactic heterogeneity between vocabularies Enabling the use of standard tools –E.g. for query (re)formulation Paving the way for dealing with semantic heterogeneity –Definitions of concepts expressed according to a common model
Converting subjects to standard representation language Approach: Semantic Web and SKOS Semantic Web –Knowledge objects as web resources (URIs) –Description by linking resources (RDF) –Description using shared formal vocabularies (ontologies) SKOS –A standard Semantic Web model (ontology) –For knowledge organization systems (thesauri, subject heading lists…)
skos:Concept rdf:type skos: broader skos: prefLabel the Virgin skos: prefLabel la Vierge skos: inScheme skos:ConceptScheme rdf:type SKOS: Example
Converting subjects to standard representation language - Process Getting processable versions from owners –E.g. XML Analyzing the models Converting to SKOS
WP3.2 Sub-tasks Converting the subjects to standard representation language –Semantic web format (SKOS) Aligning the vocabularies –Semantic correspondences between subjects Deploying the alignment knowledge obtained into TEL framework –E.g. using links to reformulate queries from one subject list to the other
Vocabulary Alignment Specifying required alignment format (links) –Type of mapping links: equivalence, broader –Cardinality: one-to-one, one-to-many –Taking application context (TEL) into account
Vocabulary Alignment Specifying required alignment format (links) Selecting (& running) alignment techniques/tools –Inspired by semantic web approaches
Vocabulary Alignment Techniques Similar to ontology alignment problem Existing approaches for (semi-) automatic ontology alignment –Using techniques from linguistics, computer science, statistics Problem: performances do not allow 100% automatic alignment Problem: multilingual case –Some techniques cannot be used
Background knowledge Potential Technique: Using Background Knowledge Using a shared conceptual reference to find links SHL 1 SHL 2 Calendar Publication
Potential Technique: Statistical Alignment Object information (book indexing) SHL 1SHL 2 Dually-indexed books Dutch Literature Dutch
Vocabulary Alignment Specifying required alignment format (links) Selection (& running) of tool/method Evaluation (& cleaning) –Considering application
Evaluation of Alignments MACS has produced mappings! –Possible gold standard But: has MACS produced all mappings? –Which proportion of the SHLs is covered? –Taking into account all indexing strings? Are MACS mappings the only interesting ones? –Serendipity mappings Concepts that are not equivalent but could bring useful results when added to queries –Compensating for indexing variability
Evaluation of Alignments Several scenarios for using and evaluating alignments –Concept-based search –Re-indexing –Integration of one SHL into the other –SHL Merging –Free-text search –Navigation
Evaluation of Alignments Several scenarios for using and evaluating alignments –Concept-based search Retrieving books indexed by SHL1 using SHL2 concepts –Re-indexing –Integration of one SHL into the other –SHL Merging –Free-text search Matching user search terms to both SHL1 or SHL2 concepts –Navigation Browsing several collections using one SHL structure
Evaluation of Alignments Several settings for a single scenario –Fully automatic reformulation vs assisted reformulation (candidates) Different evaluation measures –Good mappings vs acceptable ones –Number of candidates for reformulation –Semantic closeness to original query
Vocabulary Alignment Specifying required alignment format (links) Selection (& running) of tool/method Evaluation (& cleaning) Assessment of the approach –Efforts required, quality, extendibility
WP3.2 Sub-tasks Converting the subjects to standard representation language –Semantic web format (SKOS) Aligning the vocabularies –Semantic correspondences between subjects Deploying the alignment knowledge obtained into TEL framework –E.g. using links to reformulate queries from one subject list to the other
Deploying the alignment knowledge obtained into TEL framework Observing integration of MACS data into TEL –Conceptual input for alignment requirements Integration of the obtained alignment in TEL Assessment of the alignment integration –Technical aspects, usage aspects
Reminder Alignment is a difficult problem Application-specific alignment pretty much unexplored in Semantic Web research More a feasibility study than a complete solution to the problem Practical goal: investigate how automatic techniques could help MACS-like initiatives Manual mapping is labour-intensive
Agenda TELPlus Context Improving subject access –3 sub-tasks Services for TEL
WP4 – Integrating services with the European Library portal Theo van Veen (KB) Tasks: Identifying services that are going to give the user the greatest return Creating new services Integrating services within TEL …
WP4 – Some Services Mentioned Preliminary inventory: no official commitment! Services based on controlled vocabularies: Thesaurus and name authority service –Providing terms linked to query terms Semantic enrichment service –Users can annotate search results with terms Distance between terms and related terms
WP4 – Some Services Mentioned Preliminary inventory: no official commitment! Services based on controlled vocabularies: Thesaurus and name authority service Semantic enrichment service Distance between terms and related terms Adding more value from controlled vocabularies and alignments between them
Thanks!