ICS-FORTH January 11, 2000 1 Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Taxonomy as Content Outline, Site Map and Search Aid SLA NWR Vancouver October 6, 2006 Marjorie M.K. Hlava President
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
DELOS WP5 Workshop: Semantic Interoperability in DL systems, 17 th September 2004, Bath, UK Semantic Interoperability in Digital Library Systems Task 3:
ICS-FORTH April 10, Semantic Problems of Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science.
UK-based developments in online thesauri for taxonomic information Copp, C., Grant, M., Hewzulla, D., Hussey, C., Robinson, J., van Breda, J. & White,
Advanced Information Systems Laboratory Department of Computer Science and Systems Engineering GI-DAYS MÜNSTER A software tool.
Of 27 lecture 7: owl - introduction. of 27 ece 627, winter ‘132 OWL a glimpse OWL – Web Ontology Language describes classes, properties and relations.
Data Intensive Techniques to Boost the Real-time Performance of Global Agricultural Data Infrastructures SEMAGROW U SING A POWDER T RIPLE S TORE FOR BOOSTING.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Multilingual multimedia thesaurus for conservation and restoration collaborative networked model of construction Lucijana Leoni University of Dubrovnik.
Galia Angelova Institute for Parallel Processing, Bulgarian Academy of Sciences Visualisation and Semantic Structuring of Content (some.
Text Operations: Preprocessing. Introduction Document preprocessing –to improve the precision of documents retrieved –lexical analysis, stopwords elimination,
Entering A New ERA : The European Research Area Ken Miller UK Data Archive University Of Essex June 11-15, 2002.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
ISP 433/533 Week 2 IR Models.
Thesaurus Design and Development
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context Antoine Isaac, Claus Zinn, Henk Matthezing, Lourens van der Meij, Stefan.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
A Registry for controlled vocabularies at the Library of Congress
EuroVoc, Eurlex, EU Bookshop Danica Maleková, Publications Office STS Bratislava, 22 October 2010.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
ICS-FORTH May 25, The Utility of XML Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Heraklion, May.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
1/ 27 The Agriculture Ontology Service Initiative APAN Conference 20 July 2006 Singapore.
Languages are bridges … not barriers Chiara Carlucci – CEDEFOP Library ReferNet Technical Meeting September 2009.
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
1 Intra- and interdisciplinary cross- concordances for information retrieval Philipp Mayr GESIS – Leibniz Institute for the Social Sciences, Bonn, Germany.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
The European Thesaurus on International Relations and Area Studies A Multilingual Resource for Indexing, Retrieval, and Translation SWP Michael Kluck and.
Multilingual Information Exchange APAN, Bangkok 27 January 2005
Nancy Lawler U.S. Department of Defense ISO/IEC Part 2: Classification Schemes Metadata Registries — Part 2: Classification Schemes The revision.
In pursuit of interoperability: Can we standardize mapping types? Stella G Dextre Clarke Project Leader, ISO NP
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Tommie Curtis SAIC January 17, 2000 Open Forum on Metadata Registries Santa Fe, NM SDC JE-2023.
The Agricultural Ontology Service (AOS) A Tool for Facilitating Access to Knowledge AGRIS/CARIS and Documentation Group Library and Documentation Systems.
IL Step 3: Using Bibliographic Databases Information Literacy 1.
Gerrit Schutte OHIM 9th of December, 2011 Trademark terminology control.
The UNESCO Thesaurus Meeting for Managers of UNESCO Documentation Networks Meron Ewketu UNESCO Library June
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
FDT Foil no 1 On Methodology from Domain to System Descriptions by Rolv Bræk NTNU Workshop on Philosophy and Applicablitiy of Formal Languages Geneve 15.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Thesauri usage in information retrieval systems: example of LISTA and ERIC database thesaurus Kristina Feldvari Departmant of Information Sciences, Faculty.
Company LOGO Digital Infrastructure of RPI Personal Library Qi Pan Digital Infrastructure of RPI Personal Library Qi Pan.
Clarity Cross-Lingual Document Retrieval, Categorisation and Navigation Based on Distributed Services
Basics of Information Retrieval and Query Formulation Bekele Negeri Duresa Nuclear Information Specialist.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
June 2003INIS Training Seminar1 INIS Training Seminar 2-6 June 2003 Subject Analysis Thesaurus and Indexing Alexander Nevyjel Subject Control Unit INIS.
17 th October 2002Data Provenance Grid Data Requirements Scoping Metadata & Provenance Dave Pearson Oracle Corporation UK.
Controlled Vocabulary & Thesaurus Design Associative Relationships & Thesauri.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Charlyn P. Salcedo Instructor Types of Indexing Languages.
LE:NOTRE Spring Workshop The Role of Ontologies for Mapping the Domain of Landscape Architecture An introduction.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
The Role of Ontologies for Mapping the Domain of Landscape Architecture An introduction.
RECENT TRENDS IN METADATA GENERATION
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
ece 720 intelligent web: ontology and beyond
IL Step 3: Using Bibliographic Databases
Morphoogle - A Multilingual Interface to a Web Search Engine
CSE 635 Multimedia Information Retrieval
The Database Environment
Semantic Interoperability in Digital Library Systems
Presentation transcript:

ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January 11, 2000 Centre for Cultural Informatics and Documentation Systems

ICS-FORTH January 11, Thesaurus Mapping The Problem p Logical aspects u Semantics of involved entities u Notions of translation u Objectives and logics of mapping p Production of mappings u Human u Language engineering, cluster analysis p Architecture u Mapping management u Mapping service u Integration in IT environment

ICS-FORTH January 11, Thesaurus Mapping Why do we need mapping? p Thesauri for information retrieval depend on: u View point (e.g. functional, morphological, social, special database fields etc.) u Language or social group (experts, common people etc.) u Size and distribution of target material (effective partitioning) p Therefore u Concepts differ u Use of concepts differs u Semantic embedding differs p Even if we agree on the same world u Research topic: Formalisation of views and context

ICS-FORTH January 11, Thesaurus Mapping Semantics of entities p Concepts are defined by agreement, e.g. orange (colour) p Concepts identify sets of real world objects p Concepts are identified by u scope notes, literature references, examples, images p Concepts should not be changed u they should be created or abandoned u they should be understood, accepted or rejected p A Descriptor is a concept identifier

ICS-FORTH January 11, Thesaurus Mapping Semantics of entities p Links should express opinions and differences u about set relation between concepts subsumtion, disjointness etc. u about derived concepts u about term usage u opinions may be human or computational ! p Terms (noun phrases) should be used u by social groups to refer to (multiple) concepts u without direct linguistic meaning u one term is selected as concept identifier

ICS-FORTH January 11, Thesaurus Mapping Semantics of entities p concept - concept relations: u set semantics : BT, between thesauri/ version - for query expansion, users u associative: RTs, BTP, etc, - for user guidance p concept - term : u authoritative: preferred, used for - for cataloguers, users u statistical, possible synonyms: - for information retrieval p term - term relations : u dictionary entries: - limited precision, within LE tools

ICS-FORTH January 11, p A translated thesaurus: For comprehension u Established concepts and terms from one user group u Optimally interpreted in words of another or more languages u Translations are not established terms p Mapped thesauri (ISO5964): For transition u Independent thesauri, each one from another user group u Established concepts and terms. u links declare “overlap” between concepts p Interlingua: For communication and knowledge sharing u Compromise to share concepts between many user groups u Optimally interpreted in words of another language Thesaurus Mapping What is a Multilingual Thesaurus?

ICS-FORTH January 11, Thesaurus Mapping Functionality of Mapping p Transparent query transformation (Z39.50!) u Replace Boolean term combination from thesaurus A with optimal term combination from thesaurus B to retrieve equivalent results u Guaranteed transition needed (ev. to higher concepts) u Need controlled loss of precision or recall (research!) u Combinatorial explosion: Need cascading Thes A => Thes B => Thes C

ICS-FORTH January 11, o Interthesaurus relations (ISO 5964) (from Descriptor of Thes. A to Descriptor of Thes. B ) partial equivalence Better: broader equivalence narrower equivalence exact equivalence inexact equivalence (“+/-”) good for FTR only single to multiple equivalence Better: exact equivalence to BOOLEAN combination of target terms. “AND” (intersection), “OR” (union), “NOT” (complement) Thesaurus Mapping Logics of Mapping

ICS-FORTH January 11, AND English Heritage Thesaurus Merimee Thesaurus English Vocabulary French Vocabulary Interthesaurus relations linguistic translation linguistic translation +/- Interlingua +/- Thesaurus Mapping Translation and Mapping

ICS-FORTH January 11, BT Thesaurus Mapping Boolean OR-Combinations A C B B OR C Exact equivalence Boolean Compound Combines instances of B and C Uses properties of either B or C Is BT of B, C and NT of their common broader terms.

ICS-FORTH January 11, BT Thesaurus Mapping Boolean AND-Combinations A B AND C Exact equivalence Boolean Compound Uses instances of both, B and C Combines properties of B and C Is NT of B, C and BT of their common narrower terms. C B

ICS-FORTH January 11, BT Thesaurus Mapping Approximation by Inclusion A C B Broader equivalence Narrower equivalences

ICS-FORTH January 11, BT Thesaurus Mapping Avoid redundant linking! A B Broader equivalence Narrower equivalences Exact equivalence

ICS-FORTH January 11, Thesaurus Mapping Problems of Mapping p Consistency and reasoning (Description Logics!) p Optimal substitution of combined query terms p Protocol to propagate recall/ precision control p Inverse reading of one-to-many links. p Postcoordination : unclear semantics ! e.g. “grinding & factories”, solution by DL ?

ICS-FORTH January 11, Thesaurus Mapping Production of Mappings p Human assessment needs (see Term-IT): u CSCW, work flow, decentralised management tools u Excellent comparative presentation of thesaurus contents p Language engineering (see Term-IT): u termhood recognition, automatic translation by parallel texts, filtering by occurrence in target indexing language. u Excellent for preprocessing ! p Analysis of use: u Cluster analysis with doubly indexed entries. u Libraries: problem to identify the same “work” !

ICS-FORTH January 11, SIS - Thesaurus Management System Co-operative linking BT Version 0 Version 1 Version 0 Version 1 Version 2 New Workspace Group 1Group 2 New Workspace obsolete term links of group2 links of group1

ICS-FORTH January 11, Thesaurus Mapping Users Environment

ICS-FORTH January 11, Search Aid Tool Thesaurus Mapping Three-level Architecture CMS Maintainer CMS CMS Maintainer CMS National Authority Providers concept proposal Thesaurus initialization Local TMS End User Cascaded mapping service concept proposal Thesaurus initialization Update term use Update term use

ICS-FORTH January 11, Thesaurus Mapping Architectural Considerations p We propose to distinguish: u Collection Management Systems with local term management u National authority providers u Mapping service p Mapping service: u Co-operative mapping production environment and system, - for few languages (3?), domain specific ? u Large scale mapping tables detached from production system, accessible as replicated Web resource. p Integration: u Access engines connect to mapping resources on demand u Provision of suitable metadata for CMS capabilities

ICS-FORTH January 11, Thesaurus Mapping Conclusions p Thesaurus mapping is feasible and the best means to access coherently multiple CMS with controlled vocabulary p Thesaurus mapping is a major investment in human resources and IT environment p Targeted research can much improve the currently feasible - quality of mapping - quality of service - and production cost