Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg.

Slides:



Advertisements
Similar presentations
Using a domain-ontology and semantic search in an eLearning environment Lothar Lemnitzer, Kiril Simov, Petya Osenova, Eelco Mossel and Paola Monachesi.
Advertisements

Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment Eelco Mossel LSP 2007, Hamburg.
Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
Applying Ontology-Based Lexicons to the Semantic Annotation of Learning Objects Kiril Simov and Petya Osenova BulTreeBank Project
INFO624 - Week 2 Models of Information Retrieval Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
Lecture 11 Search, Corpora Characteristics, & Lucene Introduction.
A Linguistic Approach for Semantic Web Service Discovery International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) July 13, 2012 Jordy.
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
A Splitter for German compound words Pasquale Imbemba Free University of Bozen-Bolzano Supervisor: Dr. Raffaella Bernardi.
IR Models: Overview, Boolean, and Vector
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Crosslingual Retrieval in an eLearning Environment Cristina Vertan, Kiril Simov, Petya Osenova, Lothar Lemnitzer, Alex Killing, Diane Evans, Paola Monachesi.
Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.
LTeL - Language Technology for eLearning -
A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,
LTeL - Language Technology for eLearning - Paola Monachesi, Lothar Lemnitzer, Kiril Simov, Alex Killing, Diane Evans, Cristina Vertan.
Interfaces for Querying Collections. Information Retrieval Activities Selecting a collection –Lists, overviews, wizards, automatic selection Submitting.
Ontology-Based Free-Form Query Processing for the Semantic Web Mark Vickers Brigham Young University MS Thesis Defense Supported by:
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Chapter 5: Information Retrieval and Web Search
Aparna Kulkarni Nachal Ramasamy Rashmi Havaldar N-grams to Process Hindi Queries.
Databases & Data Warehouses Chapter 3 Database Processing.
Faculty of Informatics and Information Technologies Slovak University of Technology Personalized Navigation in the Semantic Web Michal Tvarožek Mentor:
Introduction n Keyword-based query answering considers that the documents are flat i.e., a word in the title has the same weight as a word in the body.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Evaluation David Kauchak cs458 Fall 2012 adapted from:
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
February 2007MCST - FP7 Launch1 Michael Rosner Department of Computer Science and Artificial Intelligence University of Malta.
Copyright © 2013 Curt Hill The Zachman Framework What is it all about?
CSC 480 Software Engineering Lecture 19 Nov 11, 2002.
Multilingual Information Exchange APAN, Bangkok 27 January 2005
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Clustering User Queries of a Search Engine Ji-Rong Wen, Jian-YunNie & Hon-Jian Zhang.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
CSE 6331 © Leonidas Fegaras Information Retrieval 1 Information Retrieval and Web Search Engines Leonidas Fegaras.
Of 33 lecture 10: ontology – evolution. of 33 ece 720, winter ‘122 ontology evolution introduction - ontologies enable knowledge to be made explicit and.
PETRA – the Personal Embedded Translation and Reading Assistant Werner Winiwarter University of Vienna InSTIL/ICALL Symposium 2004 June 17-19, 2004.
© Paul Buitelaar – November 2007, Busan, South-Korea Evaluating Ontology Search Towards Benchmarking in Ontology Search Paul Buitelaar, Thomas.
RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal.
Chapter 6: Information Retrieval and Web Search
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
EasyQuerier: A Keyword Interface in Web Database Integration System Xian Li 1, Weiyi Meng 2, Xiaofeng Meng 1 1 WAMDM Lab, RUC & 2 SUNY Binghamton.
Search Engine Architecture
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Faculty of Informatics and Information Technologies Slovak University of Technology Personalized Navigation in the Semantic Web Michal Tvarožek Mentor:
Evaluation of Agent Building Tools and Implementation of a Prototype for Information Gathering Leif M. Koch University of Waterloo August 2001.
1 FollowMyLink Individual APT Presentation Third Talk February 2006.
GEMET GEneral Multilingual Environmental Thesaurus leading the way to federated terminologies Stefan Jensen, Head of information services group with input.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Information Retrieval
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
The Cross Language Image Retrieval Track: ImageCLEF Breakout session discussion.
Acceso a la información mediante exploración de sintagmas Anselmo Peñas, Julio Gonzalo y Felisa Verdejo Dpto. Lenguajes y Sistemas Informáticos UNED III.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Ontology-Based Free-Form Query Processing for the Semantic Web Mark Vickers Brigham Young University MS Thesis Defense Supported by:
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
1 Unit E-Guidelines (c) elsaddik SEG 3210 User Interface Design & Implementation Prof. Dr.-Ing. Abdulmotaleb.
Search Engine Architecture
Introduction to Information Retrieval
Search Engine Architecture
Information Retrieval and Web Design
Presentation transcript:

Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg

2 EU-Project LT4eL: Language Technology for eLearning ( ) Goal: use of Language Technology to improve the effectiveness of Learning Management Systems Multilingual Setting: 8 languages 12 European partner universities/institutes Crosslingual search: work together with: –Cristina Vertan, Stefanie Reimers (University of Hamburg) –Kiril Simov and his team (Bulgarian Academy of Sciences, Sofia) –Alex Killing (ETH Zürich (Eidgenössische Technische Hochschule) ) Framework

3 Goals of semantic search Resources for search function Functionality and architecture Further work Overview

4 Goals of the approach 1. Improved retrieval of documents –Find documents that would not be found by simple text search (exact search word occurs in text) –Example: search for “screen” – retrieve doc that contains “monitor” but not “screen”. 2. Multilinguality –One implementation for all languages in the project 3. Crosslinguality –Find documents in languages different from search/interface language No need to translate search query Search possible with passive foreign language knowledge Crosslingual semantic search

5 A multilingual document collection An ontology including a domain ontology on the domain of the documents Concept lexicalisations in different languages Annotation of concepts in the documents Overview of resources

6 Overview of resources (graphical) PL PT RO EN MT NL BG CD DE Lexicons: Term  Concept LOs Ontology BG CS DE EN MT NL PL PT RO

7 Ontology: contains concepts Document Database Lexicons: contain term-concept mappings Visualisation select concepts Search-Terms (multiple languages) Search-Concepts Retrieved Documents Search procedure

8 Search with ILIAS

9

10 Search functionality comprises: 1.Find terms in lexicons that reflect search query. 2.Find corresponding concepts for derived terms. 3.Find relevant documents for concepts. 4.Create ranking for set of found documents. 5.Create ontology fragment containing necessary information to present concept neighbourhood 6.Find “shared concepts” Internal components

11 Architecture Crosslingual Search LMS / ILIAS / other system using the search functionality Lexicon Lookup Component Ontology Management System Ontology Search Engine LexiconOntology Lucene Database

12 Why start with a free text query? –User wants results fast (as in Google) –Compete with fulltext search and keyword search –Find starting point for ontology browsing Query  lexicon: adopted/implemented strategies for –Case and diacritic insensitive –Create combinations for multiword terms Example: Text Editor  text-editor texteditor text editor 1: Query  Terms

13 Other ideas to improve recognition of query: –Lemmatisation of search terms –Expansion of lexicon with word forms –Match substrings –Match similar strings Insertion of function words e.g. Portuguese: “provedor acesso”  “provedor de acesso” -Dynamic list of available terms that contain input so far (involves change of GUI) 1: Query  Terms (continued)

14 Not always 1:1 mapping. Corresponding concept is missing from ontology –LT4eL: not in lexicon Unique result: term is lexicalisation of one concept Multiple concepts from one domain, e.g.: –Key (from keyboard) –Key (in database) Concepts from more domains: –Window (graphical representation on monitor) –Window (part of a building) Different concepts for different languages: –“Kind” (English: sort/type) –“Kind” (German: child)  Let the user choose: present multiple browsing units 2: Term  Concept

15 Simplest: –Disjunctive search with ranking For each concept, each document that is annotated with it is returned Documents with more search concepts are ranked higher –DISADVANTAGES: (too) many results slower Use super/subconcepts Further possibilities –Conjunctive search: Combination of concepts must occur in a document Is taken into account by current ranking –DISADVANTAGES: For automatic concept search: concept set might be larger than expected, thus restricting search results too much 3: Concept  Documents

16 How useful is it, to find documents that treat a superconcept? –Negative example: lt4el:Subroutine  lt4el:Software. –Positive example: lt4el:WebPortal  lt4el:Website. How useful is it, to find documents that treat a subconcept? –lt4el:Program has 93 subconcepts, e.g.: ApplicationProgram Computervirus Driver Unzip 3: Concept  Documents (continued)

17 Number of different search concepts Annotation frequency: number of times search concepts are annotated in the document –Normalise: divide by document length Superconcepts and subconcepts of search concepts have lower weight –A factor determines their weight Language of document: –Sort per language? (currently) –Sort by ranking throughout (independent of) languages? –Make language a factor in ranking? 4: Ranking

18 Does semantic search return correct results? (appropriate documents) How easy is it to use semantic search? Are the results better (precision/recall) than with keyword search or fulltext search (also available in ILIAS)? –Relevant for monolingual scenario Is the learning process improved? –Depends on quality of ontology and annotation –In multilingual case: depends on domain knowledge and language knowledge of multilingual test persons Evaluation

19 Display document fragment for search results, in addition to title. –Choose contexts, where search concepts occur close together –More on this Thursday 18:30 at BIS-21++ information session. Integrate faster document lookup component Improve: search term  lexicon entry Make use of more relations than super/subconcepts Possibly other changes like: –Sort differently than per language Future work

20 Thank you