Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment Eelco Mossel LSP 2007, Hamburg.

Slides:



Advertisements
Similar presentations
Fatma Y. ELDRESI Fatma Y. ELDRESI ( MPhil ) Systems Analysis / Programming Specialist, AGOCO Part time lecturer in University of Garyounis,
Advertisements

OvidSP Flexible. Innovative. Precise. Introducing OvidSP Resources.
OMV Ontology Metadata Vocabulary April 10, 2008 Peter Haase.
…to Ontology Repositories Mathieu dAquin Knowledge Media Institute, The Open University From…
Relational Database and Data Modeling
LT4EL - Integrating Language Technology and Semantic Web techniques in eLearning Lothar Lemnitzer GLDV AK eLearning, 11. September 2007.
Using a domain-ontology and semantic search in an eLearning environment Lothar Lemnitzer, Kiril Simov, Petya Osenova, Eelco Mossel and Paola Monachesi.
WP 4: Integration of Language Technology Tools into ILIAS Learning Management System Alexander Killing Project review, Utrecht, 1 Feb 2007.
Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
Applying Ontology-Based Lexicons to the Semantic Annotation of Learning Objects Kiril Simov and Petya Osenova BulTreeBank Project
0 - 0.
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Cathy N. Hartman University of North Texas Libraries October 10, 1998 Cathy N. Hartman University of North Texas Libraries October 10, 1998.
Copyright 2006 Digital Enterprise Research Institute. All rights reserved. MarcOnt Initiative Tools for collaborative ontology development.
Programming Language Concepts
CMU SCS : Multimedia Databases and Data Mining Lecture #17: Text - part IV (LSI) C. Faloutsos.
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina Bontcheva.
© Arjen P. de Vries Arjen P. de Vries Fascinating Relationships between Media and Text.
The World Wide Web. 2 The Web is an infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that.
1 Evaluations in information retrieval. 2 Evaluations in information retrieval: summary The following gives an overview of approaches that are applied.
Software Requirements
Traditional IR models Jian-Yun Nie.
Database System Concepts and Architecture
Macromedia Dreamweaver MX 2004 – Design Professional Dreamweaver GETTING STARTED WITH.
Addition 1’s to 20.
Week 1.
Learning Outcomes Participants will be able to analyze assessments
Computer Concepts BASICS 4th Edition
Introduction Distance-based Adaptable Similarity Search
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
Chapter 5: Introduction to Information Retrieval
INFO624 - Week 2 Models of Information Retrieval Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
Improved TF-IDF Ranker
CHAITALI GUPTA, RAJDEEP BHOWMIK, MICHAEL R. HEAD, MADHUSUDHAN GOVINDARAJU, WEIYI MENG PRESENTED BY: SIDDHARTH PALANISWAMI A Query-based System for Automatic.
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg.
IR Models: Overview, Boolean, and Vector
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Crosslingual Retrieval in an eLearning Environment Cristina Vertan, Kiril Simov, Petya Osenova, Lothar Lemnitzer, Alex Killing, Diane Evans, Paola Monachesi.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.
A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Retrieval Effectiveness of an Ontology-based Model for Information Selection Khan, L., McLeod, D. & Hovy, E. Presented by Danielle Lee.
Chapter 16 The World Wide Web Chapter Goals ( ) Compare and contrast the Internet and the World Wide Web Describe general Web processing.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
PETRA – the Personal Embedded Translation and Reading Assistant Werner Winiwarter University of Vienna InSTIL/ICALL Symposium 2004 June 17-19, 2004.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
EasyQuerier: A Keyword Interface in Web Database Integration System Xian Li 1, Weiyi Meng 2, Xiaofeng Meng 1 1 WAMDM Lab, RUC & 2 SUNY Binghamton.
Search Engine Architecture
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Web- and Multimedia-based Information Systems Lecture 2.
Information Retrieval
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Acceso a la información mediante exploración de sintagmas Anselmo Peñas, Julio Gonzalo y Felisa Verdejo Dpto. Lenguajes y Sistemas Informáticos UNED III.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Text Based Information Retrieval
Multimedia Information Retrieval
Introduction to Information Retrieval
Presentation transcript:

Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment Eelco Mossel LSP 2007, Hamburg

2 EU-Project LT4eL: Language Technology for eLearning ( ) Goal: use of Language Technology to improve the effectiveness of Learning Management Systems Multilingual Setting: 8 languages 12 European partner universities/institutes Crosslingual search: work together with: –Cristina Vertan (University of Hamburg) –Kiril Simov (Bulgarian Academy of Sciences, Sofia) –Alex Killing (ETH Zürich ( Eidgenössische Technische Hochschule) ) Framework

3 Project framework Learning Management System Goals of semantic search Resources for search function Features (user side) Internal components (developer side) Evaluation Overview

4 Learning Platform on the internet (interactive website) –Users can log in, have a profile, chat, … Fundamental: store and access Learning Material Units: Learning Objects (LOs) = documents Test-System : open source platform ILIAS ( Test material: text-based documents in 8 languages (PDF, HTML, MS-Word) Domain of test material: Computer Science for non CS specialists Learning Management System (LMS)

5 Goals of the approach 1. Improved retrieval of documents –Find documents that would not be found by simple text search (exact search word occurs in text) –Example: search for screen – retrieve doc that contains monitor but not screen. 2. Multilinguality –One implementation for all languages in the project 3. Crosslinguality –Find documents in languages different from search/interface language No need to translate search query Search possible with passive foreign language knowledge Crosslingual semantic search

6 Approach: use a domain ontology Creation: –Select keywords from LOs (also used for another project goal/task) –Choose keywords relevant for the domain Computer Science for Non-Computer Scientists) –Derive a set of concepts from the set of keywords. Concepts have English names/labels –Provide a definition in English for each concept –Create OWL taxonomy/ontology from concepts, by specifying relations between domain concepts, and mapping to DOLCE and WORDNET ontology. –1 ontology, for language-independent use, but contains English as common language for labels and definitions –Currently 707 concepts Resource: domain ontology

7 Connection between terms (words in a certain language) and concepts Create term-concept lexicons: –For each language and each concept, specify terms (synonyms if relevant) that denote the concept –At least one term for each concept –German: currently 939 terms 707 concepts Resource: term-concept lexicons

8 Concept Document relations Annotate concepts in documents in a semi-automatic way using the lexicons: –Occurrences of terms are annotated with corresponding concepts –Annotator (person) decides whether or not to annotate this occurrence, and chooses between concepts for ambiguous terms Resource: concept-annotations

9 Starting points A multilingual document collection An ontology including a domain ontology on the domain of the documents Concept lexicalisations in different languages Annotation of concepts in the documents Crosslingual semantic search

10 CommunicationsProtocol HTTPFTP Docs L1 Docs L3 Docs L2 HTTP SSL FTP IP Terms L1 Terms L3 Terms L2 Lexicons in different languages search terms browse and select Connecting the components

11 Search with ILIAS

12

13 Search using search words, concepts from the ontology or a combination of both –In case of combination: results of directly selected concepts come first Search for documents with super- or subconcepts –For documents in which desired concept is not found Ranking –Number of different searched concepts in document –Normalised annotation frequency –Super/subconcepts have lower weight Shared concepts (occurring in e.g. 50% of the found documents) –Example: Concept Report Some documents about academic writing Concept Publication Navigate through ontology (get related concepts) Features of the search functionality

14 Search functionality comprises: 1.Find terms in lexicons that reflect search query. 2.Find corresponding concepts for derived terms. 3.Find relevant documents for concepts. 4.Create ranking for set of found documents. 5.Create ontology fragment containing necessary information to present concept neighbourhood 6.Find shared concepts Internal components

15 Why start with a free text query? –User wants results fast (as in Google) –Compete with fulltext search and keyword search –Find starting point for ontology browsing Query lexicon: adopted/implemented strategies for –Tokenise create combinations for multiword terms (e.g. "space bar"), –Loose match of diacritic and uppercase letters (é e; E e) Other ideas to improve recognition of query: –Lemmatisation of search terms –Expansion of lexicon with word forms –Match similar strings Insertion of function words e.g. provedor acesso provedor de acesso –Automatic substring match -Dynamic list of available terms that contain input so far 1: Query Terms

16 Not always 1:1 mapping. Corresponding concept is missing from ontology –LT4eL: not in lexicon Unique result: term is lexicalisation of one concept Multiple concepts from one domain, e.g.: –Key (from keyboard) –Key (in database) Concepts from more domains: –Window (graphical representation on monitor) –Window (part of a building) Different concepts for different languages: –Kind (English: sort/type) –Kind (German: child) Let the user choose: present multiple browsing units 2: Term Concept

17 Simplest: –Disjunctive search with ranking For each concept, each document that is annotated with it is returned Documents with more desired concepts are ranked higher Use super/subconcepts Further possibilities –Conjunctive search: Combination of concepts must occur in a document Is taken into account in current ranking –Context search: Combination of concepts must occur in a paragraph or sentence –Word & Concept search combined: Document must contain concepts as well as certain words 3: Concept Documents

18 How useful is it, to find documents that treat a superconcept? –Negative example: lt4el:Subroutine lt4el:Software. Other children of Software are e.g.: Shareware, AuthoringLanguage –Positive example: GraphicalUserInterface UserInterface How useful is it, to find documents that treat a subconcept? –lt4el:Program has 93 subconcepts, e.g.: ApplicationProgram Computervirus Driver Unzip 3: Concept Documents (continued)

19 Does semantic search return correct results? (appropriate documents) How easy is it to use semantic search? Are the results better (precision/recall) than with keyword search or fulltext search (also available in ILIAS)? –Relevant for monolingual scenario Is the learning process improved? –Depends on quality of ontology and annotation –In multilingual case: depends on domain knowledge and language knowledge of multilingual test persons Evaluation

20 Thank you