Download presentation
Presentation is loading. Please wait.
1
Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg
2
2 EU-Project LT4eL: Language Technology for eLearning ( www.lt4el.eu ) www.lt4el.eu Goal: use of Language Technology to improve the effectiveness of Learning Management Systems Multilingual Setting: 8 languages 12 European partner universities/institutes Crosslingual search: work together with: –Cristina Vertan, Stefanie Reimers (University of Hamburg) –Kiril Simov and his team (Bulgarian Academy of Sciences, Sofia) –Alex Killing (ETH Zürich (Eidgenössische Technische Hochschule) ) Framework
3
3 Goals of semantic search Resources for search function Functionality and architecture Further work Overview
4
4 Goals of the approach 1. Improved retrieval of documents –Find documents that would not be found by simple text search (exact search word occurs in text) –Example: search for “screen” – retrieve doc that contains “monitor” but not “screen”. 2. Multilinguality –One implementation for all languages in the project 3. Crosslinguality –Find documents in languages different from search/interface language No need to translate search query Search possible with passive foreign language knowledge Crosslingual semantic search
5
5 A multilingual document collection An ontology including a domain ontology on the domain of the documents Concept lexicalisations in different languages Annotation of concepts in the documents Overview of resources
6
6 Overview of resources (graphical) PL PT RO EN MT NL BG CD DE Lexicons: Term Concept LOs Ontology BG CS DE EN MT NL PL PT RO
7
7 Ontology: contains concepts Document Database Lexicons: contain term-concept mappings Visualisation select concepts Search-Terms (multiple languages) Search-Concepts Retrieved Documents Search procedure
8
8 Search with ILIAS
9
9
10
10 Search functionality comprises: 1.Find terms in lexicons that reflect search query. 2.Find corresponding concepts for derived terms. 3.Find relevant documents for concepts. 4.Create ranking for set of found documents. 5.Create ontology fragment containing necessary information to present concept neighbourhood 6.Find “shared concepts” Internal components
11
11 Architecture Crosslingual Search LMS / ILIAS / other system using the search functionality Lexicon Lookup Component Ontology Management System Ontology Search Engine LexiconOntology Lucene Database
12
12 Why start with a free text query? –User wants results fast (as in Google) –Compete with fulltext search and keyword search –Find starting point for ontology browsing Query lexicon: adopted/implemented strategies for –Case and diacritic insensitive –Create combinations for multiword terms Example: Text Editor text-editor texteditor text editor 1: Query Terms
13
13 Other ideas to improve recognition of query: –Lemmatisation of search terms –Expansion of lexicon with word forms –Match substrings –Match similar strings Insertion of function words e.g. Portuguese: “provedor acesso” “provedor de acesso” -Dynamic list of available terms that contain input so far (involves change of GUI) 1: Query Terms (continued)
14
14 Not always 1:1 mapping. Corresponding concept is missing from ontology –LT4eL: not in lexicon Unique result: term is lexicalisation of one concept Multiple concepts from one domain, e.g.: –Key (from keyboard) –Key (in database) Concepts from more domains: –Window (graphical representation on monitor) –Window (part of a building) Different concepts for different languages: –“Kind” (English: sort/type) –“Kind” (German: child) Let the user choose: present multiple browsing units 2: Term Concept
15
15 Simplest: –Disjunctive search with ranking For each concept, each document that is annotated with it is returned Documents with more search concepts are ranked higher –DISADVANTAGES: (too) many results slower Use super/subconcepts Further possibilities –Conjunctive search: Combination of concepts must occur in a document Is taken into account by current ranking –DISADVANTAGES: For automatic concept search: concept set might be larger than expected, thus restricting search results too much 3: Concept Documents
16
16 How useful is it, to find documents that treat a superconcept? –Negative example: lt4el:Subroutine lt4el:Software. –Positive example: lt4el:WebPortal lt4el:Website. How useful is it, to find documents that treat a subconcept? –lt4el:Program has 93 subconcepts, e.g.: ApplicationProgram Computervirus Driver Unzip 3: Concept Documents (continued)
17
17 Number of different search concepts Annotation frequency: number of times search concepts are annotated in the document –Normalise: divide by document length Superconcepts and subconcepts of search concepts have lower weight –A factor determines their weight Language of document: –Sort per language? (currently) –Sort by ranking throughout (independent of) languages? –Make language a factor in ranking? 4: Ranking
18
18 Does semantic search return correct results? (appropriate documents) How easy is it to use semantic search? Are the results better (precision/recall) than with keyword search or fulltext search (also available in ILIAS)? –Relevant for monolingual scenario Is the learning process improved? –Depends on quality of ontology and annotation –In multilingual case: depends on domain knowledge and language knowledge of multilingual test persons Evaluation
19
19 Display document fragment for search results, in addition to title. –Choose contexts, where search concepts occur close together –More on this Thursday 18:30 at BIS-21++ information session. Integrate faster document lookup component Improve: search term lexicon entry Make use of more relations than super/subconcepts Possibly other changes like: –Sort differently than per language Future work
20
20 Thank you
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.