Clarity Cross-Lingual Document Retrieval, Categorisation and Navigation Based on Distributed Services

Slides:



Advertisements
Similar presentations
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Advertisements

Information Society Technologies Third Call for Proposals Norbert Brinkhoff-Button DG Information Society European Commission Key action III: Multmedia.
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
Metadata in Carrot II Current metadata –TF.IDF for both documents and collections –Full-text index –Metadata are transferred between different nodes Potential.
Chapter 5: Introduction to Information Retrieval
Computational Paradigms in the Humanities – eHumanities and their role and impact in transdisciplinary research Gerhard Budin University of Vienna.
© NCSR, Paris, December 5-6, 2002 WP1: Plan for the remainder (1) Ontology Ontology  Enrich the lexicons for the 1 st domain based on partners remarks.
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
Slovak University of Technology Department of Computer Science and Engineering Bratislava, Slovakia Pavol Návrat, Mária Bieliková {navrat,
The Challenges of Multilingual Search Paul Clough The Information School University of Sheffield ISKO UK conference 8-9 July 2013.
Multilingual Information Access in a Digital Library Vamshi Ambati, Rohini U, Pramod, N Balakrishnan and Raj Reddy International Institute of Information.
Galia Angelova Institute for Parallel Processing, Bulgarian Academy of Sciences Visualisation and Semantic Structuring of Content (some.
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
HLT Research and Development for Baltic Languages in Tilde Andrejs Vasiļjevs, Raivis Skadiņš Tilde Riga, October 27, 2004.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
Advance Information Retrieval Topics Hassan Bashiri.
University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 2005 I T T C Dr. Susan Gauch Personalized Search Based.
Overview of Search Engines
1 Introduction to Web Development. Web Basics The Web consists of computers on the Internet connected to each other in a specific way Used in all levels.
User-Centred Interface Design for Cross-Language Information Retrieval Presented at SIGIR2002 Tampere - Finland Clarity is funded by EU - IST Program Contract.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Leuven, Computer Aided Document Indexing System for Accessing Legislation A Joint Venture of Flanders and Croatia Bojana Dalbelo Bašić Faculty.
GRACE Project IST EGAAP meeting – Den Haag, 25/11/2004 Giuseppe Sisto – Telecom Italia Lab.
FLAVIUS Presentation of Softissimo WP1 Project Management.
JRC-Ispra, , Slide 1 Next Steps / Technical Details Bruno Pouliquen & Ralf Steinberger Addressing the Language Barrier Problem in the Enlarged.
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
PATENTSCOPE Patent Search Strategies and Techniques Andrew Czajkowski Head, Innovation and Technology Support Section Centurion September 11, 2014.
1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.
FIIT STU Bratislava Classification and automatic concept map creation in eLearning environment Karol Furdík 1, Ján Paralič 1, Pavel Smrž.
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.
Themes Architecture Content Metadata Interoperability Standards Knowledge Organisation Systems Use and Users Legal and Economic Issues The Future.
Aquenergy Portal Elisabetta Zuanelli, University of Rome “Tor Vergata”, Italy E-Age 2014 Muscat december.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Task-oriented approach to information handling support within web-based education Lora M. Aroyo 15 November 2001.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
EVA Workshop, 26 March 2003, Florence, Italy1 COINE Cultural Objects In Networked Environments Anthi Baliou University of Macedonia,Library Thessaloniki,
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
Learning a Monolingual Language Model from a Multilingual Text Database Rayid Ghani & Rosie Jones School of Computer Science Carnegie Mellon University.
Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
1 / Marja Oksa-Pallasvuo 6/2003 Finnish National Thesaurus YSA Used by libraries, archives, museums, virtual libraries On the web with good search interface.
UOS Personalized Search Zhang Tao 장도. Zhang Tao Data Mining Contents Overview 1 The Outride Approach 2 The outride Personalized Search System 3 Testing.
Definition, purposes/functions, elements of IR systems Lesson 1.
Improving E-Book Access via a Library Developed Full-Text Search Tool Jill E. Foust, MLS Phillip Bergen, MA, MS Gretchen L. Maxeiner, MA, MS Health Sciences.
Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content Kalina Bontcheva, Diana Maynard, Hamish Cunningham, Horacio.
Michael T. Cox Computer Science & Engineering Department Wright State University Dayton, OH DAGSI/AFRL #HE-WSU AFOSR #F
About.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
fulfilling the DESIRE for knowledge
Thai AGROVOC Ontology Base for Agricultural Information Retrieval
HLT Research and Development for Baltic Languages in Tilde
WP4 – Knowledge platform and communication
INFS 3500 Martin, Brad, and John
Multilingual Information Access in a Digital Library
Computer Aided Document Indexing System for Accessing Legislation A Joint Venture of Flanders and Croatia Bojana Dalbelo Bašić Faculty of Electrical Engineering.
Text Categorization Rong Jin.
ITS 2.0 Enriched Terminology Annotation Showcase
CSE 635 Multimedia Information Retrieval
INSPIRE MIG-T Meeting Paris, October
Presentation transcript:

clarity Cross-Lingual Document Retrieval, Categorisation and Navigation Based on Distributed Services

clarity CLARITY Project Main objectives: To develop CLIR techniques for English -> Finnish, Swedish, Latvian & Lithuanian i.e low density languages with minimal translation resources To investigate techniques of document organisation and presentation:  concept hierarchies  document genres & filters

clarity Project Partners The University of Sheffield, UK: Project coordinator and developer of architecture, interface and concept hierarchies The University of Tampere (Information Studies), Finland: Developer of information retrieval engine and linguistic tools for Finnish language Swedish Institute of Computer Science: Developer of document styles and filtering software Tilde SIA, Latvia: Developer of tools and resources for Baltic languages AlmaMedia, Finland: Finnish and Swedish text collections BBC Monitoring, UK CIIR, Univ. of Massachusetts, USA: Research collaborator

clarity

Document Presentation: Text View Source search terms Target search terms (highlighted) Translated title

clarity Document Presentation: Concept Hierarchies An effective method of organising a set of documents without prior knowledge or training data Task: organise target language documents into clusters of source language concepts (requires translation of target language terms)

clarity CLIR and Concept Hierarchies

clarity Translation Routes 10 direct routes (all routes between Fin/Swe/Eng; English Lat / Lit). Transitive: Finnish->English->Latvian; Latvian->English->Lithuanian, Triangulated: Finnish->Latvian via two pivots: Finnish->English->Latvian and Finnish->German ->Latvian

clarity Results for Baltic Languages Monolingual, cross-lingual and triangular cross-lingual IR system Triangular CLIR is efficient method for IR between low density languages Concept hierarchies allows organize cross ‑ language documents more effectively Headline translations allows user evaluate relevance of foreign document

clarity Conclusions Clarity is to our knowledge the only CLIR system that has support for Baltic languages The web services architecture allowed us to utilise local linguistic expertise, to avoid re-installing and maintaining software versions on different platforms and to deal with data licensing issues The results show that CLIR can be performed with the use of dictionaries without the need of ‘translation-rich’ methods Triangulated translation via pivot languages can be a solution when there is no translation dictionary between source and target language