“ SINAI at CLEF 2005 : The evolution of the CLEF2003 system.” Fernando Martínez-Santiago Miguel Ángel García-Cumbreras University of Jaén.

Slides:



Advertisements
Similar presentations
Relevance Feedback User tells system whether returned/disseminated documents are relevant to query/information need or not Feedback: usually positive sometimes.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
SINAI-GIR A Multilingual Geographical IR System University of Jaén (Spain) José Manuel Perea Ortega CLEF 2008, 18 September, Aarhus (Denmark) Computer.
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Implicit Queries for Vitor R. Carvalho (Joint work with Joshua Goodman, at Microsoft Research)
Thomas Mandl: Robust CLEF Overview 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl Information Science Universität Hildesheim
Multilingual experiments of CLEF 2003 Eija Airio, Heikki Keskustalo, Turid Hedlund, Ari Pirkola University of Tampere, Finland Department of Information.
1 Block-based Web Search Deng Cai *1, Shipeng Yu *2, Ji-Rong Wen * and Wei-Ying Ma * * Microsoft Research Asia 1 Tsinghua University 2 University of Munich.
Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
Data Fusion Eyüp Serdar AYAZ İlker Nadi BOZKURT Hayrettin GÜRKÖK.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
Important Task in Patents Retrieval Recall is an Important Factor Given Query Patent -> the Task is to Search all Related Patents Patents have Complex.
Chapter 5: Information Retrieval and Web Search
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
A New Approach for Cross- Language Plagiarism Analysis Rafael Corezola Pereira, Viviane P. Moreira, and Renata Galante Universidade Federal do Rio Grande.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.
TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.
CLEF 2004 – Interactive Xling Bookmarking, thesaurus, and cooperation in bilingual Q & A Jussi Karlgren – Preben Hansen –
Finding Similar Questions in Large Question and Answer Archives Jiwoon Jeon, W. Bruce Croft and Joon Ho Lee Retrieval Models for Question and Answer Archives.
CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists Luo Si & Jamie Callan Language Technology Institute School of Computer.
A Study on Query Expansion Methods for Patent Retrieval Walid MagdyGareth Jones Centre for Next Generation Localisation School of Computing Dublin City.
1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
The PATENTSCOPE search system: CLIR February 2013 Sandrine Ammann Marketing & Communications Officer.
Query Operations J. H. Wang Mar. 26, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text.
MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.
Chapter 6: Information Retrieval and Web Search
Multilingual Retrieval Experiments with MIMOR at the University of Hildesheim René Hackl, Ralph Kölle, Thomas Mandl, Alexandra Ploedt, Jan-Hendrik Scheufen,
Comparing syntactic semantic patterns and passages in Interactive Cross Language Information Access (iCLEF at the University of Alicante) Borja Navarro,
A merging strategy proposal: The 2-step retrieval status value method Fernando Mart´inez-Santiago · L. Alfonso Ure ˜na-L´opez · Maite Mart´in-Valdivia.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
UA in ImageCLEF 2005 Maximiliano Saiz Noeda. Index System  Indexing  Retrieval Image category classification  Building  Use Experiments and results.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
CLEF2003 Forum/ August 2003 / Trondheim / page 1 Report on CLEF-2003 ML4 experiments Extracting multilingual resources from corpora N. Cancedda, H. Dejean,
Chapter 23: Probabilistic Language Models April 13, 2004.
CLEF 2008 Workshop September 17-19, 2008 Aarhus, Denmark.
Information Retrieval at NLC Jianfeng Gao NLC Group, Microsoft Research China.
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
CLEF Kerkyra Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Arantxa Otegi UNIPD: Giorgio Di Nunzio UH: Thomas Mandl.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
1 Flexible and Efficient Toolbox for Information Retrieval MIRACLE group José Miguel Goñi-Menoyo (UPM) José Carlos González-Cristóbal (UPM-Daedalus) Julio.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
Departamento de Lenguajes y Sistemas Informáticos Cross-language experiments with IR-n system CLEF-2003.
Learning to Estimate Query Difficulty Including Applications to Missing Content Detection and Distributed Information Retrieval Elad Yom-Tov, Shai Fine,
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Survey Jaehui Park Copyright  2008 by CEBT Introduction  Members Jung-Yeon Yang, Jaehui Park, Sungchan Park, Jongheum Yeon  We are interested.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
(Pseudo)-Relevance Feedback & Passage Retrieval Ling573 NLP Systems & Applications April 28, 2011.
The Effect of Database Size Distribution on Resource Selection Algorithms Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University.
1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch & Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
1 SINAI at CLEF 2004: Using Machine Translation resources with mixed 2-step RSV merging algorithm Fernando Martínez Santiago Miguel Ángel García Cumbreras.
Multilingual Search using Query Translation and Collection Selection Jacques Savoy, Pierre-Yves Berger University of Neuchatel, Switzerland
F. López-Ostenero, V. Peinado, V. Sama & F. Verdejo
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Chapter 5: Information Retrieval and Web Search
Web Information retrieval (Web IR)
Information Retrieval and Web Design
Cheshire at GeoCLEF 2008: Text and Fusion Approaches for GIR
Presentation transcript:

“ SINAI at CLEF 2005 : The evolution of the CLEF2003 system.” Fernando Martínez-Santiago Miguel Ángel García-Cumbreras University of Jaén

University of Jaén - SINAI at CLEF Content CLEF Experiment Framework CLEF Experiment Framework – Machine Translators and Alignment – Aligned and non-aligned scores – IR systems and Pseudo-Relevance Feedback Results and comparative Conclusions and Future Work

University of Jaén - SINAI at CLEF CLEF 2003 Experiment Framework Pre-processing of monolingual collections: stopword lists and stemming algorithms. Decompounding algorithm for Dutch, Finnish, German and Swedish. IR System: Zprise IR system, using the OKAPI probabilistic model. Machine Dictionary Readable (MDR) FinnPlace for Finnish / Babylon for the rest. With and without Pseudo-relevance feedback (PRF): Robertson-Croft’s approach (no more than 15 search keywords, extracted from the 10-best ranked documents). Fusion Algorithms: 2-Step RSV and raw mixed 2-Step RSV

University of Jaén - SINAI at CLEF CLEF 2005 Experiment Framework - Resume Pre-processing of monolingual collections: stopword lists and stemming algorithms. Decompounding algorithm for Dutch, Finnish, German and Swedish. IR Systems: – Document Retrieval: Zprise IR system – Passage Retrieval: IRn – Several lists of relevant documents available from Multi-8 Merging-only task (DataFusion, OKAPI and Prosit) Machine Translators (MT) vs. MDR. Alignment Algorithm. With and without Pseudo-relevance feedback (PRF): Robertson- Croft’s approach (10-15 search keywords, extracted from the 10-best ranked documents). Fusion Algorithms: raw mixed 2-Step RSV with and without machine learning

University of Jaén - SINAI at CLEF CLEF 2005 Experiment Framework – MTs and Alignment Since 2-step RSV requires to group together the DF for each concept (term and its translations), and MTs translate better the whole phrase  Alignment Algorithm [SINAI at CLEF 2004: Usign Machine Translation Resources with 2-step RSV Merging Algorithm] Percent of aligned non-empty words (CLEF 2005, Title+Desc) LanguageTranslation ResourceAlignment percent DutchPrompt (MT)85.4% FinnishFinnPlace (MDR)100% FrenchReverso (MT)85.6% GermanPrompt (MT)82.9% ItalianFreeTrans (MT)83.8% SpanishReverso (MT)81.5% SwedishBabylon (MDR)100%

University of Jaén - SINAI at CLEF CLEF 2005 Experiment Framework – Aligned and non-aligned scores. We use two subqueries  Q 1 : aligned terms / Q 2 : non-aligned terms For each subquery we obtain an RSV score. Several ways to combine both scores: 1. Raw mixed 2-step RSV. Combining the RSV value of the aligned words and not aligned words with the formula: 0.6 * * 2. Mixed 2-step RSV by using Logistic Regression. It applies the formula: e (alpha * + beta * ) 3. Mixed 2-step RSV by using Logistic Regression and local score. The last one also uses Logistic Regression but include a new component, the ranking of the doc. It applies the formula: e (alpha * + beta * + gamma * ) 4. Mixed 2-step RSV by using Bayesian Logistic Regression and local score. similar to the previous approach, but it is based on bayesian logistic regression instead of logistic regression. 2, 3 y 4 use Machine Learning Methods. CLEF 2005  60 queries: 20 for training / 40 for evaluation

University of Jaén - SINAI at CLEF CLEF 2005 Experiment Framework – IR Systems and PRF 3 Sources of Relevant Lists: – Documents Retrieval IR System: Zprise using the OKAPI probabilistic model. – Passage Retrieval: IR-n [F. Llopis, R. Muñoz, R. M. Terol and E. Noguera. IR-n r2: Usign Normalized Passages. CLEF University of Alicante] modified to return lists of relevant documents (that contain relevant passages). – Several lists of relevant documents, available from Multi-8 Merging-only task (DataFusion, Okapi and Prosit, thanks to Jacques Savoy) PRF : – Some experiments based on Zprise. – Robertson-Croft's approach in the first step search keywords expanded, extracted from the 10-best ranked documents. – Second step does not make use of automatic query expansion techniques such as relevance feedback (RF) or pseudo-relevance feedback (PRF) applied to monolingual queries.

University of Jaén - SINAI at CLEF CLEF 2005 Comparative Results CLEF 2005 Vs. CLEF 2003 YearMain FeaturesAvgP 2003 Case Base 2003 (OKAPI Zprise IR, MDR, 2-Step RSV) Case Base 2005 (OKAPI Zprise IR, no PRF, MT, raw mixed 2-Step RSV) Case Base PRF Case Base 2005 (IRn IR, no PRF, MT, raw mixed 2-Step RSV) OKAPI Zprise IR, no PRF, MT, Mixed 2-Step RSV by using Logistic Regresion and local score OKAPI Zprise IR, PRF, MT, Mixed 2-Step RSV by using Logistic Regresion and local score % 19.02%

University of Jaén - SINAI at CLEF CLEF 2005 Multi-8 Merging-only Results The relevant documents are available for the task from Neuchatel Bilingual Runs from CLEF YearMain FeaturesAvgP 2003 Case Base 2003 (OKAPI Zprise IR, MDR, 2-Step RSV) OKAPI Zprise IR, PRF, MT, Mixed 2-Step RSV by using Logistic Regression and local score OKAPI Zprise IR, no PRF, MT, raw mixed 2-Step RSV PROSIT Documents OKAPI Zprise IR, no PRF, MT, raw mixed 2-Step RSV OKAPI Documents OKAPI Zprise IR, PRF, MT, Mixed 2-Step RSV by using Logistic Regression and local score DataFusion Documents % 22.29%

University of Jaén - SINAI at CLEF CLEF 2005 Conclusions and Future Work Conclusions: – Our 2003 CLIR system has been improved using Machine Translators instead of Machine Dictionaries Readable, around a 20% in terms of Average Precision. – This year we have tested more IR systems, based on Document and Passage Retrieval. Last one gets a bit better results. – Machine Learning Methods not produce an important improvement, because monolingual CLEF collections are quite comparable. Future Work: – To improve the translation and the Alignment Algorithm. – To test and to improve the IR system for the second phase.

University of Jaén - SINAI at CLEF Thanks “ SINAI at CLEF 2005 : The evolution of the CLEF2003 system.”