CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists Luo Si & Jamie Callan Language Technology Institute School of Computer.

Slides:



Advertisements
Similar presentations
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Advertisements

Retrieval of Authentic Documents for Reader- Specific Lexical Practice Jonathan Brown Carnegie Mellon University Language Technologies Institute J. Brown.
A Maximum Coherence Model for Dictionary-based Cross-language Information Retrieval Yi Liu, Rong Jin, Joyce Y. Chai Dept. of Computer Science and Engineering.
Multilingual experiments of CLEF 2003 Eija Airio, Heikki Keskustalo, Turid Hedlund, Ari Pirkola University of Tampere, Finland Department of Information.
CONTROL: CLEF-2003 with Open, Transparent Resources Off-Line Monica Rogati Computer Science Department, Carnegie Mellon.
How to Make Manual Conjunctive Normal Form Queries Work in Patent Search Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science.
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Modern Information Retrieval
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
 Manmatha MetaSearch R. Manmatha, Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts, Amherst.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Federated Search of Text Search Engines in Uncooperative Environments Luo Si Language Technology Institute School of Computer Science Carnegie Mellon University.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Sigir’99 Inside Internet Search Engines: Search Jan Pedersen and William Chang.
Carnegie Mellon Exact Maximum Likelihood Estimation for Word Mixtures Yi Zhang & Jamie Callan Carnegie Mellon University Wei Xu.
Online Learning for Web Query Generation: Finding Documents Matching a Minority Concept on the Web Rayid Ghani Accenture Technology Labs, USA Rosie Jones.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Distributed Information Retrieval Jamie Callan Carnegie Mellon University
Federated Search of Text Search Engines in Uncooperative Environments Luo Si Language Technology Institute School of Computer Science Carnegie Mellon University.
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
Chapter 5: Information Retrieval and Web Search
A New Approach for Cross- Language Plagiarism Analysis Rafael Corezola Pereira, Viviane P. Moreira, and Renata Galante Universidade Federal do Rio Grande.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
MPI Informatik 1/17 Oberseminar AG5 Result merging in a Peer-to-Peer Web Search Engine Supervisors: Speaker : Sergey Chernov Prof. Gerhard Weikum Christian.
Concept Unification of Terms in Different Languages for IR Qing Li, Sung-Hyon Myaeng (1), Yun Jin (2),Bo-yeong Kang (3) (1) Information & Communications.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
“ SINAI at CLEF 2005 : The evolution of the CLEF2003 system.” Fernando Martínez-Santiago Miguel Ángel García-Cumbreras University of Jaén.
CSE 6331 © Leonidas Fegaras Information Retrieval 1 Information Retrieval and Web Search Engines Leonidas Fegaras.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Carnegie Mellon School of Computer Science Copyright © 2001, Carnegie Mellon. All Rights Reserved. JAVELIN Project Briefing 1 AQUAINT Phase I Kickoff December.
Cross-Language Evaluation Forum (CLEF) IST Expected Kick-off Date: August 2001 Carol Peters IEI-CNR, Pisa, Italy Carol Peters: blabla Carol.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
Chapter 6: Information Retrieval and Web Search
Parallel and Distributed Searching. Lecture Objectives Review Boolean Searching Indicate how Searches may be carried out in parallel Overview Distributed.
Evaluation INST 734 Module 5 Doug Oard. Agenda Evaluation fundamentals Test collections: evaluating sets  Test collections: evaluating rankings Interleaving.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
MIND: An architecture for multimedia information retrieval in federated digital libraries Henrik Nottelmann University of Dortmund, Germany.
How robust is CLIR? Proposal for a new robust task at CLEF Thomas Mandl Information Science Universität Hildesheim 6 th Workshop.
JAVELIN Project Briefing AQUAINT Program 1 AQUAINT Workshop, October 2005 JAVELIN Project Briefing Eric Nyberg, Teruko Mitamura, Jamie Callan, Robert Frederking,
CLEF2003 Forum/ August 2003 / Trondheim / page 1 Report on CLEF-2003 ML4 experiments Extracting multilingual resources from corpora N. Cancedda, H. Dejean,
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
8 December 1997Industry Day Applications of SuperTagging Raman Chandrasekar.
Learning a Monolingual Language Model from a Multilingual Text Database Rayid Ghani & Rosie Jones School of Computer Science Carnegie Mellon University.
A Logistic Regression Approach to Distributed IR Ray R. Larson : School of Information Management & Systems, University of California, Berkeley --
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Multilingual Search Shibamouli Lahiri
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Document Clustering and Collection Selection Diego Puppin Web Mining,
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
The Effect of Database Size Distribution on Resource Selection Algorithms Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University.
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
Relevant Document Distribution Estimation Method for Resource Selection Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University
1 Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan, MIT Susan T. Dumais, Microsoft Eric Horvitz, Microsoft SIGIR 2005.
A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie.
Federated text retrieval from uncooperative overlapped collections Milad Shokouhi, RMIT University, Melbourne, Australia Justin Zobel, RMIT University,
Multilingual Search using Query Translation and Collection Selection Jacques Savoy, Pierre-Yves Berger University of Neuchatel, Switzerland
Yiming Yang1,2, Abhay Harpale1 and Subramanian Ganaphathy1
Information Retrieval in Practice
Text Based Information Retrieval
Applying Key Phrase Extraction to aid Invalidity Search
Chapter 5: Information Retrieval and Web Search
Presentation transcript:

CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists Luo Si & Jamie Callan Language Technology Institute School of Computer Science Carnegie Mellon University

2 CMU, Two-Years On Task: Basic Idea Combination of different bilingual CLIR methods is helpful –What to translate: Queries, documents –How to translate: Parallel corpus, MT software Two ways to combine results of different methods and languages: –Combine all results for language l into a language-specific result, then combine results from all languages into a final result »Starts by combining scores from multiple methods, which may be incompatible –Combine all results from method r into a multilingual result, then combine results from all methods into a final result »Easier, because it starts by combining scores from a single method, which may be more compatible How What QD || Corpus MT S/W

3 CMU, Two-Years On Task: Combination Method Our Strategy: Combining Multiple Simple Multilingual Ranked Lists Lang 1Lang N Retrieval Method 1 Lang 1Lang N Retrieval Method M Normalization Raw Score Multilingual List 1 Raw Score Multilingual List M (a) Equal Weight Combination (b) Learned Weight Combination by maximizing regularized MAP Final Multilingual Lists

4 CMU, Two-Years On Task: Evaluation Results Qry_fb (1)Doc_nofb (2)Qry_nofb (3)Doc_fb (4)UniNE (5) MAP Mean average precision of each multilingual retrieval method Qry/Doc: what was translated Fb/Nofb: with/without pseudo relevance back. UniNE: UniNE system. Mean average precision by combining multiple multilingual retrieval results M4_W1M4_TrnM5_W1M5_Trn MAP M4/M5: Combine models (1)-(4) / (1)-(5); W1/Trn: Equal or learned weights Combined results of multiple methods provides large improvement

5 CMU, Result Merging Task: Basic Idea We treat the task as a multilingual federated search problem –All documents in language l are stored in resource (search engine) s »So there are several independent search engines –Downloading and indexing documents takes time »Do as little of this as possible »Do as rarely as possible at retrieval (query) time Goals: –A high-quality merged list –Low communication and computation costs

6 CMU, Result Merging Task: Basic Idea Offline: Sample documents from each resource –To estimate corpus statistics (e.g., IDF) Online: –Calculate comparable scores for top ranked documents in each language »Combine scores of query-based and doc-based translation methods »Build language-specific query-specific logistic models to transform language-specific scores to comparable scores [Si & Callan, SIGIR 02] –Estimate comparable scores for all retrieved documents in each language »Combine them with exact comparable scores if available –Use comparable scores to create a merged multilingual result list

7 CMU, Result Merging Task: Language-Specific Query-Specific Model Offline: Sample 3,000 documents from each language to estimate corpus statistics Retrieval time: Calculate comparable scores for top ranked documents Estimate language-specific query- specific model to map source- specific scores to comparable scores Merge docs by the combination of estimated comparable scores and accurate scores Source1 (Lang 1) Engine N Sample Doc DB (Corpus Stat) Source1 (Lang 2) Estimated Comparable Scores Source-specific scores Final Results Exact Comparable scores

8 CMU, Result Merging Task: Evaluation Results Language-specific logistic models are used to map resource-specific scores to comparable (resource-independent) scores Should the models be query-specific or query-independent? Mean average precision of language-specific query-independent models (UniNE) –TrainLog_MLE (logistic model by maximizing MLE): –TrainLog_MAP (logistic model by maximizing MAP): Mean average precision of language-specific query-specific models (UniNE) C_1000C_500Top_150_C05Top_10_C05Top_5_C05 MAP C_X: top X docs from each list merged by exact comparable scores. Top_X_0.5: top X docs from each list downloaded for logistic model to estimate comparable scores and combine them with exact scores by equal weight Language-specific query-specific merging methods have a big advantage