Cheshire at GeoCLEF 2008: Text and Fusion Approaches for GIR

Slides:



Advertisements
Similar presentations
SINAI-GIR A Multilingual Geographical IR System University of Jaén (Spain) José Manuel Perea Ortega CLEF 2008, 18 September, Aarhus (Denmark) Computer.
Advertisements

Nuno Cardoso, Bruno Martins, Marcirio Chaves, Leonardo Andrade and Mário J. Silva XLDB Group - Department of Informatics Faculdade de Ciências da Universidade.
Cross-Language Retrieval INST 734 Module 11 Doug Oard.
Current and Future Research Directions University of Tehran Database Research Group 1 October 2009 Abolfazl AleAhmad, Ehsan Darrudi, Hadi.
Disambiguating Queries for Geographic Information Retrieval Carolyn Hafernik Thesis Proposal May 10, 2006 Computer Science Advisor: Lisa Ballesteros.
Multilingual experiments of CLEF 2003 Eija Airio, Heikki Keskustalo, Turid Hedlund, Ari Pirkola University of Tampere, Finland Department of Information.
CONTROL: CLEF-2003 with Open, Transparent Resources Off-Line Monica Rogati Computer Science Department, Carnegie Mellon.
The XLDB Group at GeoCLEF 2005 Nuno Cardoso, Bruno Martins, Marcírio Chaves, Leonardo Andrade, Mário J. Silva
December 9, 2002 Cheshire II at INEX -- Ray R. Larson Cheshire II at INEX: Using A Hybrid Logistic Regression and Boolean Model for XML Retrieval Ray R.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
LREC Combining Multiple Models for Speech Information Retrieval Muath Alzghool and Diana Inkpen University of Ottawa Canada.
A New Approach for Cross- Language Plagiarism Analysis Rafael Corezola Pereira, Viviane P. Moreira, and Renata Galante Universidade Federal do Rio Grande.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
August 21, 2002Szechenyi National Library Support for Multilingual Information Access Douglas W. Oard College of Information Studies and Institute for.
CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.
1 The Domain-Specific Track at CLEF 2008 Vivien Petras & Stefan Baerisch GESIS Social Science Information Centre, Bonn, Germany Aarhus, Denmark, September.
CLEF 2004 – Interactive Xling Bookmarking, thesaurus, and cooperation in bilingual Q & A Jussi Karlgren – Preben Hansen –
CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists Luo Si & Jamie Callan Language Technology Institute School of Computer.
“ SINAI at CLEF 2005 : The evolution of the CLEF2003 system.” Fernando Martínez-Santiago Miguel Ángel García-Cumbreras University of Jaén.
2012: Monolingual and Crosslingual SMS-based FAQ Retrieval Johannes Leveling CNGL, School of Computing, Dublin City University, Ireland.
Applying the KISS Principle with Prior-Art Patent Search Walid Magdy Gareth Jones Dublin City University CLEF-IP, 22 Sep 2010.
The CLEF 2003 cross language image retrieval task Paul Clough and Mark Sanderson University of Sheffield
Cross-Language Evaluation Forum (CLEF) IST Expected Kick-off Date: August 2001 Carol Peters IEI-CNR, Pisa, Italy Carol Peters: blabla Carol.
MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.
Interactive Probabilistic Search for GikiCLEF Ray R Larson School of Information University of California, Berkeley Ray R Larson School of Information.
IIIT Hyderabad’s CLIR experiments for FIRE-2008 Sethuramalingam S & Vasudeva Varma IIIT Hyderabad, India 1.
A merging strategy proposal: The 2-step retrieval status value method Fernando Mart´inez-Santiago · L. Alfonso Ure ˜na-L´opez · Maite Mart´in-Valdivia.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Thomas Mandl: GeoCLEF Track Overview th Workshop of the Cross-Language Evaluation Forum (CLEF) Århus, 18 th Sept
GeoCLEF Breakout Notes Fred Gey, Ray Larson, Paul Clough.
CLEF 2008 Workshop September 17-19, 2008 Aarhus, Denmark.
Information Retrieval at NLC Jianfeng Gao NLC Group, Microsoft Research China.
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
Decomposing Text Processing for Retrieval: Cheshire tries Ray R Larson School of Information University of California, Berkeley Ray R Larson.
CLEF Kerkyra Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Arantxa Otegi UNIPD: Giorgio Di Nunzio UH: Thomas Mandl.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
1 Flexible and Efficient Toolbox for Information Retrieval MIRACLE group José Miguel Goñi-Menoyo (UPM) José Carlos González-Cristóbal (UPM-Daedalus) Julio.
Thomas Mandl: GeoCLEF Track Overview Cross-Language Evaluation Forum (CLEF) Thomas Mandl, (U. Hildesheim) 8 th Workshop.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Iterative Translation Disambiguation for Cross-Language.
Stiftung Wissenschaft und Politik German Institute for International and Security Affairs CLEF 2005: Domain-Specific Track Overview Michael Kluck SWP,
INAOE at GeoCLEF 2008: A Ranking Approach based on Sample Documents Esaú Villatoro-Tello Manuel Montes-y-Gómez Luis Villaseñor-Pineda Language Technologies.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
A Logistic Regression Approach to Distributed IR Ray R. Larson : School of Information Management & Systems, University of California, Berkeley --
Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.
Thomas Mandl: Robust CLEF Overview 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl Information Science Universität Hildesheim
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Study of Learning a Merge Model for Multilingual Information.
1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch & Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,
Personalizing Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
Information and Communication Technologies 1 Overview of GeoCLEF 2007 IR techniques IE/NLP techniques GIR techniques Systems Resources Experiments Translation.
GeoCLEF topic creation Mark Sanderson. 21/03/2016© The University of Sheffield / Department of Marketing and Communications Topics 25 adhoc topics Developed.
1 SINAI at CLEF 2004: Using Machine Translation resources with mixed 2-step RSV merging algorithm Fernando Martínez Santiago Miguel Ángel García Cumbreras.
How to develop a program
Multilingual Search using Query Translation and Collection Selection Jacques Savoy, Pierre-Yves Berger University of Neuchatel, Switzerland
A Formal Study of Information Retrieval Heuristics
F. López-Ostenero, V. Peinado, V. Sama & F. Verdejo
6 ~ GIR.
Multilingual Indexes for Detection and Translation
Experiments for the CL-SR task at CLEF 2006
How to develop a program
Theory Vs. Law.
How to develop a program
Chengyu Sun California State University, Los Angeles
Engineering Portfolio
Engineering Portfolio
How to develop a program
Chengyu Sun California State University, Los Angeles
Introduction to Search Engines
Presentation transcript:

Cheshire at GeoCLEF 2008: Text and Fusion Approaches for GIR Ray R Larson School of Information University of California, Berkeley

Motivation In previous GeoCLEF evaluations we found very mixed results in using various methods of query expansion, attempts at explicit geographic constraints, etc. Last year we decided to try just our “basic” retrieval method I.e., Logistic regression with blind feedback The goal was to establish baseline data that we can use to test selective additions in later experiments GeoCLEF 2008 -- Aarhus

Motivation Because the “baselines” worked well last year, we decided to continue with them and begin testing “fusion” approaches for combining the results of different retrieval algorithms This was due in part to Neuchatel’s use of fusion approaches with good results and our previous use of fusion approaches in earlier CLEF tasks GeoCLEF 2008 -- Aarhus

Experiments TD, TDN, and TDN Fusion for Monolingual English, German, Portuguese (9 runs) TD, TDN, and TDN Fusion for Bilingual X to English, German, and Portuguese (18 runs) GeoCLEF 2008 -- Aarhus

Monolingual GeoCLEF 2008 -- Aarhus

Monolingual Run Name Task Characteristics MAP BERKGCMODETD Monolingual German TD auto 0.2295 * BERKGCMODETDN TDN auto 0.205 BERKMODETDNPIV TDN auto fusion 0.2292 BERKGCMOENTD Monolingual English 0.2652 BERKGCMOENTDN 0.2001 BERKMOENTDNPIV 0.2685 * BERKGCMOPTTD Monolingual Portuguese 0.217 BERKGCMOPTTDN 0.1741 BERKMOPTTDNPIV 0.2310 * GeoCLEF 2008 -- Aarhus

Bilingual GeoCLEF 2008 -- Aarhus

TDN Fusion B: TDN OKAPI BM-25 Result A: TD Logistic Regression with Blind Feedback Result NewWt= (B*piv) + (A*(1-piv)) (piv = 0.29) A and B Normalized using MinMax to [0:1] Final Result GeoCLEF 2008 -- Aarhus

Results Fusion of Logistic regression with blind feedback and Okapi BM-25 resulted in most of our best performing runs Not always dramatic improvement With a single algorithm use of the Narrative is counter-productive. Using Title and Description provides better results with these algorithms Does blind feedback accomplish some of the geographic expansion explicit in the narrative? GeoCLEF 2008 -- Aarhus

Comparison of Berkeley Results 2006, 2007-2008 Task MAP 2006 2007 2008 Pct. Diff ‘07-’08 Monolingual English 0.250 0.264 0.268* 1.493 Monolingual German 0.215 0.139 0.230 39.565 Monolingual Portuguese 0.162 0.174 0.231* 24.675 Bilingual English -> German 0.156 0.090 0.225* 60.000 Bilingual English -> Portuguese 0.1260 0.201 0.207* 2.899 GeoCLEF 2008 -- Aarhus *using fusion

What happened in 2007 German? We speculated last year that it was No decompounding 2006 used Aitao Chen’s decompounding (no) Worse translation? Possibly - different MT systems were used But same for 2007 and 2008, so no Incomplete stoplist? Was it really the same? (yes) Was stemming the same? (yes) GeoCLEF 2008 -- Aarhus

Why did German work better for us in 2008? That was all speculation, but… It REALLY helps if you include the entire database Our 2007 German runs did not include any documents from the SDA collection! GeoCLEF 2008 -- Aarhus

What Next? Finally start adding back true geographic processing and test where and why (and if) results are improved Get decompounding working with German GeoCLEF 2008 -- Aarhus