INAOE at GeoCLEF 2008: A Ranking Approach based on Sample Documents Esaú Villatoro-Tello Manuel Montes-y-Gómez Luis Villaseñor-Pineda Language Technologies.

Slides:



Advertisements
Similar presentations
Even More TopX: Relevance Feedback Ralf Schenkel Joint work with Osama Samodi, Martin Theobald.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
SINAI-GIR A Multilingual Geographical IR System University of Jaén (Spain) José Manuel Perea Ortega CLEF 2008, 18 September, Aarhus (Denmark) Computer.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Vikas BhardwajColumbia University NLP for the Web – Spring 2010 Improving QA Accuracy by Question Inversion Prager et al. IBM T.J. Watson Res. Ctr. 02/18/2010.
The XLDB Group at GeoCLEF 2005 Nuno Cardoso, Bruno Martins, Marcírio Chaves, Leonardo Andrade, Mário J. Silva
How to Make Manual Conjunctive Normal Form Queries Work in Patent Search Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science.
Search Engines and Information Retrieval
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
A novel log-based relevance feedback technique in content- based image retrieval Reporter: Francis 2005/6/2.
Modern Information Retrieval Chapter 5 Query Operations.
Re-ranking Documents Segments To Improve Access To Relevant Content in Information Retrieval Gary Madden Applied Computational Linguistics Dublin City.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
Evaluating the Performance of IR Sytems
Web Search – Summer Term 2006 II. Information Retrieval (Basics Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
Chapter 5: Information Retrieval and Web Search
Improving web image search results using query-relative classifiers Josip Krapacy Moray Allanyy Jakob Verbeeky Fr´ed´eric Jurieyy.
Search Engines and Information Retrieval Chapter 1.
CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists Luo Si & Jamie Callan Language Technology Institute School of Computer.
Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa
“ SINAI at CLEF 2005 : The evolution of the CLEF2003 system.” Fernando Martínez-Santiago Miguel Ángel García-Cumbreras University of Jaén.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Applying the KISS Principle with Prior-Art Patent Search Walid Magdy Gareth Jones Dublin City University CLEF-IP, 22 Sep 2010.
1 Query Operations Relevance Feedback & Query Expansion.
MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.
Chapter 6: Information Retrieval and Web Search
Multilingual Retrieval Experiments with MIMOR at the University of Hildesheim René Hackl, Ralph Kölle, Thomas Mandl, Alexandra Ploedt, Jan-Hendrik Scheufen,
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
1 FollowMyLink Individual APT Presentation Third Talk February 2006.
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
CLEF 2008 Workshop September 17-19, 2008 Aarhus, Denmark.
Information Retrieval at NLC Jianfeng Gao NLC Group, Microsoft Research China.
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.
Language Model in Turkish IR Melih Kandemir F. Melih Özbekoğlu Can Şardan Ömer S. Uğurlu.
Recuperação de Informação Cap. 01: Introdução 21 de Fevereiro de 1999 Berthier Ribeiro-Neto.
Performance Measurement. 2 Testing Environment.
Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
WIDIT at TREC-2005 HARD Track Kiduk Yang, Ning Yu, Hui Zhang, Ivan Record, Shahrier Akram WIDIT Laboratory School of Library & Information Science Indiana.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Learning in a Pairwise Term-Term Proximity Framework for Information Retrieval Ronan Cummins, Colm O’Riordan Digital Enterprise Research Institute SIGIR.
Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.
Relevant Document Distribution Estimation Method for Resource Selection Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University
Enterprise Track: Thread-based Retrieval Enterprise Track: Thread-based Retrieval Yejun Wu and Douglas W. Oard Goal Explore -- document expansion.
Information Retrieval and Extraction 2009 Term Project – Modern Web Search Advisor: 陳信希 TA: 蔡銘峰&許名宏.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Bayesian Query-Focused Summarization Slide 1 Hal Daumé III Bayesian Query-Focused Summarization Hal Daumé III and Daniel Marcu Information.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
An Empirical Study of Learning to Rank for Entity Search
6 ~ GIR.
CSE 635 Multimedia Information Retrieval
Relevance and Reinforcement in Interactive Browsing
Retrieval Utilities Relevance feedback Clustering
Information Retrieval and Web Design
Cheshire at GeoCLEF 2008: Text and Fusion Approaches for GIR
Presentation transcript:

INAOE at GeoCLEF 2008: A Ranking Approach based on Sample Documents Esaú Villatoro-Tello Manuel Montes-y-Gómez Luis Villaseñor-Pineda Language Technologies Laboratory National Institute of Astrophysics, Optics and Electronics Tonantzintla, México

General ideas Our system focuses on the ranking process It is based on the following hypotheses: –Current IR machines are able to retrieve relevant documents for geographic queries –Complete documents provide more and better elements for the ranking than isolated query-terms We aimed to show that: –Using some query-related sample texts it is possible to improve the final ranking of some retrieved documents

General architecture of our system Query Retrieved documents (small) Document Collection First Stage (Retrieval Stage) IR Machine Feedback Process Selected Sample Texts Retrieved documents (large) Second Stage (Ranking stage) Query Expansion Re-ranked documents Re-ranking Process

Re-ranking process Similarity Calculation Information Fusion Sample Texts Retrieved Documents Re-Ranked list of Documents 1 2 r |R| 1 2 s |S| Geonames DB Geo- Expansion Process Different ranking proposals 1 2 r |R|

System configuration Traditional modules IR Machine: –Based on LEMUR –Retrieves 1000 documents (original/expanded queries) Feedback module –Based on blind relevance feedback –Selects the top 5 retrieved documents (sample texts) Query Expansion –Adds to the original query the five most frequent terms from the sample texts

System Configuration Re-ranking module Geo-Expansion: –Geo-terms are identified using NER LingPipe –Expands geo-terms of sample texts by adding their two nearest ancestors (Paris  France, Europe) Similarity Calculation: –Considers thematic and geographic similarities; it is based on the cosine formula Information Fusion: –Merges into one single list all different ranking proposals, using the Round-Robin technique

Evaluation points Query Retrieved documents (small) Document Collection First Stage (Retrieval Stage) IR Machine Feedback Process Selected Sample Texts Retrieved documents (large) Second Stage (Ranking stage) Query Expansion Re-ranked documents Re-ranking Process 1st EP 2nd EP 3rd EP

Experimental results Submitted runs Eval. Point Experiment Description: 1 st inaoe-BASELINE Title + Description inaoe-BASELINE Title + Description + Narrative 2 nd inaoe-BRF Baseline1 + 5 term (from 5 docs) 3 rd inaoe-RRBF re-rank: BRF-5-5, without any distinction inaoe-RRGeo re-rank: BRF-5-5, distinction (thematic, geographic) inaoe-RRGeoExp re-rank: BRF-5-5, distinction (thematic, geographic + expansion) +4.87%+3.33%+0%+3.24%

Experimental results Additional runs Sample texts were manually selected (from Inaoe-BASELINE1) Two documents were selected in average for each topic Eval. Point Experiment Description 1 st Inaoe-BASELINE Title + Description 2 nd inaoe-BRF Baseline1 + 5 term (2* docs) 3 rd inaoe-RRBF re-rank:BRF-5-2*, without any distinction inaoe-RRGeo re-rank: BRF-5-2*, distinction (thematic, geographic) inaoe-RRGeoExp re-rank:BRF-5-2*, distinction (thematic, geographic +expansion) +26.4%+15.8%+28.3%+3.24%

Final remarks Results showed that the query-related sample texts allow improving the original ranking of the retrieved documents Our experiments also showed that the proposed method is very sensitive to the presence of incorrect sample texts Since our geo-expansion process is still very simple, we believe it is damaging the performance of the method Ongoing Work A new sample text selection method A new strategy for geographic expansion that considers a more precise disambiguation strategy

Thank you! Manuel Montes y Gómez Language Technologies Laboratory National Institute of Astrophysics, Optics and Electronics Tonantzintla, México