Presentation is loading. Please wait.

Presentation is loading. Please wait.

INAOE at GeoCLEF 2008: A Ranking Approach based on Sample Documents Esaú Villatoro-Tello Manuel Montes-y-Gómez Luis Villaseñor-Pineda Language Technologies.

Similar presentations


Presentation on theme: "INAOE at GeoCLEF 2008: A Ranking Approach based on Sample Documents Esaú Villatoro-Tello Manuel Montes-y-Gómez Luis Villaseñor-Pineda Language Technologies."— Presentation transcript:

1 INAOE at GeoCLEF 2008: A Ranking Approach based on Sample Documents Esaú Villatoro-Tello Manuel Montes-y-Gómez Luis Villaseñor-Pineda Language Technologies Laboratory National Institute of Astrophysics, Optics and Electronics Tonantzintla, México mmontesg@inaoep.mx http://ccc.inaoep.mx/~mmontesg

2 General ideas Our system focuses on the ranking process It is based on the following hypotheses: –Current IR machines are able to retrieve relevant documents for geographic queries –Complete documents provide more and better elements for the ranking than isolated query-terms We aimed to show that: –Using some query-related sample texts it is possible to improve the final ranking of some retrieved documents

3 General architecture of our system Query Retrieved documents (small) Document Collection First Stage (Retrieval Stage) IR Machine Feedback Process Selected Sample Texts Retrieved documents (large) Second Stage (Ranking stage) Query Expansion Re-ranked documents Re-ranking Process

4 Re-ranking process Similarity Calculation Information Fusion Sample Texts Retrieved Documents Re-Ranked list of Documents 1 2 r |R| 1 2 s |S| Geonames DB Geo- Expansion Process Different ranking proposals 1 2 r |R|

5 System configuration Traditional modules IR Machine: –Based on LEMUR –Retrieves 1000 documents (original/expanded queries) Feedback module –Based on blind relevance feedback –Selects the top 5 retrieved documents (sample texts) Query Expansion –Adds to the original query the five most frequent terms from the sample texts

6 System Configuration Re-ranking module Geo-Expansion: –Geo-terms are identified using NER LingPipe –Expands geo-terms of sample texts by adding their two nearest ancestors (Paris  France, Europe) Similarity Calculation: –Considers thematic and geographic similarities; it is based on the cosine formula Information Fusion: –Merges into one single list all different ranking proposals, using the Round-Robin technique

7 Evaluation points Query Retrieved documents (small) Document Collection First Stage (Retrieval Stage) IR Machine Feedback Process Selected Sample Texts Retrieved documents (large) Second Stage (Ranking stage) Query Expansion Re-ranked documents Re-ranking Process 1st EP 2nd EP 3rd EP

8 Experimental results Submitted runs Eval. Point Experiment IDMAPR-PrecP@5RecallExperiment Description: 1 st inaoe-BASELINE1 0.2340.2610.3840.835 Title + Description inaoe-BASELINE2 0.2010.2260.2720.815 Title + Description + Narrative 2 nd inaoe-BRF-5-5 0.2460.2640.3280.863 Baseline1 + 5 term (from 5 docs) 3 rd inaoe-RRBF-5-5 0.2410.2680.3840.863 re-rank: BRF-5-5, without any distinction inaoe-RRGeo-5-5 0.2440.2660.3840.863 re-rank: BRF-5-5, distinction (thematic, geographic) inaoe-RRGeoExp-5-5 0.2460.2700.3840.863 re-rank: BRF-5-5, distinction (thematic, geographic + expansion) +4.87%+3.33%+0%+3.24%

9 Experimental results Additional runs Sample texts were manually selected (from Inaoe-BASELINE1) Two documents were selected in average for each topic Eval. Point Experiment IDMAPR-PrecP@5RecallExperiment Description 1 st Inaoe-BASELINE1 0.2340.2610.3840.835 Title + Description 2 nd inaoe-BRF-5- 0.2910.2830.3920.863 Baseline1 + 5 term (2* docs) 3 rd inaoe-RRBF-5- 0.3060.3040.4960.863 re-rank:BRF-5-2*, without any distinction inaoe-RRGeo-5- 0.3150.3070.5200.863 re-rank: BRF-5-2*, distinction (thematic, geographic) inaoe-RRGeoExp-5- 0.3180.3100.5360.863 re-rank:BRF-5-2*, distinction (thematic, geographic +expansion) +26.4%+15.8%+28.3%+3.24%

10 Final remarks Results showed that the query-related sample texts allow improving the original ranking of the retrieved documents Our experiments also showed that the proposed method is very sensitive to the presence of incorrect sample texts Since our geo-expansion process is still very simple, we believe it is damaging the performance of the method Ongoing Work A new sample text selection method A new strategy for geographic expansion that considers a more precise disambiguation strategy

11 Thank you! Manuel Montes y Gómez Language Technologies Laboratory National Institute of Astrophysics, Optics and Electronics Tonantzintla, México mmontesg@inaoep.mx http://ccc.inaoep.mx/~mmontesg


Download ppt "INAOE at GeoCLEF 2008: A Ranking Approach based on Sample Documents Esaú Villatoro-Tello Manuel Montes-y-Gómez Luis Villaseñor-Pineda Language Technologies."

Similar presentations


Ads by Google