Download presentation
Presentation is loading. Please wait.
Published byZakary Burbridge Modified over 10 years ago
1
SINAI-GIR A Multilingual Geographical IR System University of Jaén (Spain) José Manuel Perea Ortega CLEF 2008, 18 September, Aarhus (Denmark) Computer Science Department
2
Introduction Preliminary work of SINAI in GeoCLEF: –2006: query expansion using gazetteers and thesaurus [García-Vega et al., 2007] –2007: filtering documents based on manual rules [Perea-Ortega et al., 2007] GeoCLEF 2008: –Filtering documents using new manual rules and new approachs (query reformulation, keywords and hyponyms extraction, query geo-expansion) GeoCLEF 2008, Aarhus
3
Multilingual Query English collection IR Subsystem GeoNames Final Re-Ranked Documents retrieved TRANSLATOR QUERY ANALYZER English Query (Q) Q1 Q2 Q3 CollectionPreprocessingsubsystem GeoNames VALIDATOR Documents retrieved Keywords and geo-information extracted SINAI-GIR System overview
4
Multilingual Query English collection IR Subsystem GeoNames Final Re-Ranked Documents retrieved TRANSLATOR QUERY ANALYZER English Query (Q) Q1 Q2 Q3 CollectionPreprocessingsubsystem GeoNames VALIDATOR Documents retrieved Keywords and geo-information extracted SINAI-GIR System overview Translates the queries from other languages into English We have used SINTRAM (SINai TRAnslation Module) [García-Cumbreras et al., 2007] It works with different online machine translators
5
Multilingual Query English collection IR Subsystem GeoNames Final Re-Ranked Documents retrieved TRANSLATOR QUERY ANALYZER English Query (Q) Q1 Q2 Q3 CollectionPreprocessingsubsystem GeoNames VALIDATOR Documents retrieved Keywords and geo-information extracted SINAI-GIR System overview Preprocessing: stemming, stopwords, POS The toponyms are extracted (NER) Two indexes are generated: Locations Keywords
6
Multilingual Query English collection IR Subsystem GeoNames Final Re-Ranked Documents retrieved TRANSLATOR QUERY ANALYZER English Query (Q) Q1 Q2 Q3 CollectionPreprocessingsubsystem GeoNames VALIDATOR Documents retrieved Keywords and geo-information extracted SINAI-GIR System overview Query Preprocessing: stemming, stopwords, removes irrelevant information The toponyms are extracted (NER) Spatial relations finder based on manual rules Query reformulation based on POS tagging and query parsing subtask Geo-expansion using a gazetteer Keywords/Hyponyms detection
7
Multilingual Query English collection IR Subsystem GeoNames Final Re-Ranked Documents retrieved TRANSLATOR QUERY ANALYZER English Query (Q) Q1 Q2 Q3 CollectionPreprocessingsubsystem GeoNames VALIDATOR Documents retrieved Keywords and geo-information extracted SINAI-GIR System overview Lemur as index-search engine Okapi with PRF as weighting function
8
Multilingual Query English collection IR Subsystem GeoNames Final Re-Ranked Documents retrieved TRANSLATOR QUERY ANALYZER English Query (Q) Q1 Q2 Q3 CollectionPreprocessingsubsystem GeoNames VALIDATOR Documents retrieved Keywords and geo-information extracted SINAI-GIR System overview Filter the list of documents recovered by the IR subsystem, applying different manual rules and using the geographical data detected in the query Re-rank the documents using predefined weights for each rule and the keywords/hyponyms detected in the query
9
Experiments description 15 experimentsSINAI has participated in mono and bilingual tasks with a total of 15 experiments: –MONO-EN: 9 experiments –BILI-X2EN: 6 experiments Combining the content of topic labels: TD or TDN BaselineBaseline: Q 1 without applying any filtering or re- ranking process Other experimentsOther experiments: –Filtering and re-ranking of the fusion list of the documents recovered by the Q 1, Q 2 and Q 3 –Using keywords and/or hyponyms in the re- ranking process GeoCLEF 2008, Aarhus
10
MONO-EN results GeoCLEF 2008, Aarhus baseline Best result: baseline (no filtering and no re-ranking) In some filtering experiments the use of keywords improves the results Best results using only the TD topic labels
11
BILI-X2EN results GeoCLEF 2008, Aarhus baseline Best result: baseline (no filtering and no re-ranking) with Portuguese topics Best results using only the TD topic labels
12
Conclusions The baseline experiment seems to work well because we include the geo-information in the retrieval process The filtering of documents does not seem to work well because we include the geo-information in the query and we are re-ranking documents which maybe are not relevant with respect to their content The use of keywords for re-ranking the documents retrieved could be interesting because in some experiments it improves the results obtained without using them Query reformulation could be also interesting because for some topics it retrieves valid documents which are not retrieved with the default query GeoCLEF 2008, Aarhus
13
TextMESS at GeoCLEF 2008 TextMESS projectSpanish TextMESS project (Intelligent, Interactive and Multilingual Text Mining based on Human Language Technologies): joint participation by the Polytechnic University of Valencia and University of Jaén (SINAI) merging algorithm based on fuzzy Borda voting scheme, taking as input the two document lists returned by both systemsMethod employed: merging algorithm based on fuzzy Borda voting scheme, taking as input the two document lists returned by both systems Second best result in the monolingual English task GeoCLEF 2008, Aarhus
14
Thank you GeoCLEF 2008, Aarhus sinai.ujaen.es
15
References –García-Vega, Manuel and García-Cumbreras, Miguel A. and Ureña- López, L.A. and Perea-Ortega, José M. GEOUJA System. The first participation of the University of Jaén at GEOCLEF 2006. In LNCS, volume 4730, pages 913-917. Springer-Verlag, 2007. –Perea-Ortega, Jose M. and García-Cumbreras, Miguel A. and García- Vega, Manuel and Montejo-Ráez, Arturo. GEOUJA System. University of Jaén at GEOCLEF 2007. In Proceedings of the Cross Language Evaluation Forum (CLEF 2007), page 52, 2007. –García-Cumbreras, Miguel A. and Ureña-López, L. Alfonso and Martínez- Santiago, Fernando and Perea-Ortega, José M. BRUJA System. The University of Jaén at the Spanish task of QA@CLEF 2006. In LNCS, volume 4730, pages 328-338. Springer-Verlag, 2007. GeoCLEF 2008, Aarhus http://sinai.ujaen.es
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.