Download presentation
Presentation is loading. Please wait.
Published byTorben Kronborg Modified over 5 years ago
1
Cheshire at GeoCLEF 2008: Text and Fusion Approaches for GIR
Ray R Larson School of Information University of California, Berkeley
2
Motivation In previous GeoCLEF evaluations we found very mixed results in using various methods of query expansion, attempts at explicit geographic constraints, etc. Last year we decided to try just our “basic” retrieval method I.e., Logistic regression with blind feedback The goal was to establish baseline data that we can use to test selective additions in later experiments GeoCLEF Aarhus
3
Motivation Because the “baselines” worked well last year, we decided to continue with them and begin testing “fusion” approaches for combining the results of different retrieval algorithms This was due in part to Neuchatel’s use of fusion approaches with good results and our previous use of fusion approaches in earlier CLEF tasks GeoCLEF Aarhus
4
Experiments TD, TDN, and TDN Fusion for Monolingual English, German, Portuguese (9 runs) TD, TDN, and TDN Fusion for Bilingual X to English, German, and Portuguese (18 runs) GeoCLEF Aarhus
5
Monolingual GeoCLEF Aarhus
6
Monolingual Run Name Task Characteristics MAP BERKGCMODETD
Monolingual German TD auto * BERKGCMODETDN TDN auto 0.205 BERKMODETDNPIV TDN auto fusion 0.2292 BERKGCMOENTD Monolingual English 0.2652 BERKGCMOENTDN 0.2001 BERKMOENTDNPIV * BERKGCMOPTTD Monolingual Portuguese 0.217 BERKGCMOPTTDN 0.1741 BERKMOPTTDNPIV * GeoCLEF Aarhus
7
Bilingual GeoCLEF Aarhus
8
TDN Fusion B: TDN OKAPI BM-25 Result A: TD Logistic Regression with
Blind Feedback Result NewWt= (B*piv) + (A*(1-piv)) (piv = 0.29) A and B Normalized using MinMax to [0:1] Final Result GeoCLEF Aarhus
9
Results Fusion of Logistic regression with blind feedback and Okapi BM-25 resulted in most of our best performing runs Not always dramatic improvement With a single algorithm use of the Narrative is counter-productive. Using Title and Description provides better results with these algorithms Does blind feedback accomplish some of the geographic expansion explicit in the narrative? GeoCLEF Aarhus
10
Comparison of Berkeley Results 2006, 2007-2008
Task MAP 2006 2007 2008 Pct. Diff ‘07-’08 Monolingual English 0.250 0.264 0.268* 1.493 Monolingual German 0.215 0.139 0.230 39.565 Monolingual Portuguese 0.162 0.174 0.231* 24.675 Bilingual English -> German 0.156 0.090 0.225* 60.000 Bilingual English -> Portuguese 0.1260 0.201 0.207* 2.899 GeoCLEF Aarhus *using fusion
11
What happened in 2007 German?
We speculated last year that it was No decompounding 2006 used Aitao Chen’s decompounding (no) Worse translation? Possibly - different MT systems were used But same for 2007 and 2008, so no Incomplete stoplist? Was it really the same? (yes) Was stemming the same? (yes) GeoCLEF Aarhus
12
Why did German work better for us in 2008?
That was all speculation, but… It REALLY helps if you include the entire database Our 2007 German runs did not include any documents from the SDA collection! GeoCLEF Aarhus
13
What Next? Finally start adding back true geographic processing and test where and why (and if) results are improved Get decompounding working with German GeoCLEF Aarhus
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.