Thomas Mandl: Robust CLEF Overview 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl Information Science Universität Hildesheim 8 th Workshop of the Cross-Language Evaluation Forum (CLEF) Budapest 19 Sept Robust Task - Result Overview and Lessons Learned from Robustness Evaluation
Thomas Mandl: Robust CLEF Overview 2Robust?Robust?
3 Robustness Metaphorically A robust tool works under a variety of conditions
Thomas Mandl: Robust CLEF Overview 4Robustness?Robustness? Robust … means … capable of functioning correctly, (or at the very minimum, not failing catastrophically) under a great many conditions. ( Robust IR means the capability of an IR system to work well (and reach at least a minimal performance) under a variety of conditions (topics, difficulty, collections, users, languages …)
Thomas Mandl: Robust CLEF Overview 5 Variety of conditions … Variance between topics
Thomas Mandl: Robust CLEF Overview 6 System Variance
Thomas Mandl: Robust CLEF Overview 7 History of Robust IR Evaluation TREC –Mono-lingual Retrieval – CLEF –Mono-, bi- and Multilingual Retrieval –2006 six languages –2007 three languages
Thomas Mandl: Robust CLEF Overview 8 Robust Task 2007 Again … –Use topics and relevance assessment from previous CLEF campaigns –Take a different perspective and use a robust evaluation measure (GMAP) –Emphasize the diffficult (= low performing) topics
Thomas Mandl: Robust CLEF Overview 9 Training and Test CLEF 2001, 2002 and 2003 for training CLEF 2004, 2005 and 2006 for testing
Thomas Mandl: Robust CLEF Overview 10CollectionsCollections LanguageTarget CollectionTraining Topics Test Topics EnglishLos Angeles Times FrenchLe Monde 1994 Swiss News Agency Portuguese P ú blico
Thomas Mandl: Robust CLEF Overview 11 Robust Task languages (collections and topics) 3 mono-lingual tasks 1 bi-lingual task (English to French) some 300,000 documents about 1 gigabyte of text
Thomas Mandl: Robust CLEF Overview 12ParticipationParticipation 63 runs submitted by 7 groups 2006: 133 runs by 8 groups
Thomas Mandl: Robust CLEF Overview 13ResultsResults
Thomas Mandl: Robust CLEF Overview 14 Results Mono English
Thomas Mandl: Robust CLEF Overview 15 Results Mono Portuguese
Thomas Mandl: Robust CLEF Overview 16ResultsResults
Thomas Mandl: Robust CLEF Overview 17 Results Mono French
Thomas Mandl: Robust CLEF Overview 18 Results Bi-lingual X -> French
Thomas Mandl: Robust CLEF Overview 19ApproachesApproaches Adoption of traditional and “advanced” CLIR methods –BM 25 (Miracle) –N-gram translation (CoLesIR) – (Uni NE) Adoption of “robust” heuristics –Expansion with an external resource (SINAI)
Thomas Mandl: Robust CLEF Overview 20 Percentage of Bad Topics Mono PTMono ENMono FRBi -> FR Best System Average Percentage of Topics which received an MAP below 0.1
Thomas Mandl: Robust CLEF Overview 21TopicsTopics Large improvements are still possible Difficult topics can be solved better TaskTopicAverageBest SystemSystem Nr. 1 Mono PT Mono EN Mono FR Bi -> FR
Thomas Mandl: Robust CLEF Overview 22 Correlation between Measures? Often IR measures correlation highly For a larger topic set – as used in the robust task – the correlation might be even higher –More topics make a test more reliable If correlation is high, it makes no sense to use alternative measures
Thomas Mandl: Robust CLEF Overview 23 Analysis with Reduced Topic Sets Mono-lingual English
Thomas Mandl: Robust CLEF Overview 24 Analysis with Reduced Topic Sets Bi-lingual -> FR
Thomas Mandl: Robust CLEF Overview 25 Analysis with Reduced Topic Sets Mono-lingual Portuguese
Thomas Mandl: Robust CLEF Overview 26 Analysis with Reduced Topic Sets Mono-lingual French
Thomas Mandl: Robust CLEF Overview 27 Analysis with Reduced Topic Sets Multi-lingual 2006
Thomas Mandl: Robust CLEF Overview 28 Robust 2006 MAP GMAP
Thomas Mandl: Robust CLEF Overview 29 Thanks for your Attention