Walid Magdy Gareth Jones CLEF, 21 Sep 2010 Examining the Robustness of Evaluation Metrics for Patent Retrieval with Incomplete Relevance Judgements Walid Magdy Gareth Jones Dublin City University
Patent Retrieval Search collection of patents for relevant ones Objective: find all possible relevant documents Search: takes much longer Users: professionals and more patient IR Campaigns: NTCIR, TREC, CLEF Evaluation: MAP, recall, PRES Focuses on finding more relevant documents in relative good ranks Focuses on finding relevant documents earlier Focuses on finding more relevant documents W. Magdy and G. Jones. PRES: a score metric for evaluating recall-oriented information retrieval applications. SIGIR 2010
What’s up? Missing a relevant document in patent search is harmful What about missing it in the relevance judgements? How evaluation metrics will be affected? Are the metrics robust in evaluating systems? Bompad et al. On the robustness of relevance measures with incomplete judgements. SIGIR 2007
Data Used CLEF-IP 2009 qrels for 400 topics Avg. number of relevant documents per topic = 6 48 runs submitted by 15 participants Runs ranked according to MAP, recall, and PRES
Experimental Setup Create versions of incomplete judgements (20%, 40%, 60%, 80% of the qrels) Re-compute scores with the new judgements Re-rank runs according to new scores Monitor the change in ranking Measure correlation between ranking using Kendall Tau The higher the correlation the more robust the metric
Results Voorhees E. M. Evaluation by highly relevant documents. SIGIR 2001 Kendall tau > 0.9: nearly equivalent ranking Kendall tau < 0.8: noticeable change in ranking
Conclusion MAP is not a robust score for evaluating patent search when relevance judgements are incomplete PRES & recall are more robust
Recommendation Based on metrics robustness + performance for patent search evaluation Stop using MAP - does not reflect system recall - not robust with incomplete judgements Start using PRES - reflects system recall + quality of ranking - highly robust with incomplete judgements Get PRESeval from: www.computing.dcu.ie/~wmagdy/PRES.htm
Thank you Get PRESeval from: www.computing.dcu.ie/~wmagdy/PRES.htm
Number of Relevant Docs per Topic