PRES A Score Metric for Evaluating Recall- Oriented IR Applications Walid Magdy Gareth Jones Dublin City University SIGIR, 22 July 2010
Recall-Oriented IR Examples: patent search and legal search Objective: find all possible relevant documents Search: takes much longer Users: professionals and more patient IR Campaigns: NTCIR, TREC, CLEF Evaluation: mainly MAP!!!
Current Evaluation Metrics For a topic with 4 relevant docs and 1 st 100 docs are to be checked: System1: relevant ranks = {1} System2: relevant ranks = {50, 51, 53, 54} System3: relevant ranks = {1, 2, 3, 4} System4: relevant ranks = {1, 98, 99, 100} AP system1 = 0.25 AP system2 = AP system3 = 1 R system1 = 0.25 R system2 = 1 R system3 = 1 F 1 system1 = F 1 system2 = F 1 system3 = F 1 system1 = 0.25 F 1 system2 = F 1 system3 = 1 F 4 system1 = 0.25 F 4 system2 = F 4 system3 = 1 AP system4 = R system4 = 1 F 4 system4 = 0.864
Normalized Recall (R norm ) R norm is the area between the actual case and the worst as a proportion of the area between the best and the worst. N: collection size n: number of relevant docs r i : the rank at which the i th relevant document is retrieved
Applicability of R norm R norm requires the following: 1. Known collection size (N) 2. Number of relevant documents (qrels) (n) 3. Retrieving documents till reaching 100% recall (r i ) Workaround: – Un-retrieved relevant docs are considered as worst case – For large scale document collection: R norm ≈ Recall
N worst_case = N max + n For recall = 1 n/N max ≤ ≤ 1 For recall = R nR 2 /N max ≤ ≤ R R norm Modification PRES: Patent Retrieval Evaluation Score PRESR norm | M PRESR norm | M
PRES Performance For a topic with 4 relevant docs and 1 st 100 docs are to be checked: System1: relevant ranks = {1} System2: relevant ranks = {50, 51, 53, 54} System3: relevant ranks = {1, 2, 3, 4} System4: relevant ranks = {1, 98, 99, 100} APR/R norm F4F4 PRES System10.25 System System31111 System n = 4, N max = 100
Average Performance 48 runs in CLEF-IP 2009 PRES vs MAP vs Recall Change in Scores Change in Ranking N max = 1000 Run IDMAPRecallPRES R R R R R PRES MAPRecall Correlation
PRES Designed for recall-oriented applications Gives higher score for systems achieving higher recall and better average relative ranking Designed for laboratory testing Dependent on user’s potential/effort (N max ) Going to be applied in CLEF-IP 2010 PRESeval Get PRESeval from:
What should I say next? Let me check What should I say? MAP system Thank bla Recall system bla Thank you PRES system bla Thank you bla Thank you