Download presentation
Presentation is loading. Please wait.
Published byArthur Cain Modified over 9 years ago
1
Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation Guillaume Cabanac, Gilles Hubert, Mohand Boughanem, Claude Chrisment CLEF’10: Conference on Multilingual and Multimodal Information Access Evaluation September 20-23, Padua, Italy
2
2 Outline 1.MotivationA tale about two TREC participants 2.ContextIRS effectiveness evaluation IssueTie-breaking bias effects 3.ContributionReordering strategies 4.ExperimentsImpact of the tie-breaking bias 5.Conclusion and Future Works Effect of the Tie-Breaking Bias G. Cabanac et al.
3
3 Outline 1.MotivationA tale about two TREC participants 2.ContextIRS effectiveness evaluation IssueTie-breaking bias effects 3.ContributionReordering strategies 4.ExperimentsImpact of the tie-breaking bias 5.Conclusion and Future Works Effect of the Tie-Breaking Bias G. Cabanac et al.
4
4 A tale about two TREC participants (1/2) 1. Motivation Tie-breaking bias illustration G. Cabanac et al. 5 relevant documents Topic 031 “satellite launch contracts” ChrisEllen one single difference Why such a huge difference? unluckylucky
5
5 A tale about two TREC participants (2/2) 1. Motivation Tie-breaking bias illustration G. Cabanac et al. ChrisEllen one single difference Only difference: the name of one document After 15 days of hard work
6
6 Outline 1.MotivationA tale about two TREC participants 2.ContextIRS effectiveness evaluation IssueTie-breaking bias effects 3.ContributionReordering strategies 4.ExperimentsImpact of the tie-breaking bias 5.Conclusion and Future Works Effect of the Tie-Breaking Bias G. Cabanac et al.
7
7 Measuring the effectiveness of IRSs User-centered vs. System-focused [Spärk Jones & Willett, 1997] Evaluation campaigns 1958CranfieldUK 1992TRECText Retrieval ConferenceUSA 1999NTCIRNII Test Collection for IR SystemsJapan 2001CLEFCross-Language Evaluation ForumEurope … “Cranfield” methodology Task Test collection Corpus Topics Qrels Measures : MAP, P@X... using trec_eval 2. Context & issue Tie-breaking bias G. Cabanac et al. [Voorhees, 2007]
8
8 Runs are reordered prior to their evaluation Qrels = qid, iter, docno, rel Run = qid, iter, docno, rank, sim, run_id Reordering by trec_eval qid asc, sim desc, docno desc Effectiveness measure = f (intrinsic_quality, ) MAP, P@X, MRR… 2. Context & issue Tie-breaking bias G. Cabanac et al.
9
9 Outline 1.MotivationA tale about two TREC participants 2.ContextIRS effectiveness evaluation IssueTie-breaking bias effects 3.ContributionReordering strategies 4.ExperimentsImpact of the tie-breaking bias 5.Conclusion and Future Works Effect of the Tie-Breaking Bias G. Cabanac et al.
10
10 Consequences of run reordering Measures of effectiveness for an IRS s RR(s,t)1/rank of the 1 st relevant document, for topic t P(s,t,d)precision at document d, for topic t AP(s,t)average precision for topic t MAP(s)mean average precision Tie-breaking bias Is the Wall Street Journal collection more relevant than Associated Press? Problem 1comparing 2 systemsAP(s 1, t) vs. AP(s 2, t) Problem 2 comparing 2 topicsAP(s, t 1 ) vs. AP(s, t 2 ) Chris Ellen 3. Contribution Reordering strategies G. Cabanac et al. Sensitive to document rank
11
11 Alternative unbiased reordering strategies Conventional reordering (TREC) Ties sorted Z Aqid asc, sim desc, docno desc Realistic reordering Relevant docs lastqid asc, sim desc, rel asc, docno desc Optimistic reordering Relevant docs firstqid asc, sim desc, rel desc, docno desc 3. Contribution Reordering strategies G. Cabanac et al. ex aequo
12
12 Outline 1.MotivationA tale about two TREC participants 2.ContextIRS effectiveness evaluation IssueTie-breaking bias effects 3.ContributionReordering strategies 4.ExperimentsImpact of the tie-breaking bias 5.Conclusion and Future Works Effect of the Tie-Breaking Bias G. Cabanac et al.
13
13 Effect of the tie-breaking bias Study of 4 TREC tasks 22 editions 1360 runs Assessing the effect of tie-breaking Proportion of document ties How frequent is the bias? Effect on measure values Top 3 observed differences Observed difference in % Significance of the observed difference: Student’s t-test (paired, unilateral) 1993199920001998200220041997 routing web filtering adhoc 2009 3 GB of data from trec.nist.gov 4. Experiments Impact of the tie-breaking bias G. Cabanac et al.
14
14 Ties demographics 89.6% of the runs comprise ties Ties are present all along the runs 4. Experiments Impact of the tie-breaking bias G. Cabanac et al.
15
15 Proportion of tied documents in submitted runs On average, 10.6 docs in a tied group of docsOn average, 25.2 % of a result-list = tied documents 4. Experiments Impact of the tie-breaking bias G. Cabanac et al.
16
16 Effect on Reciprocal Rank (RR) 4. Experiments Impact of the tie-breaking bias G. Cabanac et al.
17
17 Effect on Average Precision (AP) 4. Experiments Impact of the tie-breaking bias G. Cabanac et al.
18
18 Effect on Mean Average Precision (MAP) Difference of ranks computed on MAP not significant (Kendall’s ) 4. Experiments Impact of the tie-breaking bias G. Cabanac et al.
19
19 What we learnt: Beware of tie-breaking for AP Poor effect on MAP, larger effect on AP Measure bounds AP Realistic AP Conventionnal AP Optimistic Failure analysis for the ranking process Error bar = element of chance potential for improvement 4. Experiments Impact of the tie-breaking bias G. Cabanac et al. padre1, adhoc’94
20
20 Related works in IR evaluation [Voorhees, 2007] Topics reliability? [Buckley & Voorhees, 2000] 25 [Voorhees & Buckley, 2002]error rate [Voorhees, 2009]n collections Qrels reliability? [Voorhees, 1998]quality [Al-Maskari et al., 2008]TREC vs. TREC Measures reliability? [Buckley & Voorhees, 2000]MAP [Sakai, 2008]‘system bias’ [Moffat & Zobel, 2008]new measures [Raghavan et al., 1989]Precall [McSherry & Najork, 2008]Tied scores Pooling reliability? [Zobel, 1998]approximation [Sanderson & Joho, 2004]manual [Buckley et al., 2007]size adaptation [Cabanac et al., 2010]tie-breaking bias 4. Experiments Impact of the tie-breaking bias G. Cabanac et al.
21
21 Outline 1.MotivationA tale about two TREC participants 2.ContextIRS effectiveness evaluation IssueTie-breaking bias effects 3.ContributionReordering strategies 4.ExperimentsImpact of the tie-breaking bias 5.Conclusion and Future Works Effect of the Tie-Breaking Bias G. Cabanac et al.
22
22 Conclusions and future works Context: IR evaluation TREC and other campaigns based on trec_eval Contributions Measure = f (intrinsic_quality, luck) tie-breaking bias Measure bounds (realistic conventional optimistic) Study of the tie-breaking bias effect (conventional, realistic) for RR, AP and MAP Strong correlation, yet significant difference No difference on system rankings (based on MAP) Future works Study of other / more recent evaluation campaigns Reordering-free measures Finer grained analyses: finding vs. ranking Impact du « biais des ex aequo » dans les évaluations de RI G. Cabanac et al.
23
Thank you CLEF’10: Conference on Multilingual and Multimodal Information Access Evaluation September 20-23, Padua, Italy
24
24 ‘Stuffing’ phenomenon Chris Ellen......... gecrd2@adhoc-1993 Rationale behind retrieving non relevant documents (for the IRS)? (sim = 0) Unduly score increase? realistic Effect of this issue minimized with realistic reordering strategy relevant docs queued at the bottom 4. Experiments Impact of the tie-breaking bias G. Cabanac et al.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.