Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation Guillaume Cabanac, Gilles Hubert, Mohand Boughanem, Claude Chrisment.

Similar presentations


Presentation on theme: "Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation Guillaume Cabanac, Gilles Hubert, Mohand Boughanem, Claude Chrisment."— Presentation transcript:

1 Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation Guillaume Cabanac, Gilles Hubert, Mohand Boughanem, Claude Chrisment CLEF’10: Conference on Multilingual and Multimodal Information Access Evaluation September 20-23, Padua, Italy

2 2 Outline 1.MotivationA tale about two TREC participants 2.ContextIRS effectiveness evaluation IssueTie-breaking bias effects 3.ContributionReordering strategies 4.ExperimentsImpact of the tie-breaking bias 5.Conclusion and Future Works Effect of the Tie-Breaking Bias G. Cabanac et al.

3 3 Outline 1.MotivationA tale about two TREC participants 2.ContextIRS effectiveness evaluation IssueTie-breaking bias effects 3.ContributionReordering strategies 4.ExperimentsImpact of the tie-breaking bias 5.Conclusion and Future Works Effect of the Tie-Breaking Bias G. Cabanac et al.

4 4 A tale about two TREC participants (1/2) 1. Motivation  Tie-breaking bias illustration G. Cabanac et al. 5 relevant documents Topic 031 “satellite launch contracts” ChrisEllen one single difference Why such a huge difference? unluckylucky

5 5 A tale about two TREC participants (2/2) 1. Motivation  Tie-breaking bias illustration G. Cabanac et al. ChrisEllen one single difference  Only difference: the name of one document  After 15 days of hard work

6 6 Outline 1.MotivationA tale about two TREC participants 2.ContextIRS effectiveness evaluation IssueTie-breaking bias effects 3.ContributionReordering strategies 4.ExperimentsImpact of the tie-breaking bias 5.Conclusion and Future Works Effect of the Tie-Breaking Bias G. Cabanac et al.

7 7 Measuring the effectiveness of IRSs User-centered vs. System-focused [Spärk Jones & Willett, 1997] Evaluation campaigns  1958CranfieldUK  1992TRECText Retrieval ConferenceUSA  1999NTCIRNII Test Collection for IR SystemsJapan  2001CLEFCross-Language Evaluation ForumEurope  … “Cranfield” methodology  Task  Test collection Corpus Topics Qrels  Measures : MAP, P@X... using trec_eval 2. Context & issue  Tie-breaking bias G. Cabanac et al. [Voorhees, 2007]

8 8 Runs are reordered prior to their evaluation Qrels =  qid, iter, docno, rel  Run =  qid, iter, docno, rank, sim, run_id  Reordering by trec_eval qid asc, sim desc, docno desc Effectiveness measure = f (intrinsic_quality, ) MAP, P@X, MRR… 2. Context & issue  Tie-breaking bias G. Cabanac et al.

9 9 Outline 1.MotivationA tale about two TREC participants 2.ContextIRS effectiveness evaluation IssueTie-breaking bias effects 3.ContributionReordering strategies 4.ExperimentsImpact of the tie-breaking bias 5.Conclusion and Future Works Effect of the Tie-Breaking Bias G. Cabanac et al.

10 10 Consequences of run reordering Measures of effectiveness for an IRS s  RR(s,t)1/rank of the 1 st relevant document, for topic t  P(s,t,d)precision at document d, for topic t  AP(s,t)average precision for topic t  MAP(s)mean average precision   Tie-breaking bias  Is the Wall Street Journal collection more relevant than Associated Press?   Problem 1comparing 2 systemsAP(s 1, t) vs. AP(s 2, t)   Problem 2 comparing 2 topicsAP(s, t 1 ) vs. AP(s, t 2 ) Chris Ellen 3. Contribution  Reordering strategies G. Cabanac et al.  Sensitive to document rank

11 11 Alternative unbiased reordering strategies   Conventional reordering (TREC)  Ties sorted Z  Aqid asc, sim desc, docno desc Realistic reordering  Relevant docs lastqid asc, sim desc, rel asc, docno desc Optimistic reordering  Relevant docs firstqid asc, sim desc, rel desc, docno desc 3. Contribution  Reordering strategies G. Cabanac et al. ex aequo

12 12 Outline 1.MotivationA tale about two TREC participants 2.ContextIRS effectiveness evaluation IssueTie-breaking bias effects 3.ContributionReordering strategies 4.ExperimentsImpact of the tie-breaking bias 5.Conclusion and Future Works Effect of the Tie-Breaking Bias G. Cabanac et al.

13 13 Effect of the tie-breaking bias Study of 4 TREC tasks  22 editions  1360 runs Assessing the effect of tie-breaking  Proportion of document ties  How frequent is the bias?  Effect on measure values Top 3 observed differences Observed difference in % Significance of the observed difference: Student’s t-test (paired, unilateral) 1993199920001998200220041997 routing web filtering adhoc 2009 3 GB of data from trec.nist.gov 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al.

14 14 Ties demographics 89.6% of the runs comprise ties Ties are present all along the runs 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al.

15 15 Proportion of tied documents in submitted runs On average, 10.6 docs in a tied group of docsOn average, 25.2 % of a result-list = tied documents 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al.

16 16 Effect on Reciprocal Rank (RR) 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al.

17 17 Effect on Average Precision (AP) 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al.

18 18 Effect on Mean Average Precision (MAP) Difference of ranks computed on MAP not significant (Kendall’s  ) 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al.

19 19 What we learnt: Beware of tie-breaking for AP Poor effect on MAP, larger effect on AP Measure bounds AP Realistic  AP Conventionnal  AP Optimistic Failure analysis for the ranking process  Error bar = element of chance  potential for improvement 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al. padre1, adhoc’94

20 20 Related works in IR evaluation [Voorhees, 2007] Topics reliability? [Buckley & Voorhees, 2000]  25 [Voorhees & Buckley, 2002]error rate [Voorhees, 2009]n collections Qrels reliability? [Voorhees, 1998]quality [Al-Maskari et al., 2008]TREC vs.  TREC Measures reliability? [Buckley & Voorhees, 2000]MAP [Sakai, 2008]‘system bias’ [Moffat & Zobel, 2008]new measures [Raghavan et al., 1989]Precall [McSherry & Najork, 2008]Tied scores Pooling reliability? [Zobel, 1998]approximation [Sanderson & Joho, 2004]manual [Buckley et al., 2007]size adaptation [Cabanac et al., 2010]tie-breaking bias 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al.

21 21 Outline 1.MotivationA tale about two TREC participants 2.ContextIRS effectiveness evaluation IssueTie-breaking bias effects 3.ContributionReordering strategies 4.ExperimentsImpact of the tie-breaking bias 5.Conclusion and Future Works Effect of the Tie-Breaking Bias G. Cabanac et al.

22 22 Conclusions and future works Context: IR evaluation  TREC and other campaigns based on trec_eval Contributions   Measure = f (intrinsic_quality, luck)  tie-breaking bias  Measure bounds (realistic  conventional  optimistic)  Study of the tie-breaking bias effect  (conventional, realistic) for RR, AP and MAP Strong correlation, yet significant difference No difference on system rankings (based on MAP) Future works  Study of other / more recent evaluation campaigns  Reordering-free measures  Finer grained analyses: finding vs. ranking Impact du « biais des ex aequo » dans les évaluations de RI G. Cabanac et al.

23 Thank you CLEF’10: Conference on Multilingual and Multimodal Information Access Evaluation September 20-23, Padua, Italy

24 24 ‘Stuffing’ phenomenon Chris Ellen......... gecrd2@adhoc-1993 Rationale behind retrieving non relevant documents (for the IRS)? (sim = 0) Unduly score increase? realistic Effect of this issue minimized with realistic reordering strategy  relevant docs queued at the bottom 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al.


Download ppt "Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation Guillaume Cabanac, Gilles Hubert, Mohand Boughanem, Claude Chrisment."

Similar presentations


Ads by Google