Download presentation
Presentation is loading. Please wait.
Published byJaroslava Urbanová Modified over 5 years ago
1
Cumulated Gain-Based Evaluation of IR Techniques
Liu bingbing
2
Motivation There are so many different kinds of IR techniques , but which one is better? And how to evaluate these techniques?
3
Outline Introduction Cumulated gain-based measurements
Case study : comparison of some TREC-7 results at different relevance levels Discussion
4
Outline Introduction Cumulated gain-based measurements
Case study : comparison of some TREC-7 results at different relevance levels Discussion
5
Background Highly relevant documents should be identified and ranked first It’s necessary to develop measures to evaluate different IR techniques
6
Old measures Highly and marginally relevant documents are given equal credit IR documents are judged relevant or irrelevant Graded relevance judgments
7
New measures CG DCG nCG nDCG
8
Outline Introduction Cumulated gain-based measurements
Case study : comparison of some TREC-7 results at different relevance levels Discussion
9
Principles Highly relevant documents are more important than marginally relevant ones Documents found late are less important
10
Relationship CG G BV n(D)CG DCG
11
Direct Cumulated Gain (CG)
For example G `=<3, 2, 3, 0, 0, 1, 2, 2, 3, 0, : : :> CG`=<3, 5, 8, 8, 8, 9, 11, 13, 16, 16, : : :>
12
Discounted Cumulated Gain (DCG)
For example G`=<3, 2, 3, 0, 0, 1, 2, 2, 3, 0, : : :> DCG `=<3, 5, 6.89, 6.89, 6.89, 7.28, 7.99, 8.66, 9.61, 9.61, : : :>
13
Best possible Vectors Theoretically
14
A sample ideal gain vector (BV)
CG`=<3, 6, 9, 11, 13, 15, 16, 17, 18, 19, 19, 19, 19, : : :> DCG`=<3, 6, 7.89, 8.89, 9.75, 10.52, 10.88, 11.21, 11.53, 11.83, 11.83, 11.83, : : :> base=2
15
Relative to the Ideal Measure—the Normalized (D)CG Measure
Norm-vect (V, I)=<v1/i1, v2/i2, : : : , vk/ik> For example nCG=norm-vect( CG, CGI) nDCG=norm-vect(DCG,DCGI)
16
Comparison to Earlier Measures
Average search length (ASL) estimate the average position of a relevant document Expected search length (ESL) average number of documents that must be examined to retrieve a given number of relevant documents ………………. Both of them either don’t take the degree of document relevance into account or depend on the retrieved list size or …
17
The strengths of new measures -CG,DCG,NCG,NDCG
Take the degree of relevance of document into account Don’t depend on the size of recall base Don’t depend on outliers Be obvious to interpret
18
In addition DCG has further advantages
Weights down the gain found later Model user persistence
19
Outline Introduction Cumulated gain-based measurements
Case study : comparison of some TREC-7 results at different relevance levels Discussion
20
Data source TREC-7 50 queries from topic statements
51800 document or 1.9 GB data we used result lists for 20 topics by five participants from the TREC-7 ad hoc manual track
21
Relevance judgments The new judgment is reliable
New judgment is stricter
22
Cumulated gain (a) Binary weighting (b) Nonbinary weighting
23
Discounting gain
24
Normalized (D)CG Vectors and Statistical Testing
25
Normalized (D)CG Vectors and Statistical Testing
26
About the case study D 1 2 3 4 5 6 7 8 9 10 G For example: So:
Ideal=<3,3,3,2,2,1,1,1,0,0> A=<2,3,2,1,3,…> D 1 2 3 4 5 6 7 8 9 10 G
27
Outline Introduction Cumulated gain-based measurements
Case study : comparison of some TREC-7 results at different relevance levels Discussion
28
Several parameters Last Rank Considered Gain Values Discounting Factor
29
Limitations Don’t take order effects on relevance judgments or document overlap into account Deal with a single dimension only Be unable to handle dynamic changes
30
Benefites Take the degree of document relevance into account
Model user persistence
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.