Download presentation
Presentation is loading. Please wait.
Published byTrevor Ford Modified over 8 years ago
1
Assessing The Retrieval A.I Lab 2007.01.20 박동훈
2
Contents 4.1 Personal Assessment of Relevance 4.2 Extending the Dialog with RelFbk 4.3 Aggregated Assessment : Search Engine Performance 4.4 RAVE : A Relevance Assessment Vehicle 4.5 Summary
3
4.1 Personal Assessment of Relevance 4.1.1 Cognitive Assumptions – Users trying to do ‘object recognition’ – Comparison with respect to prototypic document – Reliability of user opinions? – Relevance Scale – RelFbk is nonmetric
4
Relevance Scale
5
Users naturally provides only preference information Not(metric) measurement of how relevant a retrieved document is! RelFbk is nonmetric
6
4.2 Extending the Dialog with RelFbk RelFbk Labeling of the Retr Set
7
Query Session, Linked by RelFbk
8
4.2.1 Using RelFbk for Query Refinment
11
4.2.2 Document Modifications due to RelFbk Fig 4.7 Change documents!? More/less the query that successfully / un matches them
12
4.3 Aggregated Assessment : Search Engine Performance 4.3.1 Underlying Assumptions –RelFbk(q,di) assessments independent –Users’ opinions will all agree with single ‘omniscient’ expert’s
13
4.3.2 Consensual relevance Consensually relevant
14
4.3.4 Basic Measures Relevant versus Retrieved Sets
15
Contingency table NRel : the number of relevant documents NNRel : the number of irrelevant documents NDoc : the total number of documents NRet : the number of retrieved documents NNRet : the number of documents not retrieved
16
4.3.4 Basic Measures (cont)
17
4.3.4 Basic Measures (cont)
18
4.3.5 Ordering the Retr set Each document assigned hitlist rank Rank(di) Descending Match(q,di) Rank(di) Match(q,dj) –Rank(di) Pr(Rel(dj)) Coordination level : document’s rank in Retr –Number of keywords shared by doc and query Goal:Probability Ranking Principle
19
A tale of two retrievals Query1Query2
20
Recall/precision curve Query1
21
Recall/precision curve Query1
22
Retrieval envelope
23
4.3.6 Normalized recall ri : i 번째 relevant doc 의 hitlist rank Worst Best
24
4.3.8 One-Parameter Criteria Combining recall and precision Classification accuracy Sliding ratio Point alienation
25
Combining recall and precision F-measure –[Jardine & van Rijsbergen71] –[Lewis&Gale94] Effectiveness –[vanRijsbergen, 1979] E=1-F, α=1/(β 2 +1) α=0.5=>harmonic mean of precision & recall
26
Classification accuracy accuracy Correct identification of relevant and irrelevant
27
Sliding ratio Imagine a nonbinary, metric Rel(di) measure Rank1, Rank2 computed by two separate systems
28
Point alienation Developed to measure human preference data Capturing fundamental nonmetric nature of RelFbk
29
4.3.9 Test corpora More data required for “test corpus” Standard test corpora TREC:Text Retrieval Evaluation Conference TREC’s refined queries TREC constantly expanding, refining tasks
30
More data required for “test corpus” Documents Queries Relevance assessments Rel(q,d) Perhaps other data too – Classification data (Reuters) – Hypertext graph structure (EB5)
31
Standard test corpora
32
TREC constantly expanding, refining tasks Ad hoc queries tasks Routing/filtering task Interactive task
33
Other Measure Expected search length (ESL) –Length of “path” as user walks down HitList –ESL=Num. irrelevant documents before each relevant document –ESL for random retrieval –ESL reduction factor
34
4.5 Summary Discussed both metric and nonmetric relevance feedback The difficulties in getting users to provide relevance judgments for documents in the retrieved set Quantified several measures of system perfomance
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.