Download presentation
Presentation is loading. Please wait.
Published byKristian Gregory Modified over 10 years ago
1
Section Based Relevance Feedback Student: Nat Young Supervisor: Prof. Mark Sanderson
2
Relevance Feedback SE user marks document(s) as relevant – E.g. “find more like this” – Terms are extracted from full document – Whole document may not be relevant Could marking a sub-section relevant be better?
3
Test Collections Simulate a real user’s search process – Submit queries in batch mode – Evaluate the result sets Relevance Judgments – QREL: pairs (1 … n) – Traditionally produced by human assessors
4
Building a Test Collection Documents – 1,388,939 research papers – Stop words removed – Porter Stemmer applied Topics – 100 random documents – Their sub-sections (6 per document)
5
Building a Test Collection In-edges – Documents that cite paper X – Found 943 using the CiteSeerX database Out-edges – Documents cited by paper X – Found 397 using pattern matching on titles
6
QRELs Total – 1,340 QRELs – Avg. 13.4 QRELs per document Previous work: – Anna Richie et. al. (2006) 82 Topics, Avg. 11.4 QRELs 196 Topics, Avg. 4.5 QRELs – Last year 71 Topics, Avg. 2.9 QRELs
7
Section Queries RQ1 Do the sections return different results? Pearson’s rAllAbstractIntroMethodResultsConclusionReferences All1.000.060.140.090.050.110.14 Abstract0.061.000.090.010.070.080.04 Intro0.140.091.000.060.100.120.11 Method0.090.010.061.000.090.080.07 Results0.050.070.100.091.000.130.09 Conclusion0.110.080.120.080.131.000.08 References0.140.040.110.070.090.081.00
8
Section Queries RQ2 Do the sections return different relevant results? Avg. = The average number of relevant results returned @ 20. E.g. Abstract queries returned 2 QRELs
9
Section Queries AbstractIntroMethodResultsConclusionReferences All 0.630.640.460.340.50.64 Abstract 0.60.440.430.620.53 Intro 0.430.390.450.53 Method 0.320.410.38 Results 0.39 Conclusion 0.42 References Average intersection sizes of relevant results E.g. Avg(|Abstract ∩ All|) = 0.63 Avg(|Abstract \ All|) = 1.37 100 - ((0.63 / 2) * 100) = 68.5% difference
10
Section Queries Average set complement % of relevant results AbstractIntroMethodResultsConclusionReferences All 71 79847771 Abstract 7078 6973 Intro 79817874 Method 797375 Results 73 Conclusion 75 References E.g. Section X returned n% different relevant results than section Y
11
Next Practical Significance – Does SRF provide benefits over standard RF?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.