Download presentation
Presentation is loading. Please wait.
1
Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA
2
Outline Motivation Related Work Model & Algorithm Evaluation Conclusion & Future work
3
Search for Useful Information Full-text search Importance Judgment Manual compilation Failure Still Exists
4
Example – “ Spielberg ” Search
5
Example – “ Spielberg ” Search (Cont.)
6
Motivation Existing problem in IR applications Similar search results dominate in top one/two pages Users feel tired to similar results of same topic Users cannot find what they need in those similar results Situations where problem are/will be intensified Highly repetitive corpus, e.g. Newsgroup News archive Specialized website Generalized or short query
7
Diversity & Informativeness Diversity The coverage of different topics of a group of documents Informativeness To what extent a document can represent its topic locality ( high informativeness: inclusive)
8
Why? Traditional IR evaluation measure Maximize relevance between query & results Most important results To end-users relevant + important ≠ desirable A way out Increase diversity in top results Increase the informativeness of each single results
9
Basic Idea Build similarity-based link map Link analysis Affinity Rank indicating the informativeness of each document Rank adjustment Only the most informative of each topic can rank high Re-rank with Affinity Rank More diversified top results More informative top results
10
Related Work – link analysis Explicit PageRank (Page et al. 1998) HITS (Kleinberg, 1998) Implicit DirectHit (http://www.directhit.com) Small Web Search (Xue et al. 2003) Web author’s perspective End-user’s perspective Subjective Objective
11
Related Work – Clustering AlgorithmComplexityNaming Scatter/Gather*O(kn) Centroid + ranked words TopCatHigh Set of named entities WBSC*O(m 2 +n) Ranked words STC*O(n) Sets of N-grams IFO(kn) - PRSAO(knm) Ranked words BipartiteO(nm) ? Ranked words n:#doc k:#clusters m:#words * applied on clustering search results
12
Our proposed IR framework
13
Link Construction Similarity to directed link Directed graph Threshold Save storage space Reduce noise brought by overwhelmingly large amount of weak-similarity- links BA BA
14
Assumption Observation : relation among documents varies Some are similar, others are not Similarity varies The more relatives a document has, the more informative it is itself The more informative a document ’ s relatives are, the more informative it is itself
15
Link Analysis Link map adjacency matrix Row Normalize Based on two assumptions Principal eigenvector rank score Implementation: Power Method
16
“ Random Transform ” Model A transforming document jump from doc. to doc. at each time step Markov Chain stationary transition probability principle eigenvector informativeness current doc. “relative” doc. randomly picked doc.
17
Rank Adjustment Greedy-like Algorithm decrease the score of j by the part conveyed from i (the most informative one in the same topic) T1-1 T1-6 T1-5 T1-4 T1-3 T1-2 T2-3 T2-1
18
Re-rank Score-combine scheme where Rank-combine scheme
19
Advantages of Affinity Rank Give attention to both diversity and informativeness Implicitly expand the query towards the multiple topics Automatically pick the representative ones for each chosen topic Most of the computation can be computed OFFLINE
20
Experiment Setup Dataset Microsoft Newsgroup 117 Office product related newsgroups 256,449 posts (mainly in 4 months), about 400M Preprocess Title & text body (citation, signature, etc. stripped) Stemming, stop words removal, tfidf weighting Query Randomly picked 20 query scenarios with query words Search Results Okapi Top 50 results as answer set
21
Evaluation – ground truth User Study 4 users independently evaluate all results For each query First manually cluster all results into different topics Then score each result in terms of its informativeness in corresponding topic Finally score each result in terms of its relevance to the query Evaluation Compare original ranking with new ranking (re-ranked by Affinity Rank) 3 aspects of ranking concerned -- diversity, informativeness & relevance in top n results
22
Definitions Diversity diversity = No. of different topics in a document group Informativeness 3 - very informative 2 - informative 1 - somewhat informative 0 - not informative Relevance 1 - relevant 0 – hard to tell -1 - irrelevant
23
Experiment Result (1) Top 10 search results Compared to traditional IR results Diversity Informative ness Relevance Relative Change +31.02%+11.97%+0.72% p value (t-test) 0.0046320.0022250.067255 Significant improvement in diversity & informative without loss in relevance
24
Experiment Result (2) Diversity Improvement Informative Improvement Affinity Rank efficiently improves both diversity & informativeness of top search results (Re-ranking top 50 results all by Affinity Rank, e.g. )
25
Experiment Result (3) - Parameter Tuning Top 10 search results Affinity Rank is robust 1.Parameter doesn’t affect much if enough weight is given 2.No over-tune problem - Simply re-rank all by Affinity Rank is nearly optimal)
26
Experiment Result (4) - Parameter Tuning Improvement overview subject to weight adjustment Affinity Rank STABLELY exerts positive influence on diversity & informativeness enhancement
27
Conclusion A new IR framework Affinity Rank can help to improve diversity & informativeness of search results, especially for TOP ones Affinity Rank is computed offline, therefore brings few burden in online retrieval Future work Metrics for information quantity measurement Scale to large collection
28
Thanks
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.