Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA

Outline  Motivation  Related Work  Model & Algorithm  Evaluation  Conclusion & Future work

Search for Useful Information  Full-text search  Importance Judgment  Manual compilation Failure Still Exists

Example – “ Spielberg ” Search

Example – “ Spielberg ” Search (Cont.)

Motivation  Existing problem in IR applications Similar search results dominate in top one/two pages Users feel tired to similar results of same topic Users cannot find what they need in those similar results  Situations where problem are/will be intensified Highly repetitive corpus, e.g.  Newsgroup  News archive  Specialized website Generalized or short query

Diversity & Informativeness  Diversity The coverage of different topics of a group of documents  Informativeness To what extent a document can represent its topic locality ( high informativeness: inclusive)

Why?  Traditional IR evaluation measure Maximize relevance between query & results Most important results  To end-users relevant + important ≠ desirable  A way out Increase diversity in top results Increase the informativeness of each single results

Basic Idea  Build similarity-based link map  Link analysis  Affinity Rank indicating the informativeness of each document  Rank adjustment Only the most informative of each topic can rank high  Re-rank with Affinity Rank More diversified top results More informative top results

Related Work – link analysis  Explicit PageRank (Page et al. 1998) HITS (Kleinberg, 1998)  Implicit DirectHit (http://www.directhit.com) Small Web Search (Xue et al. 2003) Web author’s perspective End-user’s perspective Subjective Objective

Related Work – Clustering AlgorithmComplexityNaming Scatter/Gather*O(kn) Centroid + ranked words TopCatHigh Set of named entities WBSC*O(m 2 +n) Ranked words STC*O(n) Sets of N-grams IFO(kn) - PRSAO(knm) Ranked words BipartiteO(nm) ? Ranked words  n:#doc  k:#clusters  m:#words * applied on clustering search results

Our proposed IR framework

Link Construction  Similarity to directed link  Directed graph  Threshold Save storage space Reduce noise brought by overwhelmingly large amount of weak-similarity- links BA BA

Assumption Observation : relation among documents varies Some are similar, others are not Similarity varies  The more relatives a document has, the more informative it is itself  The more informative a document ’ s relatives are, the more informative it is itself

Link Analysis  Link map  adjacency matrix  Row Normalize  Based on two assumptions  Principal eigenvector  rank score  Implementation: Power Method

“ Random Transform ” Model  A transforming document jump from doc. to doc. at each time step  Markov Chain stationary transition probability  principle eigenvector informativeness current doc. “relative” doc. randomly picked doc.

Rank Adjustment  Greedy-like Algorithm decrease the score of j by the part conveyed from i (the most informative one in the same topic) T1-1 T1-6 T1-5 T1-4 T1-3 T1-2 T2-3 T2-1

Re-rank  Score-combine scheme where  Rank-combine scheme

Advantages of Affinity Rank  Give attention to both diversity and informativeness Implicitly expand the query towards the multiple topics Automatically pick the representative ones for each chosen topic  Most of the computation can be computed OFFLINE

Experiment Setup  Dataset Microsoft Newsgroup 117 Office product related newsgroups  256,449 posts (mainly in 4 months), about 400M  Preprocess Title & text body (citation, signature, etc. stripped) Stemming, stop words removal, tfidf weighting  Query Randomly picked 20 query scenarios with query words  Search Results Okapi Top 50 results as answer set

Evaluation – ground truth  User Study 4 users independently evaluate all results For each query  First manually cluster all results into different topics  Then score each result in terms of its informativeness in corresponding topic  Finally score each result in terms of its relevance to the query  Evaluation Compare original ranking with new ranking (re-ranked by Affinity Rank) 3 aspects of ranking concerned -- diversity, informativeness & relevance in top n results

Definitions  Diversity diversity = No. of different topics in a document group  Informativeness 3 - very informative 2 - informative 1 - somewhat informative 0 - not informative  Relevance 1 - relevant 0 – hard to tell -1 - irrelevant

Experiment Result (1)  Top 10 search results Compared to traditional IR results Diversity Informative ness Relevance Relative Change +31.02%+11.97%+0.72% p value (t-test) 0.0046320.0022250.067255 Significant improvement in diversity & informative without loss in relevance

Experiment Result (2)  Diversity Improvement  Informative Improvement Affinity Rank efficiently improves both diversity & informativeness of top search results (Re-ranking top 50 results all by Affinity Rank, e.g. )

Experiment Result (3) - Parameter Tuning  Top 10 search results Affinity Rank is robust 1.Parameter doesn’t affect much if enough weight is given 2.No over-tune problem - Simply re-rank all by Affinity Rank is nearly optimal)

Experiment Result (4) - Parameter Tuning  Improvement overview subject to weight adjustment Affinity Rank STABLELY exerts positive influence on diversity & informativeness enhancement

Conclusion  A new IR framework  Affinity Rank can help to improve diversity & informativeness of search results, especially for TOP ones  Affinity Rank is computed offline, therefore brings few burden in online retrieval Future work  Metrics for information quantity measurement  Scale to large collection

Thanks

Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Similar presentations

Presentation on theme: "Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Similar presentations

Presentation on theme: "Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA."— Presentation transcript:

Similar presentations

About project

Feedback