Download presentation
Presentation is loading. Please wait.
1
© 2010 IBM Corporation Diversified Ranking on Large Graphs: An Optimization Viewpoint Hanghang Tong, Jingrui He, Zhen Wen, Ching-Yung Lin, Ravi Konuru KDD 2011, August 21-24, San Diego, CA
2
2 Background: Why Diversity? A1: Uncertainty & Ambiguity in an Information Need Case 1: Uncertainty from the query Case 2: Uncertainty from the user
3
3 Background: Why Diversity? (cont.) A2: Uncertainty & ambiguity of an information need –C1: Product search want different reviews –C2: Political issue debate desire different opinions –C3: Legal search get an overview of a topic –C4: Team assembling find a set of relevant & diversified experts A3: Become a better and safer employee –Better: A 1% increase in diversity an additional $886 of monthly revenue –Safer: A 1% increase in diversity an increase of 11.8% in job retention
4
4 Problem Definitions & Challenges Problem 1 (Evaluate/measure a given top-k ranking list) –Given: A large graph A, the query vector p, the damping factor c, and a subset of k nodes S; –Measure: the goodness of the subset of nodes S by a single number in terms of (a) the relevance of each node in S wrt the query vector p, and (b) the diversity among all the nodes in the subset S. Problem 2 (Find a near optimal top-k ranking list) –Given: A large graph A, the query vector p, the damping factor c, and the budget k; –Find: A subset of k nodes S that maximizes the goodness measure f(S). Challenges –(for Prob. 1) No existing measure encoding both relevance and diversity –(for Prob. 2) Sub-set level optimization 4
5
5 Our Solutions (10 seconds introduction!) Problem 1 (Evaluate/measure a given top-k ranking list) A1: A weighted sum between relevance and similarity Problem 2 (Find a near optimal top-k ranking list) A2: A greedy algorithm (near-optimal, linear scalability) 5 weightdiversity relevance
6
6 Measure Relevance (r) by RWR (a.k.a. Personalized PageRank) Details 1 4 3 2 5 6 7 9 10 8 11 12 n x n n x 1 Ranking vector Starting vector Adjacency matrix 1 Restart p r = c A r + (1-c) e
7
7 = [c A + (1-c) e 1’ ] r = B r Diversity ~ reverse of weighted similarity on the personalized graph Details B: Personalized Graph (a.k.a ‘Google-Matrix’) 1 4 3 2 5 6 7 9 10 8 11 12 B(i,j): How node i and node j are connected in the personalized graph 1 4 3 2 5 6 7 9 10 8 11 12 g(S) = w∑r(i) - ∑B(i,j)r(j) i in Si,j in S
8
8 Properties of g(S): Why is it a Good Measure? P1: g(S)=0 for an empty set S P2: g(S) is sub-modular for any w>0 P3: g(S) is monotonically non-decreasing for any w>=2 A greedy algorithm (Dragon) leads to near-opt. solution –Quality: g(S) >= (1−1/e)g(S*), where S* is the optimal subset maximizing g(S) –Complexity: O(m) for both time and space For any w>=2 Details Footnote: Dragon stands for Diversified Ranking on Graph: An Optimization Viewpoint
9
9 Experimental Results 9 Quality-Time Balance Scalability An Illustrative ExampleCompare w/ alternative choices Quality Budget Time Opt. Quality
10
10 Conclusion Problem 1 (Evaluate/measure a given top-k ranking list) A1: A weighted sum between relevance and similarity Problem 2 (Find a near optimal top-k ranking list) A2: A greedy algorithm (near-optimal, linear scalability) Contact: Hanghang Tong (htong@us.ibm.com)
11
11 Academic Literature: More Detailed Comparison [6] [7] This Disclosure Proposes (1) The first measure that combines both relevance & diversity (2) The first method that (a) leads to near-optimal solution with (b) linear complexity For Problem 1 For Problem 2
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.