Download presentation
Presentation is loading. Please wait.
1
Measure Proximity on Graphs with Side Information Joint Work by Hanghang Tong, Huiming Qu, Hani Jamjoom Speaker: Mary McGlohon 1 ICDM 2008, Pisa, Italy15-19 December, 2009
2
Cyano: Process Collaboration Wiki 2 Q: How to enable social recommendation in Cyano? Q: How to enable social recommendation?
3
Scoop: current recommendation system [Qu+ SCC 2008] Given a node in a graph (e.g., given a user node in a user-to-process graph), Find – 1. [Ranking List] a list of recommended nodes, which are most related to the query node – 2. [Connection Subgraph] a connection subgraph, which can best interpret the relationship between the query node and the recommended node(s) 3 Proximity is the core of scoop! What to recommend Why to recommend
4
Challenges in Scoop How to incorporate users’ feedback (like/dislike)? 4 How to automatically adjust the ranking for the query node 1? 1 4 2 5 3 10 Current subgraph between 1 and 10 How to modify our subgraph to weaken the links between 1 and 10 that involve node 5? Q: How to incorporate such side information in measuring node proximity on graphs? Feedback on ranking listFeedback on conn-graph
5
Isomorphic Settings of Scoop Proximity is the Main Tool for – Neighborhood search – Anomaly detection – Pattern matching – Image captioning – … Source of Side Information is Rich – Ratings in recommendation system – Opinion/sentiment in blog analysis – Clickthrough data – … 5
6
Roadmap Motivations Proximity w/o Side Information Proximity w/ Side Information – ProSIN: Method – Fast-ProSIN: Fast Solution Experimental Results Conclusion 6
7
Proximity on Graph: What? a.k.a Relevance, Closeness, ‘Similarity’… 7
8
What is a ``good’’ Proximity? Multiple Connections Quality of connection Direct & In-direct conns Length, Degree, Weight… … 8
9
Sol: Random walk with restart [Pan+ KDD 2004] Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node 12 0.13 0.10 0.13 0.22 0.13 0.05 0.08 0.04 0.03 0.04 0.02 1 4 3 2 5 6 7 9 10 8 1 1212 0.13 0.10 0.13 0.05 0.08 0.04 0.02 0.04 0.03 Ranking vector More red, more relevant Nearby nodes, higher scores 9
10
Why is RWR a good score? all paths from i to j with length 1 all paths from i to j with length 2 all paths from i to j with length 3 : adjacency matrix. c: damping factor i j 10
11
Proximity in Current Scoop 11 U1 U2 U3 U4 P1 P2 P3 P4 P5 User Process Initial result: P2 P3 P1 1 4 2 5 3 6 8 7 9 10 1 4 2 5 3 Ranking ListConn-Subgraph
12
Roadmap Motivations Proximity w/o Side Information Proximity w/ Side Information – ProSIN: Method – Fast-ProSIN: Fast Solution Experimental Results Conclusion 12
13
ProSIN: Challenges 13 Query We want to Boost the neighbor of 4 Penalize the neighbor of 6
14
ProSIN: How to Use Side Information to refine the graph! 14 Query
15
ProSIN: Detailed Algorithm Input: – A weighted directed graph A – Source node s and target t – Side information: positive net P and the negative set N Output: – Proximity score from the source to target Method: 1.Add a link from the source node to each of the positive nodes x 2.Introduce the sink node into the graph 3.For each of the negative nodes y, find its neighboring nodes Add a link from node y to the sink Add a link from each neighboring node of node y to the sink 4.Perform random walk with restart for the source node s on the refined graph 5.Output the proximity score as the steady state probability that the random particle will finally stay at the target node t 15 Skip
16
Process management 16 Given a user-process graph, `U2’ is the query, Which are the top 3 most related processes? Initial result (no feedback): P2 P3 P1 Updated result (`no’ to `P2’) : P3 P4 P5 U1 U2 U3 U4 P1 P2 P3 P4 P5 User Process
17
Roadmap Motivations Proximity w/o Side Information Proximity w/ Side Information – ProSIN: Method – Fast-ProSIN: Fast Solution Experimental Results Conclusion 17
18
Computing RWR 1 4 3 2 5 6 7 9 10 8 1 1212 n x n n x 1 Ranking vector Starting vector Adjacency matrix 1 Restart p 18
19
Q: Given query i, how to solve it? ? ? Adjacency matrix Starting vector Ranking vector Query 19
20
OntheFly: 1 4 3 2 5 6 7 9 10 8 1 1212 20 ??
21
OntheFly: 1 4 3 2 5 6 7 9 10 8 1 1212 No pre-computation / light storage Slow on-line response O(mE) 21 1 4 3 2 5 6 7 9 10 8 1 1212 0.13 0.10 0.13 0.05 0.08 0.04 0.02 0.04 0.03
22
NB_Lin [Tong+ ICDM06] Pre-Compute Stage – Step 1: – Step 2: On-Line Stage – 2 matrix-vector multiplications 22 1 4 3 2 5 6 7 9 10 8 1 1212 4 1 2 3 5 6 7 8 9 11 12 C1 C2 C3 Fast response if … The desired graph is un-known W ~ ~ ~ U S V X X
23
How to rescue: Fast-ProSIN 23 Before After A lot of Overlap! - Pre-Compute on original graph - Update in on-line stage
24
Roadmap Motivations Proximity wo/ Side Information Proximity w/ Side Information – ProSIN: Method – Fast-ProSIN: Fast Solution Experimental Results Conclusion 24
25
Experimental Setup Data Sets – DBLP-AC Author-Conference bipartite graph; 400K authors; 3.5K conferences; 1M edges – DBLP-ML Co-authorship graph from ICML and NIPS; 4.5K nodes, 20K edges – Coral Image-Region-Keyword graph, 52K nodes, 350K edges We want to check – The effectiveness of ProSIN – The efficiency of Fast-ProSIN 25
26
Initial ResultsNo to `ICML’Yes to `SIGIR’ 'ICDM' 'ICML' 'SDM' 'VLDB' 'ICDE' 'SIGMOD' 'NIPS' 'PKDD' 'IJCAI' 'PAKDD' 'ICDM' 'SDM' 'PKDD' 'ICDE' 'VLDB' 'SIGMOD' 'PAKDD' 'CIKM' 'SIGIR' 'WWW' 'SIGIR' 'TREC' 'CIKM' 'ECIR' 'CLEF' 'ICDM' 'JCDL' 'VLDB' 'ACL' 'ICDE' two main sub-communities in KDD: DBs (green) vs. Stat (Red) Negative feedback on ICML will exclude other stats confs (NIPS, IJCAI) Positive feedback on SIGIR will bring more IR (brown) conferences. What are most related conferences wrt KDD? (DBLP author-conference bipartite graph) 26 Interactive Neighborhood Search
27
Initial ResultsNo to `ICML’Yes to `SIGIR’ 'ICDM' 'ICML' 'SDM' 'VLDB' 'ICDE' 'SIGMOD' 'NIPS' 'PKDD' 'IJCAI' 'PAKDD' 'ICDM' 'SDM' 'PKDD' 'ICDE' 'VLDB' 'SIGMOD' 'PAKDD' 'CIKM' 'SIGIR' 'WWW' 'SIGIR' 'TREC' 'CIKM' 'ECIR' 'CLEF' 'ICDM' 'JCDL' 'VLDB' 'ACL' 'ICDE' two main sub-communities in KDD: DBs (green) vs. Stat (Red) Negative feedback on ICML will exclude other stats confs (NIPS, IJCAI) Positive feedback on SIGIR will bring more IR (brown) conferences. What are most related conferences wrt KDD? (DBLP author-conference bipartite graph) 27 Interactive Neighborhood Search
28
Initial ResultsNo to `ICML’Yes to `SIGIR’ 'ICDM' 'ICML' 'SDM' 'VLDB' 'ICDE' 'SIGMOD' 'NIPS' 'PKDD' 'IJCAI' 'PAKDD' 'ICDM' 'SDM' 'PKDD' 'ICDE' 'VLDB' 'SIGMOD' 'PAKDD' 'CIKM' 'SIGIR' 'WWW' 'SIGIR' 'TREC' 'CIKM' 'ECIR' 'CLEF' 'ICDM' 'JCDL' 'VLDB' 'ACL' 'ICDE' two main sub-communities in KDD: DBs (green) vs. Stat (Red) Negative feedback on ICML will exclude other stats confs (NIPS, IJCAI) Positive feedback on SIGIR will bring more IR (brown) conferences. what are most related conferences wrt KDD? (DBLP author-conference bipartite graph) 28 Interactive Neighborhood Search
29
Andrew McCallum Yiming Yang Tom M. Mitchell Seán Slattery Rayid Ghani Xuerui Wang Rebecca Hutchinson Jian Zhang Zoubin Ghahramani John D. Laffterty 2 1 2 2 4 1 1 1 1 1 1 1 2 1 Text Mining Information Retrieval Statistics Connection Subgraph: Initial Result (between “Andrew Mccallum” and “Yiming Yang”) There are two main connections between “McCallum” and “Yang” 29
30
Andrew McCallum Yiming Yang Michael I. Jordan Xiaojin Zhu Rong Jin Andrew Ng Jian Zhang Zoubin Ghahramani John D. Laffterty 2 1 16 2 7 Fernando C.N. Pereira 2 4 2 2 1 4 2 2 3 Connection Subgraph: After Feedback (between “Andrew Mccallum” and “Yiming Yang”, but avoid “Tom M. Mitchell”) The feedback guides to avoid the entire ‘Text’ connection, and brings more connections on ‘Statistics’ 30
31
Test Image SeaSunSkyWaveCatForestTigerGrass Image Keyword Region Automatic Image Caption Q: How to assign keywords to the test image? 31
32
Semi-automatic image caption (precision) 32 Our method Baseline Linear Combination Remove Negative Nodes 5 keywords that are most relevant to the test image are returned for users’ yes/no confirmation Predict Length
33
Semi-automatic image caption (recall) 33 Our method Baseline Linear Combination Remove Negative Nodes Predict Length
34
Fast-ProSIN: Quality-Speed Trade-off 34 PrecisionRecallTime 93.0%+ quality preserving Up to 49x speed-up
35
Conclusion Goal: Incorporate Users’ Feedback (Like/Dis-like) in Proximity Measurement on Graphs Q: How to customize Tom‘s applications? A: ProSIN – Basic Idea: Bias Random Walk – Wide Applicability, Easy to Use Q: How to reflect Tom’s real-time interest? A: Fast-ProSIN – Basic Idea: Explore smoothness – Significant speedup (minutes to seconds) 35
36
Q & A Thank you! htong@cs.cmu.edu hqu@us.ibm.com jamjoom@us.ibm.com 36
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.