Proximity in Graphs by Using Random Walks Many of the slides are borrowed from Dr. Hanghang Tong’ talk slides and Dr. Jure Leskovec’s lecture notes
Proximity on Graph What is Prox between A and B ‘how close is Smith to Johnson’?
Proximity on Graphs: Why? Link prediction Ranking Email Management Image caption Neighborhooh Formulation Conn. subgraph Pattern match Collaborative Filtering Many more…
Link Prediction How to predict the existence of the link? Proximity [Liben-Nowell + 2003]
Center-Piece Subgraph(Ceps) Given Q query nodes Find Center-piece ( ) Input of Ceps Q Query nodes Budget b K softand coefficient App. Social Network Law Inforcement Gene Network …
Example of CEPS
CEPS: Overview Individual Score Calculation Combine Individual Scores Measure importance wrt individual query Combine Individual Scores Measure importance wrt query set “Extract” Alg. … the connection subgraphs
Issue: `degree-1 node’ effect [Faloutsos+] [Koren+] Esc_Prob(a->b)=1 Esc_Prob(a->b)=1 no influence for degree-1 nodes (E, F)! known as ‘pizza delivery guy’ problem in undirected graph
RWR: Individual Score Calculation Goal Individual importance score r(i,j) = ri,j For each node j wrt each query i How to Random walk with restart Steady State Prob.
An Illustrating Example 5 Prob (RW will finally stay at j) 11 12 4 Starting from 1 Randomly to neighbor Some p to return to 1 10 3 13 6 2 7 1 9 8
Individual Score Calculation Q1 Q2 Q3 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node 12 Node 13 0.5767 0.0088 0.0088 0.1235 0.0076 0.0076 0.0283 0.0283 0.0283 0.0076 0.1235 0.0076 0.0088 0.5767 0.0088 0.0076 0.0076 0.1235 0.0088 0.0088 0.5767 0.0333 0.0024 0.1260 0.1260 0.0024 0.0333 0.1260 0.0333 0.0024 0.0333 0.1260 0.0024 0.0024 0.1260 0.0333 0.0024 0.0333 0.1260
Individual Score Calculation Q1 Q2 Q3 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node 12 Node 13 0.5767 0.0088 0.0088 0.1235 0.0076 0.0076 0.0283 0.0283 0.0283 0.0076 0.1235 0.0076 0.0088 0.5767 0.0088 0.0076 0.0076 0.1235 0.0088 0.0088 0.5767 0.0333 0.0024 0.1260 0.1260 0.0024 0.0333 0.1260 0.0333 0.0024 0.0333 0.1260 0.0024 0.0024 0.1260 0.0333 0.0024 0.0333 0.1260
Variant: escape probability Define Random Walk (RW) on the graph Esc_Prob(AB) Prob (starting at A, reaches B before returning to A) the remaining graph A B Esc_Prob = Pr (smile before cry)
AND: Combine Scores Q: How to combine scores? A: Multiply …= prob. 3 random particles coincide on node j
K_SoftAnd: Combine Scores Generalization – SoftAND: We want nodes close to k of Q (k<Q) query nodes. Q: How to do that?
K_SoftAnd: Combine Scores Generalization – softAND: We want nodes close to k of Q (k<Q) query nodes. Q: How to do that? A: Prob(at least k-out-of-Q will meet each other at j)
K_SoftAnd: Relaxation of AND Disconnected Communities Noise Asking AND query? No Answer!
AND query vs. K_SoftAnd query x 1e-4 2_SoftAnd Query And Query
1_SoftAnd query = OR query
Measuring Importance Individual Scores Combining Scores Q1 Q2 Q3 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node 12 Node 13 0.5767 0.0088 0.0088 0.1235 0.0076 0.0076 0.0283 0.0283 0.0283 0.0076 0.1235 0.0076 0.0088 0.5767 0.0088 0.0076 0.0076 0.1235 0.0088 0.0088 0.5767 0.0333 0.0024 0.1260 0.1260 0.0024 0.0333 0.1260 0.0333 0.0024 0.0333 0.1260 0.0024 0.0024 0.1260 0.0333 0.0024 0.0333 0.1260 0.4505 0.0710 0.2267 0.1010 OR 0.0103 0.0019 0.0024 0.0046 Random walk with restart K_SoftAnd Steady State Prob And 2_SoftAnd Meeting Prob
“Extract” Alg. Goal How to…”Extract” Alg. Maximize total scores and 1 2 3 5 4 6 7 8 9 10 11 12 13 14 15 16 Goal Maximize total scores and ‘Appropriate’ Connections How to…”Extract” Alg. Dynamic Programming Greedy Alg. Pickup promising node Find ‘best’ path 2 10 9 6 8 13 11 4 5 7 12 3 1
Case Study: AND query
database Statistic 2_SoftAnd query