Nonparametric Link Prediction in Dynamic Graphs

Nonparametric Link Prediction in Dynamic Graphs
Deepayan Chakrabarti (UT Austin)

Collaborators Purnamrita Sarkar (UT Austin)
Michael Jordan (UC Berkeley)

Link Prediction Who is most likely to be interact with a given node?
Friend suggestion in Facebook Should Facebook suggest Alice as a friend for Bob? Alice Here is your current friendlist. I want to suggest friends for you. This is a link prediction problem. Here are the movies you have liked. I want to suggest new movies to you. This is also a link prediction problem. There are a variety of such problems. Bob

Link Prediction Movie recommendation in Netflix Alice
Bob Charlie Movie recommendation in Netflix Should Netflix suggest this movie to Alice? Here is your current friendlist. I want to suggest friends for you. This is a link prediction problem. Here are the movies you have liked. I want to suggest new movies to you. This is also a link prediction problem. There are a variety of such problems.

Link Prediction Prediction using features
degree of a node number of common neighbors number of short paths … These are features of the latest “snapshot” of the network but the network is dynamic and this can be problematic In general this is answered using heuristics. For example predict the pair connected via the minimum number of hops. Or predict the pair with the maximum number of common neighbors. In fact Facebook mentions the number of common neighbors on its friend suggestions. Often it is important to look at the features of the common neighbors. For example a very prolific common neighbor gives much less information about the similarity between two nodes, whereas a less prolific common neighbor indicates that the nodes are likely to be part of a tight niche. The adamid adar score weights the more popular common neighbors less.

Link Prediction Historical Pattern of Network Evolution
Time Latest snapshot Predict tomorrow’s network

Link Prediction Prediction using simple features
degree of a node number of common neighbors number of short paths … What if the network is dynamic? Can we use the time series of networks? In general this is answered using heuristics. For example predict the pair connected via the minimum number of hops. Or predict the pair with the maximum number of common neighbors. In fact Facebook mentions the number of common neighbors on its friend suggestions. Often it is important to look at the features of the common neighbors. For example a very prolific common neighbor gives much less information about the similarity between two nodes, whereas a less prolific common neighbor indicates that the nodes are likely to be part of a tight niche. The adamid adar score weights the more popular common neighbors less.

Related Work Generative models Other approaches
Exp. family random graph models [Hanneke+/’06] Dynamics in latent space [Sarkar+/’05] Extension of mixed membership block models [Fu+/10] Other approaches Autoregressive models for links [Huang+/09] Extensions of static features [Tylenda+/09]

Goal Link Prediction incorporating graph dynamics,
requiring weak modeling assumptions, allowing fast predictions, and offering consistency guarantees.

Outline Model Estimator Consistency Scalability Experiments

The Link Prediction Problem in Dynamic Graphs
GT+1 …… Y1 (i,j)=1 Y2 (i,j)=0 YT+1 (i,j)=? YT+1(i,j) | G1,G2, …,GT ~ Bernoulli (gG1,G2,…GT(i,j)) Edge in T+1 Features of previous graphs and this pair of nodes

Including graph-based features
Example set of features for pair (i,j): cn(i,j) (common neighbors) ℓℓ(i,j) (last time a link was formed) deg(j) Represent dynamics using “datacubes” of these features. ≈ multi-dimensional histogram on binned feature values 1 ≤ cn ≤ 3 3 ≤ deg ≤ 6 1 ≤ ℓℓ ≤ 2 ηt = #pairs of nodes in Gt with these features high ηt+/ηt  this feature combination is more likely to create a new edge at time t+1 cn ℓℓ deg ηt+ = #pairs in Gt with these features, which had an edge in Gt+1

Including graph-based features
1 ≤ cn(i,j) ≤ 3 3 ≤ deg(i,j) ≤ 6 1 ≤ ℓℓ (i,j) ≤ 2 GT YT+1 (i,j)=? Y2 (i,j)=0 Y1 (i,j)=1 …… G1 G2 How do we form these datacubes? Vanilla idea: One datacube for Gt→Gt aggregated over all pairs (i,j) Does not allow for differently evolving communities

Our Model How do we form these datacubes?
1 ≤ cn(i,j) ≤ 3 3 ≤ deg(i,j) ≤ 6 1 ≤ ℓℓ (i,j) ≤ 2 G1 G2 GT …… Y1 (i,j)=1 Y2 (i,j)=0 YT+1 (i,j)=? How do we form these datacubes? Our Model: One datacube for each neighborhood Captures local evolution

Our Model Neighborhood Nt(i)= nodes within 2 hops
1 ≤ cn(i,j) ≤ 3 3 ≤ deg(i,j) ≤ 6 1 ≤ ℓℓ (i,j) ≤ 2 Features extracted from (Nt-p,…Nt) Datacube Number of node pairs - with feature s - in the neighborhood of i - at time t Number of node pairs - with feature s - in the neighborhood of i - at time t - which got connected at time t+1

Local evolution patterns
Our Model Datacube dt(i) captures graph evolution in the local neighborhood of a node in the recent past Model: How can we estimate g(.)? YT+1(i,j) | G1,G2, …,GT ~ Bernoulli ( gG1,G2,…GT(i,j)) g(dt(i), st(i,j)) Local evolution patterns Features of the pair

{ { { { { Kernel Estimator for g G1 G2 …… GT-2 GT-1 GT … … …
datacube, feature pair t=2 { … datacube, feature pair t=1 { … datacube, feature pair t=3 { … { { query data-cube at T-1 and feature vector at time T find similar historical situations

Find similar historical situations
Kernel Estimator for g Find similar historical situations See what happened next }

} } Kernel Estimator for g Factorize the similarity function
K( , )I{ == } Factorize the similarity function Allows computation of via simple lookups } }

Kernel Estimator for g G1 G2 …… GT-2 GT-1 GT datacubes t=1 datacubes
compute similarities only between data cubes w1 w2 w3 w4

Look at the datacubes from the next timestep
Kernel Estimator for g GT-1 GT GT-2 G1 G2 …… datacubes t=1 datacubes t=2 datacubes t=3 Look at the datacubes from the next timestep w1 w2 w3 w4

Look up prob. of edge formation
Kernel Estimator for g GT-1 GT GT-2 G1 G2 …… datacubes t=1 datacubes t=2 datacubes t=3 Look up prob. of edge formation w1 w2 w3 w4

Look up prob. of edge formation
Kernel Estimator for g GT-1 GT GT-2 G1 G2 …… datacubes t=1 datacubes t=2 datacubes t=3 Look up prob. of edge formation w1 η1 , η1+ η2 , η2+ η3 , η3+ η4 , η4+ w2 w3 w4

} } Kernel Estimator for g Factorize the similarity function
K( , )I{ == } Factorize the similarity function Allows computation of via simple lookups What is K( , )? } }

Similarity between two datacubes
Idea 1 For each cell s, take (η1+/η1 – η2+/η2)2 and sum Problem: Magnitude of η is ignored 5/10 and 50/100 are treated equally Consider the distribution η1 , η1+ η2 , η2+

Similarity between two datacubes
Idea 2 For each cell s, compute posterior distribution of edge creation prob. dist = total variation distance between distributions summed over all cells η1 , η1+ η2 , η2+ 0<b<1 As b0, K( , ) 0 unless dist( , ) =0

Kernel Estimator for g Want to show:

Consistency of Estimator
Lemma 1: As T→∞, for some R>0, Proof using: As T→∞,

Lemma 2: As T→∞,

Assumption: finite graph Proof sketch: Dynamics are Markovian with finite state space the chain must eventually enter a closed, irreducible communication class geometric ergodicity if class is aperiodic (if not, more complicated…) strong mixing with exponential decay variances decay as O(1/T)

Theorem: Proof Sketch: for some R>0 So

Scalability Full solution: Approximate solution:
Summing over all n datacubes for all T timesteps Infeasible Approximate solution: Sum over nearest neighbors of query datacube How do we find nearest neighbors? Locality Sensitive Hashing (LSH) [Indyk+/98, Broder+/98]

Using LSH Devise a hashing function for datacubes such that
“Similar” datacubes tend to be hashed to the same bucket “Similar” = small total variation distance between cells of datacubes

Using LSH Step 1: Map datacubes to bit vectors
Use B2 bits for each bucket For probability mass p the first bits are set to 1 Use B1 buckets to discretize [0,1] Total M*B1*B2 bits, where M = max number of occupied cells << total number of cells

Using LSH Step 1: Map datacubes to bit vectors
Total variation distance ∝ L1 distance between distributions ≈ Hamming distance between vectors Step 2: Hash function = k out of MB1B2 bits

Fast Search Using LSH 0000 0001 1111 0011 . 1011

Experiments Baselines
LL: last link (time of last occurrence of a pair) CN: rank by number of common neighbors in 𝐺 𝑇 AA: more weight to low-degree common neighbors Katz: accounts for longer paths CN-all: apply CN to 𝐺 1 ∪⋯∪ 𝐺 𝑡 AA-all, Katz-all: similar static on 𝐺 𝑇 static on ∪𝐺 𝑇

Setup Pick random subset S from nodes with degree>0 in GT+1
∀𝑠∈𝐒, predict a ranked list of nodes likely to link to s Report mean AUC (higher is better) G1 G2 GT Training data Test data

Simulations Social network model of Hoff et al.
Each node has an independently drawn latent feature vector Edge(i,j) depends on latent features of i and j Seasonality effect Feature importance varies with season different communities in each season Feature vectors evolve smoothly over time evolving community structures

Simulations NonParam is much better than others in the presence of seasonality CN, AA, and Katz implicitly assume smooth evolution

Sensor Network* *

Summary Link formation is assumed to depend on
the neighborhood’s evolution over a time window Admits a kernel-based estimator Consistency Scalability via LSH Works particularly well for Seasonal effects differently evolving communities

Nonparametric Link Prediction in Dynamic Graphs

Similar presentations

Presentation on theme: "Nonparametric Link Prediction in Dynamic Graphs"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Nonparametric Link Prediction in Dynamic Graphs

Similar presentations

Presentation on theme: "Nonparametric Link Prediction in Dynamic Graphs"— Presentation transcript:

Similar presentations

About project

Feedback