Download presentation
Presentation is loading. Please wait.
Published byBrent Cox Modified over 9 years ago
1
Link Prediction Topics in Data Mining Fall 2015 Bruno Ribeiro
2
Overview Deterministic Heuristics Methods
Matrix / Probabilistic Methods Supervised Learning Approaches © 2015 Bruno Ribeiro
3
Goal Input: Snapshot of social network at time t
Ouput: Predict edges that will be added to network at time t’ > t © 2015 Bruno Ribeiro
4
Product Recommendation
Example of link prediction application 2 i j 5 2 (c) 2015, Bruno Ribeiro
5
Twitter’s Who to Follow
Another example of link prediction application © 2015 Bruno Ribeiro
6
Other Applications A few more examples:
Fraud Detection: (Beutel, Akoglu, Faloutsos, KDD’15) Focuses on discovering surprising link patterns Anomaly detection: (Rattigan et al, 2005) Focuses on finding unlikely existing links © 2015 Bruno Ribeiro
7
Deterministic Heuristics
© 2015 Bruno Ribeiro
8
A Very Naïve Approach Every entity is assigned a score
Score(v) = how many friends person already has © 2015 Bruno Ribeiro
9
General Approach Assign connection weight score(u, v) to non-existing edge (u, v) Rank edges and recommend ones with highest score Can be viewed as computing a measure of proximity or “similarity” between nodes u and v. © 2015 Bruno Ribeiro
10
Shortest Path Negated length of shortest path between u and v. All nodes that share one neighbor will be linked Problem: Network diameter often very small © 2015 Bruno Ribeiro
11
Common Neighbors Common neighbors uses as score the number of common neighbors between vertices u and v. Score can be high when vertices have too many neighbors © 2015 Bruno Ribeiro
12
Jaccard Similarity Fixes common neighbors “too many neighbors” problem by dividing the intersection by the union Between 0 and 1 © 2015 Bruno Ribeiro
13
Adamic / Adar This score gives more weight to neighbors that that are not shared with many others. 5 neighbors 50 neighbors Too active Less evidence of missing link between Alice and Bob Less active More evidence of missing link between Bob and Carol Alice Bob Carol © 2015 Bruno Ribeiro
14
Preferential Attachment
The probability of co-authorship of u and v is proportional to the product of the number of collaborators of u and v © 2015 Bruno Ribeiro
15
Katz (1953) Exponentially weighted sum of number of paths of length l
For small α << 1 : predictions ≈ common neighbors © 2015 Bruno Ribeiro
16
Hitting Time where, Hu,v is the random walk hitting time between u and v © 2015 Bruno Ribeiro
17
Personalized PageRank
Stationary probability walker is at v under the following random walk: With probability α, jump back to u With probability 1 – α, go to random neighbor of current node © 2015 Bruno Ribeiro
18
SimRank N(u) and N(v) are number of in-degrees of nodes u and v
Only directed graphs score(u,v) ∊ [0,1]. If u=v then, s(u,v)=1 © 2015 Bruno Ribeiro
19
Latent Space Models © 2015 Bruno Ribeiro
20
Low Rank Reconstruction
Represent the adjacency matrix A with a lower rank matrix Ak. If Ak(u,v) has large value for a missing A(u,v)=0, then recommend link (u,v) Vn⨉k A Ak ≈ Uk⨉ n = © 2015 Bruno Ribeiro
21
Product Recommendations
Users have a propensity rate to buy certain category of product (W) In a category some products have a propensity rate to be bought (H) 2 i j 5 (c) 2015, Bruno Ribeiro 2 (Non-negative matrix factorization)
22
Example Euclidean Generative Model
Vertices uniformly distributed in latent unit hyperplane Vertices close in latent space more likely to be connected Probability of edge = f (distance on latent space) Unit plane © 2015 Bruno Ribeiro The problem of link prediction is to infer distances in the latent space, i.e., find the nearest neighbor in latent space not currently linked Raftery, A. E., Handcock, M. S., & Hoff, P. D. (2002). Latent space approaches to social network analysis. J. Amer. Stat. Assoc., 15, 460.
23
Bayesian Networks Directed Graphical Models
Bayesian networks and Prob. Relational Models (Getoor et al., 2001, Neville & Jensen 2003) Captures the dependence of link existence on attributes of entities Constrains probabilistic dependency to directed acyclic graph Undirected Graphical Models Markov Networks © 2015 Bruno Ribeiro
24
Example of Today’s Approaches
Whom To Follow and Why Credit: N. Barbieri, F. Bonchi, G. Manco KDD’14
25
Relational Markov Networks
A Relational Markov Network (RMN) specifies cliques and the potentials between attributes of related entities at the template level Single model provides distribution for entire graph Train model to maximize the probability of observed edge and labels Use trained model to predict edges and unknown attributes
26
Link Prediction Heuristic
Latent Space Rational Link Prediction Heuristic Most likely neighbor of node v ? Generative model Model parameters (latent space) θ Node u Node v © 2015 Bruno Ribeiro Observed edges
27
Relationship between Deterministic & Probabilistic methods?
© 2015 Bruno Ribeiro Yes = (Sarkar, Chakrabarti, Moore, COLT’10)
28
How do we justify these observations?
Empirically Especially if the graph is sparse How do we justify these observations? Link prediction accuracy* Random Shortest Path Common Neighbors Adamic/Adar Ensemble of short paths *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007 Credit: Sarkar, Chakrabarti, Moore
29
Raftery’s Model (cont)
Two sources of randomness Point positions: uniform in D dimensional space Linkage probability: logistic with parameters α, r α, r and D are known Higher probability of linking α determines the steepness 1 radius r Credit: Sarkar, Chakrabarti, Moore
30
Raftery’s Model: Probability of New Edge
j i Logistic function integrated over volume determined by dij Credit: Sarkar, Chakrabarti, Moore
31
Connection to Common Neighbors
k j r Distinct radius per node gives Adamic/Adar Empirical Bernstein bounds (Maurer & Pontil, 2009) © 2015 Bruno Ribeiro Decreases with N Net size No. common neigh. i k j j
32
Supervised Learning Approaches
Create binary classifier that predicts whether an edge exists between two nodes © 2015 Bruno Ribeiro
33
Types of Features Classifier SVM Decision Trees
Label Features Keywords Frequency of an item Topological Features Shortest Distance Degree Centrality PageRank index SVM Decision Trees Deep Belief Network (DBN) K-Nearest Neighbors (KNN) Naive Bayes Radial Basis Function (RBF) Logistic Regression © 2015 Bruno Ribeiro Link Prediction Heuristic
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.