Nonparametric Link Prediction in Dynamic Graphs

Slides:



Advertisements
Similar presentations
Google News Personalization: Scalable Online Collaborative Filtering
Advertisements

Google News Personalization Scalable Online Collaborative Filtering
Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W. Moore (Google, Inc.)
Differential Forms for Target Tracking and Aggregate Queries in Distributed Networks Rik Sarkar Jie Gao Stony Brook University 1.
Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.
Nonparametric Link Prediction in Dynamic Graphs Purnamrita Sarkar (UC Berkeley) Deepayan Chakrabarti (Facebook) Michael Jordan (UC Berkeley) 1.
4/15/2017 Using Gaussian Process Regression for Efficient Motion Planning in Environments with Deformable Objects Barbara Frank, Cyrill Stachniss, Nichola.
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Algorithmic and Economic Aspects of Networks Nicole Immorlica.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
1 Lecture 18 Syntactic Web Clustering CS
Purnamrita Sarkar (UC Berkeley) Deepayan Chakrabarti (Yahoo! Research) Andrew W. Moore (Google, Inc.) 1.
Similarity Search in High Dimensions via Hashing Aristides Gionis, Protr Indyk and Rajeev Motwani Department of Computer Science Stanford University presented.
1 Fast Incremental Proximity Search in Large Graphs Purnamrita Sarkar Andrew W. Moore Amit Prakash.
Nearest Neighbor Retrieval Using Distance-Based Hashing Michalis Potamias and Panagiotis Papapetrou supervised by Prof George Kollios A method is proposed.
1 Uniform Sampling from the Web via Random Walks Ziv Bar-Yossef Alexander Berg Steve Chien Jittat Fakcharoenphol Dror Weitz University of California at.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Efficient and Robust Query Processing in Dynamic Environments Using Random Walk Techniques Chen Avin Carlos Brito.
Survey on Evolving Graphs Research Speaker: Chenghui Ren Supervisors: Prof. Ben Kao, Prof. David Cheung 1.
Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W. Moore (Google, Inc.)
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
LANGUAGE NETWORKS THE SMALL WORLD OF HUMAN LANGUAGE Akilan Velmurugan Computer Networks – CS 790G.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
Handover and Tracking in a Camera Network Presented by Dima Gershovich.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
Continuous Variables Write message update equation as an expectation: Proposal distribution W t (x t ) for each node Samples define a random discretization.
Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
Algorithmic Detection of Semantic Similarity WWW 2005.
Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.
Link Prediction Topics in Data Mining Fall 2015 Bruno Ribeiro
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Presenter : Kuang-Jui Hsu Date : 2011/3/24(Thur.).
1 Finding Spread Blockers in Dynamic Networks (SNAKDD08)Habiba, Yintao Yu, Tanya Y., Berger-Wolf, Jared Saia Speaker: Hsu, Yu-wen Advisor: Dr. Koh, Jia-Ling.
MAIN RESULT: We assume utility exhibits strategic complementarities. We show: Membership in larger k-core implies higher actions in equilibrium Higher.
Analysis of Social Media MLD , LTI William Cohen
S IMILARITY E STIMATION T ECHNIQUES FROM R OUNDING A LGORITHMS Paper Review Jieun Lee Moses S. Charikar Princeton University Advanced Database.
Data Transformation: Normalization
RF-based positioning.
Groups of vertices and Core-periphery structure
New Characterizations in Turnstile Streams with Applications
Ch 13 WAN Technologies and Routing
Fast nearest neighbor searches in high dimensions Sami Sieranoja
Topics In Social Computing (67810)
A Theoretical Justification of Link Prediction Heuristics
R. Srikant University of Illinois at Urbana-Champaign
How Do “Real” Networks Look?
K Nearest Neighbor Classification
Lecture 10: Sketching S3: Nearest Neighbor Search
A Theoretical Justification of Link Prediction Heuristics
6. Introduction to nonparametric clustering
How Do “Real” Networks Look?
How Do “Real” Networks Look?
Randomized Algorithms CS648
Theoretical Justification of Popular Link Prediction Heuristics
Near-Optimal (Euclidean) Metric Compression
How Do “Real” Networks Look?
Exponential Time Paradigms Through the Polynomial Time Lens
Summarizing Data by Statistics
Introduction Wireless Ad-Hoc Network
Scaling up Link Prediction with Ensembles
CS5112: Algorithms and Data Structures for Applications
Minwise Hashing and Efficient Search
Learning to Rank Typed Graph Walks: Local and Global Approaches
Topological Signatures For Fast Mobility Analysis
A task of induction to find patterns
Human-centered Machine Learning
Presented by Nick Janus
Presentation transcript:

Nonparametric Link Prediction in Dynamic Graphs Deepayan Chakrabarti (UT Austin)

Collaborators Purnamrita Sarkar (UT Austin) Michael Jordan (UC Berkeley)

Link Prediction Who is most likely to be interact with a given node? Friend suggestion in Facebook Should Facebook suggest Alice as a friend for Bob? Alice Here is your current friendlist. I want to suggest friends for you. This is a link prediction problem. Here are the movies you have liked. I want to suggest new movies to you. This is also a link prediction problem. There are a variety of such problems. Bob

Link Prediction Movie recommendation in Netflix Alice Bob Charlie Movie recommendation in Netflix Should Netflix suggest this movie to Alice? Here is your current friendlist. I want to suggest friends for you. This is a link prediction problem. Here are the movies you have liked. I want to suggest new movies to you. This is also a link prediction problem. There are a variety of such problems.

Link Prediction Prediction using features degree of a node number of common neighbors number of short paths … These are features of the latest “snapshot” of the network but the network is dynamic and this can be problematic In general this is answered using heuristics. For example predict the pair connected via the minimum number of hops. Or predict the pair with the maximum number of common neighbors. In fact Facebook mentions the number of common neighbors on its friend suggestions. Often it is important to look at the features of the common neighbors. For example a very prolific common neighbor gives much less information about the similarity between two nodes, whereas a less prolific common neighbor indicates that the nodes are likely to be part of a tight niche. The adamid adar score weights the more popular common neighbors less.

Link Prediction Historical Pattern of Network Evolution Time Latest snapshot Predict tomorrow’s network

Link Prediction Prediction using simple features degree of a node number of common neighbors number of short paths … What if the network is dynamic? Can we use the time series of networks? In general this is answered using heuristics. For example predict the pair connected via the minimum number of hops. Or predict the pair with the maximum number of common neighbors. In fact Facebook mentions the number of common neighbors on its friend suggestions. Often it is important to look at the features of the common neighbors. For example a very prolific common neighbor gives much less information about the similarity between two nodes, whereas a less prolific common neighbor indicates that the nodes are likely to be part of a tight niche. The adamid adar score weights the more popular common neighbors less.

Related Work Generative models Other approaches Exp. family random graph models [Hanneke+/’06] Dynamics in latent space [Sarkar+/’05] Extension of mixed membership block models [Fu+/10] Other approaches Autoregressive models for links [Huang+/09] Extensions of static features [Tylenda+/09]

Goal Link Prediction incorporating graph dynamics, requiring weak modeling assumptions, allowing fast predictions, and offering consistency guarantees.

Outline Model Estimator Consistency Scalability Experiments

The Link Prediction Problem in Dynamic Graphs GT+1 …… Y1 (i,j)=1 Y2 (i,j)=0 YT+1 (i,j)=? YT+1(i,j) | G1,G2, …,GT ~ Bernoulli (gG1,G2,…GT(i,j)) Edge in T+1 Features of previous graphs and this pair of nodes

Including graph-based features Example set of features for pair (i,j): cn(i,j) (common neighbors) ℓℓ(i,j) (last time a link was formed) deg(j) Represent dynamics using “datacubes” of these features. ≈ multi-dimensional histogram on binned feature values 1 ≤ cn ≤ 3 3 ≤ deg ≤ 6 1 ≤ ℓℓ ≤ 2 ηt = #pairs of nodes in Gt with these features high ηt+/ηt  this feature combination is more likely to create a new edge at time t+1 cn ℓℓ deg ηt+ = #pairs in Gt with these features, which had an edge in Gt+1

Including graph-based features 1 ≤ cn(i,j) ≤ 3 3 ≤ deg(i,j) ≤ 6 1 ≤ ℓℓ (i,j) ≤ 2 GT YT+1 (i,j)=? Y2 (i,j)=0 Y1 (i,j)=1 …… G1 G2 How do we form these datacubes? Vanilla idea: One datacube for Gt→Gt+1 aggregated over all pairs (i,j) Does not allow for differently evolving communities

Our Model How do we form these datacubes? 1 ≤ cn(i,j) ≤ 3 3 ≤ deg(i,j) ≤ 6 1 ≤ ℓℓ (i,j) ≤ 2 G1 G2 GT …… Y1 (i,j)=1 Y2 (i,j)=0 YT+1 (i,j)=? How do we form these datacubes? Our Model: One datacube for each neighborhood Captures local evolution

Our Model Neighborhood Nt(i)= nodes within 2 hops 1 ≤ cn(i,j) ≤ 3 3 ≤ deg(i,j) ≤ 6 1 ≤ ℓℓ (i,j) ≤ 2 Features extracted from (Nt-p,…Nt) Datacube Number of node pairs - with feature s - in the neighborhood of i - at time t Number of node pairs - with feature s - in the neighborhood of i - at time t - which got connected at time t+1

Local evolution patterns Our Model Datacube dt(i) captures graph evolution in the local neighborhood of a node in the recent past Model: How can we estimate g(.)? YT+1(i,j) | G1,G2, …,GT ~ Bernoulli ( gG1,G2,…GT(i,j)) g(dt(i), st(i,j)) Local evolution patterns Features of the pair

Outline Model Estimator Consistency Scalability Experiments

{ { { { { Kernel Estimator for g G1 G2 …… GT-2 GT-1 GT … … … datacube, feature pair t=2 { … datacube, feature pair t=1 { … datacube, feature pair t=3 { … { { query data-cube at T-1 and feature vector at time T find similar historical situations

Find similar historical situations Kernel Estimator for g Find similar historical situations See what happened next }

} } Kernel Estimator for g Factorize the similarity function K( , )I{ == } Factorize the similarity function Allows computation of via simple lookups } }

Kernel Estimator for g G1 G2 …… GT-2 GT-1 GT datacubes t=1 datacubes compute similarities only between data cubes w1 w2 w3 w4

Look at the datacubes from the next timestep Kernel Estimator for g GT-1 GT GT-2 G1 G2 …… datacubes t=1 datacubes t=2 datacubes t=3 Look at the datacubes from the next timestep w1 w2 w3 w4

Look up prob. of edge formation Kernel Estimator for g GT-1 GT GT-2 G1 G2 …… datacubes t=1 datacubes t=2 datacubes t=3 Look up prob. of edge formation w1 w2 w3 w4

Look up prob. of edge formation Kernel Estimator for g GT-1 GT GT-2 G1 G2 …… datacubes t=1 datacubes t=2 datacubes t=3 Look up prob. of edge formation w1 η1 , η1+ η2 , η2+ η3 , η3+ η4 , η4+ w2 w3 w4

} } Kernel Estimator for g Factorize the similarity function K( , )I{ == } Factorize the similarity function Allows computation of via simple lookups What is K( , )? } }

Similarity between two datacubes Idea 1 For each cell s, take (η1+/η1 – η2+/η2)2 and sum Problem: Magnitude of η is ignored 5/10 and 50/100 are treated equally Consider the distribution η1 , η1+ η2 , η2+

Similarity between two datacubes Idea 2 For each cell s, compute posterior distribution of edge creation prob. dist = total variation distance between distributions summed over all cells η1 , η1+ η2 , η2+ 0<b<1 As b0, K( , ) 0 unless dist( , ) =0

Kernel Estimator for g Want to show:

Outline Model Estimator Consistency Scalability Experiments

Consistency of Estimator Lemma 1: As T→∞, for some R>0, Proof using: As T→∞,

Consistency of Estimator Lemma 2: As T→∞,

Consistency of Estimator Assumption: finite graph Proof sketch: Dynamics are Markovian with finite state space the chain must eventually enter a closed, irreducible communication class geometric ergodicity if class is aperiodic (if not, more complicated…) strong mixing with exponential decay variances decay as O(1/T)

Consistency of Estimator Theorem: Proof Sketch: for some R>0 So

Outline Model Estimator Consistency Scalability Experiments

Scalability Full solution: Approximate solution: Summing over all n datacubes for all T timesteps Infeasible Approximate solution: Sum over nearest neighbors of query datacube How do we find nearest neighbors? Locality Sensitive Hashing (LSH) [Indyk+/98, Broder+/98]

Using LSH Devise a hashing function for datacubes such that “Similar” datacubes tend to be hashed to the same bucket “Similar” = small total variation distance between cells of datacubes

Using LSH Step 1: Map datacubes to bit vectors Use B2 bits for each bucket For probability mass p the first bits are set to 1 Use B1 buckets to discretize [0,1] Total M*B1*B2 bits, where M = max number of occupied cells << total number of cells

Using LSH Step 1: Map datacubes to bit vectors Total variation distance ∝ L1 distance between distributions ≈ Hamming distance between vectors Step 2: Hash function = k out of MB1B2 bits

Fast Search Using LSH 0000 0001 1111 0011 . 1011 1111111111000000000111111111000 10000101000011100001101010000 10101010000011100001101010000 101010101110111111011010111110 1111111111000000000111111111001

Outline Model Estimator Consistency Scalability Experiments

Experiments Baselines LL: last link (time of last occurrence of a pair) CN: rank by number of common neighbors in 𝐺 𝑇 AA: more weight to low-degree common neighbors Katz: accounts for longer paths CN-all: apply CN to 𝐺 1 ∪⋯∪ 𝐺 𝑡 AA-all, Katz-all: similar static on 𝐺 𝑇 static on ∪𝐺 𝑇

Setup Pick random subset S from nodes with degree>0 in GT+1 ∀𝑠∈𝐒, predict a ranked list of nodes likely to link to s Report mean AUC (higher is better) G1 G2 GT Training data Test data

Simulations Social network model of Hoff et al. Each node has an independently drawn latent feature vector Edge(i,j) depends on latent features of i and j Seasonality effect Feature importance varies with season different communities in each season Feature vectors evolve smoothly over time evolving community structures

Simulations NonParam is much better than others in the presence of seasonality CN, AA, and Katz implicitly assume smooth evolution

Sensor Network* * www.select.cs.cmu.edu/data

Summary Link formation is assumed to depend on the neighborhood’s evolution over a time window Admits a kernel-based estimator Consistency Scalability via LSH Works particularly well for Seasonal effects differently evolving communities