Purnamrita Sarkar (UC Berkeley) Deepayan Chakrabarti (Yahoo! Research) Andrew W. Moore (Google, Inc.) 1.

Slides:



Advertisements
Similar presentations
Routing in Poisson small-world networks A. J. Ganesh Microsoft Research, Cambridge Joint work with Moez Draief.
Advertisements

Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W. Moore (Google, Inc.)
The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford.
Small-world networks.
Analysis and Modeling of Social Networks Foudalis Ilias.
Nonparametric Link Prediction in Dynamic Graphs Purnamrita Sarkar (UC Berkeley) Deepayan Chakrabarti (Facebook) Michael Jordan (UC Berkeley) 1.
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
Rumors and Routes Rajmohan Rajaraman Northeastern University, Boston May 2012 Chennai Network Optimization WorkshopRumors and Routes1.
Information Networks Small World Networks Lecture 5.
Advanced Topics in Data Mining Special focus: Social Networks.
Identity and search in social networks Presented by Pooja Deodhar Duncan J. Watts, Peter Sheridan Dodds and M. E. J. Newman.
CS 599: Social Media Analysis University of Southern California1 The Basics of Network Analysis Kristina Lerman University of Southern California.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
University of Wisconsin-Milwaukee Geographic Information Science Geography 625 Intermediate Geographic Information Science Instructor: Changshan Wu Department.
CS 728 Lecture 4 It’s a Small World on the Web. Small World Networks It is a ‘small world’ after all –Billions of people on Earth, yet every pair separated.
CS Lecture 6 Generative Graph Models Part II.
© University of Minnesota Data Mining CSCI 8980 (Fall 2002) 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center.
Global topological properties of biological networks.
Parametric Inference.
Multi-armed Bandit Problems with Dependent Arms
NUS CS 5247 David Hsu1 Last lecture  Multiple-query PRM  Lazy PRM (single-query PRM)
Advanced Topics in Data Mining Special focus: Social Networks.
1 Fast Incremental Proximity Search in Large Graphs Purnamrita Sarkar Andrew W. Moore Amit Prakash.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006
1 Analyzing Kleinberg’s (and other) Small-world Models Chip Martel and Van Nguyen Computer Science Department; University of California at Davis.
Greedy Routing with Bounded Stretch Roland Flury, Roger Wattenhofer (ETH Zurich), Sriram Pemmaraju (Iowa University) Published at IEEE Infocom 2009 Introduction.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W. Moore (Google, Inc.)
1 Challenges in Computational Advertising Deepayan Chakrabarti
The Effects of Ranging Noise on Multihop Localization: An Empirical Study from UC Berkeley Abon.
Lecture 13 Graphs. Introduction to Graphs Examples of Graphs – Airline Route Map What is the fastest way to get from Pittsburgh to St Louis? What is the.
Network properties Slides are modified from Networks: Theory and Application by Lada Adamic.
Small-world networks. What is it? Everyone talks about the small world phenomenon, but truly what is it? There are three landmark papers: Stanley Milgram.
July The Mathematical Challenge of Large Networks László Lovász Eötvös Loránd University, Budapest
Improved Approximation Algorithms for the Quality of Service Steiner Tree Problem M. Karpinski Bonn University I. Măndoiu UC San Diego A. Olshevsky GaTech.
Gennaro Cordasco - How Much Independent Should Individual Contacts be to Form a Small-World? - 19/12/2006 How Much Independent Should Individual Contacts.
Online Social Networks and Media
October Large networks: a new language for science László Lovász Eötvös Loránd University, Budapest
Challenges and Opportunities Posed by Power Laws in Network Analysis Bruno Ribeiro UMass Amherst MURI REVIEW MEETING Berkeley, 26 th Oct 2011.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
Xiaowei Ying, Xintao Wu Univ. of North Carolina at Charlotte PAKDD-09 April 28, Bangkok, Thailand On Link Privacy in Randomizing Social Networks.
3. SMALL WORLDS The Watts-Strogatz model. Watts-Strogatz, Nature 1998 Small world: the average shortest path length in a real network is small Six degrees.
Link Prediction Topics in Data Mining Fall 2015 Bruno Ribeiro
Social Network Analysis. Outline l Background of social networks –Definition, examples and properties l Data in social networks –Data creation, flow and.
Most of contents are provided by the website Network Models TJTSD66: Advanced Topics in Social Media (Social.
1 CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014 Network models Tamer Kahveci.
March 3, 2009 Network Analysis Valerie Cardenas Nicolson Assistant Adjunct Professor Department of Radiology and Biomedical Imaging.
Performance Evaluation Lecture 1: Complex Networks Giovanni Neglia INRIA – EPI Maestro 10 December 2012.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Introduction Wireless Ad-Hoc Network  Set of transceivers communicating by radio.
Topics In Social Computing (67810) Module 1 Introduction & The Structure of Social Networks.
1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Algorithms for Radio Networks Winter Term 2005/2006.
Cmpe 588- Modeling of Internet Emergence of Scale-Free Network with Chaotic Units Pulin Gong, Cees van Leeuwen by Oya Ünlü Instructor: Haluk Bingöl.
Lecture 1: Complex Networks
Topics In Social Computing (67810)
A Theoretical Justification of Link Prediction Heuristics
Greedy Routing with Bounded Stretch
Network Science: A Short Introduction i3 Workshop
A Latent Space Approach to Dynamic Embedding of Co-occurrence Data
A Theoretical Justification of Link Prediction Heuristics
The Watts-Strogatz model
Centrality in Social Networks
Theoretical Justification of Popular Link Prediction Heuristics
Nonparametric Link Prediction in Dynamic Graphs
The likelihood of linking to a popular website is higher
Peer-to-Peer and Social Networks Fall 2017
Capacity of Ad Hoc Networks
Topological Signatures For Fast Mobility Analysis
Advanced Topics in Data Mining Special focus: Social Networks
Advanced Topics in Data Mining Special focus: Social Networks
Presentation transcript:

Purnamrita Sarkar (UC Berkeley) Deepayan Chakrabarti (Yahoo! Research) Andrew W. Moore (Google, Inc.) 1

 Which pair of nodes {i,j} should be connected? Alice Bob Charlie Goal: Recommend a movie

 Which pair of nodes {i,j} should be connected? Goal: Suggest friends

 Predict link between nodes Connected by the shortest path With the most common neighbors (length 2 paths) More weight to low-degree common nbrs (Adamic/Adar) 8 followers 1000 followers Prolific common friends  Less evidence Less prolific  Much more evidence Alice Bob Charlie

 Predict link between nodes Connected by the shortest path With the most common neighbors (length 2 paths) More weight to low-degree common nbrs (Adamic/Adar) With more short paths (e.g. length 3 paths )  exponentially decaying weights to longer paths (Katz measure) …

RandomShortest Path Common Neighbors Adamic/AdarEnsemble of short paths Link prediction accuracy* *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007 How do we justify these observations? Especially if the graph is sparse

7 Unit volume universe Model: 1.Nodes are uniformly distributed points in a latent space 2.This space has a distance metric 3.Points close to each other are likely to be connected in the graph  Logistic distance function (Raftery+/2002)

8 1 ½ Higher probability of linking radius r α determines the steepness The problem of link prediction is to find the nearest neighbor who is not currently linked to the node.  Equivalent to inferring distances in the latent space Model: 1.Nodes are uniformly distributed points in a latent space 2.This space has a distance metric 3.Points close to each other are likely to be connected in the graph  Logistic distance function (Raftery+/2002)

RandomShortest Path Common Neighbors Adamic/AdarEnsemble of short paths Link prediction accuracy *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007 Especially if the graph is sparse

 Pr 2 (i,j) = Pr(common neighbor|d ij ) Product of two logistic probabilities, integrated over a volume determined by d ij As α  ∞ Logistic  Step function Much easier to analyze! i j

11 Everyone has same radius r i j # common nbrs gives a bound on distance η =Number of common neighbors V(r)=volume of radius r in D dims Unit volume universe

 OPT = node closest to i  MAX = node with max common neighbors with i  Theorem: w.h.p Link prediction by common neighbors is asymptotically optimal d OPT ≤ d MAX ≤ d OPT + 2[ ε/V(1)] 1/D

 Node k has radius r k.  i  k if d ik ≤ r k (Directed graph)  r k captures popularity of node k  “Weighted” common neighbors:  Predict (i,j) pairs with highest Σ w(r)η(r) 13 i k j rkrk m Weight for nodes of radius r # common neighbors of radius r

Presence of common neighbor is very informative r is close to max radius Absence is very informative Adamic/Adar 1/r Real world graphs generally fall in this range i k j radius

RandomShortest Path Common Neighbors Adamic/AdarEnsemble of short paths Link prediction accuracy *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007 Especially if the graph is sparse

 Common neighbors = 2 hop paths  For longer paths:  Bounds are weaker  For ℓ ’ ≥ ℓ we need η ℓ ’ >> η ℓ to obtain similar bounds  justifies the exponentially decaying weight given to longer paths by the Katz measure

 Three key ingredients 1. Closer points are likelier to be linked. Small World Model- Watts, Strogatz, 1998, Kleinberg Triangle inequality holds  necessary to extend to ℓ- hop paths 3. Points are spread uniformly at random  Otherwise properties will depend on location as well as distance

RandomShortest Path Common Neighbors Adamic/AdarEnsemble of short paths Link prediction accuracy* *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007 The number of paths matters, not the length For large dense graphs, common neighbors are enough Differentiating between different degrees is important In sparse graphs, length 3 or more paths help in prediction.

20 1 ½ Higher probability of linking Two sources of randomness Point positions: uniform in D dimensional space Linkage probability: logistic with parameters α, r α, r and D are known radius r α determines the steepness The problem of link prediction is to find the nearest neighbor who is not currently linked to the node.  Equivalent to inferring distances in the latent space

1 ½ Factor ¼ weak bound for Logistic Can be made tighter, as logistic approaches the step function.

22 Generative model Link Prediction Heuristics node a Most likely neighbor of node i ? node b Compare A few properties  Can justify the empirical observations  We also offer some new prediction algorithms

 Combine bounds from different radii  But there might not be enough data to obtain individual bounds from each radius  New sweep estimator  Q r = Fraction of nodes w. radius ≤ r, which are common neighbors.  Higher Q r  smaller d ij w.h.p

 Q r = Fraction of nodes w. radius ≤ r, which are common neighbors larger Q r  smaller d ij w.h.p  T R : = Fraction of nodes w. radius ≥ R, which are common neighbors.  Smaller T R  large d ij w.h.p

Q r = Fraction of nodes with radius ≤ r which are common neighbors T R = Fraction of nodes with radius ≥ R which are common neighbors Number of common neighbors of a given radius Large Q r  small d ij Small T R  large d ij r

 Which pair of nodes {i,j} should be connected?  Variant: node i is given Friend suggestion in Facebook Alice Bob Charlie Movie recommendation in Netflix

27 Nodes are uniformly distributed in a latent space The problem of link prediction is to find the nearest neighbor who is not currently linked to the node.  Equivalent to inferring distances in the latent space Raftery et al.’s Model: Unit volume universe Points close in this space are more likely to be connected.

28 1 ½ Higher probability of linking Two sources of randomness Point positions: uniform in D dimensional space Linkage probability: logistic with parameters α, r α, r and D are known radius r α determines the steepness

i j k η 1 ~ Bin[N 1, A(r 1, r 1, d ij )] η 2 ~ Bin[N 2, A(r 2, r 2, d ij )] Example graph:  N 1 nodes of radius r 1 and N 2 nodes of radius r 2  r 1 << r 2 Maximize Pr[ η 1, η 2 | d ij ] = product of two binomials w(r 1 ) E[ η 1 |d*] + w(r 2 ) E[ η 2 |d*] = w(r 1 ) η 1 + w(r 2 ) η 2 RHS ↑  LHS ↑  d* ↓

{ Variance Jacobian Small variance  Presence is more surprising r is close to max radius Small variance  Absence is more surprising Adamic/Adar 1/r Real world graphs generally fall in this range

 Common neighbors = 2 hop paths  Analysis of longer paths: two components 1. Bounding E( η l | d ij ). [η l = # l hop paths]  Bounds Pr l (i,j) by using triangle inequality on a series of common neighbor probabilities. 2. η l ≈ E( η l | d ij ) Triangulation

 Common neighbors = 2 hop paths  Analysis of longer paths: two components 1. Bounding E( η l | d ij ) [η l = # l hop paths]  Bounds Pr l (i,j) by using triangle inequality on a series of common neighbor probabilities. 2. η l ≈ E( η l | d ij ) Bounded dependence of η l on position of each node  Can use McDiarmid’s inequality to bound | η l - E( η l | d ij )|

i j k η 1 ~ Bin[N 1, A(r 1, r 1, d ij )] η 2 ~ Bin[N 2, A(r 2, r 2, d ij )] Example graph:  N 1 nodes of radius r 1 and N 2 nodes of radius r 2 w(r 1 ) E[ η 1 |d*] + w(r 2 ) E[ η 2 |d*] = w(r 1 ) η 1 + w(r 2 ) η 2 (d * =MLE) Decreasing function of d * “Weighted” common neighbors Weights RHS ↑  d * ↓ Link prediction by weighted common neighbors is justified

 Node k has radius r k.  i  k if d ik ≤ r k (Directed graph)  r k captures popularity of node k 34 i k j rkrk m

 Node k has radius r k.  i  k if d ik ≤ r k (Directed graph)  r k captures popularity of node k 35 i k j Type 1: i  k  j riri rjrj A(r i, r j,d ij ) Type 2: i  k  j i k j rkrk rkrk A(r k, r k,d ij )

i j k Example graph:  N 1 nodes of radius r 1 and N 2 nodes of radius r 2  η 1 and η 2 common neighbors with these radii w(r 1 ) E[ η 1 |d*] + w(r 2 ) E[ η 2 |d*] = w(r 1 ) η 1 + w(r 2 ) η 2 (d * =MLE) Decreasing function of d * “Weighted” common neighbors Weights More “weighted” common neighbors  points are closer  Useful for link prediction

 Common neighbors = 2 hop paths  Analysis of longer paths: 1. Triangulation: ℓ-hop path as a sequence of common neighbors 2. “Metric” property: intermediate distances linked to d ij

 Bound d ij as a function of η ℓ  For ℓ ’ ≥ ℓ we need η ℓ ’ >> η ℓ to obtain similar bounds  justifies the exponentially decaying weight given to longer paths by the Katz measure  Also, we can obtain much tighter bounds for long paths if shorter paths are known to exist.