1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.

Slides:



Advertisements
Similar presentations
Cost-effective Outbreak Detection in Networks Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance.
Advertisements

An Introduction to Artificial Intelligence
Viral Marketing – Learning Influence Probabilities.
Learning Influence Probabilities in Social Networks 1 2 Amit Goyal 1 Francesco Bonchi 2 Laks V. S. Lakshmanan 1 U. of British Columbia Yahoo! Research.
LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
Cost-effective Outbreak Detection in Networks Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance.
Jure Leskovec, CMU Lars Backstrom, Cornell Ravi Kumar, Yahoo! Research Andrew Tomkins, Yahoo! Research.
DAVA: Distributing Vaccines over Networks under Prior Information
Maximizing the Spread of Influence through a Social Network
1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
Department of Computer Science, University of Maryland, College Park, USA TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Near-optimal Nonmyopic Value of Information in Graphical Models Andreas Krause, Carlos Guestrin Computer Science Department Carnegie Mellon University.
1 On Compressing Web Graphs Michael Mitzenmacher, Harvard Micah Adler, Univ. of Massachusetts.
INFERRING NETWORKS OF DIFFUSION AND INFLUENCE Presented by Alicia Frame Paper by Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Kraus.
Online Data Gathering for Maximizing Network Lifetime in Sensor Networks IEEE transactions on Mobile Computing Weifa Liang, YuZhen Liu.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Simpath: An Efficient Algorithm for Influence Maximization under Linear Threshold Model Amit Goyal Wei Lu Laks V. S. Lakshmanan University of British Columbia.
Minimum Spanning Trees. Subgraph A graph G is a subgraph of graph H if –The vertices of G are a subset of the vertices of H, and –The edges of G are a.
Models of Influence in Online Social Networks
Diffusion in Social and Information Networks Part II W ORLD W IDE W EB 2015, F LORENCE MPI for Software SystemsGeorgia Institute of Technology Le Song.
On Ranking and Influence in Social Networks Huy Nguyen Lab seminar November 2, 2012.
Viral Marketing for Dedicated Customers Presented by: Cheng Long 25 August, 2012.
Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.
1 1 MPI for Intelligent Systems 2 Stanford University Manuel Gomez Rodriguez 1,2 David Balduzzi 1 Bernhard Schölkopf 1 UNCOVERING THE TEMPORAL DYNAMICS.
Personalized Influence Maximization on Social Networks
Efficient Gathering of Correlated Data in Sensor Networks
Mehdi Kargar Aijun An York University, Toronto, Canada Discovering Top-k Teams of Experts with/without a Leader in Social Networks.
Jure Leskovec PhD: Machine Learning Department, CMU Now: Computer Science Department, Stanford University.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
Thang N. Dinh, Dung T. Nguyen, My T. Thai Dept. of Computer & Information Science & Engineering University of Florida, Gainesville, FL Hypertext-2012,
DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
December 7-10, 2013, Dallas, Texas
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Maximizing the Spread of Influence through a Social Network David Kempe, Jon Kleinberg, Eva Tardos Cornell University KDD 2003.
Maximizing the Spread of Influence through a Social Network Authors: David Kempe, Jon Kleinberg, É va Tardos KDD 2003.
Problem Setting :Influence Maximization A new product is available in the market. Whom to give free samples to maximize the purchase of the product ? 1.
Manuel Gomez Rodriguez Structure and Dynamics of Information Pathways in On-line Media W ORKSHOP M ENORCA, MPI FOR I NTELLIGENT S YSTEMS.
Online Social Networks and Media
I NFORMATION C ASCADE Priyanka Garg. OUTLINE Information Propagation Virus Propagation Model How to model infection? Inferring Latent Social Networks.
CIAR Summer School Tutorial Lecture 1b Sigmoid Belief Nets Geoffrey Hinton.
Manuel Gomez Rodriguez Bernhard Schölkopf I NFLUENCE M AXIMIZATION IN C ONTINUOUS T IME D IFFUSION N ETWORKS , ICML ‘12.
Comparison of Tarry’s Algorithm and Awerbuch’s Algorithm CS 6/73201 Advanced Operating System Presentation by: Sanjitkumar Patel.
A Latent Social Approach to YouTube Popularity Prediction Amandianeze Nwana Prof. Salman Avestimehr Prof. Tsuhan Chen.
Cost-effective Outbreak Detection in Networks Presented by Amlan Pradhan, Yining Zhou, Yingfei Xiang, Abhinav Rungta -Group 1.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
1 1 MPI for Intelligent Systems 2 Stanford University Manuel Gomez Rodriguez 1,2 Bernhard Schölkopf 1 S UBMODULAR I NFERENCE OF D IFFUSION NETWORKS FROM.
F EATURE -E NHANCED P ROBABILISTIC M ODELS FOR D IFFUSION N ETWORK I NFERENCE Stefano Ermon ECML-PKDD September 26, 2012 Joint work with Liaoruo Wang and.
A Cooperative Coevolutionary Genetic Algorithm for Learning Bayesian Network Structures Arthur Carvalho
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
Bo Zong, Yinghui Wu, Ambuj K. Singh, Xifeng Yan 1 Inferring the Underlying Structure of Information Cascades
Biao Wang 1, Ge Chen 1, Luoyi Fu 1, Li Song 1, Xinbing Wang 1, Xue Liu 2 1 Shanghai Jiao Tong University 2 McGill University
Inferring Networks of Diffusion and Influence
Seed Selection.
Independent Cascade Model and Linear Threshold Model
DM-Group Meeting Liangzhe Chen, Nov
Link Prediction and Network Inference
Distributed Submodular Maximization in Massive Datasets
Independent Cascade Model and Linear Threshold Model
The Importance of Communities for Learning to Influence
Coverage Approximation Algorithms
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
Cost-effective Outbreak Detection in Networks
Alan Kuhnle*, Victoria G. Crawford, and My T. Thai
Viral Marketing over Social Networks
Independent Cascade Model and Linear Threshold Model
Presentation transcript:

1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez Rodriguez 1,2 Jure Leskovec 1 Andreas Krause 3

2 Hidden and implicit networks  Many social or information networks are implicit or hard to observe:  Hidden/hard-to-reach populations:  Network of needle sharing between drug injection users  Implicit connections:  Network of information propagation in online news media  But we can observe results of the processes taking place on such (invisible) networks:  Virus propagation:  Drug users get sick, and we observe when they see the doctor  Information networks:  We observe when media sites mention information

3 Information Diffusion Network  Information diffuses through the network  We only see who mentions but not where they got the information from  Question: Can we infer the hidden networks? Time

4 Examples and Applications Virus propagation Word of mouth & Viral marketing Can we infer the underlying network? Viruses propagate through the network We only observe when people get sick But NOT who infected whom Recommendations and influence propagate We only observe when people buy products But NOT who influenced whom Process We observe It’s hidden

5 Inferring the Network  There is a directed social network over which diffusions take place: b b d d e e a a c c a a c c b b e e c c a a b b d d  But we do not observe the edges of the network  We only see the time when a node gets infected:  Cascade c 1 : (a, 1), (c, 2), (b, 6), (e, 9)  Cascade c 2 : (c, 1), (a, 4), (b, 5), (d, 8)  Task: inferring the underlying network

6 Our Problem Formulation  Plan for the talk: 1.Define a continuous time model of diffusion 2.Define the likelihood of the observed cascades given a network 3.Show how to efficiently compute the likelihood of cascades 4.Show how to efficiently find a graph G that maximizes the likelihood  Note:  There is a super-exponential number of graphs, O(N N*N )  Our method finds a near-optimal graph in O(N 2 )!

7 c c c c e e f f e e f f c c b b a a b b a a a a b b d d Cascade Generation Model tata tbtb tctc Δ1Δ1 Δ2Δ2 We assume each node v has only one parent! Δ3Δ3 Δ4Δ4 tete tftf  Continuous time cascade diffusion model:  Cascade c reaches node u at t u and spreads to u’s neighbors:  With probability β cascade propagates along edge (u, v) and we determine the infection time of node v t v = t u + Δ e.g.: Δ ~ Exponential or Power-law

8 Likelihood of a Single Cascade b b d d e e a a c c a a c c b b e e  Probability that cascade c propagates from node u to node v is: P c (u, v)  P(t v - t u )with t v > t u  Prob. that cascade c propagates in a tree pattern T:  Since not all nodes get infected by the diffusion process, we introduce the external influence node m: P c (m, v) = ε m m εε ε Tree pattern T on cascade c: (a, 1), (b, 2), (c, 4), (e, 8)

9 Finding the Diffusion Network  There are many possible propagation trees that are consistent with the observed data:  c: (a, 1), (c, 2), (b, 3), (e, 4) b b d d e e a a c c a a c c b b e e b b d d e e a a c c a a c c b b e e b b d d e e a a c c a a c c b b e e  Likelihood of a set of cascades C:  Want to find a graph:  Need to consider all possible propagation trees T supported by the graph G: Bad news We actually want to search over graphs: There is a super- exponential number of graphs! Good news Computing P(c|G) is tractable: Even though there are O(n n ) possible propagation trees. Matrix Tree Theorem can compute this in O(n 3 )!

10 An Alternative Formulation  We consider only the most likely tree  Maximum log-likelihood for a cascade c under a graph G:  Log-likelihood of G given a set of cascades C: The problem is still intractable (NP-hard) But we present an algorithm that finds near- optimal networks in O(N 2 )

11 Max Directed Spanning Tree Given a cascade c and a network G,  What is the most likely propagation tree? where Greedy parent selection of each node gives globally optimal tree!  A maximum directed spanning tree (MDST):  The sub-graph of G induced by the nodes in the cascade c is a DAG  Because edges point forward in time  For each node, just picks an in-edge of max-weight:

12 Objective function is Submodular Theorem: Log-likelihood F C (G) of a set of cascades C is monotonic, and submodular in the edges of the graph G Gain of adding an edge to a “small” graph Gain of adding an edge to a “large“ graph F C (A  {e}) – F C (A) ≥ F C (B  {e}) – F C (B) A  B  VxV Given a set of cascades C,  How do we find the network G that maximize F C (G)? F c (G) of a single cascade c is monotonic, and submodular F C (G) of a set of cascades C monotonic, and submodular Proof:

13 Objective function is Submodular Proof: s s w w’ x A B j j o o  Single cascade c, edge e with weight x  Let w be max weight in-edge of s in A  Let w’ be max weight in-edge of s in B  We know: w ≤ w’  Now: F c (A  {e}) – F c (A) = max (w, x) – w ≥ max (w’, x) – w’ = F c (B  {e}) – F c (B) r r a a k k i i i i k k Gain of adding an edge to a “small” graph Gain of adding an edge to a “large“ graph F c (A  {e}) – F c (A) ≥ F c (B  {e}) – F c (B) A  B  VxV

14 Finding the Diffusion Graph  Use the greedy hill-climbing to maximize F C (G):  For i=1…k:  At every step, pick the edge that maximizes the marginal improvement b b d d e e a a c c Marginal gains a b c b d b e b : 20 : 18 : 4 : 5 a c b c b d c d e d : 15 : 8 : 16 : 8 : 10 b e d e : 7 : 13 : 17 : 2 : 3 : 1 : 8 : 7 : 6 1. Approximation guarantee (≈ 0.63 of OPT) 2. Tight on-line bounds on the solution quality 3. Speed-ups: Lazy evaluation (by submodularity) Localized update (by the structure of the problem) Benefits:

15 Experimental Setup  We validate our method on:  How many edges of G can we find?  Precision-Recall  Break-even point  How many cascades do we need?  How fast is the algorithm?  How well do we optimize the likelihood F c (G)? Synthetic data Generate a graph G on k edges Generate cascades Record node infection times Reconstruct G Real data MemeTracker: 172m news articles Aug ’08 – Sept ‘09 343m textual phrases (quotes) Flickr:

16  Small synthetic network: True network Baseline network Our method 16 Small Synthetic Example Pick k strongest edges:

17 Synthetic Networks  Performance does not depend on the network structure:  Synthetic Networks: Forest Fire, Kronecker, etc.  Transmission time distribution: Exponential, Power Law  Break-even point of > 90% 1024 node hierarchical Kronecker exponential transmission model 1000 node Forest Fire (α = 1.1) power law transmission model

18 How good is our graph?  We achieve ≈ 90 % of the best possible network!

19 How many cascades do we need?  With 2x as many infections as edges, the break-even point is already !

20 Running Time  Lazy evaluation and localized updates speed up 2 orders of magnitude!  Can infer a networks of 10k nodes in several hours

21 Real Data: Information diffusion  MemeTracker dataset:  172m news articles from Aug ’08 – Sept ‘09  343m textual phrases (quotes)  Want to infer the network of information diffusion  We use the hyperlinks between sites to generate the edges of a ground truth G  From the MemeTracker dataset, we have the timestamps of: 1. cascades of hyperlinks: time when a site creates a link 2. cascades of (MemeTracker) textual phrases: time when site mentions the information e e f f c c a a e e f f c c a a

22 Real Network: Performance 500 node hyperlink network using hyperlinks cascades 500 node hyperlink network using MemeTracker cascades  Break-even points of 50% for hyperlinks cascades and 30% for MemeTracker cascades!

23  5,000 news sites: Blogs Mainstream media Information Diffusion Network

24 Blogs Mainstream media Information Diffusion Network (small part)

25 Real Data: Trips reconstruction  Flickr dataset:  60k Flickr users  6M time-stamped geo-localized photos  For every user we have: Time and Place where a photo was taken de Buenos Aires;Cafayate; Pietro Mosezzo;  Want to infer the network of frequent trips …

26 Trips Network

27 Conclusions  We infer hidden networks based on diffusion data (timestamps)  Problem formulation in a maximum likelihood framework  NP-hard problem to solve exactly  We develop an approximation algorithm that:  It is efficient -> It runs in O(N 2 )  It is invariant to the structure of the underlying network  It gives a sub-optimal network with tight bound  Future work:  Learn both the network and the diffusion model  Applications to other domains: biology, neuroscience, etc.

28 Thanks! For more (Code & Data):