Jure Leskovec Computer Science Department Cornell University / Stanford University.

Slides:



Advertisements
Similar presentations
Cost-effective Outbreak Detection in Networks Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance.
Advertisements

1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University
Spread of Influence through a Social Network Adapted from :
Cost-effective Outbreak Detection in Networks Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance.
Cost-effective Outbreak Detection in Networks Jure Leskovec Machine Learning Department, CMU Joint work with Andreas Krause, Carlos Guestrin, Christos.
Analysis and Modeling of Social Networks Foudalis Ilias.
Jure Leskovec, CMU Lars Backstrom, Cornell Ravi Kumar, Yahoo! Research Andrew Tomkins, Yahoo! Research.
Lecture 21 Network evolution Slides are modified from Jurij Leskovec, Jon Kleinberg and Christos Faloutsos.
Kronecker Graphs: An Approach to Modeling Networks Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, Zoubin Ghahramani Presented.
SILVIO LATTANZI, D. SIVAKUMAR Affiliation Networks Presented By: Aditi Bhatnagar Under the guidance of: Augustin Chaintreau.
Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta, Michael Mahoney Yahoo! Research.
Advanced Topics in Data Mining Special focus: Social Networks.
CS 599: Social Media Analysis University of Southern California1 The Basics of Network Analysis Kristina Lerman University of Southern California.
Hierarchy in networks Peter Náther, Mária Markošová, Boris Rudolf Vyjde : Physica A, dec
CS728 Lecture 5 Generative Graph Models and the Web.
Masters Thesis Defense Amit Karandikar Advisor: Dr. Anupam Joshi Committee: Dr. Finin, Dr. Yesha, Dr. Oates Date: 1 st May 2007 Time: 9:30 am Place: ITE.
Small Worlds Presented by Geetha Akula For the Faculty of Department of Computer Science, CALSTATE LA. On 8 th June 07.
Modeling Real Graphs using Kronecker Multiplication
Social Networks and Graph Mining Christos Faloutsos CMU - MLD.
Cascading Behavior in Large Blog Graphs Patterns and a Model Leskovec et al. (SDM 2007)
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
INFERRING NETWORKS OF DIFFUSION AND INFLUENCE Presented by Alicia Frame Paper by Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Kraus.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006
Summary from Previous Lecture Real networks: –AS-level N= 12709, M=27384 (Jan 02 data) route-views.oregon-ix.net, hhtp://abroude.ripe.net/ris/rawdata –
Measurement and Evolution of Online Social Networks Review of paper by Ophir Gaathon Analysis of Social Information Networks COMS , Spring 2011,
Leveraging Big Data: Lecture 11 Instructors: Edith Cohen Amos Fiat Haim Kaplan Tova Milo.
Models of Influence in Online Social Networks
Optimization Based Modeling of Social Network Yong-Yeol Ahn, Hawoong Jeong.
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
Jure Leskovec Machine Learning Department Carnegie Mellon University.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Log Dimension Hypothesis1 The Logarithmic Dimension Hypothesis Anthony Bonato Ryerson University MITACS International Problem Solving Workshop July 2012.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Eric Horvitz, Michael Mahoney,
Dynamics of networks Jure Leskovec Computer Science Department
Jure Leskovec PhD: Machine Learning Department, CMU Now: Computer Science Department, Stanford University.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
December 7-10, 2013, Dallas, Texas
Maximizing the Spread of Influence through a Social Network Authors: David Kempe, Jon Kleinberg, É va Tardos KDD 2003.
Jure Leskovec Machine Learning Department Carnegie Mellon University.
On-line Social Networks - Anthony Bonato 1 Dynamic Models of On-Line Social Networks Anthony Bonato Ryerson University WAW’2009 February 13, 2009 nt.
How Do “Real” Networks Look?
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
Cost-effective Outbreak Detection in Networks Presented by Amlan Pradhan, Yining Zhou, Yingfei Xiang, Abhinav Rungta -Group 1.
1 1 MPI for Intelligent Systems 2 Stanford University Manuel Gomez Rodriguez 1,2 Bernhard Schölkopf 1 S UBMODULAR I NFERENCE OF D IFFUSION NETWORKS FROM.
Jure Leskovec, CMU Lars Backstrom, Cornell Ravi Kumar, Yahoo! Research Andrew Tomkins, Yahoo! Research.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
Netlogo demo. Complexity and Networks Melanie Mitchell Portland State University and Santa Fe Institute.
GRAPH AND LINK MINING 1. Graphs - Basics 2 Undirected Graphs Undirected Graph: The edges are undirected pairs – they can be traversed in any direction.
Dynamics of Real-world Networks
Inferring Networks of Diffusion and Influence
Topics In Social Computing (67810)
Modeling networks using Kronecker multiplication
Who are the most influential bloggers?
How Do “Real” Networks Look?
Social and Information Network Analysis: Review of Key Concepts
How Do “Real” Networks Look?
How Do “Real” Networks Look?
Lecture 13 Network evolution
How Do “Real” Networks Look?
Dynamics of Real-world Networks
Graph and Tensor Mining for fun and profit
Cost-effective Outbreak Detection in Networks
Lecture 21 Network evolution
Modelling and Searching Networks Lecture 2 – Complex Networks
Navigation and Propagation in Networks
Advanced Topics in Data Mining Special focus: Social Networks
Presentation transcript:

Jure Leskovec Computer Science Department Cornell University / Stanford University

 Large on-line systems have detailed records of human activity  On-line communities: ▪ Facebook (64 million users, billion dollar business) ▪ MySpace (300 million users)  Communication: ▪ Instant Messenger (~1 billion users)  News and Social media: ▪ Blogging (250 million blogs world-wide, presidential candidates run blogs)  On-line worlds: ▪ World of Warcraft (internal economy 1 billion USD) ▪ Second Life (GDP of 700 million USD in ‘07) 2

b) Internet (AS) c) Social networks a) World wide web d) Communicatione) Citationsf) Protein interactions 3

 We know lots about the network structure:  Properties: Scale free [Barabasi-Albert ’99, Faloutsos 3 ‘99], 6-degrees of separation [Milgram ’67], Small- diameter [Albert-Jeong-Barabasi ‘99], Bipartite cores [Kumar et al. ’99], Network motifs [Milo et al. ‘02], Communities [Newman ‘99], Hubs and authorities [Page et al. ’98, Kleinberg ‘99]  Models: Preferential attachment [Barabasi ’99], Small- world [Watts-Strogatz ‘98], Copying model [Kleinberg el al. ’99], Heuristically optimized tradeoffs [Fabrikant et al. ‘02], Latent Space Models [Raftery et al. ‘02], Searchability [Kleinberg ‘00], Bowtie [Broder et al. ‘00], Exponential Random Graphs [Frank-Strauss ‘86], Transit- stub [Zegura ‘97], Jellyfish [Tauro et al. ‘01] We know much less about processes and dynamics of networks 4

 Network Dynamics:  Network evolution ▪ How network structure changes as the network grows and evolves?  Diffusion and cascading behavior ▪ How do rumors and diseases spread over networks? 5

 We need massive network data for the patterns to emerge:  MSN Messenger network [WWW ’08] (the largest social network ever analyzed) ▪ 240M people, 255B messages, 4.5 TB data  Product recommendations [EC ‘06] ▪ 4M people, 16M recommendations  Blogosphere [work in progress] ▪ 60M posts, 120M links 6

Network Evolution Diffusion & Cascades Patterns & Observations Q1: How does the network structure change as the network evolves/grows? Q2: What is edge attachment at the microscopic level? Q4: What do cascades look like? Q5: How likely are people to get influenced? Models & Algorithms Q3: How do we generate realistically looking and evolving networks? Q6: How do we find influential nodes and quickly detect epidemics? 7

Network Evolution Diffusion & Cascades Patterns & Observations Q1: How does the network structure change as the network evolves/grows? Q2: What is edge attachment at the microscopic level? Q4: What do cascades look like? Q5: How likely are people to get influenced? Models & Algorithms Q3: How do we generate realistically looking and evolving networks? Q6: How do we find influential nodes and quickly detect epidemics? 8

 Empirical findings on real graphs led to new network models  Such models make assumptions/predictions about other network properties  What about network evolution? 9 log degree log prob. Model Power-law degree distributionPreferential attachment Explains

 Prior models and intuition say that the network diameter slowly grows (like log N, log log N) time diameter size of the graph Internet Citations  Diameter shrinks over time  as the network grows the distances between the nodes slowly decrease 10 [w/ Kleinberg-Faloutsos, KDD ’05]

 Networks are denser over time  Densification Power Law: a … densification exponent (1 ≤ a ≤ 2)  What is the relation between the number of nodes and the edges over time?  Prior work assumes: constant average degree over time Internet Citations a=1.2 a=1.6 N(t) E(t) N(t) E(t) 11 [w/ Kleinberg-Faloutsos, KDD ’05] Is shrinking diameter just a consequence of densification?

12 diameter size of the graph Erdos-Renyi random graph Densification exponent a =1.3  There is more to shrinking diameter than just densification Densifying random graph has increasing diameter

 Compare diameter of a:  True network (red)  Random network with the same degree distribution (blue) 13 diameter size of the graph diameter Citations Densification + scale free structure give shrinking diameter

 We directly observe atomic events of network evolution (and not only network snapshots) 14 We observe evolution at finest scale  Test individual edge attachment  Directly observe events leading to network properties  Compare network models by likelihood (and not by just summary network statistics) and so on for millions… [w/ Backstrom-Kumar-Tomkins, KDD ’08]

 Network datasets  Full temporal information from the first edge onwards  LinkedIn (N=7m, E=30m), Flickr (N=600k, E=3m), Delicious (N=200k, E=430k), Answers (N=600k, E=2m)  How are edge destinations selected? Where do new edges attach?  Are edges more likely to attach to high degree nodes?  Are edges more likely to attach to nodes that are close? [w/ Backstrom-Kumar-Tomkins, KDD ’08] 15

 Are edges more likely to connect to higher degree nodes? G np PA Flickr Networkτ G np 0 PA1 Flickr1 Delicious1 Answers0.9 LinkedIn0.6 [w/ Backstrom-Kumar-Tomkins, KDD ’08] 16

u u w w v v  Just before the edge (u,w) is placed how many hops is between u and w? Network% Δ% Δ Flickr66%66% Delicious28% Answers23% LinkedIn50% Fraction of triad closing edges Real edges are local. Most of them close triangles! [w/ Backstrom-Kumar-Tomkins, KDD ’08] G np PA Flickr 17

 Want to generate realistic networks:  Why synthetic graphs?  Anomaly detection, Simulations, Predictions, Null-model, Sharing privacy sensitive graphs, …  Q: Which network properties do we care about?  Q: What is a good model and how do we fit it? 18 Compare graphs properties, e.g., degree distribution Given a real network Generate a synthetic network

 We prove Kronecker graphs mimic real graphs:  Power-law degree distribution, Densification, Shrinking/stabilizing diameter, Spectral properties Initiator (9x9) (3x3) (27x27) 19 [w/ Chakrabarti-Kleinberg-Faloutsos, PKDD ’05] p ij Edge probability Kronecker product of graph adjacency matrices (actually, there is also a nice social interpretation of the model) Given a real graph. How to estimate the initiator G 1 ?

 Initiator matrix is a similarity matrix  Node u is described with k binary attributes u 1, u 2,…, u k  Probability of link between nodes u,v: P(u,v) = ∏ G 1 [u i, v i ] 20 ab cd ab cd ab cd This is exactly Kronecker product v u P(u,v) = b∙d∙c∙b

 Maximum likelihood estimation  Naïve estimation takes O(N!N 2 ):  N! for different node labelings: ▪ Our solution: Metropolis sampling: N!  (big) const  N 2 for traversing graph adjacency matrix ▪ Our solution: Kronecker product (E << N 2 ): N 2  E  Do stochastic gradient descent ab cd P(|) Kronecker arg max We estimate the model in O(E) 21 [w/ Faloutsos, ICML ’07]

 We search the space of ~10 1,000,000 permutations  Fitting takes 2 hours  Real and Kronecker are very close 22 Degree distribution node degree probability Path lengths number of hops # reachable pairs “Network” values rank network value [w/ Faloutsos, ICML ’07]

 Fitting Epinions we obtained G 1 =  What does this tell about the network structure? Core 0.9 edges Periphery 0.1 edges 0.5 edges Core-periphery (jellyfish, octopus) [w/ Dasgupta-Lang-Mahoney, WWW ’08]

 Small and large networks are fundamentally different 24 Collaboration network (N=4,158, E=13,422) Scientific collaborations (N=397, E=914) G 1 =

Network Evolution Diffusion & Cascades Patterns & Observations A1: Densification, shrinking diameters A2: Local edge attachment Q4: What do cascades look like? Q5: How likely are people to get influenced? Models & Algorithms A3: Kronecker graphs Q6: How do we find influential nodes and quickly detect epidemics? 25

 Behavior that cascades from node to node like an epidemic  News, opinions, rumors  Word-of-mouth in marketing  Infectious diseases  As activations spread through the network they leave a trace – a cascade Cascade (propagation graph) Network 26

 People send and receive product recommendations, purchase products  Data: Large online retailer: 4 million people, 16 million recommendations, 500k products 10% credit10% off 27 [w/ Adamic-Huberman, EC ’06]

 Bloggers write posts and refer (link) to other posts and the information propagates  Data: 10.5 million posts, 16 million links 28 [w/ Glance-Hurst et al., SDM ’07]

 Viral marketing cascades are more social:  Collisions (no summarizers)  Richer non-tree structures  Are they stars? Chains? Trees?  Information cascades (blogosphere):  Viral marketing (DVD recommendations): (ordered by frequency) 29 propagation [w/ Kleinberg-Singh, PAKDD ’06]

 Prob. of adoption depends on the number of friends who have adopted [Bass ‘69, Shelling ’78]  What is the shape?  Distinction has consequences for models and algorithms k = number of friends adopting Prob. of adoption k = number of friends adopting Prob. of adoption Diminishing returns? Critical mass? To find the answer we need lots of data 30

Later similar findings were made for group membership [Backstrom-Huttenlocher- Kleinberg ‘06], and probability of communication [Kossinets-Watts ’06] Probability of purchasing DVD recommendations (8.2 million observations) # recommendations received Adoption curve follows the diminishing returns. Can we exploit this? 31 [w/ Adamic-Huberman, EC ’06]

 Blogs – information epidemics  Which are the influential/infectious blogs?  Viral marketing  Who are the trendsetters?  Influential people?  Disease spreading  Where to place monitoring stations to detect epidemics? 32

How to quickly detect epidemics as they spread? 33 c1c1 c2c2 c3c3 [w/ Krause-Guestrin et al., KDD ’07]

 Cost:  Cost of monitoring is node dependent  Reward:  Minimize the number of affected nodes: ▪ If A are the monitored nodes, let R(A) denote the number of nodes we save We also consider other rewards: ▪ Minimize time to detection ▪ Maximize number of detected outbreaks R(A) A 34 ( ) [w/ Krause-Guestrin et al., KDD ’07]

Reward for detecting cascade i  Given:  Graph G(V,E), budget M  Data on how cascades C 1, …, C i, …,C K spread over time  Select a set of nodes A maximizing the reward subject to cost(A) ≤ M  Solving the problem exactly is NP-hard  Max-cover [Khuller et al. ’99] 35

 Theorem: Reward function R is submodular (diminishing returns, think of it as “concavity”) 36 [w/ Krause-Guestrin et al., KDD ’07] S1S1 S2S2 Placement A={S 1, S 2 } S’ New monitored node: Adding S’ helps a lot S2S2 S4S4 S1S1 S3S3 Placement B={S 1, S 2, S 3, S 4 } S’ Adding S’ helps very little Gain of adding a node to a small setGain of adding a node to a large set R(A  {u}) – R(A) ≥ R(B  {u}) – R(B) A  B

 We develop CELF algorithm:  Two independent runs of a modified greedy ▪ Solution set A’: ignore cost, greedily optimize reward ▪ Solution set A’’: greedily optimize reward/cost ratio  Pick best of the two: arg max(R(A’), R(A’’)) 37 [w/ Krause-Guestrin et al., KDD ’07] a b c a b c d d Marginal reward e Current solution: {a}Current solution: {}Current solution: {a, c}  Theorem: If R is submodular then CELF is near optimal: CELF achieves ½(1-1/e) factor approximation

38  Question: Which blogs should one read to be most up to date?  Idea: Select blogs to cover the blogosphere. Each dot is a blog Proximity is based on the number of common cascades

For more info see our website:  Which blogs should one read to catch big stories? CELF In-links Random # posts Out-links Number of selected blogs (sensors) Reward (higher is better) 39 (used by Technorati) [w/ Krause-Guestrin et al., KDD ’07]

 Given:  a real city water distribution network  data on how contaminants spread over time  Place sensors (to save lives)  Problem posed by the US Environmental Protection Agency SS 40 c1c1 c2c2 [w/ Krause et al., J. of Water Resource Planning]

 Our approach performed best at the Battle of Water Sensor Networks competition CELF Population Random Flow Degree AuthorScore CMU (CELF) 26 Sandia21 U Exter20 Bentley systems19 Technion (1)14 Bordeaux12 U Cyprus11 U Guelph7 U Michigan4 Michigan Tech U3 Malcolm2 Proteo2 Technion (2)1 Number of placed sensors Population saved (higher is better) 41 [w/ Ostfeld et al., J. of Water Resource Planning]

 Why are networks the way they are?  Only recently have basic properties been observed on a large scale  Confirms social science intuitions; calls others into question.  Benefits of working with large data  What patterns do we observe in massive networks?  What microscopic mechanisms cause them? Social network of the whole world? 42

Milgram’s small world experiment (i.e., hops + 1) 43  Small-world experiment [Milgram ‘67]  People send letters from Nebraska to Boston  How many steps does it take?  Messenger social network – largest network analyzed  240M people, 30B conversations, 4.5TB data [w/ Horvitz, WWW ’08, Nature ‘08] MSN Messenger network Number of steps between pairs of people

 Predictive modeling of information diffusion  When, where and what information will create a cascade?  Where should one tap the network to get the effect they want? Social Media Marketing  How do news and information spread  New ranking and influence measures  Sentiment analysis from cascade structure  How to introduce incentives? 44

Observations: Data analysis Models: Predictions Algorithms: Applications Actively influencing the network 45