Dynamics of Real-world Networks

Slides:



Advertisements
Similar presentations
1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University
Advertisements

Jurij Leskovec, CMU Jon Kleinberg, Cornell Christos Faloutsos, CMU
1 Realistic Graph Generation and Evolution Using Kronecker Multiplication Jurij Leskovec, CMU Deepay Chakrabarti, CMU/Yahoo Jon Kleinberg, Cornell Christos.
CMU SCS I2.2 Large Scale Information Network Processing INARC 1 Overview Goal: scalable algorithms to find patterns and anomalies on graphs 1. Mining Large.
Modeling Blog Dynamics Speaker: Michaela Götz Joint work with: Jure Leskovec, Mary McGlohon, Christos Faloutsos Cornell University Carnegie Mellon University.
Spread of Influence through a Social Network Adapted from :
Cost-effective Outbreak Detection in Networks Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance.
Analysis and Modeling of Social Networks Foudalis Ilias.
Maximizing the Spread of Influence through a Social Network
Lecture 21 Network evolution Slides are modified from Jurij Leskovec, Jon Kleinberg and Christos Faloutsos.
Kronecker Graphs: An Approach to Modeling Networks Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, Zoubin Ghahramani Presented.
Patterns of Influence in a Recommendation Network Jure Leskovec, CMU Ajit Singh, CMU Jon Kleinberg, Cornell School of Computer Science Carnegie Mellon.
What did we see in the last lecture?. What are we going to talk about today? Generative models for graphs with power-law degree distribution Generative.
SILVIO LATTANZI, D. SIVAKUMAR Affiliation Networks Presented By: Aditi Bhatnagar Under the guidance of: Augustin Chaintreau.
Advanced Topics in Data Mining Special focus: Social Networks.
1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira.
CS728 Lecture 5 Generative Graph Models and the Web.
Masters Thesis Defense Amit Karandikar Advisor: Dr. Anupam Joshi Committee: Dr. Finin, Dr. Yesha, Dr. Oates Date: 1 st May 2007 Time: 9:30 am Place: ITE.
Jure Leskovec and Christos Faloutsos Machine Learning Department
Modeling Real Graphs using Kronecker Multiplication
Social Networks and Graph Mining Christos Faloutsos CMU - MLD.
Cascading Behavior in Large Blog Graphs Patterns and a Model Leskovec et al. (SDM 2007)
CS Lecture 6 Generative Graph Models Part II.
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
INFERRING NETWORKS OF DIFFUSION AND INFLUENCE Presented by Alicia Frame Paper by Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Kraus.
Advanced Topics in Data Mining Special focus: Social Networks.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006
Measurement and Evolution of Online Social Networks Review of paper by Ophir Gaathon Analysis of Social Information Networks COMS , Spring 2011,
Models of Influence in Online Social Networks
Jure Leskovec Machine Learning Department Carnegie Mellon University.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Survey on Evolving Graphs Research Speaker: Chenghui Ren Supervisors: Prof. Ben Kao, Prof. David Cheung 1.
Weighted Graphs and Disconnected Components Patterns and a Generator IDB Lab 현근수 In KDD 08. Mary McGlohon, Leman Akoglu, Christos Faloutsos.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Eric Horvitz, Michael Mahoney,
Dynamics of networks Jure Leskovec Computer Science Department
Jure Leskovec PhD: Machine Learning Department, CMU Now: Computer Science Department, Stanford University.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
School of Computer Science Carnegie Mellon University 1 The dynamics of viral marketing Jure Leskovec, Carnegie Mellon University Lada Adamic, University.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
Maximizing the Spread of Influence through a Social Network David Kempe, Jon Kleinberg, Eva Tardos Cornell University KDD 2003.
Jure Leskovec Computer Science Department Cornell University / Stanford University.
Jure Leskovec Machine Learning Department Carnegie Mellon University.
On-line Social Networks - Anthony Bonato 1 Dynamic Models of On-Line Social Networks Anthony Bonato Ryerson University WAW’2009 February 13, 2009 nt.
Du, Faloutsos, Wang, Akoglu Large Human Communication Networks Patterns and a Utility-Driven Generator Nan Du 1,2, Christos Faloutsos 2, Bai Wang 1, Leman.
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
1 Patterns of Cascading Behavior in Large Blog Graphs Jure Leskoves, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst SDM 2007 Date:2008/8/21.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
Dynamics of Real-world Networks
Social Networks Some content from Ding-Zhu Du, Lada Adamic, and Eytan Adar.
Graph Models Class Algorithmic Methods of Data Mining
Inferring Networks of Diffusion and Influence
Cohesive Subgraph Computation over Large Graphs
Modeling networks using Kronecker multiplication
NetMine: Mining Tools for Large Graphs
Social and Information Network Analysis: Review of Key Concepts
Generative Model To Construct Blog and Post Networks In Blogosphere
Part 1: Graph Mining – patterns
Tools for large graph mining WWW 2008 tutorial
Lecture 13 Network evolution
R-MAT: A Recursive Model for Graph Mining
The likelihood of linking to a popular website is higher
Graph and Tensor Mining for fun and profit
Cost-effective Outbreak Detection in Networks
Lecture 21 Network evolution
Modelling and Searching Networks Lecture 2 – Complex Networks
Advanced Topics in Data Mining Special focus: Social Networks
What did we see in the last lecture?
Analysis of Large Graphs: Overlapping Communities
Presentation transcript:

Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University jure@cs.cmu.edu http://www.cs.cmu.edu/~jure

Committee members Christos Faloutsos Avrim Blum Jon Kleinberg John Lafferty

Network dynamics Web & citations Sexual network Friendship network Food-web (who-eats-whom) Yeast protein interactions Internet

Large real world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes E = 5 million edges Autonomous systems N = 6,500 nodes E = 26,500 edges Citation network of physics papers N = 31,000 nodes E = 350,000 edges Recommendation network N = 3 million nodes E = 16 million edges

Questions we ask Do networks follow patterns as they grow? How to generate realistic graphs? How does influence spread over the network (chains, stars)? How to find/select nodes to detect cascades?

Our work: Network dynamics Our research focuses on analyzing and modeling the structure, evolution and dynamics of large real-world networks Evolution Growth and evolution of networks Cascades Processes taking place on networks Observe static and temporal properties Design network generative models Information propagation and cascades Find common sub-network properties

Our work: Goals 3 parts / goals G1: What are interesting statistical properties of network structure? e.g., 6-degrees G2: What is a good tractable model? e.g., preferential attachment G3: Use models and findings to predict future behavior e.g., node immunization

Our work: Overview S2: Dynamics of processes on networks S1: Dynamics of network evolution S2: Dynamics of processes on networks G1: Patterns G2: Models G3: Predictions

Our work: Overview S2: Dynamics of processes on networks S1: Dynamics of network evolution S2: Dynamics of processes on networks G1: Patterns KDD ‘05 TKDD ’07 PKDD ‘06 ACM EC ’06 G2: Models PAKDD ’05 SDM ‘07 TWEB ’07 G3: Predictions KDD ‘06 ICML ’07 WWW ‘07 submission to KDD

Our work: Impact and applications Structural properties Abnormality detection Graph models Graph generation Graph sampling and extrapolations Anonymization Cascades Node selection and targeting Outbreak detection

Outline Introduction Completed work Proposed work Conclusion S1: Network structure and evolution S2: Network cascades Proposed work Kronecker time evolving graphs Large online communication networks Links and information cascades Conclusion

Completed work: Overview S1: Dynamics of network evolution S2: Dynamics of processes on networks G1: Patterns Densification Shrinking diameters Cascade shape and size G2: Models Forest Fire Kronecker graphs Cascade generation model G3: Predictions Estimating Kronecker parameters Selecting nodes for detecting cascades

Completed work: Overview S1: Dynamics of network evolution S2: Dynamics of processes on networks G1: Patterns Densification Shrinking diameters Cascade shape and size G2: Models Forest Fire Kronecker graphs Cascade generation model G3: Predictions Estimating Kronecker parameters Selecting nodes for detecting cascades

G1 - Patterns: Densification What is the relation between the number of nodes and the edges over time? Networks are denser over time Densification Power Law a … densification exponent: 1 ≤ a ≤ 2: a=1: linear growth – constant degree a=2: quadratic growth – clique Internet log E(t) a=1.2 log N(t) Citations log E(t) a=1.7 log N(t)

G1 - Patterns: Shrinking diameters Internet Intuition and prior work say that distances between the nodes slowly grow as the network grows (like log N) Diameter Shrinks or Stabilizes over time as the network grows the distances between nodes slowly decrease diameter size of the graph Citations diameter time

G2 - Models: Kronecker graphs Want to have a model that can generate a realistic graph with realistic growth Patterns for static networks Patterns for evolving networks The model should be analytically tractable We can prove properties of graphs the model generates computationally tractable We can estimate parameters

Idea: Recursive graph generation Try to mimic recursive graph/community growth because self-similarity leads to power-laws There are many obvious (but wrong) ways: Does not densify, has increasing diameter Kronecker Product is a way of generating self-similar matrices Initial graph Recursive expansion

Kronecker product: Graph Intermediate stage (3x3) (9x9) Adjacency matrix Adjacency matrix

Kronecker product: Graph Continuing multiplying with G1 we obtain G4 and so on … G4 adjacency matrix

Properties of Kronecker graphs We show that Kronecker multiplication generates graphs that have: Properties of static networks Power Law Degree Distribution Power Law eigenvalue and eigenvector distribution Small Diameter Properties of dynamic networks Densification Power Law Shrinking / Stabilizing Diameter This means “shapes” of the distributions match but the properties are not independent How do we set the initiator to match the real graph?     

G3 - Predictions: The problem We want to generate realistic networks: G1) What are the relevant properties? G2) What is a good tractable model? G3) How can we fit the model (find parameters)? Given a real network Generate a synthetic network Compare some property, e.g., degree distribution  

Model estimation: approach Maximum likelihood estimation Given real graph G Estimate the Kronecker initiator graph Θ (e.g., 3x3 ) which We need to (efficiently) calculate And maximize over Θ

Model estimation: solution Naïvely estimating the Kronecker initiator takes O(N!N2) time: N! for graph isomorphism Metropolis sampling: N!  (big) const N2 for traversing the graph adjacency matrix Properties of Kronecker product and sparsity (E << N2): N2 E We can estimate the parameters in linear time O(E)

Model estimation: experiments Autonomous systems (internet): N=6500, E=26500 Fitting takes 20 minutes AS graph is undirected and estimated parameters correspond to that Degree distribution Hop plot log count diameter=4 log # of reachable pairs log degree number of hops

Model estimation: experiments Scree plot Network value log eigenvalue log 1st eigenvector log rank log rank

Completed work: Overview S1: Dynamics of network evolution S2: Dynamics of processes on networks G1: Patterns Densification Shrinking diameters Cascade shape and size G2: Models Forest Fire Kronecker graphs Cascade generation model G3: Predictions Estimating Kronecker parameters Selecting nodes for detecting cascades

Information cascades Cascades are phenomena in which an idea becomes adopted due to influence by others We investigate cascade formation in Viral marketing (Word of mouth) Blogs Cascade (propagation graph) Social network

Cascades: Questions What kinds of cascades arise frequently in real life? Are they like trees, stars, or something else? What is the distribution of cascade sizes (exponential tail / heavy-tailed)? When is a person going to follow a recommendation?

Cascades in viral marketing Senders and followers of recommendations receive discounts on products 10% credit 10% off Recommendations are made at time of purchase Data: 3 million people, 16 million recommendations, 500k products (books, DVDs, videos, music)

Product recommendation network purchase following a recommendation customer recommending a product customer not buying a recommended product

G1- Viral cascade shapes Stars (“no propagation”) Bipartite cores (“common friends”) Nodes having same friends

very few large cascades G1- Viral cascade sizes Count how many people are in a single cascade We observe a heavy tailed distribution which can not be explained by a simple branching process steep drop-off books log count very few large cascades log cascade size

Does receiving more recommendations increase the likelihood of buying? BOOKS DVDs

Cascades in the blogosphere blogs + posts Post network links among posts Extracted cascades Posts are time stamped We can identify cascades – graphs induced by a time ordered propagation of information 34

G1- Blog cascade shapes Cascade shapes (ordered by frequency) Cascades are mainly stars Interesting relation between the cascade frequency and structure

G1- Blog cascade size Count how many posts participate in cascades Blog cascades tend to be larger than Viral Marketing cascades shallow drop-off log count some large cascades log cascade size

G2- Blog cascades: model Simple virus propagation type of model (SIS) generates similar cascades as found in real life Count Count Cascade size Cascade node in-degree B1 B2 Count Count B3 B4 Size of star cascade Size of chain cascade

G3- Node selection for cascade detection Observing cascades we want to select a set of nodes to quickly detect cascades Given a limited budget of attention/sensors Which blogs should one read to be most up to date? Where should we position monitoring stations to quickly detect disease outbreaks?

Node selection: algorithm Node selection is NP hard We exploit submodularity of objective functions to develop scalable node selection algorithms give performance guarantees In practice our solution is at most 5-15% from optimal Worst case bound Our solution Solution quality Number of blogs

Outline Introduction Completed work Proposed work Conclusion Network structure and evolution Network cascades Proposed work Large communication networks Links and information cascades Kronecker time evolving graphs Conclusion

Proposed work: Overview S1: Dynamics of network evolution S2: Dynamics of processes on networks G1: Patterns Dynamics in communication networks G2: Models Models of link and cascade creation G3: Predictions Kronecker time evolving graphs 1 2 3

Proposed work: Communication networks 1 Large communication network 1 billion conversations per day, 3TB of data! How communication and network properties change with user demographics (age, location, sex, distance) Test 6 degrees of separation Examine transitivity in the network

Proposed work: Communication networks 1 Preliminary experiment Distribution of shortest path lengths Microsoft Messenger network 200 million people 1.3 billion edges Edge if two people exchanged at least one message in one month period MSN Messenger network Pick a random node, count how many nodes are at distance 1,2,3... hops log number of nodes 7 distance (Hops)

Proposed work: Links & cascades 2 Given labeled nodes, how do links and cascades form? Propagation of information Do blogs have particular cascading properties? Propagation of trust Social network of professional acquaintances 7 million people, 50 million edges Rich temporal and network information How do various factors (profession, education, location) influence link creation? How do invitations propagate?

Proposed work: Kronecker graphs 3 Graphs with weighted edges Move beyond Bernoulli edge generation model Algorithms for estimating parameters of time evolving networks Allow parameters to slowly evolve over time Θt Θt+1 Θt+2

Timeline May ‘07 Jun – Aug ‘07 Sept– Dec ‘07 Jan – Apr ’08 communication network Jun – Aug ‘07 research on on-line time evolving networks Sept– Dec ‘07 Cascade formation and link prediction Jan – Apr ’08 Kronecker time evolving graphs Apr – May ‘08 Write the thesis Jun ‘08 Thesis defense 1 2 3

References Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations, by Jure Leskovec, Jon Kleinberg, Christos Faloutsos, ACM KDD 2005 Graph Evolution: Densification and Shrinking Diameters, by Jure Leskovec, Jon Kleinberg and Christos Faloutsos, ACM TKDD 2007 Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication, by Jure Leskovec, Deepay Chakrabarti, Jon Kleinberg and Christos Faloutsos, PKDD 2005 Scalable Modeling of Real Graphs using Kronecker Multiplication, by Jure Leskovec and Christos Faloutsos, ICML 2007 The Dynamics of Viral Marketing, by Jure Leskovec, Lada Adamic, Bernado Huberman, ACM EC 2006 Cost-effective outbreak detection in networks, by Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance, in submission to KDD 2007 Cascading behavior in large blog graphs, by Jure Leskovec, Marry McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst, SIAM DM 2007 Acknowledgements: Christos Faloutsos, Mary McGlohon, Jon Kleinberg, Zoubin Gharamani, Pall Melsted, Andreas Krause, Carlos Guestrin, Deepay Chakrabarti, Marko Grobelnik, Dunja Mladenic, Natasa Milic-Frayling, Lada Adamic, Bernardo Huberman, Eric Horvitz, Susan Dumais

Backup slides

Proposed work: Kronecker graphs 1 Further analysis of Kronecker graphs Prove properties of the diameter of Stochastic Kronecker Graphs Extend Kronecker to generate graphs with any number of nodes Currently Kronecker can generate graphs with Nk nodes Idea: expand only one row/column of current adjacency matrix

Proposed work: GraphGarden 5 Publicly release a library for mining large graphs Developed during our research 40,000 lines of C++ code Components Properties of static and evolving networks Graph generation and model fitting Graph sampling Analysis of cascades Node placement/selection

Proposed work Dynamics of processes on networks Dynamics of network evolution Dynamics of processes on networks Structural properties Dynamics in communication networks Models Analysis and extensions of Kronecker graphs Models of link and cascade creation Predictions Kronecker time evolving graphs 3 1 4 2 Release the graph mining toolkit 5

The model: Forest Fire Model Want to model graphs that density and have shrinking diameters Intuition: How do we meet friends at a party? How do we identify references when writing papers? End the red line

Properties of the Forest Fire Heavy-tailed in-degrees: “rich get richer” Highly linked nodes can easily be reached Communities Newcomer copies several of neighbors’ links Heavy-tailed out-degrees Recursive nature provides chance for node to burn many edges Densification Power Law Like in Community Guided Attachment Shrinking diameter Densification helps but is not enough

Forest Fire Model Forest Fire generates graphs that densify and have shrinking diameter E(t) densification diameter 1.32 diameter N(t) N(t)

Forest Fire: Parameter Space Fix backward probability r and vary forward burning probability p We observe a sharp transition between sparse and clique-like graphs Sweet spot is very narrow Clique-like graph Increasing diameter Constant diameter Sparse graph Decreasing diameter

Kronecker product: Definition The Kronecker product of matrices A and B is given by We define a Kronecker product of two graphs as a Kronecker product of their adjacency matrices N x M K x L N*K x M*L

Kronecker graphs We propose a growing sequence of graphs by iterating the Kronecker product Each Kronecker multiplication exponentially increases the size of the graph Gk has N1k nodes and E1k edges, so we get densification

Stochastic Kronecker graphs Create N1N1 probability matrix P1 Compute the kth Kronecker power Pk For each entry puv of Pk include an edge (u,v) with probability puv Probability of edge pij Kronecker multiplication 0.25 0.10 0.04 0.05 0.15 0.02 0.06 0.01 0.03 0.09 0.5 0.2 0.1 0.3 Instance matrix K2 P1 flip biased coins P2=P1P1

Cascade formation process Viral marketing People purchase and send recommendations legend received recommendation and propagated it forward Friendship network received a recommendation but didn’t propagate

Node selection: example Water distribution network: Different objective functions give different placements Population affected Detection likelihood