RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos Carnegie Mellon University.

Slides:



Advertisements
Similar presentations
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
Advertisements

1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University
Jurij Leskovec, CMU Jon Kleinberg, Cornell Christos Faloutsos, CMU
1 Realistic Graph Generation and Evolution Using Kronecker Multiplication Jurij Leskovec, CMU Deepay Chakrabarti, CMU/Yahoo Jon Kleinberg, Cornell Christos.
The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford.
Algorithmic and Economic Aspects of Networks Nicole Immorlica.
Emergence of Scaling in Random Networks Albert-Laszlo Barabsi & Reka Albert.
Modeling Blog Dynamics Speaker: Michaela Götz Joint work with: Jure Leskovec, Mary McGlohon, Christos Faloutsos Cornell University Carnegie Mellon University.
Analysis and Modeling of Social Networks Foudalis Ilias.
Week 5 - Models of Complex Networks I Dr. Anthony Bonato Ryerson University AM8002 Fall 2014.
Lecture 21 Network evolution Slides are modified from Jurij Leskovec, Jon Kleinberg and Christos Faloutsos.
Kronecker Graphs: An Approach to Modeling Networks Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, Zoubin Ghahramani Presented.
Information Networks Generative processes for Power Laws and Scale-Free networks Lecture 4.
Synopsis of “Emergence of Scaling in Random Networks”* *Albert-Laszlo Barabasi and Reka Albert, Science, Vol 286, 15 October 1999 Presentation for ENGS.
What did we see in the last lecture?. What are we going to talk about today? Generative models for graphs with power-law degree distribution Generative.
SILVIO LATTANZI, D. SIVAKUMAR Affiliation Networks Presented By: Aditi Bhatnagar Under the guidance of: Augustin Chaintreau.
Information Networks Small World Networks Lecture 5.
The influence of search engines on preferential attachment Dan Li CS3150 Spring 2006.
1 Evolution of Networks Notes from Lectures of J.Mendes CNR, Pisa, Italy, December 2007 Eva Jaho Advanced Networking Research Group National and Kapodistrian.
1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira.
Complex Networks Third Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.
CS728 Lecture 5 Generative Graph Models and the Web.
Scale-free networks Péter Kómár Statistical physics seminar 07/10/2008.
The Barabási-Albert [BA] model (1999) ER Model Look at the distribution of degrees ER ModelWS Model actorspower grid www The probability of finding a highly.
Modeling Real Graphs using Kronecker Multiplication
Weighted Graphs and Disconnected Components Patterns and a Generator Mary McGlohon, Leman Akoglu, Christos Faloutsos Carnegie Mellon University School.
Social Networks and Graph Mining Christos Faloutsos CMU - MLD.
Network Design IS250 Spring 2010 John Chuang. 2 Questions  What does the Internet look like? -Why do we care?  Are there any structural invariants?
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
Common Properties of Real Networks. Erdős-Rényi Random Graphs.
CS Lecture 6 Generative Graph Models Part II.
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
1 On Compressing Web Graphs Michael Mitzenmacher, Harvard Micah Adler, Univ. of Massachusetts.
Advanced Topics in Data Mining Special focus: Social Networks.
SDSC, skitter (July 1998) A random graph model for massive graphs William Aiello Fan Chung Graham Lincoln Lu.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006
On Distinguishing between Internet Power Law B Bu and Towsley Infocom 2002 Presented by.
Summary from Previous Lecture Real networks: –AS-level N= 12709, M=27384 (Jan 02 data) route-views.oregon-ix.net, hhtp://abroude.ripe.net/ris/rawdata –
Computer Science 1 Web as a graph Anna Karpovsky.
Online Social Networks and Media Network models. What is a network model? Informally, a network model is a process (radomized or deterministic) for generating.
Information Networks Power Laws and Network Models Lecture 3.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Weighted Graphs and Disconnected Components Patterns and a Generator IDB Lab 현근수 In KDD 08. Mary McGlohon, Leman Akoglu, Christos Faloutsos.
Popularity versus Similarity in Growing Networks Fragiskos Papadopoulos Cyprus University of Technology M. Kitsak, M. Á. Serrano, M. Boguñá, and Dmitri.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
ValuePick : Towards a Value-Oriented Dual-Goal Recommender System Leman Akoglu Christos Faloutsos OEDM in conjunction with ICDM 2010 Sydney, Australia.
Gennaro Cordasco - How Much Independent Should Individual Contacts be to Form a Small-World? - 19/12/2006 How Much Independent Should Individual Contacts.
On-line Social Networks - Anthony Bonato 1 Dynamic Models of On-Line Social Networks Anthony Bonato Ryerson University WAW’2009 February 13, 2009 nt.
Random Dot Product Graphs Ed Scheinerman Applied Mathematics & Statistics Johns Hopkins University IPAM Intelligent Extraction of Information from Graphs.
Du, Faloutsos, Wang, Akoglu Large Human Communication Networks Patterns and a Utility-Driven Generator Nan Du 1,2, Christos Faloutsos 2, Bai Wang 1, Leman.
R-MAT: A Recursive Model for Graph Mining Deepayan Chakrabarti Yiping Zhan Christos Faloutsos.
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
Performance Evaluation Lecture 1: Complex Networks Giovanni Neglia INRIA – EPI Maestro 10 December 2012.
1 Patterns of Cascading Behavior in Large Blog Graphs Jure Leskoves, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst SDM 2007 Date:2008/8/21.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
Network (graph) Models
L – Modeling and Simulating Social Systems with MATLAB
Lecture 1: Complex Networks
Topics In Social Computing (67810)
Modeling networks using Kronecker multiplication
Models of networks (synthetic networks or generative models): Random, Small-world, Scale-free, Configuration model and Random geometric model By: Ralucca.
Part 1: Graph Mining – patterns
Lecture 13 Network evolution
R-MAT: A Recursive Model for Graph Mining
Graph and Tensor Mining for fun and profit
Lecture 21 Network evolution
Network Science: A Short Introduction i3 Workshop
Advanced Topics in Data Mining Special focus: Social Networks
Presentation transcript:

RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos Carnegie Mellon University

Outline Motivation Problem Definition Related Work A Little History Proposed Model Experimental Results Conclusion 6/18/2015Akoglu, Faloutsos ECML PKDD 20092

Motivation - 1 Complex graphs --WWW, computer, biological, social networks, etc. exhibit many common properties: - power laws - small and shrinking diameter - community structure - … How can we produce synthetic but realistic graphs? 6/18/2015Akoglu, Faloutsos ECML PKDD

Motivation - 2 Why do we need synthetic graphs? Simulation Sampling/Extrapolation Summarization/Compression Motivation to understand pattern generating processes 6/18/2015Akoglu, Faloutsos ECML PKDD 20094

Problem Definition Discover a graph generator that is: G1. simple: the more intuitive the better! G2. realistic: outputs graphs that obey all “laws” G3. parsimonious: requires few parameters G4. flexible: able to produce the cross-product of un/weighted, un/directed, uni/bipartite graphs G5. fast: generation should take linear time with the size of the output graph 6/18/2015Akoglu, Faloutsos ECML PKDD 20095

Outline Motivation Problem Definition Related Work A Little History Proposed Model Experimental Results Conclusion 6/18/2015Akoglu, Faloutsos ECML PKDD 20096

Related Work 1.Graph Properties  What we want to match 2.Graph Generators  What has been proposed earlier 6/18/2015Akoglu, Faloutsos ECML PKDD 20097

Related Work 1: Graph Properties 6/18/2015Akoglu, Faloutsos ECML PKDD 20098

Related Work 2: Graph Generators Erdős-Rényi (ER) model [Erdős, Rényi `60] Small-world model [Watts, Strogatz `98] Preferential Attachment [Barabási, Albert `99] Winners don’t take all [Pennock et al. `02] Forest Fire model [Leskovec, Faloutsos `05] Butterfly model [McGlohon et al. `08] 6/18/2015Akoglu, Faloutsos ECML PKDD 20099

Related Work 2: Graph Generators Erdős-Rényi (ER) model [Erdős, Rényi `60] Small-world model [Watts, Strogatz `98] Preferential Attachment [Barabási, Albert `99] Winners don’t take all [Pennock et al. `02] Forest Fire model [Leskovec, Faloutsos `05] Butterfly model [McGlohon et al. `08] 6/18/2015Akoglu, Faloutsos ECML PKDD Model some static graph property Neglect dynamic properties Cannot produce weighted graphs.

Related Work 2: Graph Generators Random dot-product graphs [Kraetzl, Nickel `05] [Young, Scheinerman `07] Utility-based models [Fabrikant et al. ’02] [Even-Bar et al. `07] [Laoutaris, `08] Kronecker graphs [Leskovec et al. `07] [Akoglu et al. `08] 6/18/2015Akoglu, Faloutsos ECML PKDD

Related Work 2: Graph Generators 6/18/2015Akoglu, Faloutsos ECML PKDD Random dot-product graphs [Kraetzl, Nickel `05] [Young, Scheinerman `07] Utility-based models [Fabrikant et al. ’02] [Even-Bar et al. `07] [Laoutaris, `08] Kronecker graphs [Leskovec et al. `07] [Akoglu et al. `08] Produces only undirected graphs Cannot produce weighted graphs. Requires quadratic time

Random dot-product graphs [Kraetzl, Nickel `05] [Young, Scheinerman `07] Utility-based models [Fabrikant et al. ’02] [Even-Bar et al. `07] [Laoutaris, `08] Kronecker graphs [Leskovec et al. `07] [Akoglu et al. `08] Related Work 2: Graph Generators 6/18/2015Akoglu, Faloutsos ECML PKDD Hard to analyze Produces only undirected graphs Cannot produce weighted graphs. Requires quadratic time

Related Work 2: Graph Generators Random dot-product graphs [Kraetzl, Nickel `05] [Young, Scheinerman `07] Utility-based models [Fabrikant et al. ’02] [Even-Bar et al. `07] [Laoutaris, `08] Kronecker graphs [Leskovec et al. `07] [Akoglu, `08] 6/18/2015Akoglu, Faloutsos ECML PKDD Multinomial/Lognormal distrib. Fixed number of nodes Hard to analyze Produces only undirected graphs Cannot produce weighted graphs. Requires quadratic time

Outline Motivation Problem Definition Related Work A Little History Proposed Model Experimental Results Conclusion 6/18/2015Akoglu, Faloutsos ECML PKDD

rank count A Little History - 1 [Zipf, 1932] In many natural languages, the rank r and the frequency f r of words follow a power law: f r ∝ 1/r 6/18/2015Akoglu, Faloutsos ECML PKDD

A Little History - 2 [Mandelbrot, 1953] “Humans optimize avg. information per unit transmission cost.” 6/18/2015Akoglu, Faloutsos ECML PKDD

A Little History - 2 [Miller, 1957] “A monkey types randomly on a keyboard:  Distribution of words follow a power-law.” 6/18/2015Akoglu, Faloutsos ECML PKDD k equiprobable keys..... abλ $ +Space

A Little History - 2 [Conrad and Mitzenmacher, 2004] “Same relation still holds when keys have unequal probabilities.” 6/18/2015Akoglu, Faloutsos ECML PKDD abλ$ + Space

Outline Motivation Problem Definition Related Work A Little History Proposed Model Experimental Results Conclusion 6/18/2015Akoglu, Faloutsos ECML PKDD

Preliminary Model 1 RTG-IE: RTG with Independent Equiprobable keys 6/18/2015Akoglu, Faloutsos ECML PKDD Space

, where Lemma 1. W is super-linear on N (power law): Lemma 2. W is super-linear on E (power law): Lemma 3. In(out)-weight W n of node n is super-linear on in(out)-degree d n (power law): 6/18/2015Akoglu, Faloutsos ECML PKDD Please find the proofs in the paper. Preliminary Model 1 RTG-IE: RTG with Independent Equiprobable keys

Graph Properties 6/18/2015Akoglu, Faloutsos ECML PKDD

, where Lemma 1. W is super-linear on N (power law): Lemma 2. W is super-linear on E (power law): Lemma 3. In(out)-weight W n of node n is super-linear on in(out)-degree d n (power law): 6/18/2015Akoglu, Faloutsos ECML PKDD Please find the proofs in the paper. Preliminary Model 1 RTG-IE: RTG with Independent Equiprobable keys L11. Weight PL L05. Densification PL L10. Snapshot PL

Advantages of the Preliminary Model 1 G1 - Intuitive G1 - Easy to implement G2 - Realistic –provably follows several rules G3 - Handful of parameters –k, q, W G5 - Fast –generating random sequence of char.s 6/18/2015Akoglu, Faloutsos ECML PKDD

Problems of the Preliminary Model 1 1- Multinomial degree distributions 6/18/2015Akoglu, Faloutsos ECML PKDD rank count in-degree count

Problems of the Preliminary Model 1 2- No homophily, no community structure  Node i connects to any node j with prob. d i *d j independently, rather than connecting to ‘similar’ nodes. 6/18/2015Akoglu, Faloutsos ECML PKDD

Preliminary Model 2 RTG-IU: RTG with Independent Un-equiprobable keys 6/18/2015Akoglu, Faloutsos ECML PKDD Solution to Problem 1: [Conrad and Mitzenmacher, 2004] rank count in-degree count rank count in-degree count... a b λ $ + Space..... abλ$+Space

Proposed Model RTG: Random Typing Graphs 6/18/2015Akoglu, Faloutsos ECML PKDD Solution to Problem 2: “2D keyboard” Generate source- destination labels in one shot. Pick one of the nine keys randomly.

6/18/2015Akoglu, Faloutsos ECML PKDD Solution to Problem 2: “2D keyboard” Repeat recursively. Terminate each label when the space key is typed on each dimension (dark blue). Proposed Model RTG: Random Typing Graphs

p a *p a 6/18/2015Akoglu, Faloutsos ECML PKDD Solution to Problem 2: “2D keyboard” How do we choose the keys? Independent model does not yield community structure! Proposed Model RTG: Random Typing Graphs p a *p b p b *p a p b *p b q*p a q*p b p a *q p b *q q*q

6/18/2015Akoglu, Faloutsos ECML PKDD Solution to Problem 2: “2D keyboard” Boost probability of diagonal keys and decrease probability of off-diagonal ones (0<β<1: imbalance factor) Proposed Model RTG: Random Typing Graphs

6/18/2015Akoglu, Faloutsos ECML PKDD Solution to Problem 2: “2D keyboard” Boost probability of diagonal keys and decrease probability of off-diagonal ones (0<β<1: imbalance factor) Favoring of diagonal keys creates homophily. Proposed Model RTG: Random Typing Graphs

Proposed Model 6/18/2015Akoglu, Faloutsos ECML PKDD Parameters k: Number of keys q: Probability of hitting the space key S W: Number of multi- edges in output graph G β: imbalance factor

Up to this point, we discussed directed, weighted and unipartite graphs. Generalizations - Undirected graphs: Ignore edge directions; edge generation is symmetric. - Unweighted graphs: Ignore duplicate edges. - Bipartite graphs: Different key sets on source and destination; labels are different. Proposed Model 6/18/2015Akoglu, Faloutsos ECML PKDD

Outline Motivation Problem Definition Related Work A Little History Proposed Model Experimental Results Conclusion 6/18/2015Akoglu, Faloutsos ECML PKDD

Experimental Results How does RTG model real graphs? Blognet: a social network of blogs based on citations  undirected, unweighted and unipartite  N = 27, 726; E = 126, 227; over 80 time ticks. Com2Cand: the U.S. electoral campaign donations network from organizations to candidates  directed, weighted ( $ amounts) and bipartite  N = 23, 191; E = 877, 721; W = 4, 383, 105, 580 over 29 time ticks. 6/18/2015Akoglu, Faloutsos ECML PKDD

Experimental Results Blognet RTG 6/18/2015Akoglu, Faloutsos ECML PKDD degree count L01. Power-law degree distribution [Faloutsos et al. `99, Kleinberg et al. `99, Chakrabarti et al. `04, Newman `04]

Experimental Results Blognet RTG 6/18/2015Akoglu, Faloutsos ECML PKDD triangles count L02. Triangle Power Law (TPL) [Tsourakakis `08]

Experimental Results 1 Blognet RTG 6/18/2015Akoglu, Faloutsos ECML PKDD rank λ rank L03. Eigenvalue Power Law (EPL) [Siganos et al. `03]

Graph Properties 6/18/2015Akoglu, Faloutsos ECML PKDD

Experimental Results 1 Blognet RTG 6/18/2015Akoglu, Faloutsos ECML PKDD #nodes #edges L05. Densification Power Law (DPL) [Leskovec et al. `05]

Experimental Results Blognet RTG 6/18/2015Akoglu, Faloutsos ECML PKDD time diameter L06. Small and shrinking diameter [Albert and Barabási `99, Leskovec et al. `05]

Experimental Results Blognet RTG 6/18/2015Akoglu, Faloutsos ECML PKDD time size L07. Constant size 2 nd and 3 rd connected components [McGlohon et al. `08]

Experimental Results 1 Blognet RTG 6/18/2015Akoglu, Faloutsos ECML PKDD #edges λ1λ1 λ1λ1 L08. Principal Eigenvalue Power Law (λ 1 PL) [Akoglu et al. `08]

Experimental Results 1 Blognet RTG 6/18/2015Akoglu, Faloutsos ECML PKDD resolution entropy L09. Bursty/self-similar edge/weight additions [Gomez and Santonja `98, Gribble et al. `98, Crovella and Bestavros `99, McGlohon et al. `08]

Graph Properties 6/18/2015Akoglu, Faloutsos ECML PKDD

Experimental Results 2 Com2Cand RTG 6/18/2015Akoglu, Faloutsos ECML PKDD time diameter size

Experimental Results 2 Com2Cand RTG 6/18/2015Akoglu, Faloutsos ECML PKDD #edges rank λ1λ1 λ1λ1 λ rank

Experimental Results 2 Com2Cand RTG 6/18/2015Akoglu, Faloutsos ECML PKDD in-degree count in-degree count resolution entropy

Experimental Results 2 Com2Cand RTG 6/18/2015Akoglu, Faloutsos ECML PKDD in-degree in-degree (#checks) in-weight in-weight ( $ amount) L10. Snapshot Power Law (SPL) [McGlohon et al. `08]

Experimental Results 2 Com2Cand RTG 6/18/2015Akoglu, Faloutsos ECML PKDD #edges Total weight L11. Weight Power Law (WPL) [McGlohon et al. `08] Total weight #edges

Graph Properties 6/18/2015Akoglu, Faloutsos ECML PKDD

Experimental Results On “modularity” [Girvan and Newman `02] 6/18/2015Akoglu, Faloutsos ECML PKDD No significant modularity --RTG-IE “Modularity “ decreases with increasing β more community structure

Graph Properties 6/18/2015Akoglu, Faloutsos ECML PKDD

Experimental Results On complexity 6/18/2015Akoglu, Faloutsos ECML PKDD Computation time grows linearly with increasing W 2M multi-edges in 7 sec.s #multi-edges time (ms)

Outline Motivation Problem Definition Related Work A Little History Proposed Model Experimental Results Conclusion 6/18/2015Akoglu, Faloutsos ECML PKDD

Conclusion 1 Our model is: G1.simple and intuitive --few lines of code G2.realistic --graphs that obey all eleven properties in real graphs G3.parsimonious --only a handful of parameters G4.flexible --can generate weighted/unweighted, directed/undirected, unipartite/bipartite graphs and any combination of those G5.fast --linear on the size of the output graph 6/18/2015Akoglu, Faloutsos ECML PKDD

Conclusion 2 We showed that: RTG mimics real graphs well. 6/18/2015Akoglu, Faloutsos ECML PKDD

Contact 6/18/2015Akoglu, Faloutsos ECML PKDD Leman Akoglu Christos Faloutsos

A Little History - 3 The infinite monkey theorem: A monkey typing randomly on a keyboard for an infinite amount of time will almost surely type a given text, such as the complete works of William Shakespeare. 6/18/2015Akoglu, Faloutsos ECML PKDD

Burstiness and Self-similarity If each step is a time tick, weight additions are uniform! Start with a uniform interval Recursively subdivide weight additions to each half, quarter, and so on, according to the bias b > 0.5 b -fraction of the additions happen in one “half” and the remaining in the other. Total Weight Time Proposed Model 6/18/2015Akoglu, Faloutsos ECML PKDD

Related Work: Graph Properties 6/18/2015Akoglu, Faloutsos ECML PKDD UnweightedWeighted Static L01. Power-law degree distribution [Faloutsos et al. `99, Kleinberg et al. `99, Chakrabarti et al. `04, Newman `04] L02. Triangle Power Law (TPL) [Tsourakakis `08] L03. Eigenvalue Power Law (EPL) [Siganos et al. `03] L04. Community structure [Flake et al. `02, Girvan and Newman `02] L10. Snapshot Power Law (SPL) [McGlohon et al. `08] Dynamic L05. Densification Power Law (DPL) [Leskovec et al. `05] L06. Small and shrinking diameter [Albert and Barabási `99, Leskovec et al. `05] L07. Constant size 2 nd and 3 rd connected components [McGlohon et al. `08] L08. Principal Eigenvalue Power Law (λ 1 PL) [Akoglu et al. `08] L09. Bursty/self-similar edge/weight additions [Gomez and Santonja `98, Gribble et al. `98, Crovella and Bestavros `99, McGlohon et al. `08] L11. Weight Power Law (WPL) [McGlohon et al. `08]