RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos Carnegie Mellon University
Outline Motivation Problem Definition Related Work A Little History Proposed Model Experimental Results Conclusion 6/18/2015Akoglu, Faloutsos ECML PKDD 20092
Motivation - 1 Complex graphs --WWW, computer, biological, social networks, etc. exhibit many common properties: - power laws - small and shrinking diameter - community structure - … How can we produce synthetic but realistic graphs? 6/18/2015Akoglu, Faloutsos ECML PKDD
Motivation - 2 Why do we need synthetic graphs? Simulation Sampling/Extrapolation Summarization/Compression Motivation to understand pattern generating processes 6/18/2015Akoglu, Faloutsos ECML PKDD 20094
Problem Definition Discover a graph generator that is: G1. simple: the more intuitive the better! G2. realistic: outputs graphs that obey all “laws” G3. parsimonious: requires few parameters G4. flexible: able to produce the cross-product of un/weighted, un/directed, uni/bipartite graphs G5. fast: generation should take linear time with the size of the output graph 6/18/2015Akoglu, Faloutsos ECML PKDD 20095
Outline Motivation Problem Definition Related Work A Little History Proposed Model Experimental Results Conclusion 6/18/2015Akoglu, Faloutsos ECML PKDD 20096
Related Work 1.Graph Properties What we want to match 2.Graph Generators What has been proposed earlier 6/18/2015Akoglu, Faloutsos ECML PKDD 20097
Related Work 1: Graph Properties 6/18/2015Akoglu, Faloutsos ECML PKDD 20098
Related Work 2: Graph Generators Erdős-Rényi (ER) model [Erdős, Rényi `60] Small-world model [Watts, Strogatz `98] Preferential Attachment [Barabási, Albert `99] Winners don’t take all [Pennock et al. `02] Forest Fire model [Leskovec, Faloutsos `05] Butterfly model [McGlohon et al. `08] 6/18/2015Akoglu, Faloutsos ECML PKDD 20099
Related Work 2: Graph Generators Erdős-Rényi (ER) model [Erdős, Rényi `60] Small-world model [Watts, Strogatz `98] Preferential Attachment [Barabási, Albert `99] Winners don’t take all [Pennock et al. `02] Forest Fire model [Leskovec, Faloutsos `05] Butterfly model [McGlohon et al. `08] 6/18/2015Akoglu, Faloutsos ECML PKDD Model some static graph property Neglect dynamic properties Cannot produce weighted graphs.
Related Work 2: Graph Generators Random dot-product graphs [Kraetzl, Nickel `05] [Young, Scheinerman `07] Utility-based models [Fabrikant et al. ’02] [Even-Bar et al. `07] [Laoutaris, `08] Kronecker graphs [Leskovec et al. `07] [Akoglu et al. `08] 6/18/2015Akoglu, Faloutsos ECML PKDD
Related Work 2: Graph Generators 6/18/2015Akoglu, Faloutsos ECML PKDD Random dot-product graphs [Kraetzl, Nickel `05] [Young, Scheinerman `07] Utility-based models [Fabrikant et al. ’02] [Even-Bar et al. `07] [Laoutaris, `08] Kronecker graphs [Leskovec et al. `07] [Akoglu et al. `08] Produces only undirected graphs Cannot produce weighted graphs. Requires quadratic time
Random dot-product graphs [Kraetzl, Nickel `05] [Young, Scheinerman `07] Utility-based models [Fabrikant et al. ’02] [Even-Bar et al. `07] [Laoutaris, `08] Kronecker graphs [Leskovec et al. `07] [Akoglu et al. `08] Related Work 2: Graph Generators 6/18/2015Akoglu, Faloutsos ECML PKDD Hard to analyze Produces only undirected graphs Cannot produce weighted graphs. Requires quadratic time
Related Work 2: Graph Generators Random dot-product graphs [Kraetzl, Nickel `05] [Young, Scheinerman `07] Utility-based models [Fabrikant et al. ’02] [Even-Bar et al. `07] [Laoutaris, `08] Kronecker graphs [Leskovec et al. `07] [Akoglu, `08] 6/18/2015Akoglu, Faloutsos ECML PKDD Multinomial/Lognormal distrib. Fixed number of nodes Hard to analyze Produces only undirected graphs Cannot produce weighted graphs. Requires quadratic time
Outline Motivation Problem Definition Related Work A Little History Proposed Model Experimental Results Conclusion 6/18/2015Akoglu, Faloutsos ECML PKDD
rank count A Little History - 1 [Zipf, 1932] In many natural languages, the rank r and the frequency f r of words follow a power law: f r ∝ 1/r 6/18/2015Akoglu, Faloutsos ECML PKDD
A Little History - 2 [Mandelbrot, 1953] “Humans optimize avg. information per unit transmission cost.” 6/18/2015Akoglu, Faloutsos ECML PKDD
A Little History - 2 [Miller, 1957] “A monkey types randomly on a keyboard: Distribution of words follow a power-law.” 6/18/2015Akoglu, Faloutsos ECML PKDD k equiprobable keys..... abλ $ +Space
A Little History - 2 [Conrad and Mitzenmacher, 2004] “Same relation still holds when keys have unequal probabilities.” 6/18/2015Akoglu, Faloutsos ECML PKDD abλ$ + Space
Outline Motivation Problem Definition Related Work A Little History Proposed Model Experimental Results Conclusion 6/18/2015Akoglu, Faloutsos ECML PKDD
Preliminary Model 1 RTG-IE: RTG with Independent Equiprobable keys 6/18/2015Akoglu, Faloutsos ECML PKDD Space
, where Lemma 1. W is super-linear on N (power law): Lemma 2. W is super-linear on E (power law): Lemma 3. In(out)-weight W n of node n is super-linear on in(out)-degree d n (power law): 6/18/2015Akoglu, Faloutsos ECML PKDD Please find the proofs in the paper. Preliminary Model 1 RTG-IE: RTG with Independent Equiprobable keys
Graph Properties 6/18/2015Akoglu, Faloutsos ECML PKDD
, where Lemma 1. W is super-linear on N (power law): Lemma 2. W is super-linear on E (power law): Lemma 3. In(out)-weight W n of node n is super-linear on in(out)-degree d n (power law): 6/18/2015Akoglu, Faloutsos ECML PKDD Please find the proofs in the paper. Preliminary Model 1 RTG-IE: RTG with Independent Equiprobable keys L11. Weight PL L05. Densification PL L10. Snapshot PL
Advantages of the Preliminary Model 1 G1 - Intuitive G1 - Easy to implement G2 - Realistic –provably follows several rules G3 - Handful of parameters –k, q, W G5 - Fast –generating random sequence of char.s 6/18/2015Akoglu, Faloutsos ECML PKDD
Problems of the Preliminary Model 1 1- Multinomial degree distributions 6/18/2015Akoglu, Faloutsos ECML PKDD rank count in-degree count
Problems of the Preliminary Model 1 2- No homophily, no community structure Node i connects to any node j with prob. d i *d j independently, rather than connecting to ‘similar’ nodes. 6/18/2015Akoglu, Faloutsos ECML PKDD
Preliminary Model 2 RTG-IU: RTG with Independent Un-equiprobable keys 6/18/2015Akoglu, Faloutsos ECML PKDD Solution to Problem 1: [Conrad and Mitzenmacher, 2004] rank count in-degree count rank count in-degree count... a b λ $ + Space..... abλ$+Space
Proposed Model RTG: Random Typing Graphs 6/18/2015Akoglu, Faloutsos ECML PKDD Solution to Problem 2: “2D keyboard” Generate source- destination labels in one shot. Pick one of the nine keys randomly.
6/18/2015Akoglu, Faloutsos ECML PKDD Solution to Problem 2: “2D keyboard” Repeat recursively. Terminate each label when the space key is typed on each dimension (dark blue). Proposed Model RTG: Random Typing Graphs
p a *p a 6/18/2015Akoglu, Faloutsos ECML PKDD Solution to Problem 2: “2D keyboard” How do we choose the keys? Independent model does not yield community structure! Proposed Model RTG: Random Typing Graphs p a *p b p b *p a p b *p b q*p a q*p b p a *q p b *q q*q
6/18/2015Akoglu, Faloutsos ECML PKDD Solution to Problem 2: “2D keyboard” Boost probability of diagonal keys and decrease probability of off-diagonal ones (0<β<1: imbalance factor) Proposed Model RTG: Random Typing Graphs
6/18/2015Akoglu, Faloutsos ECML PKDD Solution to Problem 2: “2D keyboard” Boost probability of diagonal keys and decrease probability of off-diagonal ones (0<β<1: imbalance factor) Favoring of diagonal keys creates homophily. Proposed Model RTG: Random Typing Graphs
Proposed Model 6/18/2015Akoglu, Faloutsos ECML PKDD Parameters k: Number of keys q: Probability of hitting the space key S W: Number of multi- edges in output graph G β: imbalance factor
Up to this point, we discussed directed, weighted and unipartite graphs. Generalizations - Undirected graphs: Ignore edge directions; edge generation is symmetric. - Unweighted graphs: Ignore duplicate edges. - Bipartite graphs: Different key sets on source and destination; labels are different. Proposed Model 6/18/2015Akoglu, Faloutsos ECML PKDD
Outline Motivation Problem Definition Related Work A Little History Proposed Model Experimental Results Conclusion 6/18/2015Akoglu, Faloutsos ECML PKDD
Experimental Results How does RTG model real graphs? Blognet: a social network of blogs based on citations undirected, unweighted and unipartite N = 27, 726; E = 126, 227; over 80 time ticks. Com2Cand: the U.S. electoral campaign donations network from organizations to candidates directed, weighted ( $ amounts) and bipartite N = 23, 191; E = 877, 721; W = 4, 383, 105, 580 over 29 time ticks. 6/18/2015Akoglu, Faloutsos ECML PKDD
Experimental Results Blognet RTG 6/18/2015Akoglu, Faloutsos ECML PKDD degree count L01. Power-law degree distribution [Faloutsos et al. `99, Kleinberg et al. `99, Chakrabarti et al. `04, Newman `04]
Experimental Results Blognet RTG 6/18/2015Akoglu, Faloutsos ECML PKDD triangles count L02. Triangle Power Law (TPL) [Tsourakakis `08]
Experimental Results 1 Blognet RTG 6/18/2015Akoglu, Faloutsos ECML PKDD rank λ rank L03. Eigenvalue Power Law (EPL) [Siganos et al. `03]
Graph Properties 6/18/2015Akoglu, Faloutsos ECML PKDD
Experimental Results 1 Blognet RTG 6/18/2015Akoglu, Faloutsos ECML PKDD #nodes #edges L05. Densification Power Law (DPL) [Leskovec et al. `05]
Experimental Results Blognet RTG 6/18/2015Akoglu, Faloutsos ECML PKDD time diameter L06. Small and shrinking diameter [Albert and Barabási `99, Leskovec et al. `05]
Experimental Results Blognet RTG 6/18/2015Akoglu, Faloutsos ECML PKDD time size L07. Constant size 2 nd and 3 rd connected components [McGlohon et al. `08]
Experimental Results 1 Blognet RTG 6/18/2015Akoglu, Faloutsos ECML PKDD #edges λ1λ1 λ1λ1 L08. Principal Eigenvalue Power Law (λ 1 PL) [Akoglu et al. `08]
Experimental Results 1 Blognet RTG 6/18/2015Akoglu, Faloutsos ECML PKDD resolution entropy L09. Bursty/self-similar edge/weight additions [Gomez and Santonja `98, Gribble et al. `98, Crovella and Bestavros `99, McGlohon et al. `08]
Graph Properties 6/18/2015Akoglu, Faloutsos ECML PKDD
Experimental Results 2 Com2Cand RTG 6/18/2015Akoglu, Faloutsos ECML PKDD time diameter size
Experimental Results 2 Com2Cand RTG 6/18/2015Akoglu, Faloutsos ECML PKDD #edges rank λ1λ1 λ1λ1 λ rank
Experimental Results 2 Com2Cand RTG 6/18/2015Akoglu, Faloutsos ECML PKDD in-degree count in-degree count resolution entropy
Experimental Results 2 Com2Cand RTG 6/18/2015Akoglu, Faloutsos ECML PKDD in-degree in-degree (#checks) in-weight in-weight ( $ amount) L10. Snapshot Power Law (SPL) [McGlohon et al. `08]
Experimental Results 2 Com2Cand RTG 6/18/2015Akoglu, Faloutsos ECML PKDD #edges Total weight L11. Weight Power Law (WPL) [McGlohon et al. `08] Total weight #edges
Graph Properties 6/18/2015Akoglu, Faloutsos ECML PKDD
Experimental Results On “modularity” [Girvan and Newman `02] 6/18/2015Akoglu, Faloutsos ECML PKDD No significant modularity --RTG-IE “Modularity “ decreases with increasing β more community structure
Graph Properties 6/18/2015Akoglu, Faloutsos ECML PKDD
Experimental Results On complexity 6/18/2015Akoglu, Faloutsos ECML PKDD Computation time grows linearly with increasing W 2M multi-edges in 7 sec.s #multi-edges time (ms)
Outline Motivation Problem Definition Related Work A Little History Proposed Model Experimental Results Conclusion 6/18/2015Akoglu, Faloutsos ECML PKDD
Conclusion 1 Our model is: G1.simple and intuitive --few lines of code G2.realistic --graphs that obey all eleven properties in real graphs G3.parsimonious --only a handful of parameters G4.flexible --can generate weighted/unweighted, directed/undirected, unipartite/bipartite graphs and any combination of those G5.fast --linear on the size of the output graph 6/18/2015Akoglu, Faloutsos ECML PKDD
Conclusion 2 We showed that: RTG mimics real graphs well. 6/18/2015Akoglu, Faloutsos ECML PKDD
Contact 6/18/2015Akoglu, Faloutsos ECML PKDD Leman Akoglu Christos Faloutsos
A Little History - 3 The infinite monkey theorem: A monkey typing randomly on a keyboard for an infinite amount of time will almost surely type a given text, such as the complete works of William Shakespeare. 6/18/2015Akoglu, Faloutsos ECML PKDD
Burstiness and Self-similarity If each step is a time tick, weight additions are uniform! Start with a uniform interval Recursively subdivide weight additions to each half, quarter, and so on, according to the bias b > 0.5 b -fraction of the additions happen in one “half” and the remaining in the other. Total Weight Time Proposed Model 6/18/2015Akoglu, Faloutsos ECML PKDD
Related Work: Graph Properties 6/18/2015Akoglu, Faloutsos ECML PKDD UnweightedWeighted Static L01. Power-law degree distribution [Faloutsos et al. `99, Kleinberg et al. `99, Chakrabarti et al. `04, Newman `04] L02. Triangle Power Law (TPL) [Tsourakakis `08] L03. Eigenvalue Power Law (EPL) [Siganos et al. `03] L04. Community structure [Flake et al. `02, Girvan and Newman `02] L10. Snapshot Power Law (SPL) [McGlohon et al. `08] Dynamic L05. Densification Power Law (DPL) [Leskovec et al. `05] L06. Small and shrinking diameter [Albert and Barabási `99, Leskovec et al. `05] L07. Constant size 2 nd and 3 rd connected components [McGlohon et al. `08] L08. Principal Eigenvalue Power Law (λ 1 PL) [Akoglu et al. `08] L09. Bursty/self-similar edge/weight additions [Gomez and Santonja `98, Gribble et al. `98, Crovella and Bestavros `99, McGlohon et al. `08] L11. Weight Power Law (WPL) [McGlohon et al. `08]