Presentation is loading. Please wait.

Presentation is loading. Please wait.

RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos Carnegie Mellon University.

Similar presentations


Presentation on theme: "RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos Carnegie Mellon University."— Presentation transcript:

1 RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos Carnegie Mellon University

2 Outline Motivation Problem Definition Related Work A Little History Proposed Model Experimental Results Conclusion 6/18/2015Akoglu, Faloutsos ECML PKDD 20092

3 Motivation - 1 Complex graphs --WWW, computer, biological, social networks, etc. exhibit many common properties: - power laws - small and shrinking diameter - community structure - … How can we produce synthetic but realistic graphs? 6/18/2015Akoglu, Faloutsos ECML PKDD 20093 http://www.aharef.info/static/htmlgraph/

4 Motivation - 2 Why do we need synthetic graphs? Simulation Sampling/Extrapolation Summarization/Compression Motivation to understand pattern generating processes 6/18/2015Akoglu, Faloutsos ECML PKDD 20094

5 Problem Definition Discover a graph generator that is: G1. simple: the more intuitive the better! G2. realistic: outputs graphs that obey all “laws” G3. parsimonious: requires few parameters G4. flexible: able to produce the cross-product of un/weighted, un/directed, uni/bipartite graphs G5. fast: generation should take linear time with the size of the output graph 6/18/2015Akoglu, Faloutsos ECML PKDD 20095

6 Outline Motivation Problem Definition Related Work A Little History Proposed Model Experimental Results Conclusion 6/18/2015Akoglu, Faloutsos ECML PKDD 20096

7 Related Work 1.Graph Properties  What we want to match 2.Graph Generators  What has been proposed earlier 6/18/2015Akoglu, Faloutsos ECML PKDD 20097

8 Related Work 1: Graph Properties 6/18/2015Akoglu, Faloutsos ECML PKDD 20098

9 Related Work 2: Graph Generators Erdős-Rényi (ER) model [Erdős, Rényi `60] Small-world model [Watts, Strogatz `98] Preferential Attachment [Barabási, Albert `99] Winners don’t take all [Pennock et al. `02] Forest Fire model [Leskovec, Faloutsos `05] Butterfly model [McGlohon et al. `08] 6/18/2015Akoglu, Faloutsos ECML PKDD 20099

10 Related Work 2: Graph Generators Erdős-Rényi (ER) model [Erdős, Rényi `60] Small-world model [Watts, Strogatz `98] Preferential Attachment [Barabási, Albert `99] Winners don’t take all [Pennock et al. `02] Forest Fire model [Leskovec, Faloutsos `05] Butterfly model [McGlohon et al. `08] 6/18/2015Akoglu, Faloutsos ECML PKDD 200910 Model some static graph property Neglect dynamic properties Cannot produce weighted graphs.

11 Related Work 2: Graph Generators Random dot-product graphs [Kraetzl, Nickel `05] [Young, Scheinerman `07] Utility-based models [Fabrikant et al. ’02] [Even-Bar et al. `07] [Laoutaris, `08] Kronecker graphs [Leskovec et al. `07] [Akoglu et al. `08] 6/18/2015Akoglu, Faloutsos ECML PKDD 200911

12 Related Work 2: Graph Generators 6/18/2015Akoglu, Faloutsos ECML PKDD 200912 Random dot-product graphs [Kraetzl, Nickel `05] [Young, Scheinerman `07] Utility-based models [Fabrikant et al. ’02] [Even-Bar et al. `07] [Laoutaris, `08] Kronecker graphs [Leskovec et al. `07] [Akoglu et al. `08] Produces only undirected graphs Cannot produce weighted graphs. Requires quadratic time

13 Random dot-product graphs [Kraetzl, Nickel `05] [Young, Scheinerman `07] Utility-based models [Fabrikant et al. ’02] [Even-Bar et al. `07] [Laoutaris, `08] Kronecker graphs [Leskovec et al. `07] [Akoglu et al. `08] Related Work 2: Graph Generators 6/18/2015Akoglu, Faloutsos ECML PKDD 200913 Hard to analyze Produces only undirected graphs Cannot produce weighted graphs. Requires quadratic time

14 Related Work 2: Graph Generators Random dot-product graphs [Kraetzl, Nickel `05] [Young, Scheinerman `07] Utility-based models [Fabrikant et al. ’02] [Even-Bar et al. `07] [Laoutaris, `08] Kronecker graphs [Leskovec et al. `07] [Akoglu, `08] 6/18/2015Akoglu, Faloutsos ECML PKDD 200914 Multinomial/Lognormal distrib. Fixed number of nodes Hard to analyze Produces only undirected graphs Cannot produce weighted graphs. Requires quadratic time

15 Outline Motivation Problem Definition Related Work A Little History Proposed Model Experimental Results Conclusion 6/18/2015Akoglu, Faloutsos ECML PKDD 200915

16 rank count A Little History - 1 [Zipf, 1932] In many natural languages, the rank r and the frequency f r of words follow a power law: f r ∝ 1/r 6/18/2015Akoglu, Faloutsos ECML PKDD 200916

17 A Little History - 2 [Mandelbrot, 1953] “Humans optimize avg. information per unit transmission cost.” 6/18/2015Akoglu, Faloutsos ECML PKDD 200917

18 A Little History - 2 [Miller, 1957] “A monkey types randomly on a keyboard:  Distribution of words follow a power-law.” 6/18/2015Akoglu, Faloutsos ECML PKDD 200918 k equiprobable keys..... abλ $ +Space

19 A Little History - 2 [Conrad and Mitzenmacher, 2004] “Same relation still holds when keys have unequal probabilities.” 6/18/2015Akoglu, Faloutsos ECML PKDD 200919... abλ$ + Space

20 Outline Motivation Problem Definition Related Work A Little History Proposed Model Experimental Results Conclusion 6/18/2015Akoglu, Faloutsos ECML PKDD 200920

21 Preliminary Model 1 RTG-IE: RTG with Independent Equiprobable keys 6/18/2015Akoglu, Faloutsos ECML PKDD 200921 Space

22 , where Lemma 1. W is super-linear on N (power law): Lemma 2. W is super-linear on E (power law): Lemma 3. In(out)-weight W n of node n is super-linear on in(out)-degree d n (power law): 6/18/2015Akoglu, Faloutsos ECML PKDD 200922 Please find the proofs in the paper. Preliminary Model 1 RTG-IE: RTG with Independent Equiprobable keys

23 Graph Properties 6/18/2015Akoglu, Faloutsos ECML PKDD 200923

24 , where Lemma 1. W is super-linear on N (power law): Lemma 2. W is super-linear on E (power law): Lemma 3. In(out)-weight W n of node n is super-linear on in(out)-degree d n (power law): 6/18/2015Akoglu, Faloutsos ECML PKDD 200924 Please find the proofs in the paper. Preliminary Model 1 RTG-IE: RTG with Independent Equiprobable keys L11. Weight PL L05. Densification PL L10. Snapshot PL

25 Advantages of the Preliminary Model 1 G1 - Intuitive G1 - Easy to implement G2 - Realistic –provably follows several rules G3 - Handful of parameters –k, q, W G5 - Fast –generating random sequence of char.s 6/18/2015Akoglu, Faloutsos ECML PKDD 200925

26 Problems of the Preliminary Model 1 1- Multinomial degree distributions 6/18/2015Akoglu, Faloutsos ECML PKDD 200926 rank count in-degree count

27 Problems of the Preliminary Model 1 2- No homophily, no community structure  Node i connects to any node j with prob. d i *d j independently, rather than connecting to ‘similar’ nodes. 6/18/2015Akoglu, Faloutsos ECML PKDD 200927

28 Preliminary Model 2 RTG-IU: RTG with Independent Un-equiprobable keys 6/18/2015Akoglu, Faloutsos ECML PKDD 200928 Solution to Problem 1: [Conrad and Mitzenmacher, 2004] rank count in-degree count rank count in-degree count... a b λ $ + Space..... abλ$+Space

29 Proposed Model RTG: Random Typing Graphs 6/18/2015Akoglu, Faloutsos ECML PKDD 200929 Solution to Problem 2: “2D keyboard” Generate source- destination labels in one shot. Pick one of the nine keys randomly.

30 6/18/2015Akoglu, Faloutsos ECML PKDD 200930 Solution to Problem 2: “2D keyboard” Repeat recursively. Terminate each label when the space key is typed on each dimension (dark blue). Proposed Model RTG: Random Typing Graphs

31 p a *p a 6/18/2015Akoglu, Faloutsos ECML PKDD 200931 Solution to Problem 2: “2D keyboard” How do we choose the keys? Independent model does not yield community structure! Proposed Model RTG: Random Typing Graphs p a *p b p b *p a p b *p b q*p a q*p b p a *q p b *q q*q

32 6/18/2015Akoglu, Faloutsos ECML PKDD 200932 Solution to Problem 2: “2D keyboard” Boost probability of diagonal keys and decrease probability of off-diagonal ones (0<β<1: imbalance factor) Proposed Model RTG: Random Typing Graphs

33 6/18/2015Akoglu, Faloutsos ECML PKDD 200933 Solution to Problem 2: “2D keyboard” Boost probability of diagonal keys and decrease probability of off-diagonal ones (0<β<1: imbalance factor) Favoring of diagonal keys creates homophily. Proposed Model RTG: Random Typing Graphs

34 Proposed Model 6/18/2015Akoglu, Faloutsos ECML PKDD 200934 Parameters k: Number of keys q: Probability of hitting the space key S W: Number of multi- edges in output graph G β: imbalance factor

35 Up to this point, we discussed directed, weighted and unipartite graphs. Generalizations - Undirected graphs: Ignore edge directions; edge generation is symmetric. - Unweighted graphs: Ignore duplicate edges. - Bipartite graphs: Different key sets on source and destination; labels are different. Proposed Model 6/18/2015Akoglu, Faloutsos ECML PKDD 200935

36 Outline Motivation Problem Definition Related Work A Little History Proposed Model Experimental Results Conclusion 6/18/2015Akoglu, Faloutsos ECML PKDD 200936

37 Experimental Results How does RTG model real graphs? Blognet: a social network of blogs based on citations  undirected, unweighted and unipartite  N = 27, 726; E = 126, 227; over 80 time ticks. Com2Cand: the U.S. electoral campaign donations network from organizations to candidates  directed, weighted ( $ amounts) and bipartite  N = 23, 191; E = 877, 721; W = 4, 383, 105, 580 over 29 time ticks. 6/18/2015Akoglu, Faloutsos ECML PKDD 200937

38 Experimental Results Blognet RTG 6/18/2015Akoglu, Faloutsos ECML PKDD 200938 degree count L01. Power-law degree distribution [Faloutsos et al. `99, Kleinberg et al. `99, Chakrabarti et al. `04, Newman `04]

39 Experimental Results Blognet RTG 6/18/2015Akoglu, Faloutsos ECML PKDD 200939 triangles count L02. Triangle Power Law (TPL) [Tsourakakis `08]

40 Experimental Results 1 Blognet RTG 6/18/2015Akoglu, Faloutsos ECML PKDD 200940 rank λ rank L03. Eigenvalue Power Law (EPL) [Siganos et al. `03]

41 Graph Properties 6/18/2015Akoglu, Faloutsos ECML PKDD 200941

42 Experimental Results 1 Blognet RTG 6/18/2015Akoglu, Faloutsos ECML PKDD 200942 #nodes #edges L05. Densification Power Law (DPL) [Leskovec et al. `05]

43 Experimental Results Blognet RTG 6/18/2015Akoglu, Faloutsos ECML PKDD 200943 time diameter L06. Small and shrinking diameter [Albert and Barabási `99, Leskovec et al. `05]

44 Experimental Results Blognet RTG 6/18/2015Akoglu, Faloutsos ECML PKDD 200944 time size L07. Constant size 2 nd and 3 rd connected components [McGlohon et al. `08]

45 Experimental Results 1 Blognet RTG 6/18/2015Akoglu, Faloutsos ECML PKDD 200945 #edges λ1λ1 λ1λ1 L08. Principal Eigenvalue Power Law (λ 1 PL) [Akoglu et al. `08]

46 Experimental Results 1 Blognet RTG 6/18/2015Akoglu, Faloutsos ECML PKDD 200946 resolution entropy L09. Bursty/self-similar edge/weight additions [Gomez and Santonja `98, Gribble et al. `98, Crovella and Bestavros `99, McGlohon et al. `08]

47 Graph Properties 6/18/2015Akoglu, Faloutsos ECML PKDD 200947

48 Experimental Results 2 Com2Cand RTG 6/18/2015Akoglu, Faloutsos ECML PKDD 200948 time diameter size

49 Experimental Results 2 Com2Cand RTG 6/18/2015Akoglu, Faloutsos ECML PKDD 200949 #edges rank λ1λ1 λ1λ1 λ rank

50 Experimental Results 2 Com2Cand RTG 6/18/2015Akoglu, Faloutsos ECML PKDD 200950 in-degree count in-degree count resolution entropy

51 Experimental Results 2 Com2Cand RTG 6/18/2015Akoglu, Faloutsos ECML PKDD 200951 in-degree in-degree (#checks) in-weight in-weight ( $ amount) L10. Snapshot Power Law (SPL) [McGlohon et al. `08]

52 Experimental Results 2 Com2Cand RTG 6/18/2015Akoglu, Faloutsos ECML PKDD 200952 #edges Total weight L11. Weight Power Law (WPL) [McGlohon et al. `08] Total weight #edges

53 Graph Properties 6/18/2015Akoglu, Faloutsos ECML PKDD 200953

54 Experimental Results On “modularity” [Girvan and Newman `02] 6/18/2015Akoglu, Faloutsos ECML PKDD 200954 No significant modularity --RTG-IE “Modularity “ decreases with increasing β more community structure

55 Graph Properties 6/18/2015Akoglu, Faloutsos ECML PKDD 200955

56 Experimental Results On complexity 6/18/2015Akoglu, Faloutsos ECML PKDD 200956 Computation time grows linearly with increasing W 2M multi-edges in 7 sec.s #multi-edges time (ms)

57 Outline Motivation Problem Definition Related Work A Little History Proposed Model Experimental Results Conclusion 6/18/2015Akoglu, Faloutsos ECML PKDD 200957

58 Conclusion 1 Our model is: G1.simple and intuitive --few lines of code G2.realistic --graphs that obey all eleven properties in real graphs G3.parsimonious --only a handful of parameters G4.flexible --can generate weighted/unweighted, directed/undirected, unipartite/bipartite graphs and any combination of those G5.fast --linear on the size of the output graph 6/18/2015Akoglu, Faloutsos ECML PKDD 200958

59 Conclusion 2 We showed that: RTG mimics real graphs well. 6/18/2015Akoglu, Faloutsos ECML PKDD 200959

60 Contact 6/18/2015Akoglu, Faloutsos ECML PKDD 200960 Leman Akoglu www.cs.cmu.edu/~lakoglu lakoglu@cs.cmu.edu Christos Faloutsos www.cs.cmu.edu/~christos christos@cs.cmu.edu

61 A Little History - 3 The infinite monkey theorem: A monkey typing randomly on a keyboard for an infinite amount of time will almost surely type a given text, such as the complete works of William Shakespeare. 6/18/2015Akoglu, Faloutsos ECML PKDD 200961

62 Burstiness and Self-similarity If each step is a time tick, weight additions are uniform! Start with a uniform interval Recursively subdivide weight additions to each half, quarter, and so on, according to the bias b > 0.5 b -fraction of the additions happen in one “half” and the remaining in the other. Total Weight Time Proposed Model 6/18/2015Akoglu, Faloutsos ECML PKDD 200962

63 Related Work: Graph Properties 6/18/2015Akoglu, Faloutsos ECML PKDD 200963 UnweightedWeighted Static L01. Power-law degree distribution [Faloutsos et al. `99, Kleinberg et al. `99, Chakrabarti et al. `04, Newman `04] L02. Triangle Power Law (TPL) [Tsourakakis `08] L03. Eigenvalue Power Law (EPL) [Siganos et al. `03] L04. Community structure [Flake et al. `02, Girvan and Newman `02] L10. Snapshot Power Law (SPL) [McGlohon et al. `08] Dynamic L05. Densification Power Law (DPL) [Leskovec et al. `05] L06. Small and shrinking diameter [Albert and Barabási `99, Leskovec et al. `05] L07. Constant size 2 nd and 3 rd connected components [McGlohon et al. `08] L08. Principal Eigenvalue Power Law (λ 1 PL) [Akoglu et al. `08] L09. Bursty/self-similar edge/weight additions [Gomez and Santonja `98, Gribble et al. `98, Crovella and Bestavros `99, McGlohon et al. `08] L11. Weight Power Law (WPL) [McGlohon et al. `08]


Download ppt "RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos Carnegie Mellon University."

Similar presentations


Ads by Google