Download presentation
Presentation is loading. Please wait.
Published byAlexandra Kennedy Modified over 9 years ago
1
Weighted Graphs and Disconnected Components Patterns and a Generator Mary McGlohon, Leman Akoglu, Christos Faloutsos Carnegie Mellon University School of Computer Science
2
2 McGlohon, Akoglu, Faloutsos KDD08
3
● In graphs a largest connected component emerges. ● What about the smaller-size components? ● How do they emerge, and join with the large one? 3 McGlohon, Akoglu, Faloutsos KDD08 “Disconnected” components
4
4 McGlohon, Akoglu, Faloutsos KDD08 Weighted edges ● Graphs have heavy-tailed degree distribution. ● What can we also say about these edges? ● How are they repeated, or otherwise weighted?
5
5 McGlohon, Akoglu, Faloutsos KDD08 Our goals ● Observe “Next-largest connected components” Q1. How does the GCC emerge? Q2. How do NLCC’s emerge and join with the GCC? ● Find properties that govern edge weights Q3: How does the total weight of the graph relate to the number of edges? Q4: How do the weights of nodes relate to degree? Q5: Does this relation change with the graph? ● Q6: Can we produce an emergent, generative model
6
66 McGlohon, Akoglu, Faloutsos KDD08 Outline ● Motivation ● Related work ● Preliminaries ● Data ● Observations ● Model ● Summary 12345
7
7 McGlohon, Akoglu, Faloutsos KDD08 Properties of networks ● Small diameter (“small world” phenomenon) – [Milgram 67] [Leskovec, Horovitz 07] ● Heavy-tailed degree distribution – [Barabasi, Albert 99] [Faloutsos, Faloutsos, Faloutsos 99] ● Densification – [Leskovec, Kleinberg, Faloutsos 05] ● “Middle region” components as well as GCC and singletons – [Kumar, Novak, Tomkins 06]
8
8 McGlohon, Akoglu, Faloutsos KDD08 Generative Models ● Erdos-Renyi model [Erdos, Renyi 60] ● Preferential Attachment [Barabasi, Albert 99] ● Forest Fire model [Leskovec, Kleinberg, Faloutsos 05] ● Kronecker multiplication [Leskovec, Chakrabarti, Kleinberg, Faloutsos 07] ● Edge Copying model [Kumar, Raghavan, Rajagopalan, Sivakumar, Tomkins, Upfal 00] ● “Winners don’t take all” [Pennock, Flake, Lawrence, Glover, Giles 02]
9
99 McGlohon, Akoglu, Faloutsos KDD08 Outline ● Motivation ● Related work ● Preliminaries ● Data ● Observations ● Model ● Summary 123456
10
10 McGlohon, Akoglu, Faloutsos KDD08 Diameter ● Diameter of a graph is the “longest shortest path”. n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7
11
11 McGlohon, Akoglu, Faloutsos KDD08 Diameter ● Diameter of a graph is the “longest shortest path”. diameter=3 n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7
12
12 McGlohon, Akoglu, Faloutsos KDD08 Diameter ● Diameter of a graph is the “longest shortest path”. ● Effective diameter is the distance at which 90% of nodes can be reached. diameter=3 n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7
13
13 McGlohon, Akoglu, Faloutsos KDD08 Outline ● Motivation ● Related work ● Preliminaries ● Data ● Observations ● Model ● Summary 12345
14
14 McGlohon, Akoglu, Faloutsos KDD08 Unipartite Networks ● Postnet: Posts in blogs, hyperlinks between ● Blognet: Aggregated Postnet, repeated edges ● Patent: Patent citations ● NIPS: Academic citations ● Arxiv: Academic citations ● NetTraffic: Packets, repeated edges ● Autonomous Systems (AS): Packets, repeated edges n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7
15
15 McGlohon, Akoglu, Faloutsos KDD08 Unipartite Networks ● Postnet: Posts in blogs, hyperlinks between ● Blognet: Aggregated Postnet, repeated edges ● Patent: Patent citations ● NIPS: Academic citations ● Arxiv: Academic citations ● NetTraffic: Packets, repeated edges ● Autonomous Systems (AS): Packets, repeated edges n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 (3)
16
16 McGlohon, Akoglu, Faloutsos KDD08 Unipartite Networks ● Postnet: Posts in blogs, hyperlinks between ● Blognet: Aggregated Postnet, repeated edges ● Patent: Patent citations ● NIPS: Academic citations ● Arxiv: Academic citations ● NetTraffic: Packets, repeated edges ● Autonomous Systems (AS): Packets, repeated edges n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 10 1.2 8.3 2 6 1
17
17 McGlohon, Akoglu, Faloutsos KDD08 Unipartite Networks ● (Nodes, Edges, Timestamps) ● Postnet: 250K, 218K, 80 days ● Blognet: 60K,125K, 80 days ● Patent: 4M, 8M, 17 yrs ● NIPS: 2K, 3K, 13 yrs ● Arxiv: 30K, 60K, 13 yrs ● NetTraffic: 21K, 3M, 52 mo ● AS: 12K, 38K, 6 mo n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7
18
18 McGlohon, Akoglu, Faloutsos KDD08 Bipartite Networks ● IMDB: Actor-movie network ● Netflix: User-movie ratings ● DBLP: conference- repeated edges – Author-Keyword – Keyword-Conference – Author-Conference ● US Election Donations: $ weights, repeated edges – Orgs-Candidates – Individuals-Orgs n1n1 n2n2 n3n3 n4n4 m1m1 m2m2 m3m3
19
19 McGlohon, Akoglu, Faloutsos KDD08 Bipartite Networks ● IMDB: Actor-movie network ● Netflix: User-movie ratings ● DBLP: repeated edges – Author-Keyword – Keyword-Conference – Author-Conference ● US Election Donations: $ weights, repeated edges – Orgs-Candidates – Individuals-Orgs n1n1 n2n2 n3n3 n4n4 m1m1 m2m2 m3m3
20
20 McGlohon, Akoglu, Faloutsos KDD08 Bipartite Networks ● IMDB: Actor-movie network ● Netflix: User-movie ratings ● DBLP: repeated edges – Author-Keyword – Keyword-Conference – Author-Conference ● US Election Donations: $ weights, repeated edges – Orgs-Candidates – Individuals-Orgs n1n1 n2n2 n3n3 n4n4 m1m1 m2m2 m3m3 10 1.2 2 1 5 6
21
21 McGlohon, Akoglu, Faloutsos KDD08 Bipartite Networks ● IMDB: 757K, 2M, 114 yr ● Netflix: 125K, 14M, 72 mo ● DBLP: 25 yr – Author-Keyword: 27K, 189K – Keyword-Conference: 10K, 23K – Author-Conference: 17K, 22K ● US Election Donations: 22 yr – Orgs-Candidates: 23K, 877K – Individuals-Orgs: 6M, 10M n1n1 n2n2 n3n3 n4n4 m1m1 m2m2 m3m3
22
22 McGlohon, Akoglu, Faloutsos KDD08 Outline ● Motivation ● Related work ● Preliminaries ● Data ● Observations ● Model ● Summary 12345
23
23 McGlohon, Akoglu, Faloutsos KDD08 Observation 1: Gelling Point Q1: How does the GCC emerge?
24
24 McGlohon, Akoglu, Faloutsos KDD08 Observation 1: Gelling Point ● Most real graphs display a gelling point, or burning off period ● After gelling point, they exhibit typical behavior. This is marked by a spike in diameter. Time Diameter IMDB t=1914
25
Observation 2: NLCC behavior Q2: How do NLCC’s emerge and join with the GCC? Do they continue to grow in size? Do they shrink? Stabilize? 25 McGlohon, Akoglu, Faloutsos KDD08
26
26 McGlohon, Akoglu, Faloutsos KDD08 Observation 2: NLCC behavior ● After the gelling point, the GCC takes off, but NLCC’s remain constant or oscillate. Time IMDB CC size
27
27 McGlohon, Akoglu, Faloutsos KDD08 Outline ● Motivation ● Related work ● Preliminaries ● Data ● Observations ● Model ● Summary 12345
28
Observation 3 Q3: How does the total weight of the graph relate to the number of edges? 28 McGlohon, Akoglu, Faloutsos KDD08
29
29 McGlohon, Akoglu, Faloutsos KDD08 Observation 3: Fortification Effect ● $ = # checks ? |Checks| Orgs-Candidates |$| 1980 2004
30
30 McGlohon, Akoglu, Faloutsos KDD08 Observation 3: Fortification Effect ● Weight additions follow a power law with respect to the number of edges: – W(t): total weight of graph at t – E(t): total edges of graph at t – w is PL exponent – 1.01 < w < 1.5 = super-linear! – (more checks, even more $) |Checks| Orgs-Candidates |$| 1980 2004
31
Observation 4 and 5 Q4: How do the weights of nodes relate to degree? Q5: Does this relation change over time? 31 McGlohon, Akoglu, Faloutsos KDD08
32
32 McGlohon, Akoglu, Faloutsos KDD08 Observation 4: Snapshot Power Law ● At any time, total incoming weight of a node is proportional to in degree with PL exponent, iw. 1.01 < iw < 1.26, super-linear ● More donors, even more $ Edges (# donors) In-weights ($) Orgs-Candidates e.g. John Kerry, $10M received, from 1K donors
33
33 McGlohon, Akoglu, Faloutsos KDD08 Observation 5: Snapshot Power Law ● For a given graph, this exponent is constant over time. Time exponent Orgs-Candidates
34
34 McGlohon, Akoglu, Faloutsos KDD08 Outline ● Motivation ● Related work ● Preliminaries ● Data ● Observations ● Q6: Is there a generative, “emergent” model? ● Summary
35
Goals of model 35 McGlohon, Akoglu, Faloutsos KDD08 ● a) Emergent, intuitive behavior ● b) Shrinking diameter ● c) Constant NLCC’s ● d) Densification power law ● e) Power-law degree distribution
36
Goals of model 36 McGlohon, Akoglu, Faloutsos KDD08 ● a) Emergent, intuitive behavior ● b) Shrinking diameter ● c) Constant NLCC’s ● d) Densification power law ● e) Power-law degree distribution = “Butterfly” Model
37
37 McGlohon, Akoglu, Faloutsos KDD08 Butterfly model in action ● A node joins a network, with own parameter. n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 p step “Curiosity”
38
38 McGlohon, Akoglu, Faloutsos KDD08 Butterfly model in action ● A node joins a network, with own parameter. ● With (global) p host, chooses a random host n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 p host “Cross-disciplinarity”
39
39 McGlohon, Akoglu, Faloutsos KDD08 Butterfly model in action ● A node joins a network, with own parameters. ● With (global) p host, chooses a random host – With (global) p link, creates link n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 p link “Friendliness”
40
40 McGlohon, Akoglu, Faloutsos KDD08 Butterfly model in action ● A node joins a network, with own parameters. ● With (global) p host, chooses a random host – With (global) p link, creates link – With p step travels to random neighbor n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 p step
41
41 McGlohon, Akoglu, Faloutsos KDD08 Butterfly model in action ● A node joins a network, with own parameters. ● With (global) p host, chooses a random host – With (global) p link, creates link – With p step travels to random neighbor. Repeat. n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 p link
42
42 McGlohon, Akoglu, Faloutsos KDD08 Butterfly model in action ● A node joins a network, with own parameters. ● With (global) p host, chooses a random host – With (global) p link, creates link – With p step travels to random neighbor. Repeat. n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 p step
43
43 McGlohon, Akoglu, Faloutsos KDD08 Butterfly model in action ● Once there are no more “steps”, repeat “host” procedure: – With p host, choose new host, possibly link, etc. n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 p host
44
44 McGlohon, Akoglu, Faloutsos KDD08 Butterfly model in action ● Once there are no more “steps”, repeat “host” procedure: – With p host, choose new host, possibly link, etc. n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 p host
45
45 McGlohon, Akoglu, Faloutsos KDD08 Butterfly model in action ● Once there are no more “steps”, repeat “host” procedure: – With p host, choose new host, possibly link, etc. – Until no more steps, and no more hosts. n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 p link
46
46 McGlohon, Akoglu, Faloutsos KDD08 Butterfly model in action ● Once there are no more “steps”, repeat “host” procedure: – With p host, choose new host, possibly link, etc. – Until no more steps, and no more hosts. n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 p step
47
47 McGlohon, Akoglu, Faloutsos KDD08 a) Emergent, intuitive behavior Novelties of model: ● Nodes link with probability – May choose host, but not link (start new component) ● Incoming nodes are “social butterflies” – May have several hosts (merges components) ● Some nodes are friendlier than others – p step different for each node – This creates power-law degree distribution (theorem)
48
Validation of Butterfly ● Chose following parameters: – p host = 0.3 – p link = 0.5 – p step ~ U(0,1) ● Ran 10 simulations ● 100,000 nodes per simulation 48 McGlohon, Akoglu, Faloutsos KDD08
49
b) Shrinking diameter ● Shrinking diameter – In model, gelling usually occurred around N=20,000 49 McGlohon, Akoglu, Faloutsos KDD08 Nodes Diam- eter N=20,000
50
● Constant / oscillating NLCC’s Nodes NLCC size c) Oscillating NLCC’s 50 McGlohon, Akoglu, Faloutsos KDD08 N=20,000
51
d) Densification power law ● Densification: – Our datasets had a=(1.03, 1.7) – In [Leskovec+05-KDD], a= (1.1, 1.7) – Simulation produced a = (1.1,1.2) 51 McGlohon, Akoglu, Faloutsos KDD08 Nodes Edges N=20,000
52
e) Power-law degree distribution ● Power-law degree distribution – Exponents approx -2 52 McGlohon, Akoglu, Faloutsos KDD08 Degree Count
53
53 McGlohon, Akoglu, Faloutsos KDD08 Summary ● Studied several diverse public graphs – Measured at many timestamps – Unipartite and bipartite – Blogs, citations, real-world, network traffic – Largest was 6 million nodes, 10 million edges
54
54 McGlohon, Akoglu, Faloutsos KDD08 Summary ● Observations on unweighted graphs: A1: The GCC emerges at the “gelling point” A2: NLCC’s are of constant / oscillating size ● Observations on weighted graphs: A3: Total weight increases super-linearly with edges A4: Node’s weights increase super-linearly with degree, power law exponent iw A5: iw remains constant over time ● A6: Intuitive, emergent generative “butterfly” model, that matches properties
55
55 McGlohon, Akoglu, Faloutsos KDD08 References [Barabasi+99] Barabasi, A. L. & Albert, R. (1999), 'Emergence of scaling in random networks', Science 286(5439), 509--512. [Erdos+60] Erdos, P. & Renyi, A. (1960), 'On the evolution of random graphs', Publ. Math. Inst. Hungary. Acad. Sci. 5, 17-61. [Faloutsos * 99] Faloutsos, M.; Faloutsos, P. & Faloutsos, C. (1999), 'On Power-law Relationships of the Internet Topology', SIGCOMM, 251-262. [Kumar+99]. R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and Eli Upfal. Stochastic models for the Web graph. Proceedings of the 41th FOCS. 2000, pp. 57-65 [Kumar+06] Kumar, R.; Novak, J. & Tomkins, A. (2006), Structure and evolution of online social networks, in 'KDD '06: Proceedings of the 12th ACM SIGKDD International Conference on Knowedge Discover and Data Mining', pp. 611—617. [Leskovec+05KDD] Leskovec, J.; Kleinberg, J. & Faloutsos, C. (2005), Graphs over time: densification laws, shrinking diameters and possible explanations, in 'KDD '05. [Leskovec+07] Leskovec, J.; Faloutsos, C. Scalable modeling of real graphs using Kronecker Multiplication. ICML 2007. [Milgram+67] Milgram, S. (1967), 'The small-world problem', Psychology Today 2, 60—67. [Pennock+02] Winners don’t take all: Characterizing the competition for links on the web PNAS 2002 [Wang+2002] Wang, M.; Madhyastha, T.; Chang, N. H.; Papadimitriou, S. & Faloutsos, C. (2002), 'Data Mining Meets Performance Evaluation: Fast Algorithms for Modeling Bursty Traffic', ICDE.
56
56 McGlohon, Akoglu, Faloutsos KDD08 Contact us Leman Akoglu www.andrew.cmu.edu/~lakoglu lakoglu@cs.cmu.edu Christos Faloutsos www.cs.cmu.edu/~christos christos@cs.cmu.edu Mary McGlohon www.cs.cmu.edu/~mmcgloho mmcgloho@cs.cmu.edu
57
● From time series data, begin with resolution r= T/2. ● Record entropy H R 57 McGlohon, Akoglu, Faloutsos KDD08 Entropy plots [Wang+2002] Time Weights Resolution Entropy
58
● From time series data, begin with resolution r= T/2. ● Record entropy H R` 58 McGlohon, Akoglu, Faloutsos KDD08 Entropy plots Time Weights Resolution Entropy
59
● From time series data, begin with resolution r= T/2. ● Record entropy H R ● Recursively take finer resolutions. 59 McGlohon, Akoglu, Faloutsos KDD08 Entropy plots Time Weights Resolution Entropy
60
● From time series data, begin with resolution r= T/2. ● Record entropy H R ● Recursively take finer resolutions. 60 McGlohon, Akoglu, Faloutsos KDD08 Entropy plots Time Weights Resolution Entropy
61
61 McGlohon, Akoglu, Faloutsos KDD08 Entropy Plots ● Self-similarity Linear plot Resolution Entropy s= 0.59 ● Self-similarity Linear plot ●
62
62 McGlohon, Akoglu, Faloutsos KDD08 Entropy Plots ● Self-similarity Linear plot Resolution Entropy s= 0.59 ● Self-similarity Linear plot ● Uniform: slope of plot s=1. time
63
63 McGlohon, Akoglu, Faloutsos KDD08 Entropy Plots ● Self-similarity Linear plot Resolution Entropy s= 0.59 ● Self-similarity Linear plot ● Uniform: slope of plot s=1. Point mass: s=0 time
64
64 McGlohon, Akoglu, Faloutsos KDD08 Entropy Plots ● Self-similarity Linear plot Resolution Entropy s= 0.59 ● Self-similarity Linear plot ● Uniform: slope of plot s=1. Point mass: s=0 time Bursty: 0.2 < s < 0.9
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.