Download presentation
Presentation is loading. Please wait.
Published byJoleen Sullivan Modified over 9 years ago
1
The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert
2
Outline VIGS: Vertex-Importance Graph Synopsis Testing VIGS with different datasets and importance measures Analytical expectations Making guarantees about VIGS Connectedness: KeepOne, KeepAll Related Work Graph Sampling, Rich Club, K-cores, Web Measure
3
Network or Hairball? Huge networks difficult to study, store, share.. Can we shrink or summarize a network? Starting point: important vertices Vertex-Importance Graph Synopsis
4
Vertex-Importance Graph Synopsis Create subgraph of important vertices Study both key nodes and entire graph Which vertices are important? High-traffic routers? The most quoted blog? Standard, well-defined measures Degree, Betweenness, Closeness, PageRank
5
VIGS In Action Starting point: random graph with 100 vertices Select an importance measure - Degree pick 9 highest degree vertices keep only edges between these 9 vertices average degree = 4average degree = 0.9
6
Motivating example: citations among ACM papers 500 random papers500 most cited papers
7
Datasets Erdos-Renyi random graph and three real networks BuddyZoo - collection of buddy lists TREC - links between blogs Web - an older web crawl from PARC Erdos-RenyiBuddyZooTRECWeb Vertices 10,000135,13129,690152,171 Edges 49,935803,200195,9401,686,541 ASP 4.265.963.723.48 Directed false true
8
Importance measures degree (number of connections) denoted by size betweenness (number of shortest paths a vertex lies on) denoted by color
9
Importance measures degree (number of connections) denoted by size closeness (length of shortest path to all others) denoted by color
10
High correlation between different importance measurements Undirected graphs - higher correlation Closeness has lowest correlation in all datasets Correlation among measures
11
High correlation between different importance measurements Undirected graphs – higher orrelation Closeness has lowest correlation in all datasets Correlation among measures
12
Assortativity In an assortative graph, high-value nodes tend to connect to other high-value nodes Example: degree assortativedisassortative
13
Assortativity - Degree ER: Neutral BZ: Assortative TREC and Web: Disassortative
14
Assortativity
15
Degree distributions
16
Subgraphs Apply VIGS! Select Degree, top 100 nodes Example: degree Substantial difference between datasets!
17
Subgraphs The selection of an importance measure may have an impact, even in the same dataset
18
Connectivity: size of largest component Proportion of nodes that are connected either directly or indirectly
19
Subgraph Connectivity - ER Highly connected, even with only a few vertices All importance measures almost completely connected by 2000 nodes Better performance than random
20
Subgraph Connectivity
21
subgraphs: density average degree = 4average degree = 0.9 What is the proportion of edges to nodes in the original graphs vs. subgraphs?
22
Subgraph Density - ER Black line slope = Edges/Vertices in entire network Lower dotted line = subgraph of random vertices VIGS subgraphs: lower than total density, higher than random subgraph density
23
Subgraph Density
24
Average Shortest Path ‘ASP’
25
whole network ASP ASP between IV’s in subgraph. ASP between IV’s in whole graph ER ASP shorter between IV’s, but higher in subgraph Subgraph Average Shortest Path ‘ASP’ for Erdos Renyi
26
Subgraph ASP’s
27
Relative Rank of Vertices in Subgraph - ER Do IV’s maintain their relative rank in subgraphs? IV and edges only ER - little correlation, steadily increasing until all vertices are included
28
Relative Rank in Subgraph
29
TREC anomaly - closeness
30
Four Regions Four regions, highlighted in density plot: Original Closeness only, Regions highlighted
31
Cause: Blog Aggregator One node has connections to 99% of the nodes between 1 and 7961! (regions 1, 2, 3) This same node has only 1 connection to a node beyond 7961 (region 4) Nodes between 5828 and 7961 (region 3) have only 1 connection: to the aggregator Spam blogs? New blogs? Private blogs?
32
Examining Density The first 3 regions feature nodes connected to the aggregator R1: well connected blogs Average increase in total edges per node added: 12.93 R2: far less connected, but not quite barren Average increase per node: 3.2 R3: isolated spam/new blogs 1 edge per node increase
33
Examining Density R4: well connected, but not linked to aggregator Average increase even higher than region 1: 17.8 Aggregator inflated the closeness scores of connected nodes (R1, 2, 3) above those in region 4
34
Examining Avg Shortest Paths (ASP) R1: ASP slightly below 2 Some nodes directly connected, 99%+ within 2 hops via aggregator R2 and 3: ASP levels at ~2 Fewer and fewer direct links, but all accessible via aggregator R4: ASP’s begin to increase ASP doesn’t explode: ~70% of R4 links are to R1 or R2 nodes R3 only reachable from R4 via agr. Access to aggregator through connected R1/R2 nodes: adds a hop to path
35
Examining Relative Ranking Correlation R1-3: correlation steadily decreases R4: rapid increase in correlation! Spam blogs importance in subgraph initially inflated Realigns when blogs in 4 connect with real blogs in 1-2
36
Localized to closeness Region 1, 2 and 3 nodes have high closeness thanks to the aggregator Recall ASP graph - short distance to many, many nodes via aggr. Connection to aggregator doesn’t confer high degree, PageRank or Betweenness - nodes must ‘fend for themselves’ Degree: link to aggr. Is just 1 link. PR: aggr. ‘vote’ diluted by high degree Bet: Aggr. Is gateway to its children, could use any child to reach aggr.
37
VIGS results vary by graph and importance measure Still, subgraphs tended towards –High connectivity –Average or higher density –Shorter ASP’s –Maintain relative importance rank of vertices –“spam” affects closeness primarily Empirical Analysis Summary
38
Preserving Properties So far, just studying subgraphs Applying VIGS - may need guarantees Hard to make a guarantee? Example property: subgraph is connected Preserving Properties
39
Is it difficult to guarantee the connectedness of a VIGS subgraph? NP-complete: reducible to Steiner Minimum Spanning Tree (MST) problem Resort to heuristics KeepOne, KeepAll from Gilbert and Levchenko (2004)
40
KeepOne and KeepAll KeepOne - build an MST: drop as many vertices/edges as possible while maintaining connectivity. Problem! ASP/diameter could increase Solution: KeepAll - MST, but add all vertices/edges on a shortest path
41
Heuristic Performance - ER KO - did not have to add many vertices, but shortest path rather large (ER ASP was 4.26) KA - good improvement in path length, but huge increase in vertices ASP
42
Heuristic Performance - BZ Similar performance to ER - KO results in significantly longer shortest paths, but KA adds many vertices Is 4000 too many vertices to add? Small compared to total graph, but huge compared to number of important vertices ASP
43
Heuristic Performance - TREC Almost completely connected from the start KA adds only a few vertices, doesn’t change much Results for Web dataset similar ASP
44
Related Work Graph sampling - Similar objective: synopsis Concerned only with original graph Random sampling, snowball sampling… Lee, Kim, Jeong (2006), Leskovec, Faloutsos (2006), Li, Church, Hastie (2006) Rich-club Concerned only with high degree nodes Zhou, Mondragon (2004), Colizza, Flammini, Serrano, Vespignani (2006)
45
Related Work K-cores Subgraphs where each vertex has at least k-connections within the subgraph Dorogovstev, Goltsev, Mendes (2006) Core connectivity Smallest number of important vertices to remove before destroying largest component Mislove, Marcon, Gummadi, Druschel, Bhattacharjee (2007)
46
VIGS wrap up vertex-importance graph synopsis create a subgraph of important vertices to study both the full graph and these vertices in particular properties of VIGS depend on entire network and importance measure real world networks have dense, closely knit VIGS in some cases easy to meet connectivity & ASP guarantees
47
Thanks to Xiaolin Shi Matthew Bonner Lada Adamic NSF DMS 0547744
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.