The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert.

Slides:



Advertisements
Similar presentations
Measurement and Analysis of Online Social Networks 1 A. Mislove, M. Marcon, K Gummadi, P. Druschel, B. Bhattacharjee Presentation by Shahan Khatchadourian.
Advertisements

CSE 5243 (AU 14) Graph Basics and a Gentle Introduction to PageRank 1.
Analysis and Modeling of Social Networks Foudalis Ilias.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture10.
The Connectivity and Fault-Tolerance of the Internet Topology
CONNECTIVITY “The connectivity of a network may be defined as the degree of completeness of the links between nodes” (Robinson and Bamford, 1978).
1 Evolution of Networks Notes from Lectures of J.Mendes CNR, Pisa, Italy, December 2007 Eva Jaho Advanced Networking Research Group National and Kapodistrian.
Routing, Anycast, and Multicast for Mesh and Sensor Networks Roland Flury Roger Wattenhofer RAM Distributed Computing Group.
Graph Algorithms: Minimum Spanning Tree We are given a weighted, undirected graph G = (V, E), with weight function w:
Chapter 9 Graph algorithms. Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
Graphs & Graph Algorithms 2 Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
Chapter 9 Graph algorithms Lec 21 Dec 1, Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.
Graphs and Topology Yao Zhao. Background of Graph A graph is a pair G =(V,E) –Undirected graph and directed graph –Weighted graph and unweighted graph.
Graphs G = (V,E) V is the vertex set. Vertices are also called nodes and points. E is the edge set. Each edge connects two different vertices. Edges are.
Steiner trees Algorithms and Networks. Steiner Trees2 Today Steiner trees: what and why? NP-completeness Approximation algorithms Preprocessing.
Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Introduction Outline The Problem Domain Network Design Spanning Trees Steiner Trees Triangulation Technique Spanners Spanners Application Simple Greedy.
Graphs & Graph Algorithms 2 Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
The Very Small World of the Well-connected. (19 june 2008 ) Lada Adamic School of Information University of Michigan Ann Arbor, MI
Graph Operations And Representation. Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
Minimum Spanning Trees. Subgraph A graph G is a subgraph of graph H if –The vertices of G are a subset of the vertices of H, and –The edges of G are a.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
University of California at Santa Barbara Christo Wilson, Bryce Boe, Alessandra Sala, Krishna P. N. Puttaswamy, and Ben Zhao.
WALKING IN FACEBOOK: A CASE STUDY OF UNBIASED SAMPLING OF OSNS junction.
Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.
COSC 2007 Data Structures II Chapter 14 Graphs III.
7.1 and 7.2: Spanning Trees. A network is a graph that is connected –The network must be a sub-graph of the original graph (its edges must come from the.
Module 5 – Networks and Decision Mathematics Chapter 23 – Undirected Graphs.
Lecture 5: Mathematics of Networks (Cont) CS 790g: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Shi Zhou University College London Second-order mixing in networks Shi Zhou University College London.
Lecture 13: Network centrality Slides are modified from Lada Adamic.
InterConnection Network Topologies to Minimize graph diameter: Low Diameter Regular graphs and Physical Wire Length Constrained networks Nilesh Choudhury.
Data Structures & Algorithms Graphs
Slides are modified from Lada Adamic
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Data Structures & Algorithms Graphs Richard Newman based on book by R. Sedgewick and slides by S. Sahni.
GRAPHS. Graph Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component, spanning tree Types of graphs: undirected,
Graphs Upon completion you will be able to:
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Spanning Tree Definition:A tree T is a spanning tree of a graph G if T is a subgraph of G that contains all of the vertices of G. A graph may have more.
Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.
Models of Web-Like Graphs: Integrated Approach
Spanning Trees Dijkstra (Unit 10) SOL: DM.2 Classwork worksheet Homework (day 70) Worksheet Quiz next block.
Response network emerging from simple perturbation Seung-Woo Son Complex System and Statistical Physics Lab., Dept. Physics, KAIST, Daejeon , Korea.
1 Data Structures and Algorithms Graphs. 2 Graphs Basic Definitions Paths and Cycles Connectivity Other Properties Representation Examples of Graph Algorithms:
Topics In Social Computing (67810) Module 1 (Structure) Centrality Measures, Graph Clustering Random Walks on Graphs.
Random Walk for Similarity Testing in Complex Networks
Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst
Minimum Spanning Trees
Data Center Network Architectures
Groups of vertices and Core-periphery structure
Topics In Social Computing (67810)
Graph Operations And Representation
Enumerating Distances Using Spanners of Bounded Degree
Minimum Spanning Trees
Graphs & Graph Algorithms 2
Peer-to-Peer and Social Networks Fall 2017
Graph Operations And Representation
Networks Kruskal’s Algorithm
Chapter 15 Graph Theory © 2008 Pearson Addison-Wesley.
Lecture 21 Network evolution
Practical Applications Using igraph in R Roger Stanton
Graph Operations And Representation
Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst
Network Models Michael Goodrich Some slides adapted from:
Chapter 9 Graph algorithms
For Friday Read chapter 9, sections 2-3 No homework
Presentation transcript:

The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert

Outline  VIGS: Vertex-Importance Graph Synopsis  Testing VIGS with different datasets and importance measures  Analytical expectations  Making guarantees about VIGS  Connectedness: KeepOne, KeepAll  Related Work  Graph Sampling, Rich Club, K-cores, Web Measure

Network or Hairball?  Huge networks difficult to study, store, share..  Can we shrink or summarize a network?  Starting point: important vertices  Vertex-Importance Graph Synopsis

Vertex-Importance Graph Synopsis  Create subgraph of important vertices  Study both key nodes and entire graph  Which vertices are important?  High-traffic routers? The most quoted blog?  Standard, well-defined measures  Degree, Betweenness, Closeness, PageRank

VIGS In Action Starting point: random graph with 100 vertices Select an importance measure - Degree pick 9 highest degree vertices keep only edges between these 9 vertices average degree = 4average degree = 0.9

Motivating example: citations among ACM papers 500 random papers500 most cited papers

Datasets Erdos-Renyi random graph and three real networks  BuddyZoo - collection of buddy lists  TREC - links between blogs  Web - an older web crawl from PARC Erdos-RenyiBuddyZooTRECWeb Vertices 10,000135,13129,690152,171 Edges 49,935803,200195,9401,686,541 ASP Directed false true

Importance measures  degree (number of connections) denoted by size  betweenness (number of shortest paths a vertex lies on) denoted by color

Importance measures  degree (number of connections) denoted by size  closeness (length of shortest path to all others) denoted by color

 High correlation between different importance measurements  Undirected graphs - higher correlation  Closeness has lowest correlation in all datasets Correlation among measures

 High correlation between different importance measurements  Undirected graphs – higher orrelation  Closeness has lowest correlation in all datasets Correlation among measures

Assortativity  In an assortative graph, high-value nodes tend to connect to other high-value nodes  Example: degree assortativedisassortative

Assortativity - Degree ER: Neutral BZ: Assortative TREC and Web: Disassortative

Assortativity

Degree distributions

Subgraphs  Apply VIGS! Select Degree, top 100 nodes  Example: degree  Substantial difference between datasets!

Subgraphs  The selection of an importance measure may have an impact, even in the same dataset

Connectivity: size of largest component Proportion of nodes that are connected either directly or indirectly

Subgraph Connectivity - ER Highly connected, even with only a few vertices All importance measures almost completely connected by 2000 nodes Better performance than random

Subgraph Connectivity

subgraphs: density average degree = 4average degree = 0.9  What is the proportion of edges to nodes in the original graphs vs. subgraphs?

Subgraph Density - ER Black line slope = Edges/Vertices in entire network Lower dotted line = subgraph of random vertices VIGS subgraphs: lower than total density, higher than random subgraph density

Subgraph Density

Average Shortest Path ‘ASP’

whole network ASP ASP between IV’s in subgraph. ASP between IV’s in whole graph ER ASP shorter between IV’s, but higher in subgraph Subgraph Average Shortest Path ‘ASP’ for Erdos Renyi

Subgraph ASP’s

Relative Rank of Vertices in Subgraph - ER Do IV’s maintain their relative rank in subgraphs? IV and edges only ER - little correlation, steadily increasing until all vertices are included

Relative Rank in Subgraph

TREC anomaly - closeness

Four Regions  Four regions, highlighted in density plot: Original Closeness only, Regions highlighted

Cause: Blog Aggregator  One node has connections to 99% of the nodes between 1 and 7961! (regions 1, 2, 3)  This same node has only 1 connection to a node beyond 7961 (region 4)  Nodes between 5828 and 7961 (region 3) have only 1 connection: to the aggregator  Spam blogs? New blogs? Private blogs?

Examining Density  The first 3 regions feature nodes connected to the aggregator  R1: well connected blogs  Average increase in total edges per node added:  R2: far less connected, but not quite barren  Average increase per node: 3.2  R3: isolated spam/new blogs  1 edge per node increase

Examining Density  R4: well connected, but not linked to aggregator  Average increase even higher than region 1: 17.8  Aggregator inflated the closeness scores of connected nodes (R1, 2, 3) above those in region 4

Examining Avg Shortest Paths (ASP)  R1: ASP slightly below 2  Some nodes directly connected, 99%+ within 2 hops via aggregator  R2 and 3: ASP levels at ~2  Fewer and fewer direct links, but all accessible via aggregator  R4: ASP’s begin to increase  ASP doesn’t explode: ~70% of R4 links are to R1 or R2 nodes  R3 only reachable from R4 via agr.  Access to aggregator through connected R1/R2 nodes: adds a hop to path

Examining Relative Ranking Correlation  R1-3: correlation steadily decreases  R4: rapid increase in correlation!  Spam blogs importance in subgraph initially inflated  Realigns when blogs in 4 connect with real blogs in 1-2

Localized to closeness  Region 1, 2 and 3 nodes have high closeness thanks to the aggregator  Recall ASP graph - short distance to many, many nodes via aggr.  Connection to aggregator doesn’t confer high degree, PageRank or Betweenness - nodes must ‘fend for themselves’  Degree: link to aggr. Is just 1 link.  PR: aggr. ‘vote’ diluted by high degree  Bet: Aggr. Is gateway to its children, could use any child to reach aggr.

VIGS results vary by graph and importance measure Still, subgraphs tended towards –High connectivity –Average or higher density –Shorter ASP’s –Maintain relative importance rank of vertices –“spam” affects closeness primarily Empirical Analysis Summary

Preserving Properties  So far, just studying subgraphs  Applying VIGS - may need guarantees  Hard to make a guarantee?  Example property: subgraph is connected Preserving Properties

 Is it difficult to guarantee the connectedness of a VIGS subgraph?  NP-complete: reducible to Steiner Minimum Spanning Tree (MST) problem  Resort to heuristics  KeepOne, KeepAll from Gilbert and Levchenko (2004)

KeepOne and KeepAll  KeepOne - build an MST: drop as many vertices/edges as possible while maintaining connectivity.  Problem! ASP/diameter could increase  Solution: KeepAll - MST, but add all vertices/edges on a shortest path

Heuristic Performance - ER KO - did not have to add many vertices, but shortest path rather large (ER ASP was 4.26) KA - good improvement in path length, but huge increase in vertices ASP

Heuristic Performance - BZ Similar performance to ER - KO results in significantly longer shortest paths, but KA adds many vertices Is 4000 too many vertices to add? Small compared to total graph, but huge compared to number of important vertices ASP

Heuristic Performance - TREC Almost completely connected from the start KA adds only a few vertices, doesn’t change much Results for Web dataset similar ASP

Related Work  Graph sampling - Similar objective: synopsis  Concerned only with original graph  Random sampling, snowball sampling…  Lee, Kim, Jeong (2006),  Leskovec, Faloutsos (2006),  Li, Church, Hastie (2006)  Rich-club  Concerned only with high degree nodes  Zhou, Mondragon (2004),  Colizza, Flammini, Serrano, Vespignani (2006)

Related Work  K-cores  Subgraphs where each vertex has at least k-connections within the subgraph  Dorogovstev, Goltsev, Mendes (2006)  Core connectivity  Smallest number of important vertices to remove before destroying largest component  Mislove, Marcon, Gummadi, Druschel, Bhattacharjee (2007)

VIGS wrap up  vertex-importance graph synopsis  create a subgraph of important vertices to study both the full graph and these vertices in particular  properties of VIGS depend on entire network and importance measure  real world networks have dense, closely knit VIGS  in some cases easy to meet connectivity & ASP guarantees

Thanks to  Xiaolin Shi  Matthew Bonner  Lada Adamic NSF DMS