Download presentation
Presentation is loading. Please wait.
Published byEaster Washington Modified over 9 years ago
1
Traceroute-like exploration of unknown networks: a statistical analysis A. Barrat, LPT, Université Paris-Sud, France I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France) A. Vázquez (Notre-Dame University, USA) A. Vespignani (LPT, France) cond-mat/0406404 http://www.th.u-psud.fr/page_perso/Barrat
2
Plan of the talk Apparent complexity of Internet’s structure Problem of sampling biases Model for traceroute Theoretical approach of the traceroute mapping process Numerical results Conclusions
3
Multi-probe reconstruction (router-level) Use of BGP tables for the Autonomous System level (domains) Graph representation different granularities Internet representation Many projects (CAIDA, NLANR, RIPE, IPM, PingER...)
4
=>Large-scale visualizations TOPOLOGICAL ANALYSIS by STATISTICAL TOOLS and GRAPH THEORY !
5
Main topological characteristics Small world Distribution of Shortest paths (# hops) between two nodes
6
Main topological characteristics Small world : captured by Erdös-Renyi graphs Poisson distribution = p N With probability p an edge is established among couple of vertices
7
Main topological characteristics Small world Large clustering: different neighbours of a node will likely know each other 1 2 3 n Higher probability to be connected =>graph models with large clustering, e.g. Watts-Strogatz 1998
8
Main topological characteristics Small world Large clustering Dynamical network Broad connectivity distributions also observed in many other contexts (from biological to social networks) huge activity of modeling
9
Main topological characteristics Broad connectivity distributions: obtained from mapping projects Govindan et al 2002 Result of a sampling: is this reliable ?
10
Sampling biases Internet mapping: Sampling is incomplete Lateral connectivity is missed (edges are underestimated) Finite size sample => spanning tree
11
Sampling biases ● Vertices and edges best sampled in the proximity of sources ● Bad estimation of some topological properties Statistical properties of the sampled graph may sharply differ from the original one Lakhina et al. 2002 Clauset & Moore 2004 De Los Rios & Petermann 2004 Guillaume & Latapy 2004 Bad sampling ?
12
Evaluating sampling biases Real graph G=(V,E) (known, with given properties ) Sampled graph G’=(V’,E’) simulated sampling Analysis of G’, comparison with G
13
Model for traceroute G=(V, E) Sources Targets First approximation: union of shortest paths NB: Unique Shortest Path
14
Model for traceroute G’=(V’, E’) First approximation: union of shortest paths Very simple model, but: allows for some analytical and numerical understanding
15
More formally... G = (V, E): sparse undirected graph with a set of N S sources S = { i 1, i 2, …, i N S } N T targets T = {j 1, j 2, …, j N T } randomly placed. The sampled graph G’=(V’,E’) is obtained by considering the union of all the traceroute -like paths connecting source- target pairs. PARAMETERS: (probing effort) Usually N S =O(1), T =O(1)
16
For each set = { S, T }, the indicator function that a given edge (i, j) belongs to the sampled graph is with if (i,j) path between l,m otherwise, Analysis of the mapping process
17
Averaging over all the possible realizations of the set = { S, T }, WE HAVE NEGLECTED CORRELATIONS !! usually S T << 1 Mean-field statistical analysis ij > > 1 – exp(- S T b ij ) betweenness
18
Betweenness centrality b for each pair of nodes (l,m) in the graph, there are lm shortest paths between l and m ij lm shortest paths going through ij b ij is the sum of ij lm / lm over all pairs (l,m) Similar concept: node betweenness b i i j k ij: large betweenness jk: small betweenness Also: flow of information if each individual sends a message to all other individuals
19
Consequences of the analysis ij > > 1 – exp(- S T b ij ) Smallest betweenness b ij =2 => > 2 S T i j Largest betweenness b ij =O(N 2 ) => > 1 (i.e. i and j have to be source and target to discover i-j)
20
Results for the vertices ij > > 1 – exp(- b ij /N) > 1 – (1- T ) exp( - b i /N ) (discovery probability) N * k /N k > 1 – exp( - b(k)/N ) (discovery frequency) /k > ( 1 + b(k)/N )/k (discovered connectivity) Summary: discovery probability strongly related with the centrality ; vertex discovery favored by a finite density of targets ; accuracy increased by increasing probing effort .
21
Numerical simulations 1. Homogeneous graphs: peaked distributions of k and b narrow range of betweenness (ex: ER random graphs) => Good sampling expected only for high probing effort
22
Numerical simulations broad distributions of k and b P(k) ~ k -3 large range of available values 1.Homogeneous graphs 2.Heavy-tailed graphs (ex: Scale-free BA model) => Expected: Hubs well-sampled independently of (b(k) ~k
23
Homogeneous graphs Homogeneously pretty badly sampled Numerical simulations
24
Hubs are well discovered Numerical simulations Scale-free graphs
25
No heavy-tail behaviour except.... heavy-tailed P*(k) only for N S = 1 (cf Clauset and Moore 2004) cut-off around => large, unrealistic needed bad sampling of P(k) Numerical simulations Homogeneous graphs N S =1
26
good sampling, especially of the heavy-tail; almost independent of N S ; slight bending for low degree (less central) nodes => bad evaluation of the exponents. Scale-free graphs Numerical simulations
27
Summary Analytical approach to a traceroute-like sampling process Link with the topological properties, in particular the betweenness Usual random graphs more “difficult” to sample than heavy-tails Heavy-tails well sampled Bias yielding a sampled scale-free network from a homogeneous network: only in few cases has to be unrealistically large Heavy tails properties are a genuine feature of the Internet however Quantitative analysis might be strongly biased (wrong exponents...)
28
Optimized strategies: separate influence of T, S location of sources, targets investigation of other networks Results on redundancy issues Massive deployment traceroute@home Perspectives The internet is a weighted networks bandwidth, traffic, efficiency, routers capacity Data are scarse and on limited scale Interaction among topology and traffic and... cf also Guillaume and Latapy 2004
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.