Presentation is loading. Please wait.

Presentation is loading. Please wait.

Traceroute-like exploration of unknown networks: a statistical analysis A. Barrat, LPT, Université Paris-Sud, France I. Alvarez-Hamelin (LPT, France) L.

Similar presentations


Presentation on theme: "Traceroute-like exploration of unknown networks: a statistical analysis A. Barrat, LPT, Université Paris-Sud, France I. Alvarez-Hamelin (LPT, France) L."— Presentation transcript:

1 Traceroute-like exploration of unknown networks: a statistical analysis A. Barrat, LPT, Université Paris-Sud, France I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France) A. Vázquez (Notre-Dame University, USA) A. Vespignani (LPT, France) cond-mat/0406404 http://www.th.u-psud.fr/page_perso/Barrat

2 Plan of the talk Apparent complexity of Internet’s structure Problem of sampling biases Model for traceroute Theoretical approach of the traceroute mapping process Numerical results Conclusions

3 Multi-probe reconstruction (router-level) Use of BGP tables for the Autonomous System level (domains) Graph representation different granularities Internet representation Many projects (CAIDA, NLANR, RIPE, IPM, PingER...)

4 =>Large-scale visualizations TOPOLOGICAL ANALYSIS by STATISTICAL TOOLS and GRAPH THEORY !

5 Main topological characteristics Small world Distribution of Shortest paths (# hops) between two nodes

6 Main topological characteristics Small world : captured by Erdös-Renyi graphs Poisson distribution = p N With probability p an edge is established among couple of vertices

7 Main topological characteristics Small world Large clustering: different neighbours of a node will likely know each other 1 2 3 n Higher probability to be connected =>graph models with large clustering, e.g. Watts-Strogatz 1998

8 Main topological characteristics Small world Large clustering Dynamical network Broad connectivity distributions also observed in many other contexts (from biological to social networks) huge activity of modeling

9 Main topological characteristics Broad connectivity distributions: obtained from mapping projects Govindan et al 2002 Result of a sampling: is this reliable ?

10 Sampling biases Internet mapping: Sampling is incomplete Lateral connectivity is missed (edges are underestimated) Finite size sample => spanning tree

11 Sampling biases ● Vertices and edges best sampled in the proximity of sources ● Bad estimation of some topological properties Statistical properties of the sampled graph may sharply differ from the original one Lakhina et al. 2002 Clauset & Moore 2004 De Los Rios & Petermann 2004 Guillaume & Latapy 2004 Bad sampling ?

12 Evaluating sampling biases Real graph G=(V,E) (known, with given properties ) Sampled graph G’=(V’,E’) simulated sampling Analysis of G’, comparison with G

13 Model for traceroute G=(V, E) Sources Targets First approximation: union of shortest paths NB: Unique Shortest Path

14 Model for traceroute G’=(V’, E’) First approximation: union of shortest paths Very simple model, but: allows for some analytical and numerical understanding

15 More formally... G = (V, E): sparse undirected graph with a set of N S sources S = { i 1, i 2, …, i N S } N T targets T = {j 1, j 2, …, j N T } randomly placed. The sampled graph G’=(V’,E’) is obtained by considering the union of all the traceroute -like paths connecting source- target pairs. PARAMETERS: (probing effort) Usually N S =O(1),  T =O(1)

16 For each set  = { S, T }, the indicator function that a given edge (i, j) belongs to the sampled graph is with if (i,j)  path between l,m otherwise, Analysis of the mapping process

17 Averaging over all the possible realizations of the set  = { S, T }, WE HAVE NEGLECTED CORRELATIONS !! usually  S  T << 1 Mean-field statistical analysis  ij > > 1 – exp(-  S  T b ij ) betweenness

18 Betweenness centrality b for each pair of nodes (l,m) in the graph, there are lm shortest paths between l and m ij lm shortest paths going through ij b ij is the sum of ij lm / lm over all pairs (l,m) Similar concept: node betweenness b i i j k ij: large betweenness jk: small betweenness Also: flow of information if each individual sends a message to all other individuals

19 Consequences of the analysis  ij > > 1 – exp(-  S  T b ij ) Smallest betweenness b ij =2 => > 2  S  T i j Largest betweenness b ij =O(N 2 ) => > 1 (i.e. i and j have to be source and target to discover i-j)

20 Results for the vertices  ij > > 1 – exp(-  b ij /N) > 1 – (1-  T ) exp( -  b i /N ) (discovery probability) N * k /N k > 1 – exp( -  b(k)/N ) (discovery frequency) /k >  ( 1 + b(k)/N )/k (discovered connectivity) Summary: discovery probability strongly related with the centrality ; vertex discovery favored by a finite density of targets ; accuracy increased by increasing probing effort .

21 Numerical simulations 1. Homogeneous graphs: peaked distributions of k and b narrow range of betweenness (ex: ER random graphs) => Good sampling expected only for high probing effort

22 Numerical simulations broad distributions of k and b P(k) ~ k -3 large range of available values 1.Homogeneous graphs 2.Heavy-tailed graphs (ex: Scale-free BA model) => Expected: Hubs well-sampled independently of  (b(k) ~k  

23 Homogeneous graphs Homogeneously pretty badly sampled Numerical simulations

24 Hubs are well discovered Numerical simulations Scale-free graphs

25 No heavy-tail behaviour except.... heavy-tailed P*(k) only for N S = 1 (cf Clauset and Moore 2004) cut-off around => large, unrealistic needed bad sampling of P(k) Numerical simulations Homogeneous graphs N S =1

26 good sampling, especially of the heavy-tail; almost independent of N S ; slight bending for low degree (less central) nodes => bad evaluation of the exponents. Scale-free graphs Numerical simulations

27 Summary  Analytical approach to a traceroute-like sampling process  Link with the topological properties, in particular the betweenness  Usual random graphs more “difficult” to sample than heavy-tails  Heavy-tails well sampled  Bias yielding a sampled scale-free network from a homogeneous network: only in few cases has to be unrealistically large Heavy tails properties are a genuine feature of the Internet however Quantitative analysis might be strongly biased (wrong exponents...)

28 Optimized strategies: separate influence of  T,  S location of sources, targets investigation of other networks Results on redundancy issues Massive deployment traceroute@home Perspectives The internet is a weighted networks bandwidth, traffic, efficiency, routers capacity Data are scarse and on limited scale Interaction among topology and traffic and... cf also Guillaume and Latapy 2004


Download ppt "Traceroute-like exploration of unknown networks: a statistical analysis A. Barrat, LPT, Université Paris-Sud, France I. Alvarez-Hamelin (LPT, France) L."

Similar presentations


Ads by Google