Traceroute-like exploration of unknown networks: a statistical analysis A. Barrat, LPT, Université Paris-Sud, France I. Alvarez-Hamelin (LPT, France) L.

Slides:



Advertisements
Similar presentations
University of Nevada, Reno Router-level Internet Topology Mapping CS790 Presentation Modified from Dr. Gunes slides by Talha OZ.
Advertisements

Network biology Wang Jie Shanghai Institutes of Biological Sciences.
Traffic-driven model of the World-Wide-Web Graph A. Barrat, LPT, Orsay, France M. Barthélemy, CEA, France A. Vespignani, LPT, Orsay, France.
Analysis and Modeling of Social Networks Foudalis Ilias.
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
Daniel ben -AvrahamClarkson University Boston Universtiy Reuven Cohen Tomer Kalisky Alex Rozenfeld Bar-Ilan University Eugene Stanley Lidia Braunstein.
Advanced Topics in Data Mining Special focus: Social Networks.
Weighted networks: analysis, modeling A. Barrat, LPT, Université Paris-Sud, France M. Barthélemy (CEA, France) R. Pastor-Satorras (Barcelona, Spain) A.
Topology Generation Suat Mercan. 2 Outline Motivation Topology Characterization Levels of Topology Modeling Techniques Types of Topology Generators.
Complex Networks Third Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.
Networks. Graphs (undirected, unweighted) has a set of vertices V has a set of undirected, unweighted edges E graph G = (V, E), where.
The structure of the Internet. How are routers connected? Why should we care? –While communication protocols will work correctly on ANY topology –….they.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Network Statistics Gesine Reinert. Yeast protein interactions.
Search in a Small World JIN Xiaolong Based on [1].
Advanced Topics in Data Mining Special focus: Social Networks.
Computer Science Sampling Biases in IP Topology Measurements John Byers with Anukool Lakhina, Mark Crovella and Peng Xie Department of Computer Science.
Graphs and Topology Yao Zhao. Background of Graph A graph is a pair G =(V,E) –Undirected graph and directed graph –Weighted graph and unweighted graph.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Models of Influence in Online Social Networks
On Power-Law Relationships of the Internet Topology.
Weighted networks: analysis, modeling A. Barrat, LPT, Université Paris-Sud, France M. Barthélemy (CEA, France) R. Pastor-Satorras (Barcelona, Spain) A.
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
The Erdös-Rényi models
Optimization Based Modeling of Social Network Yong-Yeol Ahn, Hawoong Jeong.
Information Networks Power Laws and Network Models Lecture 3.
Epidemic spreading in complex networks: from populations to the Internet Maziar Nekovee, BT Research Y. Moreno, A. Paceco (U. Zaragoza) A. Vespignani (LPT-
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
Alessandro Vespignani (CNRS, LPT-Paris). Alain Barrat (CNRS, LPT-Paris) Yamir Moreno (University of Saragoza) Alexei Vazquez (University of Notre Dame)
Complex networks A. Barrat, LPT, Université Paris-Sud, France I. Alvarez-Hamelin (LPT, Orsay, France) M. Barthélemy (CEA, France) L. Dall’Asta (LPT, Orsay,
Soon-Hyung Yook, Sungmin Lee, Yup Kim Kyung Hee University NSPCS 08 Unified centrality measure of complex networks.
Biological Networks Lectures 6-7 : February 02, 2010 Graph Algorithms Review Global Network Properties Local Network Properties 1.
DYNAMICS OF COMPLEX SYSTEMS Self-similar phenomena and Networks Guido Caldarelli CNR-INFM Istituto dei Sistemi Complessi
Weighted networks: analysis, modeling A. Barrat, LPT, Université Paris-Sud, France M. Barthélemy (CEA, France) R. Pastor-Satorras (Barcelona, Spain) A.
Complex Networks First Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.
Random-Graph Theory The Erdos-Renyi model. G={P,E}, PNP 1,P 2,...,P N E In mathematical terms a network is represented by a graph. A graph is a pair of.
A Graph-based Friend Recommendation System Using Genetic Algorithm
International Workshop on Complex Networks, Seoul (23-24 June 2005) Vertex Correlations, Self-Avoiding Walks and Critical Phenomena on the Static Model.
School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2013 Figures are taken.
Part 1: Biological Networks 1.Protein-protein interaction networks 2.Regulatory networks 3.Expression networks 4.Metabolic networks 5.… more biological.
Self-Similarity of Complex Networks Maksim Kitsak Advisor: H. Eugene Stanley Collaborators: Shlomo Havlin Gerald Paul Zhenhua Wu Yiping Chen Guanliang.
Social Network Analysis Prof. Dr. Daning Hu Department of Informatics University of Zurich Mar 5th, 2013.
Challenges and Opportunities Posed by Power Laws in Network Analysis Bruno Ribeiro UMass Amherst MURI REVIEW MEETING Berkeley, 26 th Oct 2011.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Soon-Hyung Yook, Sungmin Lee, Yup Kim Kyung Hee University NSPCS 08 Unified centrality measure of complex networks: a dynamical approach to a topological.
Physics of Flow in Random Media Publications/Collaborators: 1) “Postbreakthrough behavior in flow through porous media” E. López, S. V. Buldyrev, N. V.
Percolation Processes Rajmohan Rajaraman Northeastern University, Boston May 2012 Chennai Network Optimization WorkshopPercolation Processes1.
Workshop on Optimization in Complex Networks, CNLS, LANL (19-22 June 2006) Application of replica method to scale-free networks: Spectral density and spin-glass.
Slides are modified from Lada Adamic
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
University “Ss. Cyril and Methodus” SKOPJE Cluster-based MDS Algorithm for Nodes Localization in Wireless Sensor Networks Ass. Biljana Stojkoska.
KPS 2007 (April 19, 2007) On spectral density of scale-free networks Doochul Kim (Department of Physics and Astronomy, Seoul National University) Collaborators:
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel University of Haifa, Israel.
1 CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014 Network models Tamer Kahveci.
March 3, 2009 Network Analysis Valerie Cardenas Nicolson Assistant Adjunct Professor Department of Radiology and Biomedical Imaging.
Transport in weighted networks: optimal path and superhighways Collaborators: Z. Wu, Y. Chen, E. Lopez, S. Carmi, L.A. Braunstein, S. Buldyrev, H. E. Stanley.
Informatics tools in network science
Hierarchical Organization in Complex Networks by Ravasz and Barabasi İlhan Kaya Boğaziçi University.
Incrementally Improving Lookup Latency in Distributed Hash Table Systems Hui Zhang 1, Ashish Goel 2, Ramesh Govindan 1 1 University of Southern California.
Response network emerging from simple perturbation Seung-Woo Son Complex System and Statistical Physics Lab., Dept. Physics, KAIST, Daejeon , Korea.
ICTP School and Workshop on Structure and Function of complex Networks (16-28 May 2005) Structural correlations and critical phenomena of random scale-free.
Lecture II Introduction to complex networks Santo Fortunato.
Weighted Networks IST402 – Network Science Acknowledgement: Roberta Sinatra Laszlo Barabasi.
Cmpe 588- Modeling of Internet Emergence of Scale-Free Network with Chaotic Units Pulin Gong, Cees van Leeuwen by Oya Ünlü Instructor: Haluk Bingöl.
Structures of Networks
Groups of vertices and Core-periphery structure
Evaluation of a Large-Scale Topology Discovery Aglorithm
Improved Algorithms for Network Topology Discovery
Presentation transcript:

Traceroute-like exploration of unknown networks: a statistical analysis A. Barrat, LPT, Université Paris-Sud, France I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France) A. Vázquez (Notre-Dame University, USA) A. Vespignani (LPT, France) cond-mat/

Plan of the talk Apparent complexity of Internet’s structure Problem of sampling biases Model for traceroute Theoretical approach of the traceroute mapping process Numerical results Conclusions

Multi-probe reconstruction (router-level) Use of BGP tables for the Autonomous System level (domains) Graph representation different granularities Internet representation Many projects (CAIDA, NLANR, RIPE, IPM, PingER...)

=>Large-scale visualizations TOPOLOGICAL ANALYSIS by STATISTICAL TOOLS and GRAPH THEORY !

Main topological characteristics Small world Distribution of Shortest paths (# hops) between two nodes

Main topological characteristics Small world : captured by Erdös-Renyi graphs Poisson distribution = p N With probability p an edge is established among couple of vertices

Main topological characteristics Small world Large clustering: different neighbours of a node will likely know each other n Higher probability to be connected =>graph models with large clustering, e.g. Watts-Strogatz 1998

Main topological characteristics Small world Large clustering Dynamical network Broad connectivity distributions also observed in many other contexts (from biological to social networks) huge activity of modeling

Main topological characteristics Broad connectivity distributions: obtained from mapping projects Govindan et al 2002 Result of a sampling: is this reliable ?

Sampling biases Internet mapping: Sampling is incomplete Lateral connectivity is missed (edges are underestimated) Finite size sample => spanning tree

Sampling biases ● Vertices and edges best sampled in the proximity of sources ● Bad estimation of some topological properties Statistical properties of the sampled graph may sharply differ from the original one Lakhina et al Clauset & Moore 2004 De Los Rios & Petermann 2004 Guillaume & Latapy 2004 Bad sampling ?

Evaluating sampling biases Real graph G=(V,E) (known, with given properties ) Sampled graph G’=(V’,E’) simulated sampling Analysis of G’, comparison with G

Model for traceroute G=(V, E) Sources Targets First approximation: union of shortest paths NB: Unique Shortest Path

Model for traceroute G’=(V’, E’) First approximation: union of shortest paths Very simple model, but: allows for some analytical and numerical understanding

More formally... G = (V, E): sparse undirected graph with a set of N S sources S = { i 1, i 2, …, i N S } N T targets T = {j 1, j 2, …, j N T } randomly placed. The sampled graph G’=(V’,E’) is obtained by considering the union of all the traceroute -like paths connecting source- target pairs. PARAMETERS: (probing effort) Usually N S =O(1),  T =O(1)

For each set  = { S, T }, the indicator function that a given edge (i, j) belongs to the sampled graph is with if (i,j)  path between l,m otherwise, Analysis of the mapping process

Averaging over all the possible realizations of the set  = { S, T }, WE HAVE NEGLECTED CORRELATIONS !! usually  S  T << 1 Mean-field statistical analysis  ij > > 1 – exp(-  S  T b ij ) betweenness

Betweenness centrality b for each pair of nodes (l,m) in the graph, there are lm shortest paths between l and m ij lm shortest paths going through ij b ij is the sum of ij lm / lm over all pairs (l,m) Similar concept: node betweenness b i i j k ij: large betweenness jk: small betweenness Also: flow of information if each individual sends a message to all other individuals

Consequences of the analysis  ij > > 1 – exp(-  S  T b ij ) Smallest betweenness b ij =2 => > 2  S  T i j Largest betweenness b ij =O(N 2 ) => > 1 (i.e. i and j have to be source and target to discover i-j)

Results for the vertices  ij > > 1 – exp(-  b ij /N) > 1 – (1-  T ) exp( -  b i /N ) (discovery probability) N * k /N k > 1 – exp( -  b(k)/N ) (discovery frequency) /k >  ( 1 + b(k)/N )/k (discovered connectivity) Summary: discovery probability strongly related with the centrality ; vertex discovery favored by a finite density of targets ; accuracy increased by increasing probing effort .

Numerical simulations 1. Homogeneous graphs: peaked distributions of k and b narrow range of betweenness (ex: ER random graphs) => Good sampling expected only for high probing effort

Numerical simulations broad distributions of k and b P(k) ~ k -3 large range of available values 1.Homogeneous graphs 2.Heavy-tailed graphs (ex: Scale-free BA model) => Expected: Hubs well-sampled independently of  (b(k) ~k  

Homogeneous graphs Homogeneously pretty badly sampled Numerical simulations

Hubs are well discovered Numerical simulations Scale-free graphs

No heavy-tail behaviour except.... heavy-tailed P*(k) only for N S = 1 (cf Clauset and Moore 2004) cut-off around => large, unrealistic needed bad sampling of P(k) Numerical simulations Homogeneous graphs N S =1

good sampling, especially of the heavy-tail; almost independent of N S ; slight bending for low degree (less central) nodes => bad evaluation of the exponents. Scale-free graphs Numerical simulations

Summary  Analytical approach to a traceroute-like sampling process  Link with the topological properties, in particular the betweenness  Usual random graphs more “difficult” to sample than heavy-tails  Heavy-tails well sampled  Bias yielding a sampled scale-free network from a homogeneous network: only in few cases has to be unrealistically large Heavy tails properties are a genuine feature of the Internet however Quantitative analysis might be strongly biased (wrong exponents...)

Optimized strategies: separate influence of  T,  S location of sources, targets investigation of other networks Results on redundancy issues Massive deployment Perspectives The internet is a weighted networks bandwidth, traffic, efficiency, routers capacity Data are scarse and on limited scale Interaction among topology and traffic and... cf also Guillaume and Latapy 2004