Traffic-driven model of the World-Wide-Web Graph A. Barrat, LPT, Orsay, France M. Barthélemy, CEA, France A. Vespignani, LPT, Orsay, France
Outline The WebGraph Some empirical characteristics Various models Weights and strengths Our model: Definition Analysis: analytics+numerics Conclusions
The Web as a directed graph i j l nodes i : web-pages directed links: hyperlinks in- and out- degrees:
Small world : captured by Erdös-Renyi graphs Poisson distribution = p N With probability p an edge is established among couple of vertices Empirical facts
Small world Large clustering: different neighbours of a node will likely know each other n Higher probability to be connected =>graph models with large clustering, e.g. Watts-Strogatz 1998 Empirical facts
Small world Large clustering Dynamical network Broad connectivity distributions also observed in many other contexts (from biological to social networks) huge activity of modeling Empirical facts (Barabasi-Albert 1999; Broder et al. 2000; Kumar et al. 2000; Adamic-Huberman 2001; Laura et al. 2003)
Various growing networks models á Barab á si-Albert (1999): preferential attachment á Many variations on the BA model: rewiring (Tadic 2001, Krapivsky et al. 2001), addition of edges, directed model (Dorogovtsev-Mendes 2000, Cooper-Frieze 2001), fitness (Bianconi-Barab á si 2001),... Kumar et al. (2000): copying mechanism Pandurangan et al. (2002): PageRank+pref. attachment Laura et al. (2002): Multi-layer model Menczer (2002): textual content of web-pages
The Web as a directed graph i j l nodes i : web-pages directed links: hyperlinks Broad P(k in ) ; cut-off for P(k out ) (Broder et al. 2000; Kumar et al. 2000; Adamic-Huberman 2001; Laura et al. 2003)
Additional level of complexity: Weights and Strengths i j Links carry weights/traffic: w ij In- and out- strengths l Adamic-Huberman 2001: broad distribution of s in
Model: directed network n i j (i) Growth (ii) Strength driven preferential attachment (n: k out =m outlinks) AND... “Busy gets busier”
Weights reinforcement mechanism i j n The new traffic n-i increases the traffic i-j “Busy gets busier”
Evolution equations (Continuous approximation) Coupling term
Resolution Ansatz supported by numerics:
Results
Approximation Total in-weight i s in i : approximately proportional to the total number of in-links i k in i, times average weight h w i = 1+ Then: A=1+ s in 2 [2;2+1/m]
Measure of A prediction of Numerical simulations Approx of
Numerical simulations NB: broad P(s out ) even if k out =m
Clustering spectrum i.e.: fraction of connected couples of neighbours of node i
Clustering spectrum increases => clustering increases New pages: point to various well-known pages, often connected together => large clustering for small nodes Old, popular pages with large k: many in-links from many less popular pages which are not connected together => smaller clustering for large nodes
Clustering and weighted clustering takes into account the relevance of triangles in the global traffic
Clustering and weighted clustering Weighted Clustering larger than topological clustering: triangles carry a large part of the traffic
Assortativity Average connectivity of nearest neighbours of i
Assortativity k nn : disassortative behaviour, as usual in growing networks models, and typical in technological networks lack of correlations in popularity as measured by the in-degree
Summary Web: heterogeneous topology and traffic Mechanism taking into account interplay between topology and traffic Simple mechanism=>complex behaviour, scale-free distributions for connectivity and traffic Analytical study possible Study of correlations: non-trivial hierarchical behaviour Possibility to add features (fitnesses, rewiring, addition of edges, etc...), to modify the redistribution rule... Empirical studies of traffic and correlations?