1 A Random-Surfer Web-Graph Model Avrim Blum, Hubert Chan, Mugizi Rwebangira Carnegie Mellon University
2 The Web as a Graph Consider the World Wide Web as a graph, with web pages as nodes and links between pages as edges. Experiments suggest that the degree distribution of the Web-Graph follows a power law [FFF99]. links.html resume.html index.html
3 Power Law Taking the logarithm of both sides: log Pr (X=k) = log C –α log k The distribution of a quantity X follows a power law if Pr (X=k) = Ck -α Thus if we take a log-log plot of a power law distribution we will obtain a straight line.
4 Previous Work Barabási and Albert proposed the Preferential Attachment model[BA99]: Each new node connects to the existing nodes with a probability proportional to their degree. It is known that Preferential Attachment gives a power-law distribution. [Mitzenmacher, Cooper & Frieze 03, KRRSTU00] Other models proposed include the “copying model.” [KRRSTU00]
5 Motivating Questions Why would a new node connect to nodes of high degree? -Are high degree nodes more attractive? -Or are there other explanations? How does a new node find out what the high degree nodes are? Motivating Observation: Suppose each page has a small probability p of being interesting. Suppose a user does a (undirected) random walk until they find an interesting page. If p is small then this is the same as preferential attachment. What about other processes and directed graphs?
6 Directed 1-Step Random Surfer At time 1, we start with a single node with a self-loop. At time t, a node is chosen uniformly at random, with probability p the new node connects to this node, or with probability 1-p it connects to a random out-neighbor of that node. (Extension: Repeat process k times for each new node to get out-degree k) Note: This model is just another way of stating the directed preferential attachment model.
7 Directed 1-step Random Surfer, p=.5 T=1T=2 ¾ T=3 ¼ T=4 (½) (½)+ (½) (½)+ (½) (½)(½) (⅓)+ (½) (⅓)+ (½) (⅓)+(½) (⅓)
8 Directed Coin Flipping model 1.At time 1, we start with a single node with a self-loop. 2.At time t, we choose a node uniformly at random. 3.We then flip a coin of bias p. 4.If the coin comes up heads, we connect to the current node. 5.Else we walk to a random neighbor and go to step 3. “each page has equal probability p of being interesting to us”
9 NEW NODE RANDOM STARTING NODE 1. COIN TOSS: TAIL 2. COIN TOSS: TAIL 3. COIN TOSS: HEAD
10 Is Directed Coin-Flipping Power- lawed? We don’t know … but we do have some partial results... Note: unlike for undirected graphs, the case p → 0 is not so interesting since then you just get a star.
11 Virtual Degree Definition: Let l i (u) be the number of level i descendents of u. Let i (i ≥ 1) is a sequence of real number with 1 =1. Then v(u) = 1 + ∑ β i l i (u) (i ≥ 1)
12 u = v(u) = 1 + β 1 (2) + β 2 (4) + β 3 (0) + β 4 (0) +... Virtual Degree v(u) = 1 + β 1 l 1 (u) + β 2 l 2 (u) + β 3 l 3 (u) + β 4 l 4 (u) +... Easy observation: If we set β i = (1-p) i then the expected increase in degree(u) is proportional to v(u).
13 Virtual Degree Theorem: There always exist β i such that 1.For i ≥ 1, |β i | · 1. 2.As i → ∞, β i →0 exponentially. 3.The expected increase in v(u) is proportional to v(u). Theorem: For any node u and time t ≥ t u, E[v t (u)] = Θ((t/t u ) p ) Let v t (u) be the virtual degree of node u at time t and t u be the time when node u first appears. Recurrence: 1 =1, 2 =p, i+1 = i – (1-p) i-1 E.g., for p=¾, i = 1, 3/4, 1/2, 5/16, 3/16, 7/64,... for p=½, i = 1, 1/2, 0, -1/4, -1/4, -1/8, 0, 1/16, …
14 Virtual Degree, contd Theorem: For any node u and time t ≥ t u, E[v t (u)] = Θ((t/t u ) p ) Let v t (u) be the virtual degree of node u at time t and t u be the time when node u first appears. We also have some weak concentration bounds. Unfortunately not strong enough: if these could be strengthened then would have a proof that virtual degrees (not just their expectations) follow power law.
15 Actual Degree Theorem: For any node u and time t ≥ t u, E[l 1 (u)] ≥ Ω((t/t u ) p(1-p) ) We can also obtain lower bounds on the actual degrees:
16 Experiments Random graphs of n=100,000 nodes Compute statistics averaged over 100 runs. K=1 (Every node has out-degree 1)
17 Uniform random connections
18 Directed 1-Step Random Surfer, p=3/4
19 Directed 1-Step Random Surfer, p=1/2
20 Directed 1-Step Random Surfer, p=1/4
21 Directed Coin Flipping, p=1/2
22 Directed Coin Flipping, p=1/4
23 Undirected coin flipping, p=1/2
24 Undirected Coin Flipping p=0.05
25 Conclusions Directed random walk models appear to generate power-laws (and partial theoretical results). Power laws can naturally emerge, even if all nodes have the same intrinsic “attractiveness”. (Even in absence of “role model” as in copying-model)
26 Open questions Can we prove that the degrees in the directed coin- flipping model indeed follow a power law? Analyze degree distribution for undirected coin- flipping model with p=1/2? Suppose page i has “interestingness” p i. Can we analyze the degree as a function of t, i and p i ?
27 Questions?