Presentation is loading. Please wait.

Presentation is loading. Please wait.

Generative Models for the Web Graph José Rolim. Aim Reproduce emergent properties: –Distribution site size –Connectivity of the Web –Power law distriubutions.

Similar presentations


Presentation on theme: "Generative Models for the Web Graph José Rolim. Aim Reproduce emergent properties: –Distribution site size –Connectivity of the Web –Power law distriubutions."— Presentation transcript:

1 Generative Models for the Web Graph José Rolim

2 Aim Reproduce emergent properties: –Distribution site size –Connectivity of the Web –Power law distriubutions –Small World Properties

3 Classical Model Random Graphs Erdos-Renyi Graph G(n,p) – n number of nodes – p probability of connextion pc threshold probability – p < pc -many disconnected components – p=pc - a large connected component – p=1 – a complete graph

4 Limitations To model the web graph: –Constant number of nodes –Same probability among sommets – etc, etc

5 Web Page Growth Model Sites with short term (daily) size fluctuations proportional to their size Assume an overall growth rate a such that: – S(t+1)=a(1+vb)S(t) – S(t)= # pages of site s at time t – v=+-1 – Bernouilli variable avec prob. 0.5 – b= absolute rate of daily fluctuations

6 Web Page Growth Donc: S(T)= a T S(0) π T 0 (1+n i b) ou: logS(T)=Tloga+logS(0)+Σ T 0 log(1+n i b)= l log(1+b)+(T-l)log(1-b) –l= # positive fluctuations Therefore: S(T) has a lognormal distribution or follows a power law:

7 Web page growth Probability P(s) of a site to have s pages: –P(S)=Σ i P(s/b i )P(b i )= Σ i c i /S gi = c/S g –Power Law g has been experimentaly evaluated for the web as between 1.6 and 2.0

8 Small world models Properties: –Sparse –Cliquishness –Small Diameter Two models –Edge-reassigning small world network –Edge addition small world network

9 Edge reassigning model Evolution starts with a ring of n nodes and each node connected to d nearest neighbors Then each edge is randomly reassigned to distant nodes with probability p in a round robin fashion See example page 10 with n=10 and d=4

10 Edge addition model At the original ring additional edges are added randomly giving an expected number –p.d.n/2 new edges –p probability of addition of an edge –See example page 13 Criticism to small world: –No newpages neither deletion of pages –No deletion of links

11 Rich get richer Preferential attachement model Start with a null graph with no nodes At each time step add a new node and connect it to m nodes selected randomly with probability proportional to their degree See ex. page 16

12 Important measures Average diameter Cliquishness ( measure the average density of local connections): –Take a node v sith degree d –Its d neighbors have max=d.(d-1)/2 links –Let c v =real number of links / max –C= Σ v c v /V.

13 Remarks on rich get richer Reproduces the power law of number of links. Eg: the probability of a page i to have degree di is A/di c –A is proportional to the square of the network – c is a constant – c was found empirically to be 2.9 and theoretically 3

14 Criticism on Rich Get Richer Does not allow reconnection of existing edges Addition of new edges take place only when new nodes are added

15 Copy models At each time step a node is added –With prob. p a new edge is created between this node and a randomly chosen node –With prob. 1-p: we choose randomly a node and uniformly one of the out edges and we link the new node to the node that this chosen edge enters.

16 Remarks Why is called copy? There are more elaborated models which allow addition of more than a edge each time It is also a sort of « rich get richer »

17 Applications Distributed search algorithms Subgraph patterns and communities Robusteness and vulnerability Page rank algorithms

18

19


Download ppt "Generative Models for the Web Graph José Rolim. Aim Reproduce emergent properties: –Distribution site size –Connectivity of the Web –Power law distriubutions."

Similar presentations


Ads by Google