Generative Models for the Web Graph José Rolim
Aim Reproduce emergent properties: –Distribution site size –Connectivity of the Web –Power law distriubutions –Small World Properties
Classical Model Random Graphs Erdos-Renyi Graph G(n,p) – n number of nodes – p probability of connextion pc threshold probability – p < pc -many disconnected components – p=pc - a large connected component – p=1 – a complete graph
Limitations To model the web graph: –Constant number of nodes –Same probability among sommets – etc, etc
Web Page Growth Model Sites with short term (daily) size fluctuations proportional to their size Assume an overall growth rate a such that: – S(t+1)=a(1+vb)S(t) – S(t)= # pages of site s at time t – v=+-1 – Bernouilli variable avec prob. 0.5 – b= absolute rate of daily fluctuations
Web Page Growth Donc: S(T)= a T S(0) π T 0 (1+n i b) ou: logS(T)=Tloga+logS(0)+Σ T 0 log(1+n i b)= l log(1+b)+(T-l)log(1-b) –l= # positive fluctuations Therefore: S(T) has a lognormal distribution or follows a power law:
Web page growth Probability P(s) of a site to have s pages: –P(S)=Σ i P(s/b i )P(b i )= Σ i c i /S gi = c/S g –Power Law g has been experimentaly evaluated for the web as between 1.6 and 2.0
Small world models Properties: –Sparse –Cliquishness –Small Diameter Two models –Edge-reassigning small world network –Edge addition small world network
Edge reassigning model Evolution starts with a ring of n nodes and each node connected to d nearest neighbors Then each edge is randomly reassigned to distant nodes with probability p in a round robin fashion See example page 10 with n=10 and d=4
Edge addition model At the original ring additional edges are added randomly giving an expected number –p.d.n/2 new edges –p probability of addition of an edge –See example page 13 Criticism to small world: –No newpages neither deletion of pages –No deletion of links
Rich get richer Preferential attachement model Start with a null graph with no nodes At each time step add a new node and connect it to m nodes selected randomly with probability proportional to their degree See ex. page 16
Important measures Average diameter Cliquishness ( measure the average density of local connections): –Take a node v sith degree d –Its d neighbors have max=d.(d-1)/2 links –Let c v =real number of links / max –C= Σ v c v /V.
Remarks on rich get richer Reproduces the power law of number of links. Eg: the probability of a page i to have degree di is A/di c –A is proportional to the square of the network – c is a constant – c was found empirically to be 2.9 and theoretically 3
Criticism on Rich Get Richer Does not allow reconnection of existing edges Addition of new edges take place only when new nodes are added
Copy models At each time step a node is added –With prob. p a new edge is created between this node and a randomly chosen node –With prob. 1-p: we choose randomly a node and uniformly one of the out edges and we link the new node to the node that this chosen edge enters.
Remarks Why is called copy? There are more elaborated models which allow addition of more than a edge each time It is also a sort of « rich get richer »
Applications Distributed search algorithms Subgraph patterns and communities Robusteness and vulnerability Page rank algorithms