Lecture #11 PageRank (II) CS492 Special Topics in Computer Science: Distributed Algorithms and Systems Lecture #11 PageRank (II)
Remind : PageRank Algorithm PR(A) = (1-d) + d( PR(T1)/C(T1) + ... + PR(Tn)/C(Tn) ) = (1-d) + d( ) PR(A) : PageRank of page A PR(Ti) : PageRank of Pages Ti which has link to pageA C(Ti) : number of outbound links on page Ti d : damping factor ( between 0 and 1 )
Simple Example PR(A) = (1-d) + d( ) let d = 0.85 A B C
How to calculate PageRank PR(A) = 0.15 + 0.85 PR(C) PR(B) = 0.15 + 0.85 (PR(A) / 2) PR(C) = 0.15 + 0.85 (PR(A) / 2 + PR(B)) Method 1 : Solving the equations Do the math Method 2 : Iterative Computation of Page Rank Huge size of Web : hard to solve the equations Iterative computation of PageRank values
Solve the equations Solve these equations Answers PR(A) = 0.15 + 0.85 PR(C) PR(B) = 0.15 + 0.85 (PR(A) / 2) PR(C) = 0.15 + 0.85 (PR(A) / 2 + PR(B)) Answers PR(A) = 1.16336913510458 PR(B) = 0.64443188241945 PR(C) = 1.19219898247598
Iterative Computation of Page Rank Set initial PageRank values to all pages Calculate PageRanks for all pages in several iterations Stop iteration when PageRanks converge
What does PageRank mean? Random surfer who is given a web page at random and keep clicking on links. (never hit back button) eventually gets bored and starts on another random page PageRank the probability that the random surfer visits a page the proportion of time that the random surfer spends on each page
What is the damping factor? PR(A) = (1-d) + d( ) Damping factor (1-d) : the probability at each page the random surfer will get bored and request another random page The higher d, the more likely will the random surfer keep clicking links
Loop which acts as a Rank Sink Rank Sink Problem What if we don’t have the damping factor? No way to escape loop (A-B-C). Loop which acts as a Rank Sink A B C
Dangling Link (Dead End) Danglink link points to any page with no outgoing links CA and BA are dangling links A cannot distribute its weight to the network. How to fix Method 1 : Remove dangling links until all the PageRanks are calculated. Method 2 : Make random jump to any other page
References [PBMW] L. Page, S. Brin, R. Motwani, T. Winograd, “The PageRank citation ranking: bringing order to the web,” WWW 1998 [BP98] Sergey Brin, Lawrence Page, “The anatomy of a large-scale hypertextual Web search engine,” Computer Networks and ISDN Systems, Vol. 30, 1998. [BGS05] Monica Bianchini, Marco Gori, Franco Scarselli, “Inside PageRank,” ACM Transactions on Internet Technology, Vol. 5, No. 1, Feb. 2005. [LM04] Amy N. Langville, Carl Meyer, “Deeper inside PageRank,” Internet Mathematics, Vol. I, No. 3, 2004. [K99] Jon Kleinberg, “Authoritative sources in a Hyperlinked Environment,” Journal of the ACM 46:5 (1999).