Methods of Computing the PageRank Vector Tom Mangan
Brief History of Web Search Boolean term matching
Brief History of Web Search Boolean term matching Sergey Brin and Larry Page Reputation based ranking PageRank
Reputation Count links to a page Weight links by how many come from a page Further weight links by the reputation of the linker
Link Matrix
Calculating Rank Where: = the set of all pages linking to P = # of links from page Q
Calculating Rank Where: = the set of all pages linking to P = # of links from page Q
The PageRank Vector Define:
where
From our earlier mini-web:
Taken one row at a time: where
Iterating this equation is called the Power Method where
Iterating this equation is called the Power Method and we define the PageRank vector: where
Convergence requires: Power Method irreducibility (Perron-Frobenius Thm)
Definitions Markov chain The conditional probability of each future state depends only on the present state Markov matrix Transition matrix of a Markov chain
Transition Matrix From our earlier mini-web:
Markov Matrix Properties Row-stochastic Stationary vector gives long-term probability of each state All eigenvalues λ ≤ 1
not row-stochastic
Define a vector a such that: Then we obtain a row-stochastic matrix:
or
S may or may not be reducible, so we make one more fix: The Google Matrix: Now G is a positive, irreducible, row-stochastic matrix, and the power method will converge, but we’ve lost sparsity.
Note that:
so now the power method looks like:
Power method converges at the same rate as thus
Link Matrix
A Linear System Formulation Amy Langville and Carl Meyer Exploit dangling nodes Solve a system instead of iterating
By Langville and Meyer, solving the system and letting produces the PageRank vector (proof omitted)
Exploiting Dangling Nodes: Re-order the rows and columns of H such that
Exploiting Dangling Nodes: Re-order the rows and columns of H such that then
has some nice properties that simplify solving the linear system. Non-singular
Source: L&M, A Reordering for the PageRank Problem
Langville and Meyer Algorithm 1 Re-order rows and columns so that dangling nodes are lumped at bottom Solve Compute Normalize
Improvement In testing, Algorithm 1 reduces the time necessary to find the PageRank vector by a factor of 1-6 This time is data-dependent
Further Improvement? First improvement came from finding zero rows in Now find zero rows in
Source: L&M, A Reordering for the PageRank Problem
Langville and Meyer Algorithm 2 Reorder rows and columns so that all submatrices have zero rows at bottom Solve For i = 2 to b, compute Normalize
Problem with Algorithm 2 Finding submatrices of zero rows takes longer than time saved in solve step L & M wait until all submatrices are reordered to solve primary
Proposal As each submatrix is isolated, send it out for parallel solving
Source: L&M, A Reordering for the PageRank Problem
Sources DeGroot, M. and Schervish, M., Probability and Statistics, 3rd Ed., Addison Wesley, 2002 Langville, A. and Meyer, C., A Reordering for the PageRank Problem, Journal of Scientific Computing, Vol. 27 No. 6, 2006 Langville, A. and Meyer, C., Deeper Inside PageRank, 2004 Lee, C., Golub, G. and Zenios, S., A Fast Two-Stage Algorithm for Computing PageRank, undated Rebaza, J., Lecture Notes