Download presentation
Presentation is loading. Please wait.
Published byDoreen Parks Modified over 9 years ago
1
Methods of Computing the PageRank Vector Tom Mangan
2
Brief History of Web Search Boolean term matching
3
Brief History of Web Search Boolean term matching Sergey Brin and Larry Page Reputation based ranking PageRank
4
Reputation Count links to a page Weight links by how many come from a page Further weight links by the reputation of the linker
5
12 34 56
6
12 34 56 Link Matrix
7
Calculating Rank Where: = the set of all pages linking to P = # of links from page Q
8
Calculating Rank Where: = the set of all pages linking to P = # of links from page Q
9
The PageRank Vector Define:
10
where
11
From our earlier mini-web:
12
Taken one row at a time: where
13
Iterating this equation is called the Power Method where
14
Iterating this equation is called the Power Method and we define the PageRank vector: where
15
Convergence requires: Power Method irreducibility (Perron-Frobenius Thm)
16
Definitions Markov chain The conditional probability of each future state depends only on the present state Markov matrix Transition matrix of a Markov chain
17
Transition Matrix From our earlier mini-web:
18
Markov Matrix Properties Row-stochastic Stationary vector gives long-term probability of each state All eigenvalues λ ≤ 1
19
not row-stochastic
20
Define a vector a such that: Then we obtain a row-stochastic matrix:
21
or
22
S may or may not be reducible, so we make one more fix: The Google Matrix: Now G is a positive, irreducible, row-stochastic matrix, and the power method will converge, but we’ve lost sparsity.
23
Note that:
24
so now the power method looks like:
25
Power method converges at the same rate as thus
26
12 34 56 Link Matrix
27
A Linear System Formulation Amy Langville and Carl Meyer Exploit dangling nodes Solve a system instead of iterating
28
By Langville and Meyer, solving the system and letting produces the PageRank vector (proof omitted)
29
Exploiting Dangling Nodes: Re-order the rows and columns of H such that
30
Exploiting Dangling Nodes: Re-order the rows and columns of H such that then
31
has some nice properties that simplify solving the linear system. Non-singular
33
Source: L&M, A Reordering for the PageRank Problem
34
Langville and Meyer Algorithm 1 Re-order rows and columns so that dangling nodes are lumped at bottom Solve Compute Normalize
35
Improvement In testing, Algorithm 1 reduces the time necessary to find the PageRank vector by a factor of 1-6 This time is data-dependent
36
Further Improvement? First improvement came from finding zero rows in Now find zero rows in
37
Source: L&M, A Reordering for the PageRank Problem
39
Langville and Meyer Algorithm 2 Reorder rows and columns so that all submatrices have zero rows at bottom Solve For i = 2 to b, compute Normalize
40
Problem with Algorithm 2 Finding submatrices of zero rows takes longer than time saved in solve step L & M wait until all submatrices are reordered to solve primary
41
Proposal As each submatrix is isolated, send it out for parallel solving
42
Source: L&M, A Reordering for the PageRank Problem
43
Sources DeGroot, M. and Schervish, M., Probability and Statistics, 3rd Ed., Addison Wesley, 2002 Langville, A. and Meyer, C., A Reordering for the PageRank Problem, Journal of Scientific Computing, Vol. 27 No. 6, 2006 Langville, A. and Meyer, C., Deeper Inside PageRank, 2004 Lee, C., Golub, G. and Zenios, S., A Fast Two-Stage Algorithm for Computing PageRank, undated Rebaza, J., Lecture Notes
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.