Using Adaptive Methods for Updating/Downdating PageRank Gene H. Golub Stanford University SCCM Joint Work With Sep Kamvar, Taher Haveliwala
2 Motivation Problem: Compute PageRank after the Web has changed slightly Motivation: “Freshness” Note: Since the web is growing, PageRank Computations don’t get faster as computers do.
3 Outline Power Method: Definition of PageRank Computation of PageRank Convergence Properties Outline of Our Approach Empirical Results
4 Link Counts Linked by 2 Important Pages Linked by 2 Unimportant pages Martin’s Home Page Gene’s Home Page Yahoo! Iain Duff’s Home PageGeorge W. Bush Donald Rumsfeld
5 Definition of PageRank The importance of a page is given by the importance of the pages that link to it. importance of page i pages j that link to page i number of outlinks from page j importance of page j
6 Definition of PageRank 1/ DuffYahoo!SCCM Martin Gene 0.25
7 PageRank Diagram Initialize all nodes to rank 0.333
8 PageRank Diagram Propagate ranks across links (multiplying by link weights)
9 PageRank Diagram
10 PageRank Diagram
11 PageRank Diagram
12 PageRank Diagram After a while…
13 Matrix Notation =
14 Matrix Notation = Find x that satisfies:
15 Eigenvalue Distribution The matrix P T has several eigenvalues on the unit circle. This will make power method-like algorithms less effective.
16 PageRank doesn’t actually use P T. Instead, it uses A=cP T + (1-c)E T. E is a rank 1 matrix, and in general, c=0.85. This ensures a unique solution and fast convergence. For matrix A, 2 =c. 1 1 From “The Second Eigenvalue of the Google Matrix” ( Rank-1 Correction
17 Outline Definition of PageRank Computation of PageRank Convergence Properties Outline of Our Approach Empirical Results u1u1 u2u2 u3u3 u4u4 u5u5 u1u1 u2u2 u3u3 u4u4 u5u Repeat:
18 Power Method Initialize: Repeat until convergence:
19 Power Method u11u11 u22u22 u33u33 u44u44 u55u55 Express x (0) in terms of eigenvectors of A
20 Power Method u11u11 u 2 2 2 u 3 3 3 u 4 4 4 u 5 5 5
21 Power Method u11u11 u 2 u 3 u 4 u 5 5 5 2
22 Power Method u11u11 u 2 2 2 k u 3 3 3 k u 4 4 4 k u 5 5 5 k
23 Power Method u11u11 u2u2 u3u3 u4u4 u5u5
24 Why does it work? Imagine our n x n matrix A has n distinct eigenvectors u i. u11u11 u22u22 u33u33 u44u44 u55u55 Then, you can write any n-dimensional vector as a linear combination of the eigenvectors of A.
25 Why does it work? From the last slide: To get the first iterate, multiply x (0) by A. First eigenvalue is 1. Therefore: All less than 1
26 Power Method u11u11 u22u22 u33u33 u44u44 u55u55 u11u11 u 2 2 2 u 3 3 3 u 4 4 4 u 5 5 5 u11u11 u 2 u 3 u 4 u 5 5 5 2
27 Outline Definition of PageRank Computation of PageRank Convergence Properties Outline of Our Approach Empirical Results u1u1 u2u2 u3u3 u4u4 u5u5 u1u1 u2u2 u3u3 u4u4 u5u Repeat:
28 The smaller 2, the faster the convergence of the Power Method. Convergence u11u11 u 2 2 2 k u 3 3 3 k u 4 4 4 k u 5 5 5 k
29 Quadratic Extrapolation (Joint work with Kamvar and Haveliwala) u1u1 u2u2 u3u3 u4u4 u5u5 Estimate components of current iterate in the directions of second two eigenvectors, and eliminate them.
30 Facts that work in our favor For traditional problems: A is smaller, often dense. 2 often close to , making the power method slow. In our problem, A is huge and sparse More importantly, 2 is small 1. 1 (“The Second Eigenvalue of the Google Matrix” dbpubs.stanford.edu/pub/ )
31 How do we do this? Assume x (k) can be written as a linear combination of the first three eigenvectors (u 1, u 2, u 3 ) of A. Compute approximation to {u 2,u 3 }, and subtract it from x (k) to get x (k) ’
32 Sequence Extrapolation A classical and important field in numerical analysis: techniques for accelerating the convergence of slowly convergent infinite series and integrals.
33 Example: Aitken Δ 2 - Process Suppose A=A n +aλ n +r n where r n =bμ n +o(min{1,|μ| n }), a, b, λ, μ all nonzero, |λ|>|μ|. It can be shown that S n = (A n A n+2 –A n+1 2 )/(A n -2A n+1 +A n+2 ) satisfies (as n goes to infinity) | S n -A| O( (|μ|/|λ|) n = o(1). |A n -A| ….
34 In other words… Assuming a certain pattern for the series is helpful in accelerating convergence. We can apply this component-wise in order to get a better estimate of the eigenvector.
35 Another approach Assume the x (k) can be represented by three eigenvectors of A:
36 Linear Combination We take some linear combination of these 3 iterates.
37 Rearranging Terms We can rearrange the terms to get: Goal: Find 1, 2, 3 so that coefficients of u 2 and u 3 are 0, and coefficient of u 1 is 1.
38 Rearranging Terms We can rearrange the terms to get: Goal: Find 1, 2, 3 so that coefficients of u 2 and u 3 are 0, and coefficient of u 1 is 1.
39 Results Quadratic Extrapolation speeds up convergence. Extrapolation was only used 5 times.
40 Estimating the coefficients Procedure 1: Set ß 1 =1 and solve the least squares problem. Procedure 2: Use the SVD for computing the coefficient of the characteristic polynomial.
41 Results Extrapolation dramatically speeds up convergence, for high values of c (c=.99)
42 Take-home message Quadratic Extrapolation estimates the components of current iterate in the direction of the second and third eigenvector, and subtracts them off. Achieves significant speedup, and ideas are useful for further speedup algorithms.
43 Summary of this part We make an assumption about the current iterate. Solve for dominant eigenvector as a linear combination of the next three iterates. We use a few iterations of the Power Method to “clean it up”.
Power Method: Outline Definition of PageRank Computation of PageRank Convergence Properties Outline of Our Approach Empirical Results
45 Most Pages Converge Quickly
46 Most Pages Converge Quickly
47 Basic Idea When a the PageRank of a page has converged, stop recomputing it.
48 Adaptive PageRank Algorithm
49 Updates Use the previous vector as a start vector. Speedup not that great. Why? The old pages converge quickly, but the new pages still take long to converge. But, if you use Adaptive PageRank, you save the computation on the old pages.
Repeat: Outline Definition of PageRank Computation of PageRank Convergence Properties Outline of Our Approach Empirical Results
51 Empirical Results 3 Update Algorithms on Stanford Web (n=700,000)
52 Take-home message Simply not recomputing PageRank of pages that have converged after an update speeds up PageRank by a factor of 2.
53 An Arnoldi/SVD approach (joint work with C. Greif) Perform Arnoldi (of degree k<<n) on A. Compute the SVD of the (k+1)-by-k unreduced Hessenberg matrix, after subtracting the augmented identity matrix from it first. Compute the linear combinations of the columns of the Arnoldi vectors Q with the null vector of H. Use resulting vector as the new guess for the Arnoldi procedure. Repeat until satisfied.
54 Advantages We *do* take advantage of knowing the largest eigenvalue. (As opposed to most general purpose eigensolve-packages.) Computing the corresponding eigenvector does not rely on prohibitive inversions or decompositions. (Matrix is BBBBBIIIIIGGGG!!) Orthogonalizing ‘feels right’ from a numerical linear algebra point of view. Smooth convergence behavior. Overhead is minimal.