Motivation Modern search engines for the World Wide Web use methods that require solving huge problems. Our aim: to develop multiscale techniques that will work much faster than existing methods.
Web Search
Web Search in a Nutshell Crawlers Keyword Search Link Matrix PageRank Results Ranked Results
Interpretation - Random Walk A monkey is clicking randomly at links on its browser. What is the probability for it to reach each page after a long time?
Problem Definition The rank of a page is its importance relative to other pages (its probability). Each page “distributes” its own pagerank equally to the pages to which it points. 1/2 1/3 1
Problem Definition Pagerank vector 1/2 1/3 1 Link Matrix B
Problem Definition (Cont.) The matrix B may have zero-columns that correspond to pages with no out-links. We call these troublesome pages “dangling pages”. Dangling Page 1/2 1/3 1
Problem Definition (Cont.) The matrix B may have zero-columns that correspond to pages with no out-links. We call these troublesome pages “dangling pages”. Interpretation: If the monkey finds no links on the page, it leaps to some random page on the web. Dangling Page 1/2 1/3 1
Problem Definition (Cont.) Still – there might be a group with no outlinks! We therefore introduce a “fudge factor” 0 < α < 1. Interpretation: With probability 1-a, the monkey leaps to some random page on the web.
Problem Definition (Cont.) B is a stochastic matrix. We seek its eigenvector whose eigenvalue is 1. It is called the principal eigenvector.
Computing the principal eigenvector The Power Method (eqvivalent to Jacobi’s): Starting with a random vector, xinitial, multiply it repeatedly by B. That is, iterate: This process converges to the principal eigenvector. Iterations are cheap and simple. However, the error decays roughly like |l2|/|l1| per each iteration – may be very slow!
Power Method (Jacobi’s Method) 7 iterations for a 4-variable problem, and only 3 accurate digits!!! What will happen with 1M variables? www.wikipedia.org, ~1.2 million pages, ~3 Million links x4 x3 x2 x1 0.2500 0.3333 0.2917 0.1667 0.2083 0.3611 0.2639 0.1896 0.1944 0.3287 0.2755 0.1852 0.2106 0.3457 0.2724 0.1798 0.2022 0.3398 0.2725 0.1826 0.2051 0.3409 0.2729 0.1816 0.2046 0.3409 0.2727 0.1818 0.2045