Roshnika Fernando P AGE R ANK
W HY P AGE R ANK ? The internet is a global system of networks linking to smaller networks. This system keeps growing, so there must be a way to sort though all the information available. PageRank is the algorithm used by the search engine Google to sort through internet webpages A webpage’s rank determines the order it appears when a keyword search is performed on Google Fun Fact: PageRank is named after Larry Page, one of the founders of Google, not after webpages
P OPULARITY C ONTEST Rank, at its simplest, is the probability that a webpage will be visited Sum of rank of all pages is 1 Rank of linked pages affects rank of page Initially, rank = 1/(total # of pages available) ≈ 0 for internet
D ETERMINING R ANK Let P be an i x j stochastic matrix where p i,j is the probability of going to webpage j from webpage i. p i,j = (# of links to page j from page i) (# of links on page i) Note: i and j are integers and positive values Note: There are around 25 billion p i,j combinations on the internet
L ONG T ERM P ROBABILITY After a very long time, what is the probability that web surfers will be at a certain website? Let be the stationary distribution vector where is the probability of being at state k. Since stochastic matrices have eigenvalue λ = 1, Solve for to determine long term probability of being at each webpage (aka the rank)
S MALL S CALE E XAMPLE 7 pages linked to one another
L INEAR P ROGRAM Solve for x vector using (P - I)x = 0 to obtain Page Rank x vector is the eigenvector for eigenvalue λ = 1
S MALL S CALE S OLUTION As t → ∞ p i,j given PageRank: x 1 =.304 x 2 =.166 x 3 =.141 x 4 =.105 x 5 =.179 x 6 =.045 x 7 =.061
S ENSITIVITY A NALYSIS What if a page has no links? What happens to the probability matrix P? P is stochastic, meaning the sum of the columns must equal 1. If a page has no links leading out, then p i,j for that given column will be distributed evenly to all rows in j so that This assumes when someone reaches a dead end, the possibility of him/her going to a new page is entirely random
P ROBABILITY AND R ANK The stationary distribution vector contains the rank of each webpage, which determines the order it appear when a keyword search is performed This rank is the probability that a person will be at each of the billions of pages available online. This takes several powerful computers to compute.
Q UESTIONS ?
C ITATIONS Austin, David. "How Google Finds Your Needle in the Web's Haystack." AMS.org. American Mathematical Society. Web. 09 Nov "PageRank." Wikipedia, the free encyclopedia. Web. 09 Nov Photograph. PageRanks-Example. Wikipedia, 8 July Web. 9 Nov "Stochastic matrix." Wikipedia, the free encyclopedia. Web. 09 Nov