PageRank and Markov Chains Tolga Çekiç 9.5.2014
Introduction PageRank Overview Markov Chains PageRank Continuation Conclusion
Introduction PageRank is named after one of its co-founders: Larry Page One of the algorithms used by Google search engine Ranks web pages according to importance Based on previous work on citation count Uses Markov Chain Structures
Citation Count Academic papers receive and give citations Every citation made to a paper count as a vote Those papers with high numbers of votes are important They are some problems with this basic scheme of vote counting PageRank tries to address those by treating web pages as papers and links as citations
Problems If rank is determined as total number of links directed to web page, links from more important web sites wouldn’t count much Another problem arises if a web page has too many outlinks, then that web page would have higher influence in determining the rank.
PageRank Sum of all the importance scores of links that direct to a web page is calculated Importance score of a page is divided evenly amongst all its outgoing links Uses Markov Chain PageRank calculation formula
Simple PageRank Calculation
Markov Chains Named after Andrey Markov A mathematical system of transitioning of states in a state-space States have Markov Property or ‘memorylessness’ Transitioning from one state to another depends only on the current state Used as statiscal-models in real world applications
Markov Chain Examples Drunkard’s walk, a random walking process Board games with dice A simple weather model
Probability Vector At each time, there are n states the system could be in At time k the system as modeled as a vector A probability vector is a vector in whose entries are nonnegative and sum to 1.
Markov Chains A Markov matrix (or stochastic matrix) is a square matrix M whose rows or columns are probability vectors. A Markov chain is a sequence of probability vectors such that for some Markov Matrix M
Weather Model Example Initial State: Day 1: Day 2: Day n:
Steady State Vector Representing probabilities for all days, independent of initial weather Since it’s independent from all states, it is unchanged by P. That makes q an eigenvector of P(with eigenvalue 1)
Weather Example Steady State Calculation
Existence of Steady State Vector Given a Markov matrix M, does there exist a steady-state vector? If M is a Markov matrix with all positive entries, then M has a unique steady-state vector (Perron-Frobenius Theorem)
PageRank cont. PageRank creates a square matrix A, rows and columns refer to web pages A is a Markov matrix
Problems Random Surfer Model; a real surfer might randomly go to another URL, different from the ones linked in the current page This model does not ensure a unique Steady-State Vector
PageRank To follow the PF theorem and realize random surfer model and damping factor is introduced (generally taken as 0.85) Or simply: B = 0.85A + 0.15(matrix with every entry 1=n) (B is a Markov Matrix)
PageRank Computation
Conclusion Larry Page: “PageRank can be thought of as a model of user behavior. We assume there is a random surfer who is given a web page at random and keeps clicking on links, never hitting back but eventually gets bored and starts on another random page.” PageRank is the probability a user will end up in that site or fraction of time spent on that site in the long run