Mathematical Foundations for Search and Surf Engines
Jean-Louis Lassez
Ryan Rossi
Lecture 1 & 2 Introduction Ergodic Theorem Perron-Frobenius Theorem Power Method Foundations of PageRank
Search Engines Search engine: deterministic Ranking of sites Mathematical foundations: Markov and Perron Frobenius Surf Engines Non deterministic: window shopping Ranking of links Mathematical foundations: Singular Value Decomposition
Initial approach to ranking sites: the in-degree heuristic
Google’s PageRank approach Problem: rank the sites in order of most visited
Another Approach: Hubs and Authorities Authorities: sites that contain the most important information Hubs: sites that provide directions to the authorities A H
Ranking Hyperlinks Local Global Update
Local ….. ? ? ?
Global ?? …..
2005 A C DF E B G H J I K 2006 L Updated
Part I: Ranking Sites The Web as a Markov Chain The Ergodic Theorem Perron-Frobenius Theorem Algorithms: PageRank, HITS, SALSA
Markov Chains Text Analysis Speech Recognition Statistical Mechanics Decision Science: Medicine, Commerce ….more recently…..
Bioinformatics M1M1 M2M2 M3M3 d1d1 I1I1 d2d2 I2I2 d1d1 I1I1
Internet
Markov’s Ergodic Theorem (1906) Any irreducible, finite, aperiodic Markov Chain has all states Ergodic (reachable at any time in the future) and has a unique stationary distribution, which is a probability vector.
s1s s2s2 s3s3 s4s Probabilities after 15 steps X1 =.33 X2 =.26 X3 =.26 X4 = steps X1 =.33 X2 =.26 X3 =.23 X4 = steps X1 =.37 X2 =.29 X3 =.21 X4 = steps X1 =.38 X2 =.28 X3 =.21 X4 = steps X1 =.38 X2 =.28 X3 =.21 X4 =.13
s1s s2s2 s3s3 s4s Probabilities after 15 steps X1 =.46 X2 =.20 X3 =.26 X4 = steps X1 =.36 X2 =.26 X3 =.23 X4 = steps X1 =.38 X2 =.28 X3 =.21 X4 = steps X1 =.38 X2 =.28 X3 =.21 X4 = steps X1 =.38 X2 =.28 X3 =.21 X4 =.13
p 11 x 1 + p 21 x 2 + p 31 x 3 = x 1 p 12 x 1 + p 22 x 2 + p 32 x 3 = x 2 p 13 x 1 + p 23 x 2 + p 33 x 3 = x 3 ∑x i = 1 x i ≥ 0 p 11 x 1 + p 21 x 2 + p 31 x 3 = x 1 p 12 x 1 + p 22 x 2 + p 32 x 3 = x 2 p 13 x 1 + p 23 x 2 + p 33 x 3 = x 3 ∑x i = 1 x i ≥ 0 X1, X2, X3 are probabilities p 11 x 1 + p 21 x 2 + p 31 x 3 = x 1 p 12 x 1 + p 22 x 2 + p 32 x 3 = x 2 p 13 x 1 + p 23 x 2 + p 33 x 3 = x 3 ∑x i = 1 x i ≥ 0 Probability of going to S1 from S2 Steady State Constraints
Solved by Power Iteration Method Based on Perron-Frobenius Theorem Where e is an initial vector and M is the stochastic matrix associated to the system.
Power Method Example M = /2 0 1/ / e =
Power Method Example M = /2 0 1/ / e =
Difficulties Proof, Design and Implementation Notion of convergence Deal with complex numbers Unique solution ……… Furthermore, it does not always work
The process does not converge, even though the solution is obvious
Perron-Frobenius Theorem: Is it really necessary?
Secret Weapon: Fourier Gauss Tarski Robinson The Power of Elimination
The elimination of the proof is an ideal seldom reached Elimination gives us the heart of the proof
Symbolic Gaussian Elimination Three variables: p 11 x 1 + p 21 x 2 + p 31 x 3 = x 1 p 12 x 1 + p 22 x 2 + p 32 x 3 = x 2 p 13 x 1 + p 23 x 2 + p 33 x 3 = x 3 ∑x i = 1 with Maple we get: x 1 = (p 31 p 21 + p 31 p 23 + p 32 p 21 ) / Σ x 2 = (p 13 p 32 + p 12 p 31 + p 12 p 32 ) / Σ x 3 = (p 13 p 21 + p 12 p 23 + p 13 p 23 ) / Σ Σ = (p 31 p 21 + p 31 p 23 + p 32 p 21 + p 13 p 32 + p 12 p 31 + p 12 p 32 + p 13 p 21 + p 12 p 23 + p 13 p 23 )
s1s1 p 12 p 21 p 11 s2s2 p 22 s3s3 p 33 p 13 p 23 p 31 p 32 x 1 = (p 31 p 21 + p 23 p 31 + p 32 p 21 ) / Σ x 2 = (p 13 p 32 + p 31 p 12 + p 12 p 32 ) / Σ x 3 = (p 21 p 13 + p 12 p 23 + p 13 p 23 ) / Σ
s1s1 p 12 p 21 p 11 s2s2 p 22 s3s3 p 33 p 13 p 23 p 31 p 32 x 1 = (p 31 p 21 + p 23 p 31 + p 32 p 21 ) / Σ x 2 = (p 13 p 32 + p 31 p 12 + p 12 p 32 ) / Σ x 3 = (p 21 p 13 + p 12 p 23 + p 13 p 23 ) / Σ
s1s1 p 12 p 21 p 11 s2s2 p 22 s3s3 p 33 p 13 p 23 p 31 p 32 x 1 = (p 31 p 21 + p 23 p 31 + p 32 p 21 ) / Σ x 2 = (p 13 p 32 + p 31 p 12 + p 12 p 32 ) / Σ x 3 = (p 21 p 13 + p 12 p 23 + p 13 p 23 ) / Σ
s1s1 p 12 p 21 p 11 s2s2 p 22 s3s3 p 33 p 13 p 23 p 31 p 32 x 1 = (p 31 p 21 + p 23 p 31 + p 32 p 21 ) / Σ x 2 = (p 13 p 32 + p 31 p 12 + p 12 p 32 ) / Σ x 3 = (p 21 p 13 + p 12 p 23 + p 13 p 23 ) / Σ
s1s1 p 12 p 21 p 11 s2s2 p 22 s3s3 p 33 p 13 p 23 p 31 p 32 x 1 = (p 31 p 21 + p 23 p 31 + p 32 p 21 ) / Σ x 2 = (p 13 p 32 + p 31 p 12 + p 12 p 32 ) / Σ x 3 = (p 21 p 13 + p 12 p 23 + p 13 p 23 ) / Σ
s1s1 p 12 p 21 p 11 s2s2 p 22 s3s3 p 33 p 13 p 23 p 31 p 32 x 1 = (p 31 p 21 + p 23 p 31 + p 32 p 21 ) / Σ x 2 = (p 13 p 32 + p 31 p 12 + p 12 p 32 ) / Σ x 3 = (p 21 p 13 + p 12 p 23 + p 13 p 23 ) / Σ
Symbolic Gaussian Elimination System with four variables: p 21 x 2 + p 31 x 3 + p 41 x 4 = x 1 p 12 x 1 + p 42 x 4 = x 2 p 13 x 1 = x 3 p 34 x 3 = x 4 ∑x i = 1
s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 / Σ x 3 = p 41 p 21 p 13 + p 42 p 21 p 13 / Σ x 4 = p 21 p 13 p 34 / Σ Σ = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 + p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 + p 41 p 21 p 13 + p 42 p 21 p 13 + p 21 p 13 p 34 with Maple we get:
s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 / Σ x 3 = p 41 p 21 p 13 + p 42 p 21 p 13 / Σ x 4 = p 21 p 13 p 34 / Σ Σ = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 + p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 + p 41 p 21 p 13 + p 42 p 21 p 13 + p 21 p 13 p 34
s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 / Σ x 3 = p 41 p 21 p 13 + p 42 p 21 p 13 / Σ x 4 = p 21 p 13 p 34 / Σ Σ = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 + p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 + p 41 p 21 p 13 + p 42 p 21 p 13 + p 21 p 13 p 34
s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 / Σ x 3 = p 41 p 21 p 13 + p 42 p 21 p 13 / Σ x 4 = p 21 p 13 p 34 / Σ Σ = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 + p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 + p 41 p 21 p 13 + p 42 p 21 p 13 + p 21 p 13 p 34
s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 / Σ x 3 = p 41 p 21 p 13 + p 42 p 21 p 13 / Σ x 4 = p 21 p 13 p 34 / Σ Σ = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 + p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 + p 41 p 21 p 13 + p 42 p 21 p 13 + p 21 p 13 p 34
Do you see the LIGHT? James Brown (The Blues Brothers)
s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 / Σ x 3 = p 41 p 21 p 13 + p 42 p 21 p 13 / Σ x 4 = p 21 p 13 p 34 / Σ Σ = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 + p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 + p 41 p 21 p 13 + p 42 p 21 p 13 + p 21 p 13 p 34
Ergodic Theorem Revisited If there exists a reverse spanning tree in a graph of the Markov chain associated to a stochastic system, then: (a)the stochastic system admits the following probability vector as a solution: (b) the solution is unique. (c) the conditions {x i ≥ 0} i=1,n are redundant and the solution can be computed by Gaussian elimination.
Cycle This system is now covered by the proof
Markov Chain as a Conservation System s1s1 p 12 p 21 p 11 s2s2 p 22 s3s3 p 33 p 13 p 23 p 31 p 32 p 11 x 1 + p 21 x 2 + p 31 x 3 = x 1 (p 11 + p 12 + p 13 ) p 12 x 1 + p 22 x 2 + p 32 x 3 = x 2 (p 21 + p 22 + p 23 ) p 13 x 1 + p 23 x 2 + p 33 x 3 = x 3 (p 31 + p 32 + p 33 )
Kirchoff’s Current Law (1847) The sum of currents flowing towards a node is equal to the sum of currents flowing away from the node. i 31 + i 21 = i 12 + i 13 i 32 + i 12 = i 21 i 13 = i 31 + i i 13 i 31 i 32 i 12 i 21 i 3 + i 2 = i 1 + i 4
Kirchhoff’s Matrix Tree Theorem (1847) The theorem allows us to calculate the number of spanning trees of a connected graph
Two theorems for the price of one!! Differences Ergodic Theorem: The symbolic proof is simpler and more appropriate than using Perron-Frobenius Kirchhoff Theorem: Our version calculates the spanning trees and not their total sum.
Internet Sites Kleinberg Google SALSA Indegree Heuristic
Google PageRank Patent “The rank of a page can be interpreted as the probability that a surfer will be at the page after following a large number of forward links.” The Ergodic Theorem
Google PageRank Patent “The iteration circulates the probability through the linked nodes like energy flows through a circuit and accumulates in important places.” Kirchoff (1847)
Rank Sinks No Spanning Tree
References Kirchhoff, G. "Über die Auflösung der Gleichungen, auf welche man bei der untersuchung der linearen verteilung galvanischer Ströme geführt wird." Ann. Phys. Chem. 72, , А. А. Марков. "Распространение закона больших чисел на величины, зависящие друг от друга". "Известия Физико-математического общества при Казанском университете", 2-я серия, том 15, ст , H. Minkowski, Geometrie der Zahlen (Leipzig, 1896). J. Farkas, "Theorie der einfachen Ungleichungen," J. F. Reine u. Ang. Mat., 124, 1-27, H. Weyl, "Elementare Theorie der konvexen Polyeder," Comm. Helvet., 7, , A. Charnes, W. W. Cooper, “The Strong Minkowski Farkas-Weyl Theorem for Vector Spaces Over Ordered Fields,” Proceedings of the National Academy of Sciences, pp , 1958.
References E. Steinitz. Bedingt Konvergente Reihen und Konvexe Systeme. J. reine angew. Math., 146:1-52, K. Jeev, J-L. Lassez: Symbolic Stochastic Systems. MSV/AMCS 2004: V. Chandru, J-L. Lassez: Qualitative Theorem Proving in Linear Constraints. Verification: Theory and Practice 2003: Jean-Louis Lassez: From LP to LP: Programming with Constraints. DBPL 1991 :