Mathematical Foundations for Search and Surf Engines.

Slides:

Advertisements

Similar presentations

Advertisements

Matrices, Digraphs, Markov Chains & Their Use by Google Leslie Hogben Iowa State University and American Institute of Mathematics Leslie Hogben Iowa State.

The math behind PageRank A detailed analysis of the mathematical aspects of PageRank Computational Mathematics class presentation Ravi S Sinha LIT lab,

Information Networks Link Analysis Ranking Lecture 8.

Graphs, Node importance, Link Analysis Ranking, Random walks

CS433 Modeling and Simulation Lecture 06 – Part 03 Discrete Markov Chains Dr. Anis Koubâa 12 Apr 2009 Al-Imam Mohammad Ibn Saud University.

CS345 Data Mining Link Analysis Algorithms Page Rank Anand Rajaraman, Jeffrey D. Ullman.

Link Analysis: PageRank

Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating.

Web Markov Skeleton Processes and their Applications Zhi-Ming Ma 18 April, 2011, BNU.

A Fuzzy Web Surfer Model Narayan L. Bhamidipati and Sankar K. Pal Indian Statistical Institute Kolkata.

Overview of Markov chains David Gleich Purdue University Network & Matrix Computations Computer Science 15 Sept 2011.

More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.

Experiments with MATLAB Experiments with MATLAB Google PageRank Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University, Taiwan

DATA MINING LECTURE 12 Link Analysis Ranking Random walks.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 March 23, 2005

Link Analysis Ranking. How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would.

Introduction to PageRank Algorithm and Programming Assignment 1 CSC4170 Web Intelligence and Social Computing Tutorial 4 Tutor: Tom Chao Zhou

Estimating the Global PageRank of Web Communities Paper by Jason V. Davis & Inderjit S. Dhillon Dept. of Computer Sciences University of Texas at Austin.

Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 21: Link Analysis.

Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 April 2, 2006

15-853Page :Algorithms in the Real World Indexing and Searching III (well actually II) – Link Analysis – Near duplicate removal.

Link Analysis, PageRank and Search Engines on the Web

CS345 Data Mining Link Analysis Algorithms Page Rank Anand Rajaraman, Jeffrey D. Ullman.

1 COMP4332 Web Data Thanks for Raymond Wong’s slides.

PageRank Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata October 27, 2014.

LSIR All rights reserved. © 2003, Jie Wu, EPFL-I&C-IIF-LSIR, Laboratoire de systèmes d'informations répartis, Swiss Federal Institute.

CS6800 Advanced Theory of Computation Fall 2012 Vinay B Gavirangaswamy

Motivation When searching for information on the WWW, user perform a query to a search engine. The engine return, as the query’s result, a list of Web.

Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins.

The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan Instructor: Dr. Gautam Das.

Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405.

1 Applications of Relative Importance  Why is relative importance interesting? Web Social Networks Citation Graphs Biological Data  Graphs become too.

Roshnika Fernando P AGE R ANK. W HY P AGE R ANK ?  The internet is a global system of networks linking to smaller networks.  This system keeps growing,

1 Random Walks on Graphs: An Overview Purnamrita Sarkar, CMU Shortened and modified by Longin Jan Latecki.

CS315 – Link Analysis Three generations of Search Engines Anchor text Link analysis for ranking Pagerank HITS.

Ku naru shoh-mei no michi The way of the empty proof L’art de la preuve vide.

The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.

PageRank. s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31.

1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Search Algorithms Winter Semester 2004/ Nov.

Lecture #10 PageRank CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.

Ch 14. Link Analysis Padmini Srinivasan Computer Science Department

How works M. Ram Murty, FRSC Queen’s Research Chair Queen’s University or How linear algebra powers the search engine.

Theory of Computations III CS-6800 |SPRING

Ranking Link-based Ranking (2° generation) Reading 21.

9 Algorithms: PageRank. Ranking After matching, have to rank:

1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.

Applied Mathematical Sciences, Vol. 8, 2014, no. 18, HIKARI Ltd, hikari.comhttp://dx.doi.org/ /ams Formula for the.

PageRank Algorithm -- Bringing Order to the Web (Hu Bin)

By: Jesse Ehlert Dustin Wells Li Zhang Iterative Aggregation/Disaggregation(IAD)

1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Search Algorithms Winter Semester 2004/ Dec.

Gaussian Elimination and Back Substitution Aleksandra Cerović 0328/2010 1/41/4Gaussian Elimination And Back Substitution.

CompSci 100E 4.1 Google’s PageRank web site xxx web site yyyy web site a b c d e f g web site pdq pdq.. web site yyyy web site a b c d e f g web site xxx.

Link Analysis Algorithms Page Rank Slides from Stanford CS345, slightly modified.

Topics In Social Computing (67810) Module 1 (Structure) Centrality Measures, Graph Clustering Random Walks on Graphs.

Markov Chains and Random Walks

The PageRank Citation Ranking: Bringing Order to the Web

Search Engines and Link Analysis on the Web

Link-Based Ranking Seminar Social Media Mining University UC3M

PageRank and Markov Chains

DTMC Applications Ranking Web Pages & Slotted ALOHA

Laboratory of Intelligent Networks (LINK) Youn-Hee Han

Iterative Aggregation Disaggregation

Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"

Piyush Kumar (Lecture 2: PageRank)

Mikhail (Mike, Misha) Kagan jointly with Misha Klin

9 Algorithms: PageRank.

Link Structure Analysis

Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"

Presentation transcript:

Mathematical Foundations for Search and Surf Engines

Jean-Louis Lassez

Ryan Rossi

Lecture 1 & 2 Introduction Ergodic Theorem Perron-Frobenius Theorem Power Method Foundations of PageRank

Search Engines Search engine: deterministic Ranking of sites Mathematical foundations: Markov and Perron Frobenius Surf Engines Non deterministic: window shopping Ranking of links Mathematical foundations: Singular Value Decomposition

Initial approach to ranking sites: the in-degree heuristic

Google’s PageRank approach Problem: rank the sites in order of most visited

Another Approach: Hubs and Authorities Authorities: sites that contain the most important information Hubs: sites that provide directions to the authorities A H

Ranking Hyperlinks Local Global Update

Local ….. ? ? ?

Global ?? …..

2005 A C DF E B G H J I K 2006 L Updated

Part I: Ranking Sites The Web as a Markov Chain The Ergodic Theorem Perron-Frobenius Theorem Algorithms: PageRank, HITS, SALSA

Markov Chains Text Analysis Speech Recognition Statistical Mechanics Decision Science: Medicine, Commerce ….more recently…..

Bioinformatics M1M1 M2M2 M3M3 d1d1 I1I1 d2d2 I2I2 d1d1 I1I1

Internet

Markov’s Ergodic Theorem (1906) Any irreducible, finite, aperiodic Markov Chain has all states Ergodic (reachable at any time in the future) and has a unique stationary distribution, which is a probability vector.

s1s s2s2 s3s3 s4s Probabilities after 15 steps X1 =.33 X2 =.26 X3 =.26 X4 = steps X1 =.33 X2 =.26 X3 =.23 X4 = steps X1 =.37 X2 =.29 X3 =.21 X4 = steps X1 =.38 X2 =.28 X3 =.21 X4 = steps X1 =.38 X2 =.28 X3 =.21 X4 =.13

s1s s2s2 s3s3 s4s Probabilities after 15 steps X1 =.46 X2 =.20 X3 =.26 X4 = steps X1 =.36 X2 =.26 X3 =.23 X4 = steps X1 =.38 X2 =.28 X3 =.21 X4 = steps X1 =.38 X2 =.28 X3 =.21 X4 = steps X1 =.38 X2 =.28 X3 =.21 X4 =.13

p 11 x 1 + p 21 x 2 + p 31 x 3 = x 1 p 12 x 1 + p 22 x 2 + p 32 x 3 = x 2 p 13 x 1 + p 23 x 2 + p 33 x 3 = x 3 ∑x i = 1 x i ≥ 0 p 11 x 1 + p 21 x 2 + p 31 x 3 = x 1 p 12 x 1 + p 22 x 2 + p 32 x 3 = x 2 p 13 x 1 + p 23 x 2 + p 33 x 3 = x 3 ∑x i = 1 x i ≥ 0 X1, X2, X3 are probabilities p 11 x 1 + p 21 x 2 + p 31 x 3 = x 1 p 12 x 1 + p 22 x 2 + p 32 x 3 = x 2 p 13 x 1 + p 23 x 2 + p 33 x 3 = x 3 ∑x i = 1 x i ≥ 0 Probability of going to S1 from S2 Steady State Constraints

Solved by Power Iteration Method Based on Perron-Frobenius Theorem Where e is an initial vector and M is the stochastic matrix associated to the system.

Power Method Example M = /2 0 1/ / e =

Power Method Example M = /2 0 1/ / e =

Difficulties Proof, Design and Implementation Notion of convergence Deal with complex numbers Unique solution ……… Furthermore, it does not always work

The process does not converge, even though the solution is obvious

Perron-Frobenius Theorem: Is it really necessary?

Secret Weapon: Fourier Gauss Tarski Robinson The Power of Elimination

The elimination of the proof is an ideal seldom reached Elimination gives us the heart of the proof

Symbolic Gaussian Elimination Three variables: p 11 x 1 + p 21 x 2 + p 31 x 3 = x 1 p 12 x 1 + p 22 x 2 + p 32 x 3 = x 2 p 13 x 1 + p 23 x 2 + p 33 x 3 = x 3 ∑x i = 1 with Maple we get: x 1 = (p 31 p 21 + p 31 p 23 + p 32 p 21 ) / Σ x 2 = (p 13 p 32 + p 12 p 31 + p 12 p 32 ) / Σ x 3 = (p 13 p 21 + p 12 p 23 + p 13 p 23 ) / Σ Σ = (p 31 p 21 + p 31 p 23 + p 32 p 21 + p 13 p 32 + p 12 p 31 + p 12 p 32 + p 13 p 21 + p 12 p 23 + p 13 p 23 )

s1s1 p 12 p 21 p 11 s2s2 p 22 s3s3 p 33 p 13 p 23 p 31 p 32 x 1 = (p 31 p 21 + p 23 p 31 + p 32 p 21 ) / Σ x 2 = (p 13 p 32 + p 31 p 12 + p 12 p 32 ) / Σ x 3 = (p 21 p 13 + p 12 p 23 + p 13 p 23 ) / Σ

s1s1 p 12 p 21 p 11 s2s2 p 22 s3s3 p 33 p 13 p 23 p 31 p 32 x 1 = (p 31 p 21 + p 23 p 31 + p 32 p 21 ) / Σ x 2 = (p 13 p 32 + p 31 p 12 + p 12 p 32 ) / Σ x 3 = (p 21 p 13 + p 12 p 23 + p 13 p 23 ) / Σ

s1s1 p 12 p 21 p 11 s2s2 p 22 s3s3 p 33 p 13 p 23 p 31 p 32 x 1 = (p 31 p 21 + p 23 p 31 + p 32 p 21 ) / Σ x 2 = (p 13 p 32 + p 31 p 12 + p 12 p 32 ) / Σ x 3 = (p 21 p 13 + p 12 p 23 + p 13 p 23 ) / Σ

s1s1 p 12 p 21 p 11 s2s2 p 22 s3s3 p 33 p 13 p 23 p 31 p 32 x 1 = (p 31 p 21 + p 23 p 31 + p 32 p 21 ) / Σ x 2 = (p 13 p 32 + p 31 p 12 + p 12 p 32 ) / Σ x 3 = (p 21 p 13 + p 12 p 23 + p 13 p 23 ) / Σ

s1s1 p 12 p 21 p 11 s2s2 p 22 s3s3 p 33 p 13 p 23 p 31 p 32 x 1 = (p 31 p 21 + p 23 p 31 + p 32 p 21 ) / Σ x 2 = (p 13 p 32 + p 31 p 12 + p 12 p 32 ) / Σ x 3 = (p 21 p 13 + p 12 p 23 + p 13 p 23 ) / Σ

s1s1 p 12 p 21 p 11 s2s2 p 22 s3s3 p 33 p 13 p 23 p 31 p 32 x 1 = (p 31 p 21 + p 23 p 31 + p 32 p 21 ) / Σ x 2 = (p 13 p 32 + p 31 p 12 + p 12 p 32 ) / Σ x 3 = (p 21 p 13 + p 12 p 23 + p 13 p 23 ) / Σ

Symbolic Gaussian Elimination System with four variables: p 21 x 2 + p 31 x 3 + p 41 x 4 = x 1 p 12 x 1 + p 42 x 4 = x 2 p 13 x 1 = x 3 p 34 x 3 = x 4 ∑x i = 1

s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 / Σ x 3 = p 41 p 21 p 13 + p 42 p 21 p 13 / Σ x 4 = p 21 p 13 p 34 / Σ Σ = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 + p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 + p 41 p 21 p 13 + p 42 p 21 p 13 + p 21 p 13 p 34 with Maple we get:

s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 / Σ x 3 = p 41 p 21 p 13 + p 42 p 21 p 13 / Σ x 4 = p 21 p 13 p 34 / Σ Σ = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 + p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 + p 41 p 21 p 13 + p 42 p 21 p 13 + p 21 p 13 p 34

s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 / Σ x 3 = p 41 p 21 p 13 + p 42 p 21 p 13 / Σ x 4 = p 21 p 13 p 34 / Σ Σ = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 + p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 + p 41 p 21 p 13 + p 42 p 21 p 13 + p 21 p 13 p 34

s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 / Σ x 3 = p 41 p 21 p 13 + p 42 p 21 p 13 / Σ x 4 = p 21 p 13 p 34 / Σ Σ = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 + p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 + p 41 p 21 p 13 + p 42 p 21 p 13 + p 21 p 13 p 34

s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 / Σ x 3 = p 41 p 21 p 13 + p 42 p 21 p 13 / Σ x 4 = p 21 p 13 p 34 / Σ Σ = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 + p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 + p 41 p 21 p 13 + p 42 p 21 p 13 + p 21 p 13 p 34

Do you see the LIGHT? James Brown (The Blues Brothers)

s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 / Σ x 3 = p 41 p 21 p 13 + p 42 p 21 p 13 / Σ x 4 = p 21 p 13 p 34 / Σ Σ = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 + p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 + p 41 p 21 p 13 + p 42 p 21 p 13 + p 21 p 13 p 34

Ergodic Theorem Revisited If there exists a reverse spanning tree in a graph of the Markov chain associated to a stochastic system, then: (a)the stochastic system admits the following probability vector as a solution: (b) the solution is unique. (c) the conditions {x i ≥ 0} i=1,n are redundant and the solution can be computed by Gaussian elimination.

Cycle This system is now covered by the proof

Markov Chain as a Conservation System s1s1 p 12 p 21 p 11 s2s2 p 22 s3s3 p 33 p 13 p 23 p 31 p 32 p 11 x 1 + p 21 x 2 + p 31 x 3 = x 1 (p 11 + p 12 + p 13 ) p 12 x 1 + p 22 x 2 + p 32 x 3 = x 2 (p 21 + p 22 + p 23 ) p 13 x 1 + p 23 x 2 + p 33 x 3 = x 3 (p 31 + p 32 + p 33 )

Kirchoff’s Current Law (1847) The sum of currents flowing towards a node is equal to the sum of currents flowing away from the node. i 31 + i 21 = i 12 + i 13 i 32 + i 12 = i 21 i 13 = i 31 + i i 13 i 31 i 32 i 12 i 21 i 3 + i 2 = i 1 + i 4

Kirchhoff’s Matrix Tree Theorem (1847) The theorem allows us to calculate the number of spanning trees of a connected graph

Two theorems for the price of one!! Differences Ergodic Theorem: The symbolic proof is simpler and more appropriate than using Perron-Frobenius Kirchhoff Theorem: Our version calculates the spanning trees and not their total sum.

Internet Sites Kleinberg Google SALSA Indegree Heuristic

Google PageRank Patent “The rank of a page can be interpreted as the probability that a surfer will be at the page after following a large number of forward links.” The Ergodic Theorem

Google PageRank Patent “The iteration circulates the probability through the linked nodes like energy flows through a circuit and accumulates in important places.” Kirchoff (1847)

Rank Sinks No Spanning Tree

References Kirchhoff, G. "Über die Auflösung der Gleichungen, auf welche man bei der untersuchung der linearen verteilung galvanischer Ströme geführt wird." Ann. Phys. Chem. 72, , А. А. Марков. "Распространение закона больших чисел на величины, зависящие друг от друга". "Известия Физико-математического общества при Казанском университете", 2-я серия, том 15, ст , H. Minkowski, Geometrie der Zahlen (Leipzig, 1896). J. Farkas, "Theorie der einfachen Ungleichungen," J. F. Reine u. Ang. Mat., 124, 1-27, H. Weyl, "Elementare Theorie der konvexen Polyeder," Comm. Helvet., 7, , A. Charnes, W. W. Cooper, “The Strong Minkowski Farkas-Weyl Theorem for Vector Spaces Over Ordered Fields,” Proceedings of the National Academy of Sciences, pp , 1958.

References E. Steinitz. Bedingt Konvergente Reihen und Konvexe Systeme. J. reine angew. Math., 146:1-52, K. Jeev, J-L. Lassez: Symbolic Stochastic Systems. MSV/AMCS 2004: V. Chandru, J-L. Lassez: Qualitative Theorem Proving in Linear Constraints. Verification: Theory and Practice 2003: Jean-Louis Lassez: From LP to LP: Programming with Constraints. DBPL 1991 :