Download presentation
Presentation is loading. Please wait.
1
lexicon/dictionary DIC Inverted Index Allows quick lookup of document ids with a particular word 3810131620 Stanford UCLA MIT … 12391618 PL(Stanford) PL(UCLA) Posting list 4581013192022 PL(MIT)
2
Junghoo "John" Cho (UCLA Computer Science)2 PageRank A page is important if it is pointed by many important pages PR( p ) = PR( p 1 )/ c 1 + … + PR( p k )/ c k p i : page pointing to p, c i : number of links in p i PageRank of p is the sum of PageRanks of its parents One equation for every page – N equations, N unknown variables
3
Junghoo "John" Cho (UCLA Computer Science)3 Example: Web of 1842 Ne Am MS PR(n) = PR(n)/2 + PR(a)/2 PR(m) = PR(a)/2 PR(a) = PR(n)/2+PR(m) Netscape, Microsoft and Amazon
4
Junghoo "John" Cho (UCLA Computer Science)4 PageRank: Matrix Notation Web graph matrix M = { m ij } – Each page i corresponds to row i and column i of the matrix M – m ij = 1/ c if page i is one of the c children of page j m ij = 0 otherwise PageRank vector PageRank equation
5
Junghoo "John" Cho (UCLA Computer Science)5 PageRank: Iterative Computation Initially every page has a unit of importance At each round, each page shares its importance among its children and receives new importance from its parents Eventually the importance of each page reaches a limit – Stochastic matrix
6
Junghoo "John" Cho (UCLA Computer Science)6 Example: Web of 1842 Ne Am MS
7
Junghoo "John" Cho (UCLA Computer Science)7 PageRank: Random Surfer Model The probability of a Web surfer to reach a page after many clicks, following random links Random Click
8
Junghoo "John" Cho (UCLA Computer Science)8 Problems on the Real Web Dead end – A page with no links to send importance – All importance “leak out of” the Web Crawler trap – A group of one or more pages that have no links out of the group – Accumulate all the importance of the Web
9
Junghoo "John" Cho (UCLA Computer Science)9 Example: Dead End No link from Microsoft Ne Am MS Dead end
10
Junghoo "John" Cho (UCLA Computer Science)10 Example: Dead End Ne Am MS
11
Junghoo "John" Cho (UCLA Computer Science)11 Solution to Dead End Assume a surfer to jumps to a random page at a dead end Ne Am MS
12
Junghoo "John" Cho (UCLA Computer Science)12 Example: Crawler Trap Only self-link at Microsoft Ne Am MS Crawler trap
13
Junghoo "John" Cho (UCLA Computer Science)13 Example: Crawler Trap Ne Am MS
14
Junghoo "John" Cho (UCLA Computer Science)14 Crawler Trap: Damping Factor “Tax” each page some fraction of its importance and distribute it equally – Probability to jump to a random page Assuming 20% tax
15
Algorithm KMP while (m + i) < |D| do: if W[i] = D[m + i], let i = i + 1 if i = |W|, return m otherwise, let m = m + i - T[i], if i > 0, let i = T[i] return no-match
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.