Download presentation
Presentation is loading. Please wait.
Published byAntonia Richards Modified over 9 years ago
1
Ljiljana Rajačić
2
Page Rank Web as a directed graph Nodes: Web pages Edges: Hyperlinks 2 / 25 Ljiljana Rajačić
3
Page Rank Two challenges of web search 1.Web contains many sources of information Who to trust? 2.What is the “best” answer to a query? No single right answer Not all web pages are equally “important” Ljiljana Rajačić 3 / 25
4
Page Rank Link analysis approaches Rank pages (nodes) by analyzing topology of the web graph Idea: Links as votes -Page is more important if it has more links adjacent to it Incoming links? Outgoing links? Links from important pages have higher weight => recursive problem! Ljiljana Rajačić 4 / 25
5
Page Rank Ljiljana Rajačić 5 / 25
6
Page Rank Link weight proportional to the importance of its source page If page j with importance r j has n out-links, each link gets r j / n votes Page j ‘s own importance is the sum of the votes on its in-links Ljiljana Rajačić 6 / 25
7
Page Rank A page is important if it is pointed to by other important pages Rank r j of page j : d i out-degree of node i Ljiljana Rajačić 7 / 25
8
Page Rank Ljiljana Rajačić 8 / 25
9
Page Rank Ljiljana Rajačić 9 / 25
10
Page Rank Since Flow equasion in the matrix form: Ljiljana Rajačić 10 / 25 M ∙ r = r Page i links to 3 pages, including j
11
Page Rank x is an eigenvector with the corresponding eigenvalue λ if Since Rank vector r is an eigenvector of web matrix M, with corresponding eigenvalue 1 We can now efficiently find r ! Power iteration method Ljiljana Rajačić 11 / 25 Mx = λ x M ∙ r = r
12
Page Rank Ljiljana Rajačić 12 / 25 d i – out-degree of node i
13
Page Rank Page rank simulates a random web surfer: At any time t, surfer is on some page i At t + 1, he follows an out-link from i uniformly at random Ends up on some page j linked from i Rank vector r is a stationary distribution of probabilities that a random walker is on page i at arbitrary time t Ljiljana Rajačić 13 / 25
14
Page Rank Ljiljana Rajačić 14 / 25 Does this converge? Does it converge to what we want? Are the results reasonable?
15
Page Rank Ljiljana Rajačić 15 / 25 All out-links are within an isolated group Spider traps absorbe all rank eventually
16
Page Rank At each step, random surfer has 2 options: Follow a random link with probability β Jump to random page with probability 1 – β β is usually in range 0.8 – 0.9 Ljiljana Rajačić 16 / 25
17
Page Rank Ljiljana Rajačić 17 / 25 A dead end is a page with no out-links They cause rank “leaking out” All 0 in b’s column
18
Page Rank Always jump to random page from a dead end Ljiljana Rajačić 18 / 25
19
Page Rank PageRank equation [Brin – Page, 1998]: Google matrix A: Ljiljana Rajačić 19 / 25 e – vector of all 1s
20
Page Rank Key step is matrix – vector multiplication A is dense – no 0 elements M was sparse only ~ 10 – 100 non-zero elements per column We want to work with M It’s possible! Ljiljana Rajačić 20 / 25
21
Page Rank Ljiljana Rajačić 21 / 25
22
Page Rank Ljiljana Rajačić 22 / 25
23
Page Rank CPU Graph representation: Adjecency list O(m) per iteration, where m is the number of edges m = O(n) => O(n) per iteration CUDA Graph representation: Adjecency matrix O(n 2 ) per iteration Ljiljana Rajačić 23 / 25
24
Page Rank Ljiljana Rajačić 24 / 25 Number of pagesCPUCUDA 300290 ms340 ms 400570 ms380 ms 500860 ms550 ms >850000~6.5 sMemory overflow
25
Page Rank Thanks for the attention! Ljiljana Rajačić 25 / 25
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.