Download presentation
Presentation is loading. Please wait.
Published byDwain Gray Modified over 9 years ago
1
Lecture #10 PageRank CS492 Special Topics in Computer Science: Distributed Algorithms and Systems
2
Origin of “Google” Googol – 10^100 Motivation behind – Human maintained indices such as Yahoo! – Explosive growth http://news.netcraft.com Hostnames Active
3
Design Goals of Google Improved search quality – In 1997, 1 out of 4 top search engines found itself – High precision in finding relevant document was necessary Academic search engine research – Search engine technology went commercial: an black art – To build systems that a good number of people could use – To build an architecture to support novel research on large-scale Web data
4
Weakness of Existing Approaches Calculate similarities – Based on flat, vector-space model of each page – Prone to cheating (Web spamming or search engine persuasion)
5
Basic Idea of PageRank Exploit the topological structure of hypertextual systems
6
Simple Example A C B 0.2 0.4
7
Related Work Academic citation analysis – Similarities Graph structure; paper = node, web page = node citation = link, URL = link “node” authority independent of “node” content – Differences Uniform unit of info (paper) versus great variability in quality, usage, citations, and length Equal link weight vs variable importance A backlink from Yahoo! vs. from a friend
8
Which Page Should Be Ranked Higher? A B John Doe
9
Simple Expression page rank of set of pages pointing at out-degree of Question: role of c? Answer: total rank of all web pages constant
10
Dangling links Pages without outgoing pointers – Example: Pages not yet downloaded Do not affect the calculation much – Remove them, calculate ranks, and add them back
11
Loop A C B Question: ranks of A, B, and C? Answer: infinite! (rank sink)
12
Basic Algorithm page rank of set of pages pointing at out-degree of dumping factor
13
Matrix Representation Question: Where to start? 13 whereand
14
Iterative Algorithm whereand Question: Will it converge?
15
Example [LM04]
16
Turn the Problem into a Markov Process [LM04]
17
Evenly Split Rank of Dangling Links 17 [LM04]
18
Final Solution Eigenvector of P = steady state rank 18
19
Spam Rank [BGS05]
20
Questions Where to start? – Find a nondegenerate start vector What if there are two pages that point to each other and no one else and there is a page that points to one of them? – Role of dumping factor guarantees no rank sink
21
References [PBMW] L. Page, S. Brin, R. Motwani, T. Winograd, “The PageRank citation ranking: bringing order to the web,” WWW 1998 [BP98] Sergey Brin, Lawrence Page, “The anatomy of a large-scale hypertextual Web search engine,” Computer Networks and ISDN Systems, Vol. 30, 1998. [BGS05] Monica Bianchini, Marco Gori, Franco Scarselli, “Inside PageRank,” ACM Transactions on Internet Technology, Vol. 5, No. 1, Feb. 2005. [LM04] Amy N. Langville, Carl Meyer, “Deeper inside PageRank,” Internet Mathematics, Vol. I, No. 3, 2004. [K99] Jon Kleinberg, “Authoritative sources in a Hyperlinked Environment,” Journal of the ACM 46:5 (1999).
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.