Download presentation
Presentation is loading. Please wait.
1
Social Networks 101 P ROF. J ASON H ARTLINE AND P ROF. N ICOLE I MMORLICA
2
Lecture Ten: The web and PageRank.
3
The internet vs the web The internet:The world wide web: Nodes = machinesNodes = webpages Edges = wiresEdges = hyperlinks
4
The web is a directed graph Cows: Dairy Meat Dairy: Cheese Milk Meat: Cow Lamb
5
Directed graphs a a b b Edge (a,b) = edge from a to b.
6
Directed paths Definition: A directed path from v 1 to v k is a sequence of nodes (v 1, …, v k ) such that for any adjacent pair v i and v i+1, there’s an edge from v i to v i+1. v1v1 v2v2 v3v3 v4v4 Path (v 1, v 2, v 3, v 4 ).
7
Strongly connected components Definition: A strongly connected component is a subset of nodes {v 1, …, v k } such that for any pair v i and v j in the set, there’s a path from v i to v j. Strongly connected. Not strongly connected.
8
What does the web look like? Strongly connected component 56 million nodes
9
What does the web look like? Strongly connected component InOut Tendrils Tubes Disconnected components
10
Searching the web Q. How can Google answer your questions without understanding them? A. It uses the hyperlink structure.
11
Basic ideas 1.A link to a page is an endorsement of that page’s quality. 2.Links from high quality pages are better than links from low quality pages.
12
First attempt Initialize: Each page has equal rank (“tokens”). Repeat: Each page divides its tokens equally among all out-going links.
13
Initialization 1/5
14
First round 3/151/15 4/15 3/15
15
What could go wrong? Some node eventually collects all tokens.
16
What could go wrong? Some node eventually collects all tokens.
17
PageRank Initialize: Each page has equal rank (“tokens”). Repeat: Each page divides 1. an s fraction of its tokens equally among all out-going links. 2. a (1-s) fraction equally among all nodes
18
Important properties of PageRank 1.It converges (the PageRank of a page is the number of tokens it owns in the limit). 2.The initialization doesn’t matter.
19
Random walks and PageRank Randy browses the web randomly.
20
Start at arbitrary node. With prob. s, travel to random out-going link, With prob. (1-s), travel to random node. Repeat forever and ever.
21
Important properties Randy’s walk, 1. Converges: the probability Randy is on any given page approaches a fixed number in the limit. 2. It doesn’t matter where he starts.
22
Randy’s walk = PageRank The probability Randy is on a given page is proportional to that page’s PageRank.
23
Extensions Anchor text Click probabilities Link/click spam
24
Next time TBA
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.