Presentation is loading. Please wait.

Presentation is loading. Please wait.

Social Networks 101 P ROF. J ASON H ARTLINE AND P ROF. N ICOLE I MMORLICA.

Similar presentations


Presentation on theme: "Social Networks 101 P ROF. J ASON H ARTLINE AND P ROF. N ICOLE I MMORLICA."— Presentation transcript:

1 Social Networks 101 P ROF. J ASON H ARTLINE AND P ROF. N ICOLE I MMORLICA

2 Lecture Ten: The web and PageRank.

3 The internet vs the web The internet:The world wide web: Nodes = machinesNodes = webpages Edges = wiresEdges = hyperlinks

4 The web is a directed graph Cows: Dairy Meat Dairy: Cheese Milk Meat: Cow Lamb

5 Directed graphs a a b b Edge (a,b) = edge from a to b.

6 Directed paths Definition: A directed path from v 1 to v k is a sequence of nodes (v 1, …, v k ) such that for any adjacent pair v i and v i+1, there’s an edge from v i to v i+1. v1v1 v2v2 v3v3 v4v4 Path (v 1, v 2, v 3, v 4 ).

7 Strongly connected components Definition: A strongly connected component is a subset of nodes {v 1, …, v k } such that for any pair v i and v j in the set, there’s a path from v i to v j. Strongly connected. Not strongly connected.

8 What does the web look like? Strongly connected component 56 million nodes

9 What does the web look like? Strongly connected component InOut Tendrils Tubes Disconnected components

10 Searching the web Q. How can Google answer your questions without understanding them? A. It uses the hyperlink structure.

11 Basic ideas 1.A link to a page is an endorsement of that page’s quality. 2.Links from high quality pages are better than links from low quality pages.

12 First attempt Initialize: Each page has equal rank (“tokens”). Repeat: Each page divides its tokens equally among all out-going links.

13 Initialization 1/5

14 First round 3/151/15 4/15 3/15

15 What could go wrong? Some node eventually collects all tokens.

16 What could go wrong? Some node eventually collects all tokens.

17 PageRank Initialize: Each page has equal rank (“tokens”). Repeat: Each page divides 1. an s fraction of its tokens equally among all out-going links. 2. a (1-s) fraction equally among all nodes

18 Important properties of PageRank 1.It converges (the PageRank of a page is the number of tokens it owns in the limit). 2.The initialization doesn’t matter.

19 Random walks and PageRank Randy browses the web randomly.

20 Start at arbitrary node. With prob. s, travel to random out-going link, With prob. (1-s), travel to random node. Repeat forever and ever.

21 Important properties Randy’s walk, 1. Converges: the probability Randy is on any given page approaches a fixed number in the limit. 2. It doesn’t matter where he starts.

22 Randy’s walk = PageRank The probability Randy is on a given page is proportional to that page’s PageRank.

23 Extensions Anchor text Click probabilities Link/click spam

24 Next time TBA


Download ppt "Social Networks 101 P ROF. J ASON H ARTLINE AND P ROF. N ICOLE I MMORLICA."

Similar presentations


Ads by Google