Presentation is loading. Please wait.

Presentation is loading. Please wait.

The PageRank Citation Ranking: Bringing Order to the Web

Similar presentations


Presentation on theme: "The PageRank Citation Ranking: Bringing Order to the Web"— Presentation transcript:

1 The PageRank Citation Ranking: Bringing Order to the Web
Presented By: Noy Hadar

2 Introduction and Motivation
there are over 150 million web pages. at least 4.62 billion web pages. Huge number of web pages. The average web page quality >= quality of the average web page.

3 Let’s Count Simple links count doesn’t work. Useless websites
Show Me Your Friends, I’ll Tell You Who You Are

4 Related Work HITS algorithm (Jon Kleinberg). Two scores for each page:
authority hub

5 Link Structure of the Web
Forward links = outedges Backlinks = inedges

6 Importance of Links Most pages have just a few backlinks.
Highly linked pages are more "important”. 1 important link vs. many average ranked links Vs.

7 Definition of PageRank
A method for computing a ranking for every web page. Based on the graph of the web. High rank requires: many backlinks highly ranked backlinks A page is important if important pages refer to it.

8 Simple Ranking Function
u: web page Bu: backlinks Nu = |Fu| number of links from u c: factor used for normalization

9 Simplified PageRank Calculation
1/8 A B C D E F G H 1 8 1 8 1 8 1 8 1 8 1 8 1 8 1 8

10 Iteration 1 Rank Page 1/2 A 1/8 B C D E F G H 1 8 1 8 1 8 1 8 1 8 1 8

11 Iteration 1 Rank Page 1/2 A 1/16 B 1/8 C D E F G H 1 8 1 8 1 8 1 8 1 8

12 Iteration 1 Rank Page 1/2 A 1/16 B C 1/8 D E F G H 1 8 1 8 1 8 1 8 1 8

13 Iteration 1 Rank Page 1/2 A 1/16 B C D 1/8 E F G H 1 8 1 8 1 8 1 8 1 8

14 Iteration 1 Rank Page 1/2 A 1/16 B C D E 1/8 F G H 1 8 1 8 1 8 1 8 1 8

15 Iteration 1 Rank Page 1/2 A 1/16 B C D E F 1/8 G H 1 8 1 8 1 8 1 8 1 8

16 Iteration 1 Rank Page 1/2 A 1/16 B C D E F G 1/8 H 1 8 1 8 1 8 1 8 1 8

17 Iteration 1 Rank Page 1/2 A 1/16 B C D E F G 1/8 H 1 8 1 8 1 8 1 8 1 8

18 Update Iteration 1 Rank Page 1/2 A 1/16 B C D E F G 1/8 H 1 2 1 16
1 8

19 Iteration 2 Rank Page 5/16 A 1/16 B C D E F G 1/8 H 1 2 1 16 1 16 1 16
1 8

20 Iteration 2 Rank Page 5/16 A 1/4 B 1/16 C D E F G 1/8 H 1 2 1 16 1 16
1 8

21 Iteration 2 Rank Page 5/16 A 1/4 B C 1/16 D E F G 1/8 H 1 2 1 16 1 16
1 8

22 Iteration 2 Rank Page 5/16 A 1/4 B C 1/32 D 1/16 E F G 1/8 H 1 2 1 16
1 8

23 Iteration 2 Rank Page 5/16 A 1/4 B C 1/32 D E 1/16 F G 1/8 H 1 2 1 16
1 8

24 Iteration 2 Rank Page 5/16 A 1/4 B C 1/32 D E F 1/16 G 1/8 H 1 2 1 16
1 8

25 Iteration 2 Rank Page 5/16 A 1/4 B C 1/32 D E F G 1/8 H 1 2 1 16 1 16
1 8

26 Iteration 2 Rank Page 5/16 A 1/4 B C 1/32 D E F G 1/16 H 1 2 1 16 1 16
1 8

27 Update Iteration 2 Rank Page 5/16 A 1/4 B C 1/32 D E F G 1/16 H 5 16
1 4 1 4 1 32 1 32 1 32 1 32 1 16

28 Rank Sink F and G form a loop that accumulates rank to infinity.

29 Random Surfer Model The “random surfer” simply keeps clicking on successive links at random. If stuck in a loop of web pages jump to some other page. We model this behavior with the additional factor E.

30 PageRank Expression Let E(u) be some vector over the Web pages that corresponds to a source of rank. The water intuition. Dumping factor Number of forward links from v Usually d=0.85

31 PageRank Calculation Rank Page 1/8 A B C D E F G H 1 8 1 8 1 8 1 8 1 8

32 PageRank Calculation Rank Page A B C D E F G H 1 8 1 8 1 8 1 8 1 8 1 8
0.231 A 1/8 B C D E F G H 1 8 1 8 1 8 1 8 1 8 1 8 1 8 1 8

33 PageRank Calculation Rank Page A B C D E F G H 1 8 1 8 1 8 1 8 1 8 1 8
0.231 A 0.071 B 1/8 C D E F G H 1 8 1 8 1 8 1 8 1 8 1 8 1 8 1 8

34 PageRank Calculation Rank Page A B C D E F G H 1 8 1 8 1 8 1 8 1 8 1 8
0.231 A 0.071 B C 1/8 D E F G H 1 8 1 8 1 8 1 8 1 8 1 8 1 8 1 8

35 PageRank Calculation Rank Page A B C D E F G H 1 8 1 8 1 8 1 8 1 8 1 8
0.231 A 0.071 B C D 1/8 E F G H 1 8 1 8 1 8 1 8 1 8 1 8 1 8 1 8

36 PageRank Calculation Rank Page A B C D E F G H 1 8 0.85*1/16+0.15*1/8
0.231 A 0.071 B C D E 1/8 F G H 1 8 0.85*1/ *1/8 1 8 1 8 1 8 1 8 1 8 1 8 1 8

37 PageRank Calculation Rank Page A B C D E F G H 1 8 0.85*1/16+0.15*1/8
0.231 A 0.071 B C D E F 1/8 G H 1 8 0.85*1/ *1/8 1 8 1 8 1 8 1 8 1 8 1 8 1 8

38 PageRank Calculation Rank Page A B C D E F G H 1 8 0.85*1/16+0.15*1/8
0.231 A 0.071 B C D E F G 1/8 H 1 8 0.85*1/ *1/8 1 8 1 8 1 8 1 8 1 8 1 8 1 8

39 PageRank Calculation Rank Page A B C D E F G H 1 8 0.85*1/16+0.15*1/8
0.231 A 0.071 B C D E F G H 1 8 0.85*1/ *1/8 1 8 1 8 1 8 1 8 1 8 1 8 1 8

40 Dangling Links Links that point to any page with no outgoing links.
Where should their weight be distributed?

41 PageRank Implementation
Convert each URL into a unique integer ID Sort the link structure by ID Remove the dangling links Make an initial assignment of ranks Iteratively compute PageRank until Convergence Add the dangling links back Recompute the rankings After adding the dangling links back, we need to iterate as many times as was required to remove the dangling links

42 Convergence of PageRank Computations
PageRank(322 Million link db) converges in 52 iterations PageRank(322/2 Million link db) converges in 45 iterations Scaling factor is roughly linear in logn

43 Personalized PageRank
Important component of PageRank calculation is E E vector corresponds to the distribution of web pages that a random surfer periodically jumps to. In Personalized PageRank E consists of a single web page.

44 Conclusions PageRank is based solely on page location in the Web’s graph structure. More important and central Web pages are given preference. The structure of the Web graph is very useful for information retrieval tasks.

45 References


Download ppt "The PageRank Citation Ranking: Bringing Order to the Web"

Similar presentations


Ads by Google