Presentation is loading. Please wait.

Presentation is loading. Please wait.

Google PageRank Algorithm

Similar presentations


Presentation on theme: "Google PageRank Algorithm"— Presentation transcript:

1 Google PageRank Algorithm
By: Danny Lin

2 Table of Contents Google Search History / What is Page Rank?
Page Rank Algorithm Inbound/Outbound Links Dangling Nodes Constraints Calculating your page rank How to maximize your page rank score Loopholes Neat stuff

3 Google Search Google search using PageRank:
1) Crawl the web and locate all publicly accessible webpages 2) Index the data from step 1 to allow for efficient searches for keywords or phrases 3) Rate the importance of each page in the database – using PageRank 4) Return results in descending order of importance with respect to search

4 Google’s Original Architectural Design
Source:

5 History Page Rank was conceptualized by Sergey Brin and Lawrence Page; discussed in their paper: The anatomy of a large-scale hypertextual web search engine ( Used to rank the importance of web pages Source:

6 PR(A) = (1-d) + d(PR(T1)/C(T1) + … + PR(Tn)/C(Tn))
Page Rank Algorithm PR(A) = (1-d) + d(PR(T1)/C(T1) + … + PR(Tn)/C(Tn)) PR(Tn) - The importance of page Tn. C(Tn) - The number of outgoing links for page Tn. PR(Tn)/C(Tn) - The calculated importance passed to page A from page Tn. d - damping factor (0.85).

7 Inbound/Outbound Links
With respect to page A: Inbound links – links that point towards page A Outbound links – links within page A pointing towards other pages

8 Dangling Nodes A dangling node is a page that does not have any outbound links. Issue: They act as sinks that reduce the importance from the web. Solution: Assume that the dangling node has a link to every other page. We randomly select the next page at random. This creates a stochastic matrix; all entries are nonnegative and the sum of each column is equal to 1. Source:

9 Constraints Must be primitive, i.e. for some n, Sn has all positive entries where λ1 = 1 and λ2 < 1 Must be stochastic, i.e. all entries are nonnegative and the sum of each column is equal to 1. Must be irreducible, i.e. you should not be able to perform row/column permutations such that you end up with a block upper-triangular form. The nodes must be strongly connected.

10 Calculating your page rank
“Page Rank can be calculated using a simple iterative algorithm and corresponds to the principal eigenvector of the normalized link matrix (probability distribution) of the web” Algorithm to calculate the normalized probability distribution: Multiply stochastic matrix, S, with an random eigenvector, i1, to get new eigenvector, i2… Repeat step 1 until in-1 = in (approx.) LINEAR ALGEBRA TIME!!! Page Rank calculation time!

11 How to maximize your page rank score
Internal Linking – having links to other pages within your website Hierarchical Fully meshed Good and plentiful content E.g. news website Provide a useful service or product E.g. phpbb – online bulletin board system

12 Loopholes SEO (Search Engine Optimization) webpages to increase traffic flow  conversions  $$ An issues that arose from this: the selling of links from high PR pages Source:

13 Neat stuff Overview of a google search (1-2 minutes):
How search has evolved (6 minutes): Changes to Google’s search algorithm:

14 References Content Images

15 Questions? Source:


Download ppt "Google PageRank Algorithm"

Similar presentations


Ads by Google