Download presentation
Presentation is loading. Please wait.
1
Google PageRank Algorithm
By: Danny Lin
2
Table of Contents Google Search History / What is Page Rank?
Page Rank Algorithm Inbound/Outbound Links Dangling Nodes Constraints Calculating your page rank How to maximize your page rank score Loopholes Neat stuff
3
Google Search Google search using PageRank:
1) Crawl the web and locate all publicly accessible webpages 2) Index the data from step 1 to allow for efficient searches for keywords or phrases 3) Rate the importance of each page in the database – using PageRank 4) Return results in descending order of importance with respect to search
4
Google’s Original Architectural Design
Source:
5
History Page Rank was conceptualized by Sergey Brin and Lawrence Page; discussed in their paper: The anatomy of a large-scale hypertextual web search engine ( Used to rank the importance of web pages Source:
6
PR(A) = (1-d) + d(PR(T1)/C(T1) + … + PR(Tn)/C(Tn))
Page Rank Algorithm PR(A) = (1-d) + d(PR(T1)/C(T1) + … + PR(Tn)/C(Tn)) PR(Tn) - The importance of page Tn. C(Tn) - The number of outgoing links for page Tn. PR(Tn)/C(Tn) - The calculated importance passed to page A from page Tn. d - damping factor (0.85).
7
Inbound/Outbound Links
With respect to page A: Inbound links – links that point towards page A Outbound links – links within page A pointing towards other pages
8
Dangling Nodes A dangling node is a page that does not have any outbound links. Issue: They act as sinks that reduce the importance from the web. Solution: Assume that the dangling node has a link to every other page. We randomly select the next page at random. This creates a stochastic matrix; all entries are nonnegative and the sum of each column is equal to 1. Source:
9
Constraints Must be primitive, i.e. for some n, Sn has all positive entries where λ1 = 1 and λ2 < 1 Must be stochastic, i.e. all entries are nonnegative and the sum of each column is equal to 1. Must be irreducible, i.e. you should not be able to perform row/column permutations such that you end up with a block upper-triangular form. The nodes must be strongly connected.
10
Calculating your page rank
“Page Rank can be calculated using a simple iterative algorithm and corresponds to the principal eigenvector of the normalized link matrix (probability distribution) of the web” Algorithm to calculate the normalized probability distribution: Multiply stochastic matrix, S, with an random eigenvector, i1, to get new eigenvector, i2… Repeat step 1 until in-1 = in (approx.) LINEAR ALGEBRA TIME!!! Page Rank calculation time!
11
How to maximize your page rank score
Internal Linking – having links to other pages within your website Hierarchical Fully meshed Good and plentiful content E.g. news website Provide a useful service or product E.g. phpbb – online bulletin board system
12
Loopholes SEO (Search Engine Optimization) webpages to increase traffic flow conversions $$ An issues that arose from this: the selling of links from high PR pages Source:
13
Neat stuff Overview of a google search (1-2 minutes):
How search has evolved (6 minutes): Changes to Google’s search algorithm:
14
References Content Images
15
Questions? Source:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.