Download presentation
Presentation is loading. Please wait.
Published bySibyl Palmer Modified over 9 years ago
1
9 Algorithms: PageRank
2
Ranking After matching, have to rank:
3
Index Based Ranking Strategies we could (do) use: – Frequency – Position – Metadata
4
Missing Ingredient Index lacks intra-page information
5
Link Quality Not all links are equal Who do you trust? – CS Prof – World Famous Chef
6
Identifying Authority Links into a page give it authority Page value = sum of authorities of pages linking to it
7
Link Quality More links is easy to abuse Spam Link Pages
8
Issues Spam Links – Discourage with negative weight Spam Link Pages
9
Issues Cycles:
10
Issues Cycles:
11
Issues Cycles: …
12
Random Surfer Simulating a web surfing session – Start at random page – At each page have a chance to Pick a random link to go to Jump to a completely random page
13
Results Results of many random sessions:
14
Results Expressed as percentages, results stabilize – Law of large numbers
15
Cycle Buster Random surfer not phased by cycles:
16
Random Surfer In Use The recipe pages visited by random surfers:
17
Simulator PageRank Simulator: http://caccio.blogdns.net/software/pagerank-simulator
18
The Real Math Markov Chains – Set of states – Each state has probability of leading to other states – Represent as matrix
19
Excel Simulation Three pages:
20
Limitations Still have issues/room for growth – Link Spam – Context of link Where link is on page "Bob's recipe is terrible" vs "Bob's recipe is great" – Lack of semantic knowledge Page's Authority should not be the same for all domains
21
Power Controlling search is power: http://www.bitsbook.com/ "If you're not paying for the product, you are the product."
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.