9 Algorithms: PageRank
Ranking After matching, have to rank:
Index Based Ranking Strategies we could (do) use: – Frequency – Position – Metadata
Missing Ingredient Index lacks intra-page information
Link Quality Not all links are equal Who do you trust? – CS Prof – World Famous Chef
Identifying Authority Links into a page give it authority Page value = sum of authorities of pages linking to it
Link Quality More links is easy to abuse Spam Link Pages
Issues Spam Links – Discourage with negative weight Spam Link Pages
Issues Cycles:
Issues Cycles:
Issues Cycles: …
Random Surfer Simulating a web surfing session – Start at random page – At each page have a chance to Pick a random link to go to Jump to a completely random page
Results Results of many random sessions:
Results Expressed as percentages, results stabilize – Law of large numbers
Cycle Buster Random surfer not phased by cycles:
Random Surfer In Use The recipe pages visited by random surfers:
Simulator PageRank Simulator:
The Real Math Markov Chains – Set of states – Each state has probability of leading to other states – Represent as matrix
Excel Simulation Three pages:
Limitations Still have issues/room for growth – Link Spam – Context of link Where link is on page "Bob's recipe is terrible" vs "Bob's recipe is great" – Lack of semantic knowledge Page's Authority should not be the same for all domains
Power Controlling search is power: "If you're not paying for the product, you are the product."