9 Algorithms: PageRank
Ranking After matching, have to rank:
Ranking with Links Links are a sign of importance:
Identifying Authority Links into a page give it authority Page value = sum of authorities of pages linking to it
Issues Cycles: …
Random Surfing Instead of counting links, randomly surf them
Link Quality Not all links are equal Who do you trust? CS Prof World Famous Chef
Random Surfer In Use We assume that more important pages will have more incoming links
The Real Math Markov Chains Set of states Each state has probability of leading to other states Represent as matrix
Excel Simulation Three pages:
Link Quality More links is easy to abuse Spam Link Pages
Issues Spam Links Discourage with negative weight Spam Link Pages -1
Limitations Still have issues/room for growth Link Spam Context of link Where link is on page "Bob's recipe is terrible" vs "Bob's recipe is great" Lack of semantic knowledge Page's Authority should not be the same for all domains
"If you're not paying for the product, you are the product." Power Controlling search is power: http://www.bitsbook.com/ "If you're not paying for the product, you are the product."