Download presentation
Presentation is loading. Please wait.
Published byKellie Ramsey Modified over 8 years ago
1
Searching Google: page rank and anchor text Hits: hubs and authorities MSN’s Ranknet: learning to rank Today’s web dragons
2
How to search: Google’s pagerank rank(~me) = rank(p) #outlinks(p) r(q) = C(p,q) o(p) r(p) r = C r r is an eigenvector of C Pagerank Anchor text Random surfer model Broken links (hence ) Trapping states (adjust C) ~me p1p1 p2p2 p3p3
3
Chart of the web Terra incognita 30% of nodes Random surfer 30% of nodes Milgram’s continent Corporate continent 20% of nodes New archipelago 20% of nodes vs random searcher
4
Google search: anchor text ~me: this is the best page ever ~me:you: that is the best page ever Pagerank Anchor text Google uses: … and weights them according to a secret recipe In anchor text? In URL? Title Meta tags level Rel font size Capitalization Word pos in doc Secret ingredients
5
HITS: hubs and authorities hub = C auth Principal eigenvector strongest community Other eigenvectors other communities hub authority hub(x) =authority(p) =C(x,p) auth(p) auth = C T hub hub is an eigenvector of C.C T hub = C.C T hub
6
Using HITS: Ask’s Teoma Web communities jaguar jaguar jaguar jaguar jaguar jaguar
7
Query neighborhood graph (search hits + neighbors) Using HITS: Ask’s Teoma Web communities jaguar jaguar jaguar jaguar jaguar Hub scores (lists of resources) Authority scores (target pages) helps to deal with synonyms pull in other relevant pages (e.g. Toyota is authority for “auto manufacturers” yet doesn’t contain the term)
8
Learning to rank: MSN’s Ranknet Training set queries with matching documents from human judges Discriminant function e.g. weighted sum of features, plus threshold Machine learning learn the weights Apply to real queries 17,000 queries 10 documents/query human judgement (1–5) 600 features pairs of docs with same query: which is more highly ranked? train a neural net (1-layer, 2-layer) Results? — Pretty good
9
Sergey Brin Larry Page Today’s web dragons 49% Google 1998 2004 23% Yahoo 1994 1996 Inktomi 2002 AltaVista 2003 10% MSN 2005 7%AOL Excite since 1997, Google since 2002 2%Ask (Jeeves) Teoma 2001
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.