Download presentation
Presentation is loading. Please wait.
Published byUlysses Finnie Modified over 10 years ago
1
Hyper search ing the Web Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins Jacob Kalakal Joseph CS 572 (Spring 2011) | Class Presentation | June 21, 2011
2
Outline Characteristics of the WWW Motivation for building search engines Traditional SEs and the challenges Improvements the associated problems CLEVER Power of hyperlinks Hubs and Authorities Algorithm Evaluate CLEVER Future scope Answer questions and class discussion
3
WWW ~ Universe
4
Motivation for search engines
5
Initial Attempts Ranking functions based on simple heuristics
6
Challenges: Synonymy
7
Challenges: Polysemy
8
Challenges: Spamming Cheap airtickets Cheap airtickets Cheap airtickets Cheap airtickets Cheap airtickets White font on White background
9
Improvements Semantic NetworksHuman selectors Helps synonymy but worsens polysemy Impractical
10
Hyperlinks - What a CLEVER idea!
11
Hubs & Authorities
12
How it works
13
Clever vs. Google Googles faster!Clever looks back also
14
Pros Rapid convergence (5 iterations for root set of 3000 pages) Independent of the initial H, A scores Get info even before we actually crawl
15
Segregation of web into clusters
16
Cons The underlying assumption – Web links confer authority – could be incorrect! – Navigation – Advertisement – Disapproval
17
Cons Ignores the Anchor text It is not necessary for every page to be either a hub or an authority Universally popular Websites like Wikipedia will be an authority on almost everything May return a General result for a Narrow topic search
18
Whats next?
19
References S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins,Hypersearching the Web. Scientific American, June 1999.Hypersearching the Web CLEVER project (http://www.almaden.ibm.com/projects/clever.shtml)http://www.almaden.ibm.com/projects/clever.shtml J. Kleinberg.Authoritative sources in a hyperlinked environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998Authoritative sources in a hyperlinked environment S. Brin, L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems. Vol. 30, No. 1-7, pp. 107-117, 1998.The anatomy of a large-scale hypertextual Web search engine WordNet Project (http://wordnet.princeton.edu/)http://wordnet.princeton.edu/
20
Group Discussion
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.