Presentation is loading. Please wait.

Presentation is loading. Please wait.

Web Searching Everything, now..

Similar presentations


Presentation on theme: "Web Searching Everything, now.."— Presentation transcript:

1 Web Searching Everything, now.

2 History of Search Archie (archives) - 1990 WWW Wanderer
Database of FTP filenames with regex query searching WWW Wanderer Web’s first robot High bandwidth load ALIWEB (Archie-Like Indexing of the WEB) Pages submitted with descriptions ARCHIE : Alan Emtage, McGill University in Montreal WWW Wanderer : Matthew Gray ALIWEB : artijn Koster, in response to WWW Wanderer – low bandwidth

3 History of Search Archietext - 1993 Yahoo! - 1994 Webcrawler - 1994 †
First to use statistical analysis of word relationships to generate results Yahoo! Searchable directory of pages with descriptions Webcrawler † Indexed entire web pages Lycos † 60 million documents by 1996 Archietext : stanford undergrads…. became Excite † (RIP) Yahoo : David Filo and Jerry Yang … started as a collection of links Webcrawler : Brian Pinkerton, University of Washington … bought by AOL Lycos : Michale Mauldin, Carnegie mellon … went public with 54,000 documents

4 History of Search Infoseek - 1994 † Altavista - 1995 † Looksmart -1996
Inktomi Ask Jeeves -1997 Google -1998 Teoma

5 Web Search Today Search algorithms are highly secret
Use off-page criteria for ranking Constant tweaking Things to look for: Boolean nesting Fields Clustering? Stop words Fields : specific structural units of a document (like “head” or “title”) Clustering : whether the results will be clustered or not Stop words : things that occur all the time

6 Web Search Today Google PageRank system Strengths Weaknesses
“Important” sites given artificial high rank Strengths Largest database Relevance based on external linkage Weaknesses No nesting May search for synonyms / grammatical variants (automatic stemming) Limited search features: no nesting Google does not make it clear which terms it stems and which it does not.

7 Web Search Today Yahoo! Brand new search database (as of Feb ’04)
Strengths Full boolean searching Very fresh Directory links Weaknesses Includes pay for inclusion results (!)

8 Web Search Today MSN Search (Inktomi) Large Inktomi database Strengths
Page depth limit Full boolean searching Depth limit - how far down a subdirectory hierarchy to look

9 Web Search Today Teoma Subject-specific popularity Strengths
Refine Related Weaknesses Small database No boolean nesting Subject-specific popularity : # of same SUBJECT pages that reference it => pickup truck example Refine : teoma’s best guess to narrow your search (uses clustering) Related : “expert” sites

10 Web Search Tomorrow Kartoo Nutch Dipsie Singingfish
Visual meta search engine Nutch Open source web search Java (but that could change) Dipsie “2 clicks” Singingfish Multimedia (audio / video) search

11 Internet Directories Selection Size Yahoo! User submission / editors
3 million Open Directory Editors (62,562!) 3.8 million LookSmart Selected 2.3 million CiteSeer Submission ??? Librarians’ Index Public Librarians 10 thousand InfoMine Academic Librarians 120 thousand RDN Academic Selections 30 thousand RDN : resource discovery net

12 Conclusion Which search engine is the best?

13 References http://searchengineshowdown.com/


Download ppt "Web Searching Everything, now.."

Similar presentations


Ads by Google