Hyper-Searching the Web
Search Engines Basic Search (index) Cluster Search (themes) Meta-search (outsource) “Smarter” meta-search (themes + outsource)
Basic search engine Examples: AltaVista, InfoSeek, HotBot, Lycos, Excite, Google, etc Maintains an index for every word found Processes through crawling, indexing, and returning results
Basic search engine Different ranking systems used -most use heuristics (easiest solution) counts # of keywords that appear -Google uses PageRank
Basic search engine No idea of searcher’s intent so “best” result hard to achieve Problems with synonymy and polysemy ex. car and automobile ex. jaguar One solution: store semantic relations -only can help w/synonmy Can’t identify concepts/author intent ex. IBM site does not say “computer”
Cluster search engine Example: Clusty Clusters results into categories/themes Can show results that would be ranked lower in another search engine -due to different meanings in words, can show the less searched-for
Meta-search engine Examples: Dogpile, Surfwax, Copernic, etc Sends searcher’s query to a database of search engines Claimed to not be any better than database; often the referenced search engines are small, free, commercial Users can create their own on Google of up to 5,000 URLs as “database”
“Smarter” meta-search engine Example: Clever project (n/a online yet) Includes clustering and linguistic analysis “cat” Cat – feline Cat – power Cat – equipment Cat – scans etc.
The Clever Project Uses hyperlinks to locate hubs and authorities “a respected authority is a page that is referred to by many good hubs; a useful hub is a location that points to many valuable authorities”
The Clever Project Obtains a list of webpages from a standard index & follows hyperlinks to increase own database -resulting collection = “root set” -each page gets numerical hub & authority score
The Clever Project Similar to PageRank in determining method – guesses & constant calculations -useful by-product: clusters sites Adds to competition because competitors don’t have to acknowledge their competition through hyperlinks
Clever vs. Google GOOGLE - gives initial rankings - keeps pages indpt. of queries - faster - looks forward “link to link” CLEVER - root sets per keyword - page priority through query context - forwards & backwards “hub and authority” - sometimes too broad ex. Fallingwater