Search Engine Interfaces search engine modus operandi
The basics: what’s a search engine? Search engines are special websites that are designed to find information stored on other sites Most have the following capabilities: Search the Internet based on important words Keep an index of the words they find and where they were found Allow users to looks for words or combos of words in that index
There’s a lot of sites out there…. Indeed (thousands upon thousands nowadays) The first search engine (for Gopher) was Archie (archive without the “v”).. Later, after the rise of Gopher came… Veronica (Very Easy Rodent-Orientated Net-wide Index to Computerized Archives) Jughead (Jonzy’s Universal Gopher Hierarchy Excavation And Display)
There’s a lot of sites out there…. Wandex First search engine (for the Web) WebCrawler (let users search for any word in any page.. revoutionary! Now standard..) Lycos (Carnegie Mellon University) Many others came after…. Excite, Infoseek, Inktomi, Northern Light, AltaVista, Yahoo!… Google came about around 2000 and rose to popularity because of it’s innovative PageRank system
How does it work? The pieces of a search engine A ‘spider’ or ‘crawler’ Software “robots” that go out and visit pages on the web and build lists of words that they find on each page An index The data (words) that are gathered are indexed (by a method determined by the particular search engine) A search Usually accompanied by Boolean logic
Example: Google Claim to fame: the PageRank system Uses multiple spiders (initially 3 at once) Spiders take note of: Words on the page & Where they were found The index consists of every “significant” word on each page Google excludes the articles ‘a’, ‘an’, and ‘the’ Each page that is indexed is weighted according to the PageRank System (a link analysis algorithm to provide a numerical weight) Searching When a search is performed by a user, Google retrieves from its index all of the pages that contain those keywords AND sorts them according by the assigned ‘PageRank’ Ideally the first several sites listed will match your search criteria
Example: Ask (formerly AskJeeves) Claim to fame: the ExpertRank algorithm (formerly Teoma) Uses multiple spiders Spiders take note of: Words on the page & Where they were found (same as Google) The index consists of every “significant” word on each page Uses link analysis like Google Each page is then also analyzed to determine its popularity among pages that are considered “experts” on the topic of the search. This is called subject-specific popularity. Searching - natural language search (or subject-specific search) When a search is performed by a user, Ask goes and finds the keywords in it’s index, figures out the topics (known as ‘clusters’), the experts on those topics, and then finds the most popular results among those experts This leads to a unique “editorial flavor” to searching (
Notable others: AltaVista and Lycos The AltaVista search engine indexes every word on the page - even insignificant articles such as ‘a’, ‘an’, and ‘the.’ The Lycos search engine “is said to” index around 100 of the most frequently words used on the page as well as each word in the first 20 lines of text.
So many options… Google is the most used search engine on the Internet today. (Around 50% of queries go through it) However, there are more efficient ways to search… Ask.com’s subject-specific searching much better reflects the way the Web is set up (in subject specific clusters). However, because of the complexity of their algorithm, the search results produced were inferior to competitors like Google’s PageRank system Only recently has Ask began to cut into the search engine market share (way behind Google, Yahoo, and MSN) by reducing how well the keywords must match the results (reduced from 100% to about 95%) This yields more search results and puts Ask in a better position to compete for market share.
By the numbers…. Below: Popularity (as of 12/07) Right: Timeline of major launches
Search engines of the future…. Two types of searching: Navigational and Research Search Navigational search - the user uses the search engine as a tool to navigate to a particular intended document Research search - the user provides the search engine with a phrase which is intended to denote an object about which the user is trying to gather/research information. Rather than use ranking algorithms such as Google's PageRank to predict relevancy, Semantic Search uses semantics, or the science of meaning in language to produce highly relevant search results. The goal is to deliver the information queried by a user rather than have a user sort through a list of loosely related keyword results.
Semantic Searching Contingent upon correct semantic markup - and searching over richly structured data (ie XML and RDF) The goal is to deliver the information queried by the user rather than have a user sort through a list of loosely related keyword results. Examples: and