Unit 3 Web Search Engines
Can You Find the Answers? n Connect to Google Google n Search for items on Iran Records ________ n Combine Iran with nuclear weaponsRec _______ n Combine Iran with the phrase nuclear weaponsRec _______ nuclear weaponsRec _______ n Use Advanced Search: n Combine Iran with the phrase nuclear weapons so that all the words appear in title of so that all the words appear in title of documentsRecords ___ documentsRecords ___
Unit 3 Web Search Engines n How People Search on the Web n What Are Search Engines? n How Search Engines Work n What’s in Search Engines? n How to Find Search Engines n Search Basics
Three Ways People Search n Surf u No direction, clear idea, issue u Consult people, news, magazines, Web for ideas n Browse u Have some idea, but vague, flexible, unclear u Consult reference sources, Web directories for direction, topic, theme n Search in-depth u Have defined topic, narrow focus u Consult databases, search and metasearch engines
What Are Search Engines n Software u Captures web sites, pages u Indexes full-text of web page u Provides interface to search web pages n Database u Large, billions of pages (unlike directories) u Computer built (robots, spiders) u No selectivity, no evaluation
How Search Engines Work n Spider comb, “capture” web pages n Software builds database n Words from web pages “indexed” n Search interface finds words on pages n Engine ranks, describes results n How engines and directories differ
Spiders Comb, Capture Web Pages n Software decides which web pages to collect n Spiders check for updated pages n Spiders remove dead sites
Spider Software Builds Database n Current web size: over 15 billion pages n No engine’s database covers it all u Google covers 22% (3.3 billion+) u AlltheWeb covers 21% (3.2 million+) u HotBot (Inktomi) covers 20% (3 billion+) u Teoma covers 10% (1.5 billion)
Words from Web Pages “Indexed” n “Index” means creating lists of words for database and linking words to web pages n Some index full text in document n Some index part of text u First 100 words in document u Words in abstract, or title of document n How indexing works affects search results
Search Interface Finds Web Pages n Provides keyword search box n Offers simple or advanced searching n Offers search options to affect results: u Most assume AND between words: Russian mafia u Most accept “quotes” to search a PHRASE: “Russian mafia” u Most allow FIELD searches : ti:Russian mafia n AlltheWeb AlltheWeb
Engine Ranks, Describes Results n Software lists most “relevant” items first u Word popularity: word repetitions, location u Site popularity – visitations of web site u Link popularity – how often link cited n Results described u Few words to a paragraph u Sometimes stars, other indicators of relevancy
How Engines and Directories Differ n Computers vs people u Engine spiders not editors select documents n Quantity vs quality u Engines big: want all, accept anything u Directories small: want “best” “important” n Technology vs human judgment u Engine software ranks, no human evaluation
Top Search Engines n Google3.3 billion+ n AlltheWeb3.2 billion+ n HotBot (Inktomi)3 billion+ HotBot n Teoma1.5 billion+
Directories, Search Engines and Defaults n If directories find little, they default to engines u Yahoo defaults to Google u Open Directory defaults to Google u Looksmart defaults to Inktomi (Hotbot) n Some search engines borrow directories u Google uses “Open Directory” Google n Learn the source of information when using a directory search box or search engine’s directory
Metasearch Engines Technologies that search several search engines at the same time
Pros n Increase results when search engine produce little n Save time by searching several engines at once n Show results of several engines on one page
Cons n Retrieve too many hits n Retrieve less relevant results u Do not individualize search syntax for each engine F Do not know whether to use and, AND, +, OR, or, cannot interpret phrases, etc. n Exclude certain large engines like Google
Top Metasearch Engines n Vivisimo u Categorizes results, narrows topics n Ez2find u Includes most major engines n Dogpile Dogpile u Refines results, covers major engines
What’s In Search Engines? n Business, commercial information n Organizational publications n Government resources n Magazine, newspaper excerpts n Some scholarly information u Teaching materials, unpublished articles n Books, articles whose copyright expired
What’s Not in Search Engines n Books under copyright u Most Fiction, non-fiction in existence n Journal, magazine, newspaper articles u Most current and past research n Reference books u Most recent, quality publications n In short u Bulk of human knowledge and research
How to Find Search Engines n Word of mouth, hearsay n Newspaper, magazine articles n Library web pages u Guides to search engines F HyperResearch HyperResearch
Search Basics n Identify, select keywords u Effects of internet use on children F Internet, children, effects n Combine keywords to focus results u Use OR, AND u Use phrase searching u Limit search to field like title or URL
Or Broadens n Retrieves an article if it contains either keyword n Use to connect similar words n Use to increase results
OR Expands Results n Internet15 n Internet OR Web 50 n Internet or Web or digital 90
AND Narrows n Use to connect two different ideas n AND between keywords means both terms must be in record n Use to decrease results
AND Reduces Results n Children2,956, 000 n Children AND Internet 1,756 n Children AND Internet AND Homework u Children internet homework 26
AND, OR and Search Engine Syntax n Use help or tips u AND, and, OR, or, “+” “-” ? u Does engine default to AND or OR? u Do AND or OR have to be upper case? u Use ADVANCED SEARCH to learn options n Is there a pull-down menu box? u AND can mean “All the words” u OR can mean “Any of the words”
Phrase Searching n Two words in consecutive order u Juvenile delinquency Russian mafia n How does computer recognize “phrase”? u Pull-down menu: EXACT PHRASE u Quotation marks: “drug abuse” n Phrase searches reduce hits, improve relevancy u Russian mafia 23,234 u “Russian mafia” 789
Field Searching n Common document “fields” u Author, Title, Subject, Abstract, Text, URL n Limits search to words in particular fields u Learn syntax:title: ti: url: F Ti: Russian mafiaurl: russianmafia u Use ADVANCED SEARCH u Use pull-down menu (in the title, in the URL) n Reduces hits, improves relevancy F Russian mafia (all the words) = 23,234 F Russian mafia (in title) = 254
Can You Find the Answers? n Connect to Google Google n Search for items on Iran Records 11,00,000 n Combine Iran with nuclear weaponsRec 790,000 n Combine Iran with the phrase nuclear weaponsRec 428,000 nuclear weaponsRec 428,000 n Use Advanced Search: n Combine Iran with the phrase nuclear weapons so that all the words appear in title of so that all the words appear in title of documentsRecords 274 documentsRecords 274
Homework n Use major search engines u Alltheweb, Google, Teoma n Use a metasearch engine- Vivisimo n Practice using AND, OR, phrase, field searching