Download presentation
Presentation is loading. Please wait.
Published byColeen Powell Modified over 9 years ago
1
Lecture Number Two Web Searching and How it Evolved
2
The web is basically a jungle!
3
What do you mean “jungle”? More than 2 billion pages of accessible into No consistent id system (like _______ ) for books No cataloguing principle (like __________ or _________________)
4
And furthermore… Many documents don’t name the author and you can’t tell how current the information is Most engines index every word so too much comes back And remember: You’re NOT searching live, you’re looking at a fixed database compiled before you searched
5
Two Ways to Find Things Search Engines (all electronic) Subject Directories (electronic search of human-maintained content)
6
Search Engines electronic index every page of a Web site
7
3 Types of Search Engines -- global : search _______________ --subject-specific: search only within a defined area --meta-engines (aka _________) : - Combine results from several search engines, ranked by relevance
8
Examples of Search Engines Google Altavista Alltheweb
9
A History of Search Engines 1990 to the present
10
Precursors to Search Engines Before the internet there was…. 1969 New York Times Project
11
1990 ARCHIE- the first search engine _________ University, Alan Emtage called Archie because _______________ search term had to match exactly
12
1993 - EXCITE -__________ undergraduates -different from Archie because____________
13
1993 (cont) WORLD WIDE WEB WANDERER- first “bot” (robot) counted ________ and eventually _______ - bots evolved into Spiders catalogued links to make searchable index
14
Some Spiders JumpStation World Wide Web Worm RSBE ( __________________________) first to rank results by relevance
15
But spiders caused trouble Because they _______________________ Jump Station famous for that
16
1994 - Web Crawler - first to index TEXT on webpages (rather than just url/page title) - Yahoo! 2 Stanford undergrads favorite pages - the first __________ - Infoseek and Lycos (Lycos reputedly best for technical searches)
17
1995 ____________: (DEC)– December - got big fast -lots of firsts: natural language queries boolean techniques search tips
18
1996 Hot Bot -indexes up to _________ pages/day - until _________, the most powerful engine - has boasted it can index the entire web Metacrawler -- 1st meta-engine
19
1999 Google! first engine to pass a billion pages reports pages ranked by number of hits
20
Of these - Google commercially dominant (about 75% of most websites’ external referrals) - Microsoft wants to buy it, but can’t because of antitrust laws - Yahoo owned as of 2003 Overture,Alltheweb,Alta Vista, Inktomi
21
Privacy and Google - every time you access a page, you get a cookie on your hard drive, recording your IP address, the date/time, your search terms, your browser configuration - the cookies are basically “immortal” (expire ________)
22
How is this info used? Google customizes your search results using your IP number
23
The latest on google and privacy Google changed its privacy policy in July 2004 now they - pool the information they collect on you from all their various services. -may keep this information indefinitely -may give this information to whomever they wish.
24
if they "have a good faith belief that access, preservation or disclosure of such information is reasonably necessary to protect the rights, property or safety of Google, its users or the public."
25
Focus -Before 911, privacy issues turned on consumer protection - but now government is thinking about looking at your information in the name of national security
26
TIA (total information awareness) Goal: To anticipate terrorist activities What: credit card, travel,. Email, telephone records 2002- Google chief declined comment when asked by NY Times if google had been subpoenaed to turn information it gathered over
27
Subject Directories human-compiled and maintained (review: search engines are ______) index only home pages (review: search engines index______)
28
(Dis)advantages of Subject Directories use heirarchies Smaller Content may be annotated But quality control varies
29
Virtual Libraries (some SDs) Created and maintained by info professionals Internet Public Library Resource Discovery Network ( from Britain)
30
Subject Directory Approaches General - searching from one site Clearinghouses – searching from multiple sites
31
Examples of Subject Directories general - www. yahoo.com - www.looksmart.com Clearing houses - Argus (www.clearinghouse.net) - About.com - Virtual Library (www.Vlib.org)
32
Search Tips Get specific by using Boolean Logic AND OR NOT (often ___ and ____)
33
A Boolean Example 1. Tupac Amaru AND Peru 2.Tupac Amaru OR MRTA (movimiento revolucionaro tupac amaru) 3.Tupac Amaru NOT Shakur (the rap singer killed in 1996) To be exact, use quotes “Tupac Amaru”
34
More Search Tips Use Wildcards like * # ? for roots like psychol* for variant spellings like color colo*r
35
More Search Tips Many urls are predictable- so guess first utampa.ed Don’t look at every returned page
36
Use your Tools Pay attention to the relevance rankings some engines give you Organize your bookmarks
37
The Invisible Web: What Most Search Engines don’t find Specialized databases (7,000+)
38
What’s a Specialized Database Searchable indexes of subjects like email addresses, magazine archives,government data files, census info, medcal info, etc. 2 types: full text and bibliographic
39
How is that different from a subject directory? Subject dir are collections of urls Specialized dbs are collections of actual data/information
40
Why they aren’t found - search engines are databases themselves- programming one database to search another is difficult -specialized databases often require search forms -databases don’t rely on fixed urls -text in databases in form not usable by search engines (Like adobe pdf)
41
What can you do? pick your search engine carefully google for instance lets you use the keyword database plus the subject you want
42
Some helpful sites Beaucoup Librarian's Index to the Internet Gary Price
43
Two kinds of web data bases full text -- FindLaw (yahoo)yahoo bibliographic -- medline (librarian's index to the internetlibrarian's index to the internet
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.