Download presentation
Presentation is loading. Please wait.
Published byGrace Barker Modified over 9 years ago
1
“The Computer as an Educational Tool: Productivity and Problem Solving” ©Richard C. Forcier and Don E. Descy
2
Today
3
Why is this important? You are going to want information… *Reports, papers, presentations *Medical, family, jobs, personal Your students are going to want information… *Reports, papers, presentations, personal Most of what you want can’t be found using regular search techniques!
4
The Question
5
The Invisible Web Web sites that are hidden or are unable to be found or cataloged by regular search engines.
6
“ Public information on the deep Web is currently 400 to 550 times larger than the commonly defined World Wide Web.” (BrightPlanet, 2003)
7
“A full ninety-five per cent of the deep Web is publicly accessible information — not subject to fees or subscriptions..” (BrightPlanet, 2003)
8
The Invisible Web Facts 200,000+ Web sites 550 billion individual documents compared to the three billion of the surface Web Contains 7,500 terabytes of information compared to nineteen terabytes in the surface Web Total quality content is 1,000 to 2,000 times greater than that of the surface Web.
9
The Invisible Web Facts (2) Sixty of the largest sites collectively contain over 750 terabytes of information — They exceed the size of the surface Web forty times. Fastest growing category of new information on the Internet Fifty percent greater monthly traffic than surface sites
10
Invisible Web Facts (3) More highly linked to than surface sites Narrower, with deeper content, than conventional surface sites More than half of the content resides in topic-specific databases Content is highly relevant to every information need, market, and domain.
11
Invisible Web Facts (4) Not well known to the Internet- searching public
12
Searching, Searching, Searching Usually carried out using a “directory” or “search engine” Fast and efficient Misses most of what is out there 70% of searchers start from three sites (Nielson, 2003): Google,Yahoo, and MSN.
13
Searching Tools Directories Search engines
14
Directories Hand selected, evaluated, annotated Broad topics work best. Quality over quantity Location on list: May be paid
15
How Directories Work Directory Staff Web/Internet Find site Evaluate Catalog and Add Directory Server User BrowsingSearching Directory Index/Information
16
Directory Problems Done by humans Takes time No universal categories or cataloging system Misses the most information/sites
17
General Subject Directories “Yahoo” Biggest and most famous Often useful Information… jobs… travel… shopping… to… Yahoo.com
19
Search Engines Computer generated Narrower topics Quantity over quality Uses newer retrieval technologies Location on list: May be paid Google, Hotbot, Northern Light, AltaVista, etc.
20
How Search Engines Work Web/Internet Database Stores URL and Content User User Inputs Request Search Engine Matches Request to Content Spiders/Robots Comb Web
21
Search Engine Problems Spiders/robots don’t think. More likely to index sites with more links to them (popularity) More likely to index U.S. sites More likely to index commercial sites Sites pay for indexing/position.
22
At one time showed actual bid!
24
Finding Good Search Engines UC-Berkeley: Recommended Search Engines: http://www.lib.berkeley.edu/TeachingLib/ Guides/Internet/SearchEngines.html UC-Berkeley: The Best Search Engines (9/2003): #1 Google#3 Vivisimo #2 Teome#4 AllTheWeb
25
What do we miss? Library of Congress: 30 million+ documents ERIC databases Most daily newspapers Health and medical databases Museum and library collections The information you need?
26
Why are pages invisible? (1) 1. Searchable databases: Typing is required. Typing is required. Selection of option combination is required. Selection of option combination is required. **Pages are not available until asked for (e.g., Library of Congress). **Pages are not static but dynamic (may not exist until requested).
27
Why are pages invisible? (2) Search engines can’t handle “dynamic pages.” Search engines can’t handle “input boxes.”
32
Why are pages invisible? (3) 2. Password or login required: (Spiders do not know passwords or login IDs.) 3. Non-HTML pages: –PDF, Word, Shockwave, Flash... –Some search engines may find them: e.g., Google, AltaVista
33
Why are pages invisible? (4) 4. Script-based (computer generated) pages: –Create all or part of a Web page –Contain “?” in URL –Spiders programmed to back off –http://calver.org/search/file/ship (yes!) –http://calver.org/search?title=plane (no)
34
Sites to Check
35
Finding Invisible Information (1) “Librarians’ Index” Compiled by librarians in the “information supply business” Highest-quality sites only Reliable, annotated www.lii.org
36
Finding Invisible Information (2) “About” 2,400,000+ resources Wide variety of subjects: Teens, religion, spirituality, shopping About.com
37
Finding Invisible Information (3) “direct search” “Data not easily or entirely searchable/accessible from general search tools.” www.freepint.com/gary/direct.htm
39
Finding Invisible Information (4) “The Invisible Web Catalog” 10,000+ searchable databases Quick search, “Hot List” Sort alphabetically or by score (relevance) www.profusion.com
41
Finding Invisible Information (5) www. invisible-web.net
42
Finding Invisible Information (6) “IncyWincy” Over 100,000 databases Many links to other search engines www.incywincy.com
43
Finding Invisible Information (7) “CompletePlanet” 103,000+ databases and specialty search engines Some “surface” searching www.completeplanet.com
44
Finding Invisible Information (8) Some are research oriented. “Infomine” Infomine.ucr.edu/ “Academic Info” www.academicinfo.net
47
So… What To Do...
48
Questions? PowerPoint available at descy.net
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.