Download presentation
Presentation is loading. Please wait.
Published byRaphael Humphrys Modified over 10 years ago
1
When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube yankee@infoserve.co.il
2
What is the Visible (Surface) Web? “It’s made up of HTML Web pages that the search engines have chosen to include in their indices. It’s no more complicated than that.” Sherman and Price.
3
What is the Visible (Surface) Web? A collection of webpages Searchable with “search engines” What you and I think of as the “Internet” is actually only a small portion of the Internet
4
What is the Visible (Surface) Web? High volume Mass appeal High value Small percentage of web content –Exception: Google books and Google Scholar
5
What is the Invisible Web? What search engines do not search Searchable Databases –Tens of Thousands –Accessible and searchable via the Internet –Results often dynamically generated in specific response to your request (eBay, MapQuest, etc.)
6
What is the Invisible Web? Excluded Pages –Excluded per search engine –Excluded per webpage by the owner of the site Typically databases –Businesses –Governments –Schools –Libraries –Associations
7
What is the Invisible Web? Academic Never been indexed or linked Uniquely generated pages Proprietary Confidential Protected by username & password Constitutes the majority of the webpages on the Internet
8
The Invisible Web is about 550 times larger than the visible web and is growing much faster The deep Web consists of about 91,000 terabytes.terabytes The surface Web is only about 167 terabytes1 The Library of Congress contains about 11 terabytes.Library of Congress Quality content is 1,000 to 2,000 times greater than surface web 95% of the Deep Web is accessible to public (no fees or subscription required) based on extrapolations from a study done at University of California, BerkeleyextrapolationsUniversity of California, Berkeley Visible vs. Invisible Web
9
Opaque Web Private Web Proprietary Web Pay per click What is on the Invisible Web
10
Requires payment Requires registration Dynamically generated Very new Website specifically stops spiders Why can’t Google find it?
11
Fixed, or Could be indexed, but is not Deemed not important enough Too new and therefore not linked Never makes max results cutoff No one ever linked or submitted URL Opaque Web
12
Private Web Deliberately excluded –Password –Special coding in website stops spiders Only for select individuals –Employees –Students –Researchers
13
Proprietary Web Protected –Password –Registration (N.Y. Times, eBay, banks, etc.) –Terms of Use Anyone can access if you –Pay –Register –Agree to terms
14
Pay per click Search Engine Marketing tools Ex: overture.com, FindWhat.comoverture.comFindWhat.com
15
When do I use …. Portal or Directory? Search Engine? Invisible Web?
16
Portal or Directory You have a general topic You know little about the subject You do not know keywords You want someone or something to have sorted out the junk You need an exploratory overview
17
Search Engine You are looking for something specific You have keywords You are pretty sure the information is –advertised or –otherwise generally disseminated
18
Tips for search engines Use a toolbar Determine the key words/phrases most likely to be in your document and nowhere else Learn and use Boolean Operators Scan results Question the results
19
Invisible Web You are pretty sure the information is in a specific database Need something authoritative Speed The information is dynamically generated You are familiar with the database –Search techniques –Protocols –Access requirements
20
Searching the Invisible Web Directories – subject guide compiled by human editors Specialized Search Engines –http://library.albany.edu/internet/choose.htmlhttp://library.albany.edu/internet/choose.html Special Databases ( Library of Congress, Library of Congress http://catalog.loc.gov LookSmart’s Find Articles (over 900 publications http://www.findarticles.com National Science Digital Library http://www.nsdl.org Singing Fish – audio and video http://www.singingfish.com
21
Special Databases Library of Congress –http://catalog.loc.govhttp://catalog.loc.gov LookSmart’s Find Articles (over 900 publications) –http://www.findarticles.comhttp://www.findarticles.com National Science Digital Library –http://www.nsdl.orghttp://www.nsdl.org Singing Fish – audio and video –http://www.singingfish.comhttp://www.singingfish.com
22
Types of Databases Information stored in tables (Access, Oracle, SQL Server, DB2) and accessible only by query. Examples: Phone books, People finders, Patents, laws Items for sale in a Web store or Web-based auctions Digital exhibits Multimedia and graphical files Stock and bond prices
23
Types of Hidden Info Pages in searchable databases: medical (WebMD.com), patent, scientific, legal (Lexis and Westlaw), reference Pages requiring login or registration: Social Sites, New York Times, web based applications, calendars, Google Docs, etc. Government publications or databases: ERIC, usa.gov Online databases: Gale Research PDF files, audio, video, any new format
24
More hidden stuff Dictionaries and thesauri Sites that require forms to be filled out (ex: travel direction, job hunting) Product catalogs and library catalogs Newspaper and magazine archives Dynamic web pages (ex: airline flight checkers, mapquest) Interactive tools (ex: calculators & measurement converters)
25
Access to invisible web is improving … Google Books http://books.google.com/ http://books.google.com/ Google Scholar http://scholar.google.co.il/ http://scholar.google.co.il/
26
Maybe Consider … Specialized Databases such as Dialog, Nexis Lexis, Factiva, etc. (not cheap) Use an Information Professional www.aiip.org www.aiip.org
27
To Conclude … Focus and continue doing what you do best and what you have been trained for and let an Information Professional find the info you need. He is trained to do it faster, more effectively and efficiently than you or one of your employees. (www.aiip.org)www.aiip.org
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.