Download presentation
Presentation is loading. Please wait.
1
Sydney, Australia January 23, 2003 The Invisible Web Chris Sherman Editor, SearchDay SearchEngineWatch.com Information Online 2003
2
Sydney, Australia January 23, 2003
3
Overview How Search Engines Work What is the Invisible Web? Tactics for Searching the Invisible Web Future Trends
4
Sydney, Australia January 23, 2003 The Parts of a Search Engine Three main parts of every search engine: –The Crawler (aka spider) –The Indexer –The Search Engine Database
5
Sydney, Australia January 23, 2003 Your Browser How Search Engines Work The Web URL1 URL2 URL3 URL4 Crawler Indexer Search Engine Database Eggs? Eggs. Eggs - 90% Eggo - 81% Ego- 40% Huh? - 10% All About Eggs by S. I. Am
6
Sydney, Australia January 23, 2003 How Crawlers Work Crawlers are like hyper- caffeineated browsers Seeded with a set of URLs Download Web pages, then: –Extract all links on every page for further crawling –Hand the page off to the indexer
7
Sydney, Australia January 23, 2003 The Bow Tie Model 30% in the core 24% origination pages 24% termination pages 22% disconnected pages -- these are effectively invisible to search engines Source: IBM
8
Sydney, Australia January 23, 2003 What is the Invisible Web? “Stuff” that search engine crawlers (spiders) can not -- or will not -- add to their databases 2 to 50 times larger than the visible Web Resources often much higher quality than the visible Web
9
Sydney, Australia January 23, 2003 What is the Invisible Web? Certain file formats (PDF, Flash, Office files, streaming media) –Why? They aren’t HTML text Most real-time data (stock quotes, weather, airline flight info) –Why? Ephemeral & storage intensive
10
Sydney, Australia January 23, 2003 What is the Invisible Web? Dynamically generated pages (cgi, javascript, asp, or most pages with “?” in URL) –Why? Spider traps Web accessible databases –Why? Spiders can’t type
11
Sydney, Australia January 23, 2003 The Opaque Web Visible pages “hidden” behind dynamic navigation codes Mostly graphic, non-text pages “Disconnected” pages
12
Sydney, Australia January 23, 2003 The URL Test
13
Sydney, Australia January 23, 2003 The URL Test
14
Sydney, Australia January 23, 2003 The URL Test
15
Sydney, Australia January 23, 2003 The URL Test
16
Sydney, Australia January 23, 2003 The URL Test
17
Sydney, Australia January 23, 2003 The URL Test
18
Sydney, Australia January 23, 2003 Invisible Web Searching: Core Tactics The first step in determining the best approach for searching the Invisible Web is to have a clear idea of what you’re seeking. Limit your search to appropriate tools for the particular type of information you’re looking for.
19
Sydney, Australia January 23, 2003 Use Invisible Web Pathfinders Intelliseek –http://www.invisibleweb.com Invisible-web.net –http://www.invisible-web.net/ Librarians’ Index to the Internet –http://www.lii.org
20
Sydney, Australia January 23, 2003 Finding Non-HTML File Formats Google & AlltheWeb: use the filetype operator –filetype:pdf –filetype:doc Use specialized engines –searchpdf.adobe.com –Research Index
21
Sydney, Australia January 23, 2003 Finding Real Time Information Underground Weather Google News Search Yahoo Finance J-Track Spacecraft Tracker
22
Sydney, Australia January 23, 2003 Finding Images Google/FAST/AltaVista Image Search Google Catalogs Visoo Webseek @ Columbia
23
Sydney, Australia January 23, 2003 Finding Streaming MediaFiles Speechbot Singingfish MSN Music British Pathe WindowsMedia.com v.9 player
24
Sydney, Australia January 23, 2003 Future Trends: The Invisible Web Revealed More “difficult” content indexed –Flash, dynamic content “Data centric” search engines –ResearchIndex Agent-brokered database search Form crawlers
25
Sydney, Australia January 23, 2003 Conclusion Searching the Invisible Web isn’t hard. It just takes a different mindset. It’s crucial to develop your own, personal collection. Expect the unexpected: the boundary between visible and invisible is changing as we speak.
26
Sydney, Australia January 23, 2003 More Info CyberAge Books 0-910965-51-X http://www.invisible-web.net
27
Sydney, Australia January 23, 2003 More Ranting SearchDay Newsletter –http://searchenginewatch.com/searchday/ Searchwise –http://www.searchwise.net csherman@searchwise.net
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.