Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sydney, Australia January 23, 2003 The Invisible Web Chris Sherman Editor, SearchDay SearchEngineWatch.com Information Online 2003.

Similar presentations


Presentation on theme: "Sydney, Australia January 23, 2003 The Invisible Web Chris Sherman Editor, SearchDay SearchEngineWatch.com Information Online 2003."— Presentation transcript:

1 Sydney, Australia January 23, 2003 The Invisible Web Chris Sherman Editor, SearchDay SearchEngineWatch.com Information Online 2003

2 Sydney, Australia January 23, 2003

3 Overview How Search Engines Work What is the Invisible Web? Tactics for Searching the Invisible Web Future Trends

4 Sydney, Australia January 23, 2003 The Parts of a Search Engine Three main parts of every search engine: –The Crawler (aka spider) –The Indexer –The Search Engine Database

5 Sydney, Australia January 23, 2003 Your Browser How Search Engines Work The Web URL1 URL2 URL3 URL4 Crawler Indexer Search Engine Database Eggs? Eggs. Eggs - 90% Eggo - 81% Ego- 40% Huh? - 10% All About Eggs by S. I. Am

6 Sydney, Australia January 23, 2003 How Crawlers Work Crawlers are like hyper- caffeineated browsers Seeded with a set of URLs Download Web pages, then: –Extract all links on every page for further crawling –Hand the page off to the indexer

7 Sydney, Australia January 23, 2003 The Bow Tie Model 30% in the core 24% origination pages 24% termination pages 22% disconnected pages -- these are effectively invisible to search engines Source: IBM

8 Sydney, Australia January 23, 2003 What is the Invisible Web? “Stuff” that search engine crawlers (spiders) can not -- or will not -- add to their databases 2 to 50 times larger than the visible Web Resources often much higher quality than the visible Web

9 Sydney, Australia January 23, 2003 What is the Invisible Web? Certain file formats (PDF, Flash, Office files, streaming media) –Why? They aren’t HTML text Most real-time data (stock quotes, weather, airline flight info) –Why? Ephemeral & storage intensive

10 Sydney, Australia January 23, 2003 What is the Invisible Web? Dynamically generated pages (cgi, javascript, asp, or most pages with “?” in URL) –Why? Spider traps Web accessible databases –Why? Spiders can’t type

11 Sydney, Australia January 23, 2003 The Opaque Web Visible pages “hidden” behind dynamic navigation codes Mostly graphic, non-text pages “Disconnected” pages

12 Sydney, Australia January 23, 2003 The URL Test

13 Sydney, Australia January 23, 2003 The URL Test

14 Sydney, Australia January 23, 2003 The URL Test

15 Sydney, Australia January 23, 2003 The URL Test

16 Sydney, Australia January 23, 2003 The URL Test

17 Sydney, Australia January 23, 2003 The URL Test

18 Sydney, Australia January 23, 2003 Invisible Web Searching: Core Tactics The first step in determining the best approach for searching the Invisible Web is to have a clear idea of what you’re seeking. Limit your search to appropriate tools for the particular type of information you’re looking for.

19 Sydney, Australia January 23, 2003 Use Invisible Web Pathfinders Intelliseek –http://www.invisibleweb.com Invisible-web.net –http://www.invisible-web.net/ Librarians’ Index to the Internet –http://www.lii.org

20 Sydney, Australia January 23, 2003 Finding Non-HTML File Formats Google & AlltheWeb: use the filetype operator –filetype:pdf –filetype:doc Use specialized engines –searchpdf.adobe.com –Research Index

21 Sydney, Australia January 23, 2003 Finding Real Time Information Underground Weather Google News Search Yahoo Finance J-Track Spacecraft Tracker

22 Sydney, Australia January 23, 2003 Finding Images Google/FAST/AltaVista Image Search Google Catalogs Visoo Webseek @ Columbia

23 Sydney, Australia January 23, 2003 Finding Streaming MediaFiles Speechbot Singingfish MSN Music British Pathe WindowsMedia.com v.9 player

24 Sydney, Australia January 23, 2003 Future Trends: The Invisible Web Revealed More “difficult” content indexed –Flash, dynamic content “Data centric” search engines –ResearchIndex Agent-brokered database search Form crawlers

25 Sydney, Australia January 23, 2003 Conclusion Searching the Invisible Web isn’t hard. It just takes a different mindset. It’s crucial to develop your own, personal collection. Expect the unexpected: the boundary between visible and invisible is changing as we speak.

26 Sydney, Australia January 23, 2003 More Info CyberAge Books 0-910965-51-X http://www.invisible-web.net

27 Sydney, Australia January 23, 2003 More Ranting SearchDay Newsletter –http://searchenginewatch.com/searchday/ Searchwise –http://www.searchwise.net csherman@searchwise.net


Download ppt "Sydney, Australia January 23, 2003 The Invisible Web Chris Sherman Editor, SearchDay SearchEngineWatch.com Information Online 2003."

Similar presentations


Ads by Google