Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Invisible Web Taly Sharon

Similar presentations

Presentation on theme: "The Invisible Web Taly Sharon"— Presentation transcript:

1 www.sharon-it.com1 The Invisible Web Taly Sharon

2 www.sharon-it.com2 Contents What is the Invisible Web? How big is the Invisible Web? Why is there an Invisible Web (and what’s in it)? Case study – patent search. How to find Invisible Web resources?

3 www.sharon-it.com3 What is the Invisible Web? Called also “Deep Web” in contrast to the “Surface Web”, which is the Visible Web. The term “Invisible Web” relates to content of pages that are available and accessible on the Web, but are not accessible and not indexed by the regular SEs, and includes mostly: –searchable databases –excluded pages These pages do not appear in the SEs search results. Finding information on the Invisible Web is available using direct access or using Specialized SEs. The extent of the Invisible Web is larger than the Visible Web.

4 www.sharon-it.com4 How Big is the Invisible Web? According to BrightPlanet study (2000): Deep web (Invisible web) is 500 times larger. –Number of search utilities: 45,000 search engines on the surface web. 200,000 searchable databases within the deep web. –Number of documents: 1 billion documents on the surface web. 550 billion documents within deep web. Deep web quality is 1,000 times greater. 95% of deep web information is publicly available.

5 www.sharon-it.com5 Invisible Web!? The Invisible Web google

6 www.sharon-it.com6 Is it Really that Big? Some argue that the Invisible Web is actually only 50-80 times bigger than the Visible Web.

7 www.sharon-it.com7 Why is there an Invisible Web? 1.Specialized searchable databases: –Dynamic pages –Require parameters and user judgment –Require user and password 2.Script-based pages: –Include “?” in their URL –Hazard to SEs – traps

8 www.sharon-it.com8 Why is there an Invisible Web? (2) 3.Real-time, constantly changing, content 4.Very large websites are partially indexed 5.Private/secret websites: –Internal companies portals –Excluded from SEs (Using robots.txt or similar)

9 www.sharon-it.com9 Why is there an Invisible Web? (3) 6.Multimedia (and other formats) files –Special formats –Example: PDF, DOC, PPT, GIF, Flash –Part of the SEs can’t, or won’t index.

10 www.sharon-it.com10 Why is there an Invisible Web? (4) 7.Additional reasons (mostly spam related or resources saving): –File size –Number of words in the page –Pages requiring cookies –Other spam characteristics –Multimedia files –Files and URLs with special characters.

11 www.sharon-it.com11 8.Non-linked pages (no incoming/inbound links) 9.Pages on servers with dynamic IP For example, 5% of the Internet is not connected! The Internet is partitioned and there exists “Dark Address Space”, or prefixes that are not reachable for one provider but that are available from other providers for long periods of time.  5% of the total number of prefixes in the Internet or tens of millions of end hosts. Source: Arbor Networks: Why is there an Invisible Web? (5)

12 www.sharon-it.com12 Example – Invisible Flash Website Example: “Leo baby Pot” fails in all search engines. Exists in Website: Reason: Flash Website!

13 www.sharon-it.com13 Leo baby Pot

14 www.sharon-it.com14

15 www.sharon-it.com15 The Invisible Web is Mostly Topic Databases The SEs know many databases, but not their content. Entry to many SEs is blocked Searching these databases requires entry via the site user interfaces, and often also registration/password/cookies.

16 www.sharon-it.com16 The Invisible Web is Mostly Topic Databases

17 www.sharon-it.com17 Specialized interface VS. The database’s user interface is specialized and designed to get the best results.

18 www.sharon-it.com18 Will the Invisible Web become Visible? Definitely! Using intelligent SEs. After a while, new information is being updated in SEs that can access the invisible web, and make it visible.

19 www.sharon-it.com19 Example: Patent/Trademark Search Searching Google: “Google patents or trademarks” gives various answers: –Patent dbs, patent disputes, Adware etc.. USPTO or patent DB gives an organized list.

20 www.sharon-it.com20 Is the Invisible Web Invisible? Google Yahoo! Google X

21 www.sharon-it.com21 Search for sources – not information! 1.Two-step searching in general search engines. –learn how to phrase queries well (vocabulary, anchor words, etc.) 2.Invisible Web Search Engines 3.Pathfinders: Directories and Guides 4.Expand from one website to others –Using link:, related:, directories, pearl culturing How to Find Invisible Web Resources?

22 www.sharon-it.com22 1. Two-Step Searching Use general search engine (such as Google) to search for a good database, then search for the information inside that database/website search engine. Example anchor words to add in the query (see next slide): –Database –Association –portal –encyclopedia –product review

23 www.sharon-it.com23 Anchor Words Use anchor words to find key websites and directories –Directory of –Center of –Industry portal –Guide –Database –Resource –Bibliography –Reference –Working group Examples: –“professional publications/journals” – “Industry portal” (or just portal) – “metasite resource” – pathfinder –allintitle: “directory of”

24 www.sharon-it.com24 Two Step Searching: Example 1 Find Videos of a Man blowing a Shofar. –Google Search: Video Search –Finds a list of Video Search Engines AltaVista Video Search Yahoo Video Search Singingfish Google Video Blinkx Video Search Etc. –Search for Shofar

25 www.sharon-it.com25 AltaVista Video: Shofar

26 www.sharon-it.com26 Two Step Searching: Example 2 מיהו איציק שלו נכתב בשיר " איציק, שמור מצל שיריך "? –חיפוש בגוגל: שירים אתר שירונט (אופציה שניה) –חיפוש מהתפריט, מילים מתוך השיר –קבלת "איציק מאנגר" –ציטוט: " איציק, שמור מצל שיריך, אל תהיה שוטה ושמע: תן בשיר טיפה של יין אך שמרהו מדמעה"

27 www.sharon-it.com27 Search for sources – not information! 1.Two-step searching in general search engines. –learn how to phrase queries well (vocabulary, anchor words, etc.) 2.Invisible Web Search Engines 3.Pathfinders: Directories and Guides 4.Expand from one website to others –Using link:, related:, directories, pearl culturing How to Find Invisible Web Resources?

28 www.sharon-it.com28 2. Invisible Web Search Engines Sherman-Price Invisible-web Directory * temporarily out of service CompletePlanet http://www.completeplanet.com Beaucoup http://www.beaucoup.com Turbo10

29 www.sharon-it.com29 Sherman-Price Invisible-Web directory

30 www.sharon-it.com30 Invisible-Web directory People Search – cont.

31 www.sharon-it.com31 Complete Planet

32 www.sharon-it.com32 Beaucoup Over 2500 engines The engines listed on the main site are "free information" sites -- a *lot* of information. Subject Directory/Annotated

33 www.sharon-it.com33 Beaucoup

34 www.sharon-it.com34 Turbo10

35 www.sharon-it.com35 Turbo10

36 www.sharon-it.com36

37 www.sharon-it.com37 Turbo10 – Edit Collections

38 www.sharon-it.com38 Search for sources – not information! 1.Two-step searching in general search engines. –learn how to phrase queries well (vocabulary, anchor words, etc.) 2.Invisible Web Search Engines 3.Pathfinders: Directories and Guides 4.Expand from one website to others –Using link:, related:, directories, pearl culturing How to Find Invisible Web Resources?

39 www.sharon-it.com39 3. Pathfinders Librarianas’ Index to the Internet MeL Michigan eLibrary Internet Scout Project Infomine More:

40 www.sharon-it.com40 Search for sources – not information! 1.Two-step searching in general search engines. –learn how to phrase queries well (vocabulary, anchor words, etc.) 2.Invisible Web Search Engines 3.Pathfinders: Directories and Guides 4.Expand from one website to others –Using link:, related:, directories, pearl culturing How to Find Invisible Web Resources?

41 www.sharon-it.com41 Quiz Can you find any information on the World Wide Web if you use a big enough search engine?

42 www.sharon-it.com42 Exercises 1.Search for a keyword in a big website (example "acid rain" –use several search engines. –How many results do you get? 2.What was the exchange rate of Canadian dollars (in us dollars) on 20 Sep 1991? 3.What was the value of Berkshire stock (BRK.A) on Nov. 12 1996? 4.Who wrote this (hint: book or paper): "Israeli Arabs will undoubtedly benefit from peace agreements between Israel and its neighbors“ 5.What is the zipcode of 22 Hamaayan in Givataim 6.מה אמר בנימין זאב הרצל באסיפה מוקדמת לקונגרס הציוני הראשון לפי העיתון "המגיד"? 7.Search for armchairs in website (try using site:). 8.Where would you search for people?

43 www.sharon-it.com43 References 12.html 12.html t/InvisibleWeb.html t/InvisibleWeb.html

Download ppt "The Invisible Web Taly Sharon"

Similar presentations

Ads by Google