Presentation is loading. Please wait.

Presentation is loading. Please wait.

Web Search Architecture & The Deep Web

Similar presentations


Presentation on theme: "Web Search Architecture & The Deep Web"— Presentation transcript:

1 Web Search Architecture & The Deep Web

2 Standard Web Search Engine Architecture
Check for duplicates, store the documents crawl the web DocIds user query create an inverted index Inverted index Search engine servers Show results To user

3 Web Crawlers How do the web search engines get all of the items they index? Main idea: Start with known sites Record information for these sites Follow the links from each site Record information found at new sites Repeat Automated programs called spiders do this continuously

4 Searches per Day

5 Where the US goes for search

6 How To Search the World Wide Web
Search Engines Good for finding individual pages Some examples of Search Engines: Google ( Windows Live (beta ( Yahoo (

7 How to Search the World Wide Web
Subject or Web Site Directories Web directories are organized Web site listings put together by human reviewers Good if you wish to look up a topic and find the related web sites to that topic Some examples of Subject or Web Site Directories Yahoo ( LookSmart ( Open Directory (

8 How to Search the World Wide Web
Meta Search Engines Also know as multi-threaded search engine Allows the user to search multiple databases simultaneously, via a single interface Presents a summary of the collected results from other search engines and directories Some examples of Meta Search Engines: Metacrawler ( Dogpile ( Mamma (

9 Spiders Can’t Find Everything
Spiders can follow static links, catalog words and word relationships in a text document They can’t form and submit queries to databases Dynamically generated pages are invisible to spiders They can’t discern the subject matter of a picture or song Files that that spiders can’t interpret are said to belong to the Deep Web

10 The Deep Web The content of databases accessible on the Web
Databases contain information stored in tables Information stored in databases is accessible only by query. This is distinct from static, fixed Web pages, which are documents that can be accessed directly. A significant amount of valuable information on the Web is generated from databases. the deep Web may be 500 times larger than the fixed Web Non-textual files multimedia files, graphical files, software

11 What is in the Deep Web? Information that is likely to be stored only in a database is a part of the deep Web This can include large listings of things with a common theme. All directories are part of the deep Web phone books "people finders" such as lists of doctors or lawyers patents dictionary definitions items for sale in a Web store or on Web-based auctions multimedia and graphical files

12 What is in the Deep Web? Information that is new and dynamically changing in content will appear on the deep Web. Look to the deep Web for late breaking items, such as: news job postings available airline flights, hotel rooms, etc. stock and bond prices, market averages, etc.

13 Finding Sources of Deep Web Content
CompletePlanet.com Offers searchable access to thousands of databases for results that include summaries from the retrieved site; also offers the LexiBot software for accessing deep Web content Invisible-web.net Directory of high quality deep Web databases maintained by Gary Price and Chris Sherman ProFusion.com Meta-engine that also offers searches of multiple "vertical search sources" on the deep Web organized into topical categories lii.org Librarians' Index to the Internet (LII) is a searchable, annotated subject directory of more than 14,000 Internet resources selected and evaluated by librarians for their usefulness to users of public libraries. LII is used by both librarians and the general public as a reliable and efficient guide to Internet resources.

14 Search Strategy Can you think of distinctive, unique words or phrases for your search? Search engine, phrase in quotes Is your focus broad or narrow? Directories do well on either Search engines don't do well on overviews If hits are few, could synonyms help? Do you not know enough about a topic to really even begin your search? Drill down in a subject directory


Download ppt "Web Search Architecture & The Deep Web"

Similar presentations


Ads by Google