Presentation is loading. Please wait.

Presentation is loading. Please wait.

Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.

Similar presentations


Presentation on theme: "Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available."— Presentation transcript:

1 Basic Web Applications 2

2 Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available on the web –most of them titled according to the notion of their author –almost all of them sitting on servers with hidden names. –We use search engines get information on those pages.

3 what is Internet Search Engine Special sites on the Web that are designed to help people find information stored on other sites. Special sites on the Web that are designed to help people find information stored on other sites. various search engines use different ways to work, but they all perform three basic tasks: various search engines use different ways to work, but they all perform three basic tasks: –Select pieces of the Internet -- based on important words. –Keep an index of the words they find, and where they find them. –Allow users to look for words or combinations of words found in that index.

4 Search Engine 1- Search engines use software called spiders, which comb the internet looking for documents and their web addresses 2- Spreading out across the most widely used portions of the Web. the process is called Web crawling

5 Search Engine The documents and web addresses are collected and sent to the search engine's indexing software

6 Search Engine The indexing software extracts information from the documents, storing it in a database. (every words or titles)

7 When you perform search by entering keywords, the database is searched for documents that match.

8 Search Engine

9 In Google-  multiple spiders at one time. In Google-  multiple spiders at one time. Each spider --- > keep 300 connections to Web pages open at a time. Each spider --- > keep 300 connections to Web pages open at a time. The system crawl over 100 pages per second-  around 600 kilobytes of data each second. The system crawl over 100 pages per second-  around 600 kilobytes of data each second. to minimize delays use its own DNS. to minimize delays use its own DNS.

10 Search Engine Google spider take note of two things: Google spider take note of two things: –The words within the page –Where the words were found –The frequency and location of keywords within the Web page –How long the Web page has existed –The number of other Web pages that link to the page in question

11 Search Engine Lycos: Lycos: –keep track of the words in the title, subheadings –Links-  the 100 most frequently used words on the page –each word in the first 20 lines of text. Each commercial search engine --  different formula for assigning weight to the words in its index.

12 Meta Tags Meta tags -  key words and concepts-  under which the page will be indexed. Meta tags -  key words and concepts-  under which the page will be indexed. Meta tags can guide the search engine. Meta tags can guide the search engine. There is of course careless page owner might ( irrelevant meta tags). There is of course careless page owner might ( irrelevant meta tags).

13 Meta Tags To protect against this: To protect against this: –spiders correlate Meta tags with page content -  rejecting the not matched meta tags. –The owner of a page may or may not wants its page to be included in the results of a search engine's activities. –Exclusion protocol was developed and implemented in the meta-tag section at the beginning of a Web page to tell a spider to leave the page alone.

14 Building the Index Once the spiders finish finding information on Web pages, the search engine must store the information in a useful way: Once the spiders finish finding information on Web pages, the search engine must store the information in a useful way: –The information stored with the data (for simplicity word + url) –The method by which the information is indexed

15 Building the Index Different search engines Different search engines –will produce different lists –pages presented in different orders.

16 Building the Index Indexing process allows information to be found as quickly as possible. Indexing process allows information to be found as quickly as possible. One ways to build index is to build a hash table. One ways to build index is to build a hash table. In hashing, a formula is applied to attach a numerical value to each word. In hashing, a formula is applied to attach a numerical value to each word.

17 Building the Index In English, the "M" section of the dictionary is much thicker than the "X" section -finding a word beginning with a very "popular" letter tae time. In English, the "M" section of the dictionary is much thicker than the "X" section -finding a word beginning with a very "popular" letter tae time. Hashing evens out the difference, and reduces the average time it takes to find an entry. Hashing evens out the difference, and reduces the average time it takes to find an entry. It also separates the index from the actual entry. It also separates the index from the actual entry.

18 Building the Index The hash table contains the hashed number which Point to the actual data, which is sorted in efficiently way. The hash table contains the hashed number which Point to the actual data, which is sorted in efficiently way.

19 Building a Search Searching through an index involves a user building a query and submitting it through the search engine. Searching through an index involves a user building a query and submitting it through the search engine. Boolean operators: Boolean operators: –AND -. Some search engines substitute the operator "+" for the word AND. –OR - At least one of the terms joined by "OR" must appear in the pages or documents.

20 Building a Search NOT - must not appear in the pages or documents. Some search engines substitute the operator "-" for the word NOT. NOT - must not appear in the pages or documents. Some search engines substitute the operator "-" for the word NOT. FOLLOWED BY - One of the terms must be directly followed by the other. FOLLOWED BY - One of the terms must be directly followed by the other. NEAR - One of the terms must be within a specified number of words of the other. NEAR - One of the terms must be within a specified number of words of the other. Quotation Marks - The words between the quotation marks are treated as a phrase, and that phrase must be found within the document or file Quotation Marks - The words between the quotation marks are treated as a phrase, and that phrase must be found within the document or file

21 Overall view


Download ppt "Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available."

Similar presentations


Ads by Google