Search Engines. 2 What Are They?  Four Components  A database of references to webpages  An indexing robot that crawls the WWW  An interface  Enables.

Slides:



Advertisements
Similar presentations
Getting Your Web Site Found. Meta Tags Description Tag This allows you to influence the description of your page with the web crawlers.
Advertisements

Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
Chapter 5: Introduction to Information Retrieval
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Natural Language Processing WEB SEARCH ENGINES August, 2002.
Page 1 June 2, 2015 Optimizing for Search Making it easier for users to find your content.
Information Retrieval in Practice
Search Engines and Information Retrieval
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
Search engines. The number of Internet hosts exceeded in in in in in
Searching the World Wide Web From Greenlaw/Hepp, In-line/On-line: Fundamentals of the Internet and the World Wide Web 1 Introduction Directories, Search.
Overview of Search Engines
WEB SCIENCE: SEARCHING THE WEB. Basic Terms Search engine Software that finds information on the Internet or World Wide Web Web crawler An automated program.
Types of behaviors of search engines uses
 Popularity of browsers:  Popularity of search.
Internet Research Search Engines & Subject Directories.
 Search engines are programs that search documents for specified keywords and returns a list of the documents where the keywords were found.  A search.
SEARCH ENGINE By Ms. Preeti Patel Lecturer School of Library and Information Science DAVV, Indore E mail:
Databases & Data Warehouses Chapter 3 Database Processing.
SEO for Web Designers By Alfredo Palconit, Jr.. I. What is SEO? A process of improving a site’s traffic and rank from organic search engine results. Notes:
1 Web Developer Foundations: Using XHTML Chapter 11 Web Page Promotion Concepts.
Introductions Search Engine Development COMP 475 Spring 2009 Dr. Frank McCown.
HOW SEARCH ENGINE WORKS. Aasim Bashir.. What is a Search Engine? Search engine: It is a website dedicated to search other websites and there contents.
Search Engines and Information Retrieval Chapter 1.
Search Engine Marketing Shelly Brown Director of Web Services Southwest Baptist University.
ECommerce Marketing Strategies Rae Montgomery May 16-20, 2005 Oklahoma City, OK.
Search Engines. Internet protocol (IP) Two major functions: Addresses that identify hosts, locations and identify destination Connectionless protocol.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Courtney Forsmann IT Help Desk Manager Lewis-Clark State College October 1, 2014.
آموزش طراحی وب سایت جلسه پانزدهم – بهینه سازی برای موتور جستجو تدریس طراحی وب برای اطلاعات بیشتر تماس بگیرید تاو شماره تماس: پست.
 Popularity of browsers:  Popularity of search.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Hotbot A Search Engine Case Study. Introduction  Owned by Terra/Lycos.  One of the largest web search engines.  Uses the Inktomi database combined.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
WHAT IS A SEARCH ENGINE A search engine is not a physical engine, instead its an electronic code or a software programme that searches and indexes millions.
Overview What is a Web search engine History Popular Web search engines How Web search engines work Problems.
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
Search Engine Marketing Gay, Charlesworth & Esen Chapter 6.
Do's and don'ts to improve your site's ranking … Presentation by:
Search Engine Optimization & Pay Per Click Advertising
Search Engines AGCM 4143 Electronic Communications in Agriculture.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
The Internet 8th Edition Tutorial 4 Searching the Web.
McLean HIGHER COMPUTER NETWORKING Lesson 7 Search engines Description of search engine methods.
Chapter 6: Information Retrieval and Web Search
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Search Engine Marketing SEM = Search Engine Marketing SEO = Search Engine Optimization optimizing (altering/changing) your page in order to get a higher.
Search Tools and Search Engines Searching for Information and common found internet file types.
1 University of Qom Information Retrieval Course Web Search (Spidering) Based on:
Unit 1—Computer Basics Lesson 3 The Internet and Research.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
1 CS 430: Information Discovery Lecture 18 Web Search Engines: Google.
Online Database vs. Web Search Engines 571-Information Access and Retrieval.
CP3024 Lecture 12 Search Engines. What is the main WWW problem?  With an estimated 800 million web pages finding the one you want is difficult!
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
Information Retrieval in Practice
Information Architecture
Search Engine Optimization
Search Engine Architecture
Search Engines & Subject Directories
Information Retrieval
Searching EIT, Author Gay Robertson, 2017.
Introduction to Information Retrieval
Search Engines & Subject Directories
Search Engines & Subject Directories
Information Retrieval and Web Design
Presentation transcript:

Search Engines

2 What Are They?  Four Components  A database of references to webpages  An indexing robot that crawls the WWW  An interface  Enables users to submit queries  Displays results  Information retrieval system  Each is unique, but are mostly the same

3 Database  Where user's query is matched  Contains only essential parts of pages  Only includes pages that were indexed  Search engines are always out of date

4 Web Crawler  A robot that follows links  Records data it finds  Words in the webpage  Metadata  ALT attributes in IMG tags  Robot Exclusion Protocol Robot Exclusion Protocol

5 Search Engine Interfaces  Gathers input from users  Presents results from the IR system  Often in ranked order

6 Search Engine Interfaces  Input  User requirements  Search expression, search limits  Presentation style  Presentation format, search type

7 Search Engine Interfaces  Output  Results  Descriptions  Clusters

Example: Visual Clustering Interface 8

Large Example: Clustering Visual Interface 9 Grokker

10 Search Term Matching  Trying to find a match in the database  Two main methods  Keyword searching  Matching single terms, computing cosine  Concept-based searching  Examining clusters of words  Attempt to determine meaning of query and find records related to that meaning

11 Basic IR Features  Boolean operators  AND, OR, NOT, grouping  Extended operators  NEAR, ADJACENT, (")  Stop word deletion  Stemming  Searching in fields (e.g. host)

12 Ranked Output  Most SEs produce ranked lists by applying simple rules:  Early words are more important  Title is very important  Frequency of occurrence matters for some  Infrequent words matter more  Modification date  Google is different: Google  PageRank TM method based on popularity  Links as money

13 Googlebombing  Google spoofed from the lecture list Google spoofed  first hit from 1992 first hit  Official GoogleBlog explanation Official GoogleBlog explanation

14 What about the Invisible Web?  Also known as the Deep Web  Documents that are on the WWW but not indexed by Search Engines  Some are available only by submitting forms  Some are not generally accessible (in subnets)  Some are not in (X)HTML format

15 The Invisible Web Isn't So Invisible Anymore…  More search engines parse non- (X)HTML now than before  Because of awareness of the problem companies are making more content available using  Stable URLs  Robot-friendly sitemaps  But much content is still not indexed

16 But, there's still plenty of important yet invisible docs  How to find them?  Many of them are in databases  No one search engine covers everything  Use database tools from the U.'s library  Especially for research articles  Use multiple search engines or a meta- crawler  dogpile is the most famous

Search Engines A Summary of Practical Advice

18 How To Succeed With SEs  As a surfer:  If you don't know what you are looking for  Use multiple SEs, or a meta-crawler  Search within results  If you don't know what you are looking for  Use multiple SEs, or a meta-crawler  Use Boolean expressions or search within results  Consider specialized engines

19 How To Succeed With SEs  As a creator:  HTML level  Always use ALT attributes with, etc.  Avoid frames  Make it easier to index  Don't expect SEs to find your pages  Make links between your pages  Use metadata  Informal:  Formal: Dublin core and others  Increase your pages popularity  Don’t use systematic reciprocal linking: rings, exchanges, lists  Page Rank™ is inversely proportional to outdegree

20 How To Succeed With SEs  As a creator (cont.)  For surfers:  Use  Don't expect surfers to start at top of your hierarchy  Don't rely on a hierarchy  Include a context map near the top of each page  Don't use frames  Think through dynamic content implications  Stickiness… is for another day