Hotbot A Search Engine Case Study. Introduction  Owned by Terra/Lycos.  One of the largest web search engines.  Uses the Inktomi database combined.

Slides:



Advertisements
Similar presentations
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
Advertisements

4.01 How Web Pages Work.
Search Techniques Boolean Logic and Keyword Searching.
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
IS530 Lesson 12 Boolean vs. Statistical Retrieval Systems.
Search Engines. 2 What Are They?  Four Components  A database of references to webpages  An indexing robot that crawls the WWW  An interface  Enables.
“ The Anatomy of a Large-Scale Hypertextual Web Search Engine ” Presented by Ahmed Khaled Al-Shantout ICS
Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web.
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
James Tam Computer Searches Concepts covered What is a search engine and how do they work? General search tips The Big Six search engines Other search.
What is the Internet? The Internet is a computer network connecting millions of computers all over the world It has no central control - works through.
Search engines. The number of Internet hosts exceeded in in in in in
Searching and Researching the World Wide: Emphasis on Christian Websites Developed from the book: Searching and Researching on the Internet and World Wide.
Introduction Web Development II 5 th February. Introduction to Web Development Search engines Discussion boards, bulletin boards, other online collaboration.
Searching the World Wide Web From Greenlaw/Hepp, In-line/On-line: Fundamentals of the Internet and the World Wide Web 1 Introduction Directories, Search.
Topics in this presentation: The Web and how it works Difference between Web pages and web sites Web browsers and Web servers HTML purpose and structure.
Overview of Search Engines
What’s The Difference??  Subject Directory  Search Engine  Deep Web Search.
Chapter 10 Publishing and Maintaining Your Web Site.
An Application of Graphs: Search Engines (most material adapted from slides by Peter Lee) Slides by Laurie Hiyakumoto.
An introduction to databases In this module, you will learn: What exactly a database is How a database differs from an internet search engine How to find.
Databases & Data Warehouses Chapter 3 Database Processing.
Records and Information Management IT - Enterprise Content Management SPIDR II Global Features Reference Guide April 2013.
INTRODUCTION TO WEB DATABASE PROGRAMMING
A/WWW Enterprises1 Introduction to CNIDR’s Isearch Archie Warnock
Slide No. 1 Searching the Web H Search engines and directories H Locating these resources H Using these resources H Interpreting results H Locating specific.
Postacademic Interuniversity Course in Information Technology – Module C1p1 Contents Data Communications Applications –File & print serving –Mail –Domain.
XHTML Introductory1 Linking and Publishing Basic Web Pages Chapter 3.
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
EBSCOhost 2.0 GOLD/GALILEO ANNUAL USERS GROUP CONFERENCE August 1, 2008.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
NCBI/WHO PubMed/Hinari Course Introduction Session #1, Sept 13, 2005 Session #2, Sept 14, 2005 Internet Concepts and Scientific Literature Resources Ho.
Overview What is a Web search engine History Popular Web search engines How Web search engines work Problems.
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
Internet Business Foundations © 2004 ProsoftTraining All rights reserved.
Linking electronic documents and standardisation of URL’s What can libraries do to enhance dynamic linking and bring related information within a distance.
The Internet 8th Edition Tutorial 4 Searching the Web.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Where do I find it? Created by Connie CampbellConnie Campbell.
HUMANS do it better! dmoz: The Open Directory Project.
Google Pointers to Make You Ogle Connie Ury, B.D. Owens Library.
Search Engines.
4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.
MetaLib 4 User Guide. 2 MetaLib 4 Access MetaLib at: – MetaLib may be used at two different levels –
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
CIW Lesson 6MBSH Mr. Schmidt1.  Define databases and database components  Explain relational database concepts  Define Web search engines and explain.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
CSCI-235 Micro-Computers in Science The Internet and World Wide Web.
CPSC 203 Introduction to Computers T97 By Jie (Jeff) Gao.
InK4DEV Week – Information and Knowledge for Development, 4th Edition Entebbe, Uganda (24 th – 28 th Sept, 2012) CTA is an ACP-EU institution working in.
Internet Power Searching: Finding Pearls in a Zillion Grains of Sand By Daniel Arze.
Microsoft Office 2008 for Mac – Illustrated Unit D: Getting Started with Safari.
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
WISER: What’s new in Science SCOPUS, SCIRUS and Google Scholar Kate Williams and Juliet Ralph May 2006.
SmartSearch. SmartSearch is the Library’s new improved Online Catalogue A single site searches all Library resources:  The Library Online Catalogue (ie,
Week-6 (Lecture-1) Publishing and Browsing the Web: Publishing: 1. upload the following items on the web Google documents Spreadsheets Presentations drawings.
1 UNIT 13 The World Wide Web. Introduction 2 Agenda The World Wide Web Search Engines Video Streaming 3.
June 30, 2005 Public Web Site Search Project Update: 6/30/2005 Linda Busdiecker & Andy Nguyen Department of Information Technology.
SEMINAR ON INTERNET SEARCHING PRESENTED BY:- AVIPSA PUROHIT REGD NO GUIDED BY:- Lect. ANANYA MISHRA.
The Web Web Design. 3.2 The Web Focus on Reading Main Ideas A URL is an address that identifies a specific Web page. Web browsers have varying capabilities.
4.01 How Web Pages Work.
4.01 How Web Pages Work.
Chapter 10: Web Basics.
IS1500: Introduction to Web Development
Building Search Systems for Digital Library Collections
Search Engine Mortality & New Directions
Introduction to Information Retrieval
Web Searching Everything, now..
4.01 How Web Pages Work.
Presentation transcript:

Hotbot A Search Engine Case Study

Introduction  Owned by Terra/Lycos.  One of the largest web search engines.  Uses the Inktomi database combined with Direct Hit and the DMOZ Open Directory.  Basic search screen is simple, but the advanced search allows for a full range of search features.

Databases Open Directory Open Directory Direct Hit Direct Hit Inktomi Inktomi Direct Hit results display if the option for 10 results at a time is selected and there are 10 results available from Direct Hit. If an option for more than 10 results at a time is selected the Direct Hit results are available via a link. Other content comes from various advertisers, the Lycos Network, and GoTo. The GoTo and other advertiser results may show up above and/or below the other results but are under a separate heading such as "feature listings." Direct Hit results display if the option for 10 results at a time is selected and there are 10 results available from Direct Hit. If an option for more than 10 results at a time is selected the Direct Hit results are available via a link. Other content comes from various advertisers, the Lycos Network, and GoTo. The GoTo and other advertiser results may show up above and/or below the other results but are under a separate heading such as "feature listings."

Strengths Advanced searching capabilities Advanced searching capabilities Page depth limit Page depth limit Advanced search help Advanced search help Truncation Truncation

Weaknesses Link searches must be exact Link searches must be exact Database size shrunk for awhile Database size shrunk for awhile Advanced features have not always worked right Advanced features have not always worked right

Features Default Operation: Processed as an AND Default Operation: Processed as an AND Full Boolean Searching: AND, OR, and NOT Full Boolean Searching: AND, OR, and NOT Proximity Searching Proximity Searching Truncation with the * symbol Truncation with the * symbol Case sensitive Case sensitive Extensive, dynamic stop word list Extensive, dynamic stop word list Word Stemming - Search for grammatical word variants including plural, singular, and tense. Word Stemming - Search for grammatical word variants including plural, singular, and tense.

Field Searches Field Searching: Searching title words and links to a specific URL Field Searching: Searching title words and links to a specific URL acrobat/applet/activex/audio/embed/ acrobat/applet/activex/audio/embed/ flash/form/frame/image/script/ flash/form/frame/image/script/ shockwave/table/video/vrml shockwave/table/video/vrml

Limits linkdomain: Limits pages containing links to the specified domain linkdomain: Limits pages containing links to the specified domain Outgoingurlext: Limits to pages containing embedded files with the specified extension Outgoingurlext: Limits to pages containing embedded files with the specified extension Scriptlanguage: Limits to pages containing only javascript or vbscript Scriptlanguage: Limits to pages containing only javascript or vbscript after: [day]/[month]/[year] after: [day]/[month]/[year] before: [day]/[month]/[year] before: [day]/[month]/[year] within:[number/unit] within:[number/unit] Language Limit Language Limit

Unique for Hotbot Page Type – Page Type – Default is Any (Any pages) Default is Any (Any pages) Top Page (the root page of a URL ie. Top Page (the root page of a URL ie. Page Depth - Limits how far down a subdirectory hierarchy Hotbot Searches Page Depth - Limits how far down a subdirectory hierarchy Hotbot Searches These are useful for finding the primary sites for organizations or information These are useful for finding the primary sites for organizations or information

Sorting Results are sorted by relevance with groupings by site available at the end of each brief record. Results are sorted by relevance with groupings by site available at the end of each brief record. The display includes the relevance score, title, URL, a brief extract, and date. HotBot displays 10 records at a time, by default. The display includes the relevance score, title, URL, a brief extract, and date. HotBot displays 10 records at a time, by default.

Architecture Direct Hit: Direct Hit: Provides the breadth of a conventional search engine, with the relevancy of an index which is edited by humans Provides the breadth of a conventional search engine, with the relevancy of an index which is edited by humans References the searching activity of millions of users References the searching activity of millions of users Adjusts rankings based on the popularity of the retrieved documents Adjusts rankings based on the popularity of the retrieved documents

Architecture Inktomi Inktomi Hosts Web searches for its clients on coupled- cluster, parallel-computing multiple workstations Hosts Web searches for its clients on coupled- cluster, parallel-computing multiple workstations Receiving a search query from a user, that interface translates the query from HTTP into Inktomi Data Protocol (IDP) and sends it to the Inktomi Master Cluster Receiving a search query from a user, that interface translates the query from HTTP into Inktomi Data Protocol (IDP) and sends it to the Inktomi Master Cluster it sends the results in IDP to the client Web server, which translates the information into HTTP and sends it to the user it sends the results in IDP to the client Web server, which translates the information into HTTP and sends it to the user

Results Query 1: Information on Home of the Rockefellers Kykuit - To test the engines on a very specific bit of Americana - Kykuit, the baronial home of the Rockefellers on the Hudson River in New York. Query 1: Information on Home of the Rockefellers Kykuit - To test the engines on a very specific bit of Americana - Kykuit, the baronial home of the Rockefellers on the Hudson River in New York. Query 2: Information on Neuschwanstein Castle - To test the engines on a fairly well-known tourist attraction in Germany - Neuschwanstein Castle Query 2: Information on Neuschwanstein Castle - To test the engines on a fairly well-known tourist attraction in Germany - Neuschwanstein Castle Query 3: Information on Francis Pilkington Madrigals - To test the engines on retrieval of an obscure musical reference - the Elizabethan madrigals of Francis Pilkington. Query 3: Information on Francis Pilkington Madrigals - To test the engines on retrieval of an obscure musical reference - the Elizabethan madrigals of Francis Pilkington.

Query 1: Information on Home of the Rockefellers Kykuit Hotbot - 72 Matches Hotbot - 72 Matches FPL: FPL: Relevance rating: Page 14: County Historys Relevance rating: Page 14: County Historys Google - 91 Matches Google - 91 Matches FPL: FPL: Relevance: Page 30: A Book Where Kykuit is mentioned Relevance: Page 30: A Book Where Kykuit is mentioned UNCA Library - 5 Matches UNCA Library - 5 Matches FPL: wncln.appstate.edu/search/...information+on+how+to+use +the+dietary+guidelines&1,1 FPL: wncln.appstate.edu/search/...information+on+how+to+use +the+dietary+guidelines&1,1 wncln.appstate.edu/search/...information+on+how+to+use +the+dietary+guidelines&1,1 wncln.appstate.edu/search/...information+on+how+to+use +the+dietary+guidelines&1,1 Relevance: Page 1: Information on how to use dietary guidelines Relevance: Page 1: Information on how to use dietary guidelines

Query 2: Information on Neuschwanstein Castle Hotbot - 2,700 Matches Hotbot - 2,700 Matches FPL: FPL: Relevance: Page 10: Castles of the US Relevance: Page 10: Castles of the US Google – 4,060 Matches Google – 4,060 Matches FPL: FPL: Relevance: Page 33: A Page on King Ludwig II - No Mention of Neuschwanstein Castle Relevance: Page 33: A Page on King Ludwig II - No Mention of Neuschwanstein Castle UNCA Library - 5 Matches UNCA Library - 5 Matches FPL: wncln.appstate.edu/search/…6,0,0,B/frameset&FF=tinform ation+on+self+employment+tax&1,1 FPL: wncln.appstate.edu/search/…6,0,0,B/frameset&FF=tinform ation+on+self+employment+tax&1,1 Relevance: Page 1: Information On Self Employment Tax Relevance: Page 1: Information On Self Employment Tax

Query 3: Information on Francis Pilkington Madrigals Hotbot - 53 Matches Hotbot - 53 Matches FPL: FPL: Relevance: Page 5 - A Page about the Lute - no mention of Madrigals Relevance: Page 5 - A Page about the Lute - no mention of Madrigals Google - 33 Matches Google - 33 Matches FPL: FPL: Relevance: Page 3: No mention of Pilkington Madrigals Relevance: Page 3: No mention of Pilkington Madrigals UNCA Library - 5 Matches UNCA Library - 5 Matches FPL: wncln.appstate.edu/search/…6,0,0,B/frameset&FF=tinform ation+on+the+red+notice+system&1,1 FPL: wncln.appstate.edu/search/…6,0,0,B/frameset&FF=tinform ation+on+the+red+notice+system&1,1 wncln.appstate.edu/search/…6,0,0,B/frameset&FF=tinform ation+on+the+red+notice+system&1,1 wncln.appstate.edu/search/…6,0,0,B/frameset&FF=tinform ation+on+the+red+notice+system&1,1 Relevance: Page 1: Information On The Red Notice System Relevance: Page 1: Information On The Red Notice System

Conclusion HotBot is an interface to advanced web searches, and it presents a dynamically changing backend. Both the Inktomi and Direct Hit technologies serve, in different ways, to provide a relevant list of results through advanced queries, and both seek to minimize the commercial influence over search results. All of these technologies are subject to changes in technology developments, and changes in the business environment. HotBot is an interface to advanced web searches, and it presents a dynamically changing backend. Both the Inktomi and Direct Hit technologies serve, in different ways, to provide a relevant list of results through advanced queries, and both seek to minimize the commercial influence over search results. All of these technologies are subject to changes in technology developments, and changes in the business environment. Its weaknesses include that it still doesn't seem to produce the depth and breadth of some other engines, and that it's advanced features have not always worked correctly. As the proliferation of this engine's index and searching features continues, these weaknesses should be overcome. Its weaknesses include that it still doesn't seem to produce the depth and breadth of some other engines, and that it's advanced features have not always worked correctly. As the proliferation of this engine's index and searching features continues, these weaknesses should be overcome.