Introduction to Search Engines

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

INFO624 - Week 2 Models of Information Retrieval Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
Introduction to Information Retrieval
Multimedia Database Systems
Query Languages. Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
IS530 Lesson 12 Boolean vs. Statistical Retrieval Systems.
Computer Information Technology – Section 3-2. The Internet Objectives: The Student will: 1. Understand Search Engines and how they work 2. Understand.
Search Engines. 2 What Are They?  Four Components  A database of references to webpages  An indexing robot that crawls the WWW  An interface  Enables.
Information Retrieval in Practice
Search Engines and Information Retrieval
Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web.
Parametric search and zone weighting Lecture 6. Recap of lecture 4 Query expansion Index construction.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Evaluating the Performance of IR Sytems
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
1 CS 430: Information Discovery Lecture 2 Introduction to Text Based Information Retrieval.
WMES3103 : INFORMATION RETRIEVAL INDEXING AND SEARCHING.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Search Engines and Information Retrieval Chapter 1.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
IL Step 3: Using Bibliographic Databases Information Literacy 1.
Chapter 6: Information Retrieval and Web Search
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Search Engine Architecture
IR Theory: Relevance Feedback. Relevance Feedback: Example  Initial Results Search Engine2.
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2005.
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Information Resources Libraries & the Open Web Frederic Murray Assistant Professor MLIS, University of British Columbia BA, Political Science, University.
Information Retrieval
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Chapter 20 Asking Questions, Finding Sources. Characteristics of a Good Research Paper Poses an interesting question and significant problem Responds.
IR Theory: Web Information Retrieval. Web IRFusion IR Search Engine 2.
Text Similarity: an Alternative Way to Search MEDLINE James Lewis, Stephan Ossowski, Justin Hicks, Mounir Errami and Harold R. Garner Translational Research.
Introduction to Information Retrieval. What is IR? Sit down before fact as a little child, be prepared to give up every conceived notion, follow humbly.
INFORMATION RETRIEVAL Pabitra Mitra Computer Science and Engineering IIT Kharagpur
Information Retrieval in Practice
Searching for Information
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Search Engine Architecture
Summon discovers contents from one search box!
Search Engine Architecture
CS 430: Information Discovery
Prepared by Rao Umar Anwar For Detail information Visit my blog:
Boolean Retrieval Term Vocabulary and Posting Lists Web Search Basics
Search Techniques and Advanced tools for Researchers
Thanks to Bill Arms, Marti Hearst
Information Retrieval
Searching EIT, Author Gay Robertson, 2017.
Data Mining Chapter 6 Search Engines
IL Step 3: Using Bibliographic Databases
Introduction to Information Retrieval
Introduction to Information Retrieval
Chapter 5: Information Retrieval and Web Search
Search Engine Architecture
Cell Biology and Genetics
Information Retrieval and Web Design
Information Retrieval and Web Design
Introduction to Search Engines
Presentation transcript:

Introduction to Search Engines

Search Engine Overview Query (질의) 1 Searchable Index (색인) Search Results 2 3 Search Data (0) (1) Query Indexing (2) Document Ranking (3) Result Display 1. Document Collection - e.g., spider/crawler 2. Document Indexing - term indexing (tokenizing, stop & stem) - term weighting USER: Has information need 1.     Identify the Information Need: - Think (Reflection), Talk (Discussion), Learn about it (Info. Processing) 2.     Communicate the Information Need - Say it in my own words and hope for the best - Express it in the system’s query language 3.     Give Feedback to refine and update #1 & #2 - This isn’t what I am looking for. - This is it. Give me more like it. - This could be it. I’m not sure. - This is what I asked for, but now I would like this. INTERMEDIARY: Knows how to find Information  1.     Discover User’s Information Need - Questions to guide (e.g. expand, focus, clarify), - Dialogues to discover (e.g. motivation, background) - Provide, or suggest potentially useful information 2.     Query the Database for appropriate Information - Formulate a query (Information in the form of question), Translate into system’s language 3.     Process Feedback to refine and update #1 & #2 - You wanted to find out about “…,” right? (NO  redo #1; YES  redo #2) - Reformulate the query to emphasize, de-emphasize, fuzzy-emphasize the “importance” of information contents relative to the user. - Update the query to accommodate the change in Information Need  User Intermediary Information What am I looking for? - Identification of info. need What question do I ask? - Query formulation What is the searcher looking for? - Discovery of user’s info. need How should the question be posed? - Query representation Where is the relevant information? - Query-document matching What data to collect? - Collection development What information to index? - Indexing/Representation How to represent it? - Data structure Search Engines

Search Engine: Data Document Collection Document Indexing Select target data sources – e.g., domain, corpus, WWW Harvest data – e.g., data entry, data import, spider/crawler Document Indexing Select indexing sources (색인어) – e.g., metadata, keywords, content Extract indexing terms – e.g., tokenization, stop & stem Assign term weights – e.g., tf-idf, okapi “The frequency of word occurrence in an article furnishes a useful measurement of word significance.” 문헌에 출현한 던어들은 문헌의 내용 분석을 위해 사용될 수 있으며, 단어의 출현빈도가 이 단어의 주제어로서의 중요성을 측정하는 기준이 된다 . Luhn, H.P. (1958). The automatic creation of literature abstracts. IBM Journal of Research and Development, 2, 159-165. Search Engines

Search Engine: Indexing Process Documents (Text) INVERTED INDEX Term Weighting Tokenization Tokens Tokens SEQUENTIAL INDEX Tokens Token Selection Tokens Tokens Tokens Tokens Token Normalization Select Tokens D1 D2 D3 wd1 (information) 1 wd2 (model) wd3 (retrieval) 2 wd4 (seminar) D1 information 1, retrieval 1, seminar 1 D2 information 1, model 1, retrieval 2 D3 information 1, model 1 D1: Information retrieval seminars D2: Retrieval Models and Information Retrieval D3: Information Model D1: information, retrieval, seminar(s) D2: retrieval, model(s), and, information, retrieval D3: information, model Search Engines

Search Engine: Search Query Indexing Document Ranking Result Display Tokenization Stop & Stem Term Weighting Document Ranking Query-Document matching Document Score computation Result Display Content - e.g., title & snippets Layout - e.g., grouped by category Toppings - e.g., related searches Query: What is information retrieval? Q: Information 1, retrieval 1 Index Term D1 D2 D3 wd1 (information) 1 wd2 (model) wd3 (retrieval) 2 wd4 (seminar) Rank docID score 1 D2 3 2 D1 D3 Search Engines

2015 8 1 9 2 10 11 3 4 12 5 13 6 14 7 Search Engines

Result Categories 2015 15 16 17 Proprietary (Naver-specific) content Encyclopedia Naver Books Q&A DB (지식iN) Magazine Café Blog Book Map Website Advertisement (파워링크) Image Webpage Naver News Library Video Naver AppStore Naver Scholar Naver Post Naver Shopping News Naver Dictionary 15 16 17 Proprietary (Naver-specific) content Dynamic category order Toppings Search by Category Related Searches Popular Searches (by category) 18 Query: 정보검색 (Information Retrieval) Query: 검색엔진 (Search Engine) 19 20 Search Engines

Result Categories 2015 1 Webpage-centric content Advertisement 1 Webpage-centric content Dynamic category order Toppings Search by Category Related Searches 2 Query: Information Retrieval Query: Search Engine Search Engines

Search Engine vs. Database vs. Directories Corpus Type General Specific General/Specific Data Collection Automatic - crawler/spider Manual - data entry/import - classification Data Quality Not controlled Controlled Data Organization None (bag-of-words) Structured - Relational - Hierarchical Query Input Text box Field-specific - Boolean Search Result Ranked - documents Not ranked - records - categories Search Index Document text Database Tables Category Tree e.g. Google Library Search dmoz.org USER: Has information need 1.     Identify the Information Need: - Think (Reflection), Talk (Discussion), Learn about it (Info. Processing) 2.     Communicate the Information Need - Say it in my own words and hope for the best - Express it in the system’s query language 3.     Give Feedback to refine and update #1 & #2 - This isn’t what I am looking for. - This is it. Give me more like it. - This could be it. I’m not sure. - This is what I asked for, but now I would like this. INTERMEDIARY: Knows how to find Information  1.     Discover User’s Information Need - Questions to guide (e.g. expand, focus, clarify), - Dialogues to discover (e.g. motivation, background) - Provide, or suggest potentially useful information 2.     Query the Database for appropriate Information - Formulate a query (Information in the form of question), Translate into system’s language 3.     Process Feedback to refine and update #1 & #2 - You wanted to find out about “…,” right? (NO  redo #1; YES  redo #2) - Reformulate the query to emphasize, de-emphasize, fuzzy-emphasize the “importance” of information contents relative to the user. - Update the query to accommodate the change in Information Need  Search Engines