Introduction to Information Retrieval

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
Search Engines and Information Retrieval
© Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.
Information Retrieval February 24, 2004
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Information Retrieval in Practice
1 Internet Search Tools Adapted from Kathy Schrock’s PowerPoint entitled “Successful Web Search Strategies” Kathy Schrock’s complete PowerPoint available.
Information Seeking Processes and Models Dr. Dania Bilal IS 530 Fall 2007.
Search Engines and Information Retrieval Chapter 1.
The context of the interface Ian Ruthven University of Strathclyde.
CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Information Retrieval Evaluation and the Retrieval Process.
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2006.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Search Engine Architecture
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
IR Theory: Relevance Feedback. Relevance Feedback: Example  Initial Results Search Engine2.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Structure of IR Systems INST 734 Module 1 Doug Oard.
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2005.
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Jane Reid, AMSc IRIC, QMUL, 30/10/01 1 Information seeking Information-seeking models Search strategies Search tactics.
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
The Cross Language Image Retrieval Track: ImageCLEF Breakout session discussion.
Why IR test collections are so bad Mark Sanderson University of Sheffield.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Smart Web Search Agents Data Search Engines >> Information Search Agents - Traditional searching on the Web is done using one of the following three: -
Introduction to Information Retrieval. What is IR? Sit down before fact as a little child, be prepared to give up every conceived notion, follow humbly.
WHIM- Spring ‘10 By:-Enza Desai. What is HCIR? Study of IR techniques that brings human intelligence into search process. Coined by Gary Marchionini.
Data mining in web applications
Information Retrieval in Practice
PresQT Workshop, Tuesday, May 2, 2017
Expository Speeches.
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Education 499-R01 Search Basics.
Information Organization: Overview
What is Information Retrieval (IR)?
Lecture 12: Relevance Feedback & Query Expansion - II
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Search Engine Architecture
Thanks to Bill Arms, Marti Hearst
Question Answering via Question-to-Question Mapping
IR Theory: Evaluation Methods
Introduction to Search Engines
Introduction into Knowledge and information
Evaluation of IR Performance
Document Clustering Matt Hughes.
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
Lecture 8 Information Retrieval Introduction
Planning and Storyboarding a Web Site
CS246: Information Retrieval
Search Engine Architecture
Information Organization: Overview
Information Retrieval and Web Design
Kittiya Poonsilp, Rujijan Vichivanives, Attakorn Poonsilp
Introduction to Search Engines
Information Seeking Models
Presentation transcript:

Introduction to Information Retrieval

What is IR? Google Ask.com Yahoo! Google Korea Naver Daum Sit down before fact as a little child, be prepared to give up every conceived notion, follow humbly wherever and whatever abysses nature leads, or you will learn nothing. -- Thomas Huxley -- Google Query = What is IR? Query = What is information retrieval? Ask.com Yahoo! Google Korea Naver Daum Qry= what is information retrieval? First of 1.07 million Google results Qry= “information retrieval” AJ: Survey of Information Retrieval My research focuses on the domain of Information Retrieval. So I needed to do a lot of reading and Web searching to get up to speed. To help me make sense of it all, I've made up a hypertext survey of the field. I hope it is useful to the general......  In the context of information retrieval (IR), information, in the technical meaning given in Shannon's theory of communication, is not readily measured (Shannon and Weaver[1]). In fact, in many cases one can adequately describe the kind of retrieval by simply substituting 'document' for 'information'. --C. J. van RIJSBERGEN -- An information retrieval system does not inform (i.e. change the knowledge of) the user on the subject of his inquiry. It merely informs on the existence (or non-existence) and whereabouts of documents relating to his request. – Lancaster - Search Engines

IR: Key Questions What are we looking for? How do we find it? Why is it difficult? Key questions of IR “A prudent question is one-half of wisdom” Francis Bacon Search Engines

IR: What are we looking for? We are Looking for X. Q&A: population of China Known-item Search: “Cather in the Rye” Looking for something like/about X. General/background info: Taliban Collection Development: IR Literature Similar to (known) X: like “Cather in the Rye” WhatyoumacallX: “the rye-boy story” Looking for something Problem Resoultion: how can we fight terrorism? Knowledge Development: what is IR? Looking Need something, but don’t know what what’s it all about? Serendipity: Web surfing looking for X (exact match) -- Q&A: population of China -- Known item search: “Catcher in the Rye” 2. looking something like/about X (approx./criteria match) -- General/background info: what’s up with Taliban? -- Collection development: IR Literature -- Similar to (known) X: like “Catcher in the Rye” -- WhatyoumacallX: “the rye-boy story” 3. looking for something: (criteria/dynamic match) -- Problem resolution: how can we fight terrorism? -- Knowledge development: what is IR? 4. looking: anything goes match -- Question resolution: why did chicken cross the road? -- Serendipity: Web surfing -- Need something, but don’t know what. -- the ultimate questions: what’s the meaning of life? Search Engines

IR: How do we find it? Brute force search Organize/structure the data Easy to build, maintain, and use Searcher does all the work; Hard to get satisfaction Organize/structure the data Intuitive to use Hard to build and maintain Knowledge of builder’s language & organization structure is crucial Use a search tool Easier to build and maintain: Less manipulation of data Sometimes works, sometimes not (Helps to know the language of the data) Ask the experts Easy and satisfying to use (by definition) “Expert” knowledge is transitory, hard to encapsulate Go with the crowd Relatively easy to build and maintain Limited utility: doesn’t work with “unpopular” X Zen-Fusion search. 1. Brute force: e.g. toy box -- Easy to build, maintain, and use -- Searcher does all the work, Hard to get satisfaction. 2. Organize/structuralize: -- Classification vs. Database approach -- Intuitive to use -- Hard to build and maintain -- knowledge of builder’s language & organization structure is crucial. 3. Search tool: token-based search, query-data similarity ranking -- Easier to build and maintain: Less manipulation of data -- Sometimes works, sometimes not (Helps to know the language of the data) 4. Expert system: caveat - must believe in the “expertise” of the expert. -- Easy and satisfying to use (by definition) -- “expert” knowledge is transitory, hard to encapsulate. 5. Go w/ crowd: i.e. Popularity Ranking -- relatively Easy to build and maintain -- Limited utility: doesn’t work with “unpopular” X. -- Collective “wisdom”? tell that to Galileo 6. Zen search -- Works all the time. (by definition) -- Hard to build: transcends logic, rationality. Search Engines

Information Seeking Process: Dynamic, Interactive, Iterative User Intermediary Information What am I looking for? - Identification of info. need What question do I ask? - Query formulation What is the searcher looking for? - Discovery of user’s info. need How should the question be posed? - Query representation Where is the relevant information? - Query-document matching What data to collect? - Collection development What information to index? - Indexing/Representation How to represent it? - Data structure USER: Has information need 1.     Identify the Information Need: - Think (Reflection), Talk (Discussion), Learn about it (Info. Processing) 2.     Communicate the Information Need - Say it in my own words and hope for the best - Express it in the system’s query language 3.     Give Feedback to refine and update #1 & #2 - This isn’t what I am looking for. - This is it. Give me more like it. - This could be it. I’m not sure. - This is what I asked for, but now I would like this. INTERMEDIARY: Knows how to find Information  1.     Discover User’s Information Need - Questions to guide (e.g. expand, focus, clarify), - Dialogues to discover (e.g. motivation, background) - Provide, or suggest potentially useful information 2.     Query the Database for appropriate Information - Formulate a query (Information in the form of question), Translate into system’s language 3.     Process Feedback to refine and update #1 & #2 - You wanted to find out about “…,” right? (NO  redo #1; YES  redo #2) - Reformulate the query to emphasize, de-emphasize, fuzzy-emphasize the “importance” of information contents relative to the user. - Update the query to accommodate the change in Information Need  Search Engines

Information Seeking Models Traditional Model Linear process: Problem identification Identification of information need Query formulation Result evaluation Static information need The goal is to retrieve a perfect match of the information need Berry-picking Model (딸기따기 모델) Interesting information is scattered like berries among bushes. Information seeking is a dynamic, non-linear process, where information need/queries continually shift. Information needs are not satisfied by a single, final retrieved set of documents, but rather by a series of selections and bits of information found along the way. Broader, 2002 Bates, 1989 Search Engines

IR Research: Overview Information Access Information Retrieval - Retrieve information Information Retrieval - Create a searchable index USER: Has information need 1.     Identify the Information Need: - Think (Reflection), Talk (Discussion), Learn about it (Info. Processing) 2.     Communicate the Information Need - Say it in my own words and hope for the best - Express it in the system’s query language 3.     Give Feedback to refine and update #1 & #2 - This isn’t what I am looking for. - This is it. Give me more like it. - This could be it. I’m not sure. - This is what I asked for, but now I would like this. INTERMEDIARY: Knows how to find Information  1.     Discover User’s Information Need - Questions to guide (e.g. expand, focus, clarify), - Dialogues to discover (e.g. motivation, background) - Provide, or suggest potentially useful information 2.     Query the Database for appropriate Information - Formulate a query (Information in the form of question), Translate into system’s language 3.     Process Feedback to refine and update #1 & #2 - You wanted to find out about “…,” right? (NO  redo #1; YES  redo #2) - Reformulate the query to emphasize, de-emphasize, fuzzy-emphasize the “importance” of information contents relative to the user. - Update the query to accommodate the change in Information Need  Data Mining - Discover Knowledge Information Organization: - Add structure & annotation Search Engines

IR Research: Information Retrieval Query Formulation - “What is information retrieval?” Representation - indexing, term weighting D1: information retrieval seminars D2: retrieval models and information retrieval D3: information model Search Results - (ranked) document list Searchable Index Raw Data USER: Has information need 1.     Identify the Information Need: - Think (Reflection), Talk (Discussion), Learn about it (Info. Processing) 2.     Communicate the Information Need - Say it in my own words and hope for the best - Express it in the system’s query language 3.     Give Feedback to refine and update #1 & #2 - This isn’t what I am looking for. - This is it. Give me more like it. - This could be it. I’m not sure. - This is what I asked for, but now I would like this. INTERMEDIARY: Knows how to find Information  1.     Discover User’s Information Need - Questions to guide (e.g. expand, focus, clarify), - Dialogues to discover (e.g. motivation, background) - Provide, or suggest potentially useful information 2.     Query the Database for appropriate Information - Formulate a query (Information in the form of question), Translate into system’s language 3.     Process Feedback to refine and update #1 & #2 - You wanted to find out about “…,” right? (NO  redo #1; YES  redo #2) - Reformulate the query to emphasize, de-emphasize, fuzzy-emphasize the “importance” of information contents relative to the user. - Update the query to accommodate the change in Information Need  Rank docID score 1 D2 3 2 D1 D3 Index Term D1 D2 D3 wd1 (information) 1 wd2 (model) wd3 (retrieval) 2 wd4 (seminar) D1 wd1 wd2 wd3 D2 wd3 wd2 wd1 wd3 D3 wd1 wd2 Search Engines

IR Research: Information Organization Query Formulation - “What is IR?” Representation - NLP & Machine Learning Search Results - document groups Organized Data Raw Data USER: Has information need 1.     Identify the Information Need: - Think (Reflection), Talk (Discussion), Learn about it (Info. Processing) 2.     Communicate the Information Need - Say it in my own words and hope for the best - Express it in the system’s query language 3.     Give Feedback to refine and update #1 & #2 - This isn’t what I am looking for. - This is it. Give me more like it. - This could be it. I’m not sure. - This is what I asked for, but now I would like this. INTERMEDIARY: Knows how to find Information  1.     Discover User’s Information Need - Questions to guide (e.g. expand, focus, clarify), - Dialogues to discover (e.g. motivation, background) - Provide, or suggest potentially useful information 2.     Query the Database for appropriate Information - Formulate a query (Information in the form of question), Translate into system’s language 3.     Process Feedback to refine and update #1 & #2 - You wanted to find out about “…,” right? (NO  redo #1; YES  redo #2) - Reformulate the query to emphasize, de-emphasize, fuzzy-emphasize the “importance” of information contents relative to the user. - Update the query to accommodate the change in Information Need  Search Engines

IR Research: Natural Language Processing Goal Understanding/effective processing of natural language Not just pattern matching Lexical Analysis using Part-of-Speech (POS) tagging Sentence Parsing Research area, technique, tool for Data Mining, Knowledge Discovery USER: Has information need 1.     Identify the Information Need: - Think (Reflection), Talk (Discussion), Learn about it (Info. Processing) 2.     Communicate the Information Need - Say it in my own words and hope for the best - Express it in the system’s query language 3.     Give Feedback to refine and update #1 & #2 - This isn’t what I am looking for. - This is it. Give me more like it. - This could be it. I’m not sure. - This is what I asked for, but now I would like this. INTERMEDIARY: Knows how to find Information  1.     Discover User’s Information Need - Questions to guide (e.g. expand, focus, clarify), - Dialogues to discover (e.g. motivation, background) - Provide, or suggest potentially useful information 2.     Query the Database for appropriate Information - Formulate a query (Information in the form of question), Translate into system’s language 3.     Process Feedback to refine and update #1 & #2 - You wanted to find out about “…,” right? (NO  redo #1; YES  redo #2) - Reformulate the query to emphasize, de-emphasize, fuzzy-emphasize the “importance” of information contents relative to the user. - Update the query to accommodate the change in Information Need  Search Engines

IR Research: Machine Learning Research Area, technique, tool for Information Organization, Data Mining, Knowledge Discovery Information Organization via Supervised Learning (Automatic Classification) Unsupervised Learning (Clustering) Class 1 Class 2 Class 1 Class 2 Classification USER: Has information need 1.     Identify the Information Need: - Think (Reflection), Talk (Discussion), Learn about it (Info. Processing) 2.     Communicate the Information Need - Say it in my own words and hope for the best - Express it in the system’s query language 3.     Give Feedback to refine and update #1 & #2 - This isn’t what I am looking for. - This is it. Give me more like it. - This could be it. I’m not sure. - This is what I asked for, but now I would like this. INTERMEDIARY: Knows how to find Information  1.     Discover User’s Information Need - Questions to guide (e.g. expand, focus, clarify), - Dialogues to discover (e.g. motivation, background) - Provide, or suggest potentially useful information 2.     Query the Database for appropriate Information - Formulate a query (Information in the form of question), Translate into system’s language 3.     Process Feedback to refine and update #1 & #2 - You wanted to find out about “…,” right? (NO  redo #1; YES  redo #2) - Reformulate the query to emphasize, de-emphasize, fuzzy-emphasize the “importance” of information contents relative to the user. - Update the query to accommodate the change in Information Need  Clustering Search Engines