Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Information Retrieval

Similar presentations


Presentation on theme: "Introduction to Information Retrieval"— Presentation transcript:

1 Introduction to Information Retrieval

2 What is IR? Google Ask.com Yahoo! Google Korea Naver Daum
Sit down before fact as a little child, be prepared to give up every conceived notion, follow humbly wherever and whatever abysses nature leads, or you will learn nothing. -- Thomas Huxley -- Google Query = What is IR? Query = What is information retrieval? Ask.com Yahoo! Google Korea Naver Daum Qry= what is information retrieval? First of 1.07 million Google results Qry= “information retrieval” AJ: Survey of Information Retrieval My research focuses on the domain of Information Retrieval. So I needed to do a lot of reading and Web searching to get up to speed. To help me make sense of it all, I've made up a hypertext survey of the field. I hope it is useful to the general......  In the context of information retrieval (IR), information, in the technical meaning given in Shannon's theory of communication, is not readily measured (Shannon and Weaver[1]). In fact, in many cases one can adequately describe the kind of retrieval by simply substituting 'document' for 'information'. --C. J. van RIJSBERGEN -- An information retrieval system does not inform (i.e. change the knowledge of) the user on the subject of his inquiry. It merely informs on the existence (or non-existence) and whereabouts of documents relating to his request. – Lancaster - Search Engines

3 IR: Key Questions What are we looking for? How do we find it?
Why is it difficult? Key questions of IR “A prudent question is one-half of wisdom” Francis Bacon Search Engines

4 IR: What are we looking for?
We are Looking for X. Q&A: population of China Known-item Search: “Cather in the Rye” Looking for something like/about X. General/background info: Taliban Collection Development: IR Literature Similar to (known) X: like “Cather in the Rye” WhatyoumacallX: “the rye-boy story” Looking for something Problem Resoultion: how can we fight terrorism? Knowledge Development: what is IR? Looking Need something, but don’t know what what’s it all about? Serendipity: Web surfing looking for X (exact match) -- Q&A: population of China -- Known item search: “Catcher in the Rye” 2. looking something like/about X (approx./criteria match) General/background info: what’s up with Taliban? -- Collection development: IR Literature -- Similar to (known) X: like “Catcher in the Rye” -- WhatyoumacallX: “the rye-boy story” 3. looking for something: (criteria/dynamic match) Problem resolution: how can we fight terrorism? -- Knowledge development: what is IR? 4. looking: anything goes match Question resolution: why did chicken cross the road? -- Serendipity: Web surfing -- Need something, but don’t know what. -- the ultimate questions: what’s the meaning of life? Search Engines

5 IR: How do we find it? Brute force search Organize/structure the data
Easy to build, maintain, and use Searcher does all the work; Hard to get satisfaction Organize/structure the data Intuitive to use Hard to build and maintain Knowledge of builder’s language & organization structure is crucial Use a search tool Easier to build and maintain: Less manipulation of data Sometimes works, sometimes not (Helps to know the language of the data) Ask the experts Easy and satisfying to use (by definition) “Expert” knowledge is transitory, hard to encapsulate Go with the crowd Relatively easy to build and maintain Limited utility: doesn’t work with “unpopular” X Zen-Fusion search. 1. Brute force: e.g. toy box -- Easy to build, maintain, and use -- Searcher does all the work, Hard to get satisfaction. 2. Organize/structuralize: -- Classification vs. Database approach -- Intuitive to use -- Hard to build and maintain -- knowledge of builder’s language & organization structure is crucial. 3. Search tool: token-based search, query-data similarity ranking -- Easier to build and maintain: Less manipulation of data -- Sometimes works, sometimes not (Helps to know the language of the data) 4. Expert system: caveat - must believe in the “expertise” of the expert. -- Easy and satisfying to use (by definition) -- “expert” knowledge is transitory, hard to encapsulate. 5. Go w/ crowd: i.e. Popularity Ranking -- relatively Easy to build and maintain -- Limited utility: doesn’t work with “unpopular” X. -- Collective “wisdom”? tell that to Galileo 6. Zen search -- Works all the time. (by definition) -- Hard to build: transcends logic, rationality. Search Engines

6 Information Seeking Process: Dynamic, Interactive, Iterative
User Intermediary Information What am I looking for? - Identification of info. need What question do I ask? - Query formulation What is the searcher looking for? - Discovery of user’s info. need How should the question be posed? - Query representation Where is the relevant information? - Query-document matching What data to collect? - Collection development What information to index? - Indexing/Representation How to represent it? - Data structure USER: Has information need 1.     Identify the Information Need: - Think (Reflection), Talk (Discussion), Learn about it (Info. Processing) 2.     Communicate the Information Need - Say it in my own words and hope for the best - Express it in the system’s query language 3.     Give Feedback to refine and update #1 & #2 - This isn’t what I am looking for. - This is it. Give me more like it. - This could be it. I’m not sure. - This is what I asked for, but now I would like this. INTERMEDIARY: Knows how to find Information  1.     Discover User’s Information Need - Questions to guide (e.g. expand, focus, clarify), - Dialogues to discover (e.g. motivation, background) - Provide, or suggest potentially useful information 2.     Query the Database for appropriate Information - Formulate a query (Information in the form of question), Translate into system’s language 3.     Process Feedback to refine and update #1 & #2 - You wanted to find out about “…,” right? (NO  redo #1; YES  redo #2) - Reformulate the query to emphasize, de-emphasize, fuzzy-emphasize the “importance” of information contents relative to the user. - Update the query to accommodate the change in Information Need  Search Engines

7 Information Seeking Models
Traditional Model Linear process: Problem identification Identification of information need Query formulation Result evaluation Static information need The goal is to retrieve a perfect match of the information need Berry-picking Model (딸기따기 모델) Interesting information is scattered like berries among bushes. Information seeking is a dynamic, non-linear process, where information need/queries continually shift. Information needs are not satisfied by a single, final retrieved set of documents, but rather by a series of selections and bits of information found along the way. Broader, 2002 Bates, 1989 Search Engines

8 IR Research: Overview Information Access Information Retrieval
- Retrieve information Information Retrieval - Create a searchable index USER: Has information need 1.     Identify the Information Need: - Think (Reflection), Talk (Discussion), Learn about it (Info. Processing) 2.     Communicate the Information Need - Say it in my own words and hope for the best - Express it in the system’s query language 3.     Give Feedback to refine and update #1 & #2 - This isn’t what I am looking for. - This is it. Give me more like it. - This could be it. I’m not sure. - This is what I asked for, but now I would like this. INTERMEDIARY: Knows how to find Information  1.     Discover User’s Information Need - Questions to guide (e.g. expand, focus, clarify), - Dialogues to discover (e.g. motivation, background) - Provide, or suggest potentially useful information 2.     Query the Database for appropriate Information - Formulate a query (Information in the form of question), Translate into system’s language 3.     Process Feedback to refine and update #1 & #2 - You wanted to find out about “…,” right? (NO  redo #1; YES  redo #2) - Reformulate the query to emphasize, de-emphasize, fuzzy-emphasize the “importance” of information contents relative to the user. - Update the query to accommodate the change in Information Need  Data Mining - Discover Knowledge Information Organization: - Add structure & annotation Search Engines

9 IR Research: Information Retrieval
Query Formulation - “What is information retrieval?” Representation - indexing, term weighting D1: information retrieval seminars D2: retrieval models and information retrieval D3: information model Search Results - (ranked) document list Searchable Index Raw Data USER: Has information need 1.     Identify the Information Need: - Think (Reflection), Talk (Discussion), Learn about it (Info. Processing) 2.     Communicate the Information Need - Say it in my own words and hope for the best - Express it in the system’s query language 3.     Give Feedback to refine and update #1 & #2 - This isn’t what I am looking for. - This is it. Give me more like it. - This could be it. I’m not sure. - This is what I asked for, but now I would like this. INTERMEDIARY: Knows how to find Information  1.     Discover User’s Information Need - Questions to guide (e.g. expand, focus, clarify), - Dialogues to discover (e.g. motivation, background) - Provide, or suggest potentially useful information 2.     Query the Database for appropriate Information - Formulate a query (Information in the form of question), Translate into system’s language 3.     Process Feedback to refine and update #1 & #2 - You wanted to find out about “…,” right? (NO  redo #1; YES  redo #2) - Reformulate the query to emphasize, de-emphasize, fuzzy-emphasize the “importance” of information contents relative to the user. - Update the query to accommodate the change in Information Need  Rank docID score 1 D2 3 2 D1 D3 Index Term D1 D2 D3 wd1 (information) 1 wd2 (model) wd3 (retrieval) 2 wd4 (seminar) D1 wd1 wd2 wd3 D2 wd3 wd2 wd1 wd3 D3 wd1 wd2 Search Engines

10 IR Research: Information Organization
Query Formulation - “What is IR?” Representation - NLP & Machine Learning Search Results - document groups Organized Data Raw Data USER: Has information need 1.     Identify the Information Need: - Think (Reflection), Talk (Discussion), Learn about it (Info. Processing) 2.     Communicate the Information Need - Say it in my own words and hope for the best - Express it in the system’s query language 3.     Give Feedback to refine and update #1 & #2 - This isn’t what I am looking for. - This is it. Give me more like it. - This could be it. I’m not sure. - This is what I asked for, but now I would like this. INTERMEDIARY: Knows how to find Information  1.     Discover User’s Information Need - Questions to guide (e.g. expand, focus, clarify), - Dialogues to discover (e.g. motivation, background) - Provide, or suggest potentially useful information 2.     Query the Database for appropriate Information - Formulate a query (Information in the form of question), Translate into system’s language 3.     Process Feedback to refine and update #1 & #2 - You wanted to find out about “…,” right? (NO  redo #1; YES  redo #2) - Reformulate the query to emphasize, de-emphasize, fuzzy-emphasize the “importance” of information contents relative to the user. - Update the query to accommodate the change in Information Need  Search Engines

11 IR Research: Natural Language Processing
Goal Understanding/effective processing of natural language Not just pattern matching Lexical Analysis using Part-of-Speech (POS) tagging Sentence Parsing Research area, technique, tool for Data Mining, Knowledge Discovery USER: Has information need 1.     Identify the Information Need: - Think (Reflection), Talk (Discussion), Learn about it (Info. Processing) 2.     Communicate the Information Need - Say it in my own words and hope for the best - Express it in the system’s query language 3.     Give Feedback to refine and update #1 & #2 - This isn’t what I am looking for. - This is it. Give me more like it. - This could be it. I’m not sure. - This is what I asked for, but now I would like this. INTERMEDIARY: Knows how to find Information  1.     Discover User’s Information Need - Questions to guide (e.g. expand, focus, clarify), - Dialogues to discover (e.g. motivation, background) - Provide, or suggest potentially useful information 2.     Query the Database for appropriate Information - Formulate a query (Information in the form of question), Translate into system’s language 3.     Process Feedback to refine and update #1 & #2 - You wanted to find out about “…,” right? (NO  redo #1; YES  redo #2) - Reformulate the query to emphasize, de-emphasize, fuzzy-emphasize the “importance” of information contents relative to the user. - Update the query to accommodate the change in Information Need  Search Engines

12 IR Research: Machine Learning
Research Area, technique, tool for Information Organization, Data Mining, Knowledge Discovery Information Organization via Supervised Learning (Automatic Classification) Unsupervised Learning (Clustering) Class 1 Class 2 Class 1 Class 2 Classification USER: Has information need 1.     Identify the Information Need: - Think (Reflection), Talk (Discussion), Learn about it (Info. Processing) 2.     Communicate the Information Need - Say it in my own words and hope for the best - Express it in the system’s query language 3.     Give Feedback to refine and update #1 & #2 - This isn’t what I am looking for. - This is it. Give me more like it. - This could be it. I’m not sure. - This is what I asked for, but now I would like this. INTERMEDIARY: Knows how to find Information  1.     Discover User’s Information Need - Questions to guide (e.g. expand, focus, clarify), - Dialogues to discover (e.g. motivation, background) - Provide, or suggest potentially useful information 2.     Query the Database for appropriate Information - Formulate a query (Information in the form of question), Translate into system’s language 3.     Process Feedback to refine and update #1 & #2 - You wanted to find out about “…,” right? (NO  redo #1; YES  redo #2) - Reformulate the query to emphasize, de-emphasize, fuzzy-emphasize the “importance” of information contents relative to the user. - Update the query to accommodate the change in Information Need  Clustering Search Engines


Download ppt "Introduction to Information Retrieval"

Similar presentations


Ads by Google