Jhu-hlt-2004 © n.j. belkin 1 Information Retrieval: A Quick Overview Nicholas J. Belkin

Slides:



Advertisements
Similar presentations
Recuperação de Informação B Cap. 10: User Interfaces and Visualization 10.1,10.2,10.3 November 17, 1999.
Advertisements

Information Retrieval and Organisation Chapter 12 Language Models for Information Retrieval Dell Zhang Birkbeck, University of London.
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
Chapter 5: Introduction to Information Retrieval
Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Modern Information Retrieval Chapter 1: Introduction
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Information Retrieval: Human-Computer Interfaces and Information Access Process.
Ranked Retrieval INST 734 Module 3 Doug Oard. Agenda  Ranked retrieval Similarity-based ranking Probability-based ranking.
Information Retrieval Visualization CPSC 533c Class Presentation Qixing Zheng March 22, 2004.
Web Search - Summer Term 2006 II. Information Retrieval (Basics Cont.)
IR Models: Overview, Boolean, and Vector
Information Retrieval Review
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
ISP 433/533 Week 2 IR Models.
Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.
© Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.
© Tefko Saracevic, Rutgers University1 Interaction in information retrieval There is MUCH more to searching than knowing computers, networks & commands,
Modern Information Retrieval
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.
INFO 624 Week 3 Retrieval System Evaluation
© Tefko Saracevic, Rutgers University 1 EVALUATION in searching IR systems Digital libraries Reference sources Web sources.
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Modeling Modern Information Retrieval
© Tefko Saracevic1 Search strategy & tactics Governed by effectiveness&feedback.
Information Retrieval: Human-Computer Interfaces and Information Access Process.
Computer comunication B Information retrieval. Information retrieval: introduction 1 This topic addresses the question on how it is possible to find relevant.
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
Indexing and Representation: The Vector Space Model Document represented by a vector of terms Document represented by a vector of terms Words (or word.
Vector Space Model CS 652 Information Extraction and Integration.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Retrieval Models II Vector Space, Probabilistic.  Allan, Ballesteros, Croft, and/or Turtle Properties of Inner Product The inner product is unbounded.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
IR Models: Review Vector Model and Probabilistic.
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
Overview of Search Engines
Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
1 Computing Relevance, Similarity: The Vector Space Model.
CPSC 404 Laks V.S. Lakshmanan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides at UC-Berkeley.
Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.
1 Information Retrieval LECTURE 1 : Introduction.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Evaluation. The major goal of IR is to search document relevant to a user query. The evaluation of the performance of IR systems relies on the notion.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
SIMS 202, Marti Hearst Final Review Prof. Marti Hearst SIMS 202.
Plan for Today’s Lecture(s)
Some(what) Grand Challenges for Information Retrieval
Text Based Information Retrieval
Multimedia Information Retrieval
Evaluation.
Cumulated Gain-Based Evaluation of IR Techniques
INF 141: Information Retrieval
Retrieval Performance Evaluation - Measures
Information Retrieval and Web Design
Presentation transcript:

jhu-hlt-2004 © n.j. belkin 1 Information Retrieval: A Quick Overview Nicholas J. Belkin

jhu-hlt-2004 © n.j. belkin 2 The IR Situation A person (the user) recognizes that her/his knowledge is inadequate for resolving some problem / achieving some goal (a problematic situation) In order to resolve the problematic situation, the user has recourse to some knowledge resource external to her/himself

jhu-hlt-2004 © n.j. belkin 3 The IR Situation (2) The user engages with the knowledge resource through some intermediary The three components, user, knowledge resource, intermediary, and their interactions with one another, together constitute the information retrieval system

jhu-hlt-2004 © n.j. belkin 4 IR Systems The goal of an IR system is that the user’s problematic situation is appropriately resolved This goal is accomplished by facilitating effective interaction of the user with appropriate information objects (elements of the knowledge resource)

jhu-hlt-2004 © n.j. belkin 5 Relevance An indicator, or measure, of the appropriateness of an information object to a user’s problematic situation Topical relevance - The information object is about the same topic as the problematic situation Situational relevance - The information object is useful in resolving the problematic situation

jhu-hlt-2004 © n.j. belkin 6 What IR Systems Try to Do Predict, on the basis of some information about the user, and information about the knowledge resource, what information objects are likely to be the most appropriate for the user to interact with, at any particular time

jhu-hlt-2004 © n.j. belkin 7 How IR Systems Try to Do This Represent the user’s information problem (the query) Represent (surrogate) and organize (classify) the contents of the knowledge resource Compare query to surrogates (predict relevance) Present results to the user for interaction/judgment

jhu-hlt-2004 © n.j. belkin 8 How IR Differs from DBMS No “right” answer Probabilistic (predictive), not determinative Unstructured, or only partially structured information (e.g. text, images)

jhu-hlt-2004 © n.j. belkin 9 Why IR is Difficult People cannot specify what they don’t know (Anomalous State of Knowledge), so representation of information problem is inherently uncertain Information objects can be about many things, so representation of aboutness is inherently incomplete

jhu-hlt-2004 © n.j. belkin 10 Why IR is Difficult (2) Relevance is a relation between the person and the information object(s), and is dependent upon user’s interpretation, so prediction of relevance (or appropriateness) is inherently uncertain

jhu-hlt-2004 © n.j. belkin 11 Evaluation of IR Systems Traditional goal of IR is to retrieve all and only the relevant IOs in response to a query All is measured by recall: the proportion of relevant IOs in the collection which are retrieved Only is measured by precision: the proportion of retrieved IOs which are relevant

jhu-hlt-2004 © n.j. belkin 12 Other Functions of IR Systems IR is concerned not only with supporting “specified searching” People engage in many kinds of interactions with IR systems, e.g. “browsing”, “evaluating”, “comparing”, “extracting” People have many different IR-related tasks, e.g. question-answering, finding one or a few “good” IOs, constructing a “useful” portal

jhu-hlt-2004 © n.j. belkin 13 Other Evaluation Measures To evaluate IR support for different tasks, different measures are required Relevance may not be the only criterion according to which measures are constructed Support for different kinds of behaviors may require different kinds of measures

jhu-hlt-2004 © n.j. belkin 14 Evaluation of What? Effectiveness –recall, precision, accuracy of answer, “satisfaction” Usability –learnability error rates Performance –time, cognitive effort

jhu-hlt-2004 © n.j. belkin 15 Evaluation Problems Realistic IR is interactive; traditional IR methods and measures are based on non-interactive situations Evaluating interactive IR requires human subjects; the normal mode of evaluation is comparison between two systems (no gold standard or benchmarks); cannot compare a subject’s searching on the same task in two systems Major tradeoffs between number of subjects and number of tasks; realism and control

jhu-hlt-2004 © n.j. belkin 16 A Traditional View of IR (you’ll see this again)

jhu-hlt-2004 © n.j. belkin 17 IR as Support for Interaction with Information USER COMPARISON REPRESENTATION PRESENTATION VISUALIZATION goals, tasks, knowledge, problem, uses INTERACTION judgment, use, search, interpretation, modification INFORMATION type, medium, mode, level NAVIGATION USER COMPARISON REPRESENTATION PRESENTATION VISUALIZATION goals, tasks, knowledge, problem, uses INTERACTION judgment, use, search, interpretation, modification INFORMATION type, medium, mode, level NAVIGATION USER COMPARISON REPRESENTATION PRESENTATION VISUALIZATION goals, tasks, knowledge, problem, uses INTERACTION judgment, use, search, interpretation, modification INFORMATION type, medium, mode, level NAVIGATION Time Overall goals, environment, situation

jhu-hlt-2004 © n.j. belkin 18 The User as the Central Actor in the IR System The goal of IR is to help the user resolve the problematic situation This is done by supporting interaction with appropriate IOs The user in the system is the only actor that can judge appropriateness The user’s interactions determine the type of support provided

jhu-hlt-2004 © n.j. belkin 19 Interaction as the Central Process of IR Accepting the user as the central actor implies accepting the user’s interactions with information as the central process All other IR processes can be interpreted as being in support of the user’s current (or future) interactions with information This suggests specific IR system design choices and problems

jhu-hlt-2004 © n.j. belkin 20 How Interaction Has Been Accounted For Relevance feedback –Automatically moving the initial query toward the “ideal” query –Term reweighting and query expansion Support for query modification –Display of “good” and “bad” terms –Thesauri –Inter-document relations

jhu-hlt-2004 © n.j. belkin 21 Personalization in IR Taking account of user goals, situation, context for –tailoring the interaction –tailoring the retrieval results TREC HARD track is a first attempt at evaluating use of context

jhu-hlt-2004 © n.j. belkin 22 IR Models Exact match models –String matching –Boolean Best (partial match) models –Vector space –Probabilistic –Logic (Plausible inference) –Language modeling

jhu-hlt-2004 © n.j. belkin 23 Exact Match IR Goal of EM IR is to retrieve the set of information objects which match the user’s query specification Assumptions of EM IR –IOs are completely representable –Information problems are specifiable –Relevance is determinable and binary

jhu-hlt-2004 © n.j. belkin 24 Exact Match IR Retrieves IOs that contain specified string or Boolean combination of strings Supported by inverted file organization (or signatures) Enhanced by wild-cards, proximity searching

jhu-hlt-2004 © n.j. belkin 25 Exact Match IR Advantages –Efficient –Boolean queries capture some aspects of information problem structure Disadvantages –Not effective –Difficult to write effective queries –No inherent document ranking

jhu-hlt-2004 © n.j. belkin 26 Best Match IR All types based on the assumption that IR is an uncertain process Models differ by what they ascribe the uncertainty to, and by how they respond to that uncertainty

jhu-hlt-2004 © n.j. belkin 27 Vector Space IR Words represent concepts or topics These can be construed as dimensions of a “concept space” IOs are about the topics represented by their words IOs can be represented as vectors in the concept space Queries can be specified and represented as are IOs

jhu-hlt-2004 © n.j. belkin 28 Vector Space IR Goal of IR is to present the user with IOs most similar to query, in order of similarity Similarity is defined as closeness in the concept (vector) space Uncertainty in IR is in the degree of match between IO and query, arises from uncertainty in representation of each

jhu-hlt-2004 © n.j. belkin 29 Vector Space Model Advantages –Straightforward ranking –Simple query formulation (bag of words) –Intuitively appealing –Effective Disadvantages –Unstructured queries –Effective calculations and parameters must be empirically determined

jhu-hlt-2004 © n.j. belkin 30 Probabilistic Model Uncertainty in IR arises from uncertainty in the relevance relationship, in the representation of the information problem, and in the representation of IOs Result of these uncertainties can be represented as probabilities of relevance of an IO to an information problem, given the available evidence

jhu-hlt-2004 © n.j. belkin 31 Probabilistic IR Goal of IR is to present to the user the IOs in order of their probability of relevance to the information problem (the Probability Ranking Principle)

jhu-hlt-2004 © n.j. belkin 32 Probabilistic IR Advantages –Straightforward relevance ranking –Simple query formulation –Sound mathematical/theoretical model –Effective Disadvantages –Unrealistic assumptions (term independence) –Probabilities difficult to estimate

jhu-hlt-2004 © n.j. belkin 33 Plausible Inference IR Uncertainty in IR arises from uncertainty in relevance relationship, uncertainty in representation of information problem, uncertainty in representation of IOs This implies that IR can be no more than a process of plausible inference of relevance of an IO to an information problem

jhu-hlt-2004 © n.j. belkin 34 Plausible Inference IR In logical implicature version, IO and information problem should be represented in a logical formalism which allows plausible inference In multiple sources of evidence version, as much evidence as possible about relationship between IO and information problem should be used to estimate probability of relevance (induction)

jhu-hlt-2004 © n.j. belkin 35 Plausible Inference IR In logic version, goal of IR is to present to the user those IOs from which the query is most plausibly inferred, in order of plausibility In sources of evidence version, goal of IR is to present to the user those IOs which are believed most likely to be relevant, in the order of strength of belief

jhu-hlt-2004 © n.j. belkin 36 Plausible Inference IR Advantages –Relevance ranking –Strong formalisms –Structured queries possible –Effective (multiple sources of evidence) Disadvantages –Complex, difficult to implement –Weight for evidence empirically determined

jhu-hlt-2004 © n.j. belkin 37 Language Modeling for IR Assumes that IOs and expressions of information problems are of the same type Uncertainty in IR is due to uncertainty in representations of IOs and information problems Goal is to present to the user IOs in order of the probability of the IO being generated by the language model of the information problem (or vice versa), or by the similarity of the language model of the IO to that of the information problem

jhu-hlt-2004 © n.j. belkin 38 Language Modeling for IR Most common type is statistical unigram model, based on observed word frequencies, smoothed in various ways The Kullback-Leibler distance is a measure of the distance between two probability distributions KL({pi},{qi}) =  pi(log 2 (pi/qi)) i

jhu-hlt-2004 © n.j. belkin 39 Advantages of Language Modeling Attempts to do away with the concept of relevance Computationally tractable, intuitively appealing

jhu-hlt-2004 © n.j. belkin 40 Problems with Language Modeling Assumption of equivalence between IO and information problem representation is unrealistic Very simple models of language Choosing a method of smoothing is difficult, and in general, ad hoc

jhu-hlt-2004 © n.j. belkin 41 Problems in Best Match IR For most best match IR models to work well, queries should be long –bag of words approach depends upon many words in order to disambiguate meaning Reasons for retrieval and ranking are not easily understood

jhu-hlt-2004 © n.j. belkin 42 Overcoming Problems in Best Match IR Enhance short queries through query expansion based on pseudo-relevance feedback or other methods Default exact match searching for short queries Encourage longer queries/problem statements through interface design

jhu-hlt-2004 © n.j. belkin 43 Some Takeaway Messages IR supports a human activity IR is inherently interactive, and the IR system inevitably involves the user as the central actor Representation and comparison techniques for text-based IR seem to have plateaued Improved IR will come from improved support for all types of interactions with information, and especially with personalization Big research issue: how to represent and use situation and context