Probabilistic Retrieval LBSC 708A/CMSC 838L Session 4, October 2, 2001 Philip Resnik.

Slides:



Advertisements
Similar presentations
Information Retrieval and Organisation Chapter 12 Language Models for Information Retrieval Dell Zhang Birkbeck, University of London.
Advertisements

Information Retrieval and Organisation Chapter 11 Probabilistic Information Retrieval Dell Zhang Birkbeck, University of London.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
1 Essential Probability & Statistics (Lecture for CS598CXZ Advanced Topics in Information Retrieval ) ChengXiang Zhai Department of Computer Science University.
The Probabilistic Model. Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework; n Given a user query, there is an.
Probabilistic Information Retrieval Chris Manning, Pandu Nayak and
Ranked Retrieval INST 734 Module 3 Doug Oard. Agenda  Ranked retrieval Similarity-based ranking Probability-based ranking.
CpSc 881: Information Retrieval
Probabilistic Ranking Principle
Information Retrieval Models: Probabilistic Models
IR Models: Overview, Boolean, and Vector
Hinrich Schütze and Christina Lioma
ISP 433/533 Week 2 IR Models.
Incorporating Language Modeling into the Inference Network Retrieval Framework Don Metzler.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 12: Language Models for IR.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Probabilistic IR Models Based on probability theory Basic idea : Given a document d and a query q, Estimate the likelihood of d being relevant for the.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Modeling Modern Information Retrieval
Ranked Retrieval LBSC 796/INFM 718R Session 3 September 24, 2007.
Vector Space Model CS 652 Information Extraction and Integration.
1 CS 430 / INFO 430 Information Retrieval Lecture 10 Probabilistic Information Retrieval.
Retrieval Models II Vector Space, Probabilistic.  Allan, Ballesteros, Croft, and/or Turtle Properties of Inner Product The inner product is unbounded.
IR Models: Review Vector Model and Probabilistic.
Other IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models boolean vector.
Probabilistic Models in IR Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Using majority of the slides from.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Lecture 2: Retrieval Models Maya Ramanath. QQ1 Vector space model: 0 for non-presence of a term, 1 for presence: Query: q1 AND q2 AND q3 Compare the set.
Language Models for IR Debapriyo Majumdar Information Retrieval Indian Statistical Institute Kolkata Spring 2015 Credit for several slides to Jimmy Lin.
CSCI 5417 Information Retrieval Systems Jim Martin Lecture 9 9/20/2011.
LIS618 lecture 1 Thomas Krichel economic rational for traditional model In olden days the cost of telecommunication was high. database use.
IR Models J. H. Wang Mar. 11, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
Probabilistic Ranking Principle Hongning Wang
Matching LBSC 708A/CMSC 828O March 8, 1999 Douglas W. Oard and Dagobert Soergel.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Ranked Retrieval LBSC 796/INFM 718R Session 3 February 16, 2011.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Language Models LBSC 796/CMSC 828o Session 4, February 16, 2004 Douglas W. Oard.
Language Models. Language models Based on the notion of probabilities and processes for generating text Documents are ranked based on the probability.
C.Watterscsci64031 Classical IR Models. C.Watterscsci64032 Goal Hit set of relevant documents Ranked set Best match Answer.
CpSc 881: Information Retrieval. 2 Using language models (LMs) for IR ❶ LM = language model ❷ We view the document as a generative model that generates.
Relevance Feedback Hongning Wang
A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval Min Zhang, Xinyao Ye Tsinghua University SIGIR
Introduction n IR systems usually adopt index terms to process queries n Index term: u a keyword or group of selected words u any word (more general) n.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.
Introduction to Information Retrieval Probabilistic Information Retrieval Chapter 11 1.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 14: Language Models for IR.
1 Probabilistic Models for Ranking Some of these slides are based on Stanford IR Course slides at
Recuperação de Informação B Modern Information Retrieval Cap. 2: Modeling Section 2.8 : Alternative Probabilistic Models September 20, 1999.
Plan for Today’s Lecture(s)
CS276A Text Information Retrieval, Mining, and Exploitation
Lecture 13: Language Models for IR
Probabilistic Retrieval Models
CSCI 5417 Information Retrieval Systems Jim Martin
Information Retrieval Models: Probabilistic Models
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Language Model Approach to IR
CS 430: Information Discovery
Recuperação de Informação B
Recuperação de Informação B
Recuperação de Informação B
Information Retrieval and Web Design
Presentation transcript:

Probabilistic Retrieval LBSC 708A/CMSC 838L Session 4, October 2, 2001 Philip Resnik

Questions Adjustments to syllabus Probability basics Probabilistic retrieval Comparison with vector space model Agenda

Muddiest Points The math! Two views of an idea: formulae and matrices Pivoted document length normalization Latent Semantic Indexing

Why Similarity-Based Ranking? Similarity is a useful predictor of relevance (13) –Users can then recognize documents with utility Ranked lists avoid all-or-nothing retrieval (3) –More nuanced than presence or absence of words Easy to implement (2)

Probability Basics What is probability? –Statistical: relative frequency as n   –Subjective: degree of belief Notion of a probability “space” –Elementary outcomes,  –Events, F –Probability measure, p Every probabilistic model has underlying it an algebraic foundation

Notion of “probability mass” Imagine a finite amount of “stuff” Associate the number 1 with the total amount Distribute

Independence A and B are independent iff P(A and B) = P(A)  P(B) Ex: –P(“being brown eyed”) = 85/100 –P(“being a doctor”) = 1/1000 –P(“being a brown eyed doctor”) = 85/100,000

More on independence Suppose –P(“having a B.S. degree”) = 2/10 –P(“being a doctor”) = 1/1000 Would you expect –P(“having a B.S. degree and being a doctor”) = 2/10,000 ??? Extreme example: –P(“being a doctor”) = 1/1000 –P(“having studied anatomy”) = 12/1000

Conditional Probability P(A | B)  P(A and B) / P(B) A B A and B P(A) = prob of A relative to the whole space P(A|B) = prob of A considering only the cases where B is known to be true

More on Conditional Probability Suppose –P(“having studied anatomy”) = 12/1000 –P(“being a doctor and having studied anatomy”) = 1/1000 Consider –P(“being a doctor” | “having studied anatomy”) = 1/12 But if you assume all doctors have studied anatomy –P(“having studied anatomy” | “being a doctor”) = 1 Useful restatement of definition: P(A and B) = P(A|B) x P(B)

Bayes’s Theorem: Notation Consider a set of hypotheses: H1, H2, H3 Consider some observable evidence, O P(O|H1) = probability of O being observed if we knew H1 were true P(O|H2) = probability of O being observed if we knew H2 were true P(O|H3) = probability of O being observed if we knew H3 were true

Bayes’s Theorem: example Let –O = “Joe earns more than $70,000/year” –H1 = “Joe is a doctor” –H2 = “Joe is a college professor” –H3 = “Joe works in food services” Suppose we do a survey and we find out –P(O|H1) = 0.6 –P(O|H2) = 0.07 –P(O|H3) = What should be our guess about Joe’s profession?

Bayes’s Theorem (finally!) What’s P(H1|O)? P(H2|O)? P(H3|O)? Theorem: P(H | O) = P(O | H) x P(H) P(O) Posterior probability Prior probability Notice that the prior is very important!

Example, cont’d Suppose we also have good data about priors: –P(O|H1) = 0.6P(H1) = doctor –P(O|H2) = 0.07P(H2) = prof –P(O|H3) = 0.001P(H3) = 0.2 food We can calculate –P(H1|O) = / P(“earning > $70K/year”) –P(H2|O) = / P(“earning > $70K/year”) –P(H3|O) = / P(“earning > $70K/year”)

Summary of Probability Concepts Interpretations of probability Independence Conditional Probability Bayes’s theorem

Questions Probability basics Probabilistic retrieval –Language modeling –Inference networks Comparison with vector space model Agenda

Probability Ranking Principle A useful ranking criterion –Maximize probability that relevant docs precede others Binary relevance & independence assumptions –Each document is either relevant or it is not –Relevance of one doc reveals nothing about another Theorem (provable from assumptions): –Documents should be ranked in order of decreasing probability of relevance to the query, P(d relevant-to q)

Probabilistic Retrieval Strategy Estimate how terms contribute to relevance –How do TF, DF, and length influence your judgments about document relevance? (Okapi) Combine to find document relevance probability Order documents by decreasing probability

Binary Independence Model Basis for computing probability of relevance –Simple computation based on term weights Depends on two new assumptions –Presence of one term tells nothing about another “Term independence” –No prior knowledge about any document “Uniform prior”: P(d) is the same for all d

Where do the probabilities fit? Comparison Function Representation Function Query Formulation Human Judgment Representation Function Retrieval Status Value Utility Query Information NeedDocument Query RepresentationDocument Representation Query Processing Document Processing sim(d,q) P(d is Rel | q)

Language Modeling Traditional generative model: generates strings Example: Iwish I wish I wish I wish I wish I wish I wish … *wish I wish

Stochastic Language Models Models probability of generating any string 0.2the 0.1a 0.01man 0.01woman 0.03said 0.02likes … themanlikesthewoman multiply Model M P(s | M)

Language Models, cont’d Models probability of generating any string 0.2the 0.1a 0.01man 0.01woman 0.03said 0.02likes … Model M1 0.2the 0.1yon 0.001class 0.01maiden 0.03sayst 0.02pleaseth … Model M2 maidenclasspleasethyonthe P(s|M2) > P(s|M1)

Using Language Models in IR Treat each document as the basis for a model Rank document d based on P(d | q) P(d | q) = P(q | d) x P(d) / P(q) –P(q) is the same for all documents, so ignore –P(d) [the prior] is often treated as the same for all d But we could use criteria like authority, length, genre –P(q | d) is the probability of q given d’s model Very general formal approach based on HMMs

Inference Networks A flexible way of combining term weights –Boolean model –Binary independence model –Probabilistic models with weaker assumptions Key concept: rank based on P(d | q) –P(d | q) = P(q | d) x P(d) / P(q) Efficient large-scale implementation –InQuery text retrieval system from U Mass

A Boolean Inference Net bat d1d2d3d4 catfathatmatpatratvat ANDOR sat AND I Information need

A Binary Independence Network bat d1d2d3d4 catfathatmatpatratvatsat query

Probability Computation Turn on exactly one document at a time –Boolean: Every connected term turns on –Binary Ind: Connected terms gain their weight Compute the query value –Boolean: AND and OR nodes use truth tables –Binary Ind: Fraction of the possible weight

A Critique Most of the assumptions are not satisfied! –Searchers want utility, not relevance –Relevance is not binary –Terms are clearly not independent –Documents are often not independent The best known term weights are quite ad hoc –Unless some relevant documents are known

But It Works! Ranked retrieval paradigm is powerful –Well suited to human search strategies Probability theory has explanatory power –At least we know where the weak spots are –Probabilities are good for combining evidence Inference networks are extremely flexible –Easily accommodates newly developed models Good implementations exist –Effective, efficient, and large-scale

Comparison With Vector Space Similar in some ways –Term weights can be based on frequency –Terms often used as if they were independent Different in others –Based on probability rather than similarity –Intuitions are probabilistic rather than geometric

Two Minute Paper Which assumption underlying the probabilistic retrieval model causes you the most concern, and why? What was the muddiest point in today’s lecture? Have you started Homework 2?