Probabilistic Information Retrieval Part II: In Depth Alexander Dekhtyar Department of Computer Science University of Maryland.

Slides:



Advertisements
Similar presentations
Information Retrieval and Organisation Chapter 11 Probabilistic Information Retrieval Dell Zhang Birkbeck, University of London.
Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Probabilistic Information Retrieval Part I: Survey Alexander Dekhtyar department of Computer Science University of Maryland.
The Probabilistic Model. Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework; n Given a user query, there is an.
Probabilistic Information Retrieval Chris Manning, Pandu Nayak and
Probabilistic Information Retrieval
Introduction to Information Retrieval (Part 2) By Evren Ermis.
Introduction to Information Retrieval Information Retrieval and Data Mining (AT71.07) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor:
CpSc 881: Information Retrieval
Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto
Lecture 11: Probabilistic Information Retrieval
Probabilistic Ranking Principle
Information Retrieval Models: Probabilistic Models
IR Models: Overview, Boolean, and Vector
A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web by Livia Predoiu, Heiner Stuckenschmidt Institute of Computer Science,
Hinrich Schütze and Christina Lioma
ISP 433/533 Week 2 IR Models.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
Modern Information Retrieval Chapter 2 Modeling. Probabilistic model the appearance or absent of an index term in a document is interpreted either as.
Probabilistic IR Models Based on probability theory Basic idea : Given a document d and a query q, Estimate the likelihood of d being relevant for the.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Modeling Modern Information Retrieval
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
Vector Space Model CS 652 Information Extraction and Integration.
1 CS 430 / INFO 430 Information Retrieval Lecture 10 Probabilistic Information Retrieval.
Retrieval Models II Vector Space, Probabilistic.  Allan, Ballesteros, Croft, and/or Turtle Properties of Inner Product The inner product is unbounded.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Automatic Indexing (Term Selection) Automatic Text Processing by G. Salton, Chap 9, Addison-Wesley, 1989.
CS276A Text Retrieval and Mining Lecture 10. Recap of the last lecture Improving search results Especially for high recall. E.g., searching for aircraft.
IR Models: Review Vector Model and Probabilistic.
Other IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models boolean vector.
Chapter 7 Retrieval Models.
Probabilistic Models in IR Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Using majority of the slides from.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Bayesian Networks. Male brain wiring Female brain wiring.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
Term Frequency. Term frequency Two factors: – A term that appears just once in a document is probably not as significant as a term that appears a number.
IR Theory: Relevance Feedback. Relevance Feedback: Example  Initial Results Search Engine2.
Probabilistic Ranking Principle Hongning Wang
Latent Semantic Indexing and Probabilistic (Bayesian) Information Retrieval.
1 Patrick Lambrix Department of Computer and Information Science Linköpings universitet Information Retrieval.
C.Watterscsci64031 Probabilistic Retrieval Model.
Web-Mining Agents Probabilistic Information Retrieval Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Karsten Martiny (Übungen)
CS276A Text Information Retrieval, Mining, and Exploitation Lecture 6 22 Oct 2002.
Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework n Given a user query, there is an ideal answer set n Querying.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.
Introduction to Information Retrieval Probabilistic Information Retrieval Chapter 11 1.
1 Boolean and Vector Space Retrieval Models Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE) who in turn adapted them from Prof.
VECTOR SPACE INFORMATION RETRIEVAL 1Adrienn Skrop.
Probabilistic Information Retrieval
CS276 Information Retrieval and Web Search
Probabilistic Retrieval Models
THE LOGIT AND PROBIT MODELS
Information Retrieval and Data Mining (AT71. 07) Comp. Sc. and Inf
Web-Mining Agents Probabilistic Information Retrieval
Lecture 15: Text Classification & Naive Bayes
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Information Retrieval Models: Probabilistic Models
THE LOGIT AND PROBIT MODELS
Text Retrieval and Mining
Recuperação de Informação B
CS 430: Information Discovery
Information Retrieval and Data Mining (AT71. 07) Comp. Sc. and Inf
Recuperação de Informação B
Switching Lemmas and Proof Complexity
ADVANCED TOPICS IN INFORMATION RETRIEVAL AND WEB SEARCH
Presentation transcript:

Probabilistic Information Retrieval Part II: In Depth Alexander Dekhtyar Department of Computer Science University of Maryland

In this part 4 Probability Ranking Principle –simple case –case with retrieval costs 4 Binary Independence Retrieval (BIR) –Estimating the probabilities 4 Binary Independence Indexing (BII) –dual to BIR

The Basics 4 Bayesian probability formulas 4 Odds:

The Basics Document Relevance: Note:

Probability Ranking Principle 4 Simple case: no selection costs. 4 x is relevant iff p(R|x) > p(NR|x) 4 (Bayes’ Decision Rule) 4 PRP in action: Rank all documents by p(R|x).

Probability Ranking Principle 4 More complex case: retrieval costs. – C - cost of retrieval of relevant document –C’ - cost of retrieval of non-relevant document –let d, be a document 4 Probability Ranking Principle: if for all d’ not yet retrieved, then d is the next document to be retrieved

Next: Binary Independence Model

Binary Independence Model 4 Traditionally used in conjunction with PRP 4 “Binary” = Boolean: documents are represented as binary vectors of terms: – – iff term i is present in document x. 4 “Independence”: terms occur in documents independently 4 Different documents can be modeled as same vector.

Binary Independence Model 4 Queries: binary vectors of terms 4 Given query q, –for each document d need to compute p(R|q,d). –replace with computing p(R|q,x) where x is vector representing d 4 Interested only in ranking 4 Will use odds:

Binary Independence Model Using Independence Assumption: Constant for each query Needs estimation So :

Binary Independence Model Since x i is either 0 or 1: Let Assume, for all terms not occuring in the query (q i =0) Then...

All matching terms Non-matching query terms Binary Independence Model All matching terms All query terms

Binary Independence Model Constant for each query Only quantity to be estimated for rankings Retrieval Status Value:

Binary Independence Model All boils down to computing RSV. So, how do we compute c i ’s from our data ?

Binary Independence Model Estimating RSV coefficients. For each term i look at the following table: Estimates: Add 0.5 to every expression

PRP and BIR: The lessons 4 Getting reasonable approximations of probabilities is possible. 4 Simple methods work only with restrictive assumptions: –term independence –terms not in query do not affect the outcome –boolean representation of documents/queries –document relevance values are independent 4 Some of these assumptions can be removed

Next: Binary Independence Indexing

Binary Independence Indexing vs. Binary Independence Retrieval BIRBIR BIIBII 4 Many Documents, One Query 4 Bayesian Probability: 4 Varies: document representation 4 Constant: query (representation) 4 One Document, Many Queries 4 Bayesian Probability 4 Varies: query 4 Constant: document

Binary Independence Indexing 4 “Learnng” from queries –More queries: better results 4 p(q|x,R) - probability that if document x had been deemed relevant, query q had been asked 4 The rest of the framework is similar to BIR

Binary Independence Indexing: Key Assumptions 4 Term occurrence in queries is conditionally independent: depends only 4 Relevance of document representation x w.r.t. query q depends only on the terms present in the query (q i =1) 4 For each term i not used in representation x of document d (x i =0): –only positive occurrences of terms count

Binary Independence Indexing constant Equal to 1