Download presentation
Presentation is loading. Please wait.
1
Probabilistic Information Retrieval Part II: In Depth Alexander Dekhtyar Department of Computer Science University of Maryland
2
In this part 4 Probability Ranking Principle –simple case –case with retrieval costs 4 Binary Independence Retrieval (BIR) –Estimating the probabilities 4 Binary Independence Indexing (BII) –dual to BIR
3
The Basics 4 Bayesian probability formulas 4 Odds:
4
The Basics Document Relevance: Note:
5
Probability Ranking Principle 4 Simple case: no selection costs. 4 x is relevant iff p(R|x) > p(NR|x) 4 (Bayes’ Decision Rule) 4 PRP in action: Rank all documents by p(R|x).
6
Probability Ranking Principle 4 More complex case: retrieval costs. – C - cost of retrieval of relevant document –C’ - cost of retrieval of non-relevant document –let d, be a document 4 Probability Ranking Principle: if for all d’ not yet retrieved, then d is the next document to be retrieved
7
Next: Binary Independence Model
8
Binary Independence Model 4 Traditionally used in conjunction with PRP 4 “Binary” = Boolean: documents are represented as binary vectors of terms: – – iff term i is present in document x. 4 “Independence”: terms occur in documents independently 4 Different documents can be modeled as same vector.
9
Binary Independence Model 4 Queries: binary vectors of terms 4 Given query q, –for each document d need to compute p(R|q,d). –replace with computing p(R|q,x) where x is vector representing d 4 Interested only in ranking 4 Will use odds:
10
Binary Independence Model Using Independence Assumption: Constant for each query Needs estimation So :
11
Binary Independence Model Since x i is either 0 or 1: Let Assume, for all terms not occuring in the query (q i =0) Then...
12
All matching terms Non-matching query terms Binary Independence Model All matching terms All query terms
13
Binary Independence Model Constant for each query Only quantity to be estimated for rankings Retrieval Status Value:
14
Binary Independence Model All boils down to computing RSV. So, how do we compute c i ’s from our data ?
15
Binary Independence Model Estimating RSV coefficients. For each term i look at the following table: Estimates: Add 0.5 to every expression
16
PRP and BIR: The lessons 4 Getting reasonable approximations of probabilities is possible. 4 Simple methods work only with restrictive assumptions: –term independence –terms not in query do not affect the outcome –boolean representation of documents/queries –document relevance values are independent 4 Some of these assumptions can be removed
17
Next: Binary Independence Indexing
18
Binary Independence Indexing vs. Binary Independence Retrieval BIRBIR BIIBII 4 Many Documents, One Query 4 Bayesian Probability: 4 Varies: document representation 4 Constant: query (representation) 4 One Document, Many Queries 4 Bayesian Probability 4 Varies: query 4 Constant: document
19
Binary Independence Indexing 4 “Learnng” from queries –More queries: better results 4 p(q|x,R) - probability that if document x had been deemed relevant, query q had been asked 4 The rest of the framework is similar to BIR
20
Binary Independence Indexing: Key Assumptions 4 Term occurrence in queries is conditionally independent: depends only 4 Relevance of document representation x w.r.t. query q depends only on the terms present in the query (q i =1) 4 For each term i not used in representation x of document d (x i =0): –only positive occurrences of terms count
21
Binary Independence Indexing constant Equal to 1
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.