Download presentation
Presentation is loading. Please wait.
1
Modern Information Retrieval Chapter 2 Modeling
2
Probabilistic model the appearance or absent of an index term in a document is interpreted either as evidence that the document is relevant or that it is irrelevant to a query establish a weight for each term
3
a collection of N documents R of which are relevant R t of which contain term t f t of which contain t these values can be obtained from a training set with relevance judgments
5
computing probabilities P r [relevant t]=R t f t P r [irrelevant t]=(f t -R t ) f t P r [relevant t ]=(R-R t )/(N-f t ) P r [irrelevant t ]=(N-f t -(R-R t ))/(N-f t )
6
computing weight W t for t W t = P r [relevant t] P r [irrelevant t ] P r [irrelevant t] P r [relevant t ] = R t /f t (N-f t -(R-R t ))/(N-f t ) (f t -R t )/f t (R-R t )/(N-f t ) = R t /(R-R t ) (f t -R t )/(N-f t -(R-R t ))
7
W t >1 indicates that the appearance of t supports the document is relevant W t <1 indicates that the appearance of t suggests the document is irrelevant N=20, R=13, R t =11, f t =12 W t =33 N=20, R=13, R t =4, f t =7 W t =0.59 W t =1 indicates that t is neutral
8
negative weight indicates that the document is predicted to be irrelevant zero weight indicates that the document is neutral
9
Comparison the Boolean model is the weakest model no partial matching the vector model and probabilistic model are comparative while the vector model is more popular term frequency is not considered in the probabilistic model
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.