Download presentation
Presentation is loading. Please wait.
1
Formal Multinomial and Multiple- Bernoulli Language Models Don Metzler
2
Overview Two formal estimation techniques –MAP estimates [Zargoza, Hiemstra, Tipping, SIGIR’03] –Posterior expectations Language models considered –Multinomial –Multiple-Bernoulli (2 models)
3
Bayesian Framework (MAP Estimation) Assume textual data X (document, query, etc) is generated by sampling from some distribution P(X | θ) parameterized by θ Assume some prior over θ. For each X, we want to find the maximum a posteriori (MAP) estimate: θ X is our (language) model for data X
4
Multinomial Modeling assumptions: Why Dirichlet? –Conjugate prior to multinomial –Easy to work with
5
Multinomial
6
How do we set α? α = 1 => uniform prior => ML estimate α = 2 => Laplacian smoothing Dirichlet-like smoothing:
7
left – ML estimate – α = 1 center – Laplace – α = 2 right – α = μP(w | C) μ = 10 X = A B B B P(A | C) = 0.45 P(B | C) = 0.55
8
Multiple-Bernoulli Assume vocabulary V = A B C D How do we model text X = D B B D? –In multinomial, we represent X as the sequence D B B D –In multiple-Bernoulli we represent X as the vector [0 1 0 1] denoting terms B and D occur in X –Each X represented by single binary vector
9
Multiple-Bernoulli (Model A) Modeling assumptions: –Each X is a single sample from a multiple- Bernoulli distribution parameterized by θ –Use conjugate prior (multiple-Beta)
10
Multiple-Bernoulli (Model A)
11
Problems with Model A Ignores document length –This may be desirable in some applications Ignores term frequencies How to solve this? –Model X as a collection of samples (one per word occurrence) from an underlying multiple-Bernoulli distribution –Example: V = A B C D, X = B D D B Representation: {[0 1 0 0], [0 0 0 1], [0 0 0 1], [0 1 0 0] }
12
Multiple-Bernoulli (Model B) Modeling assumptions: –Each X is a collection (multiset) of indicator vectors sampled from a multiple-Bernoulli distribution parameterized by θ –Use conjugate prior (multiple-Beta)
13
Multiple-Bernoulli (Model B)
14
How do we set α, β? α = β = 1 => uniform prior => ML estimate But we want smoothed probabilities… –One possibility:
15
Multiple-Bernoulli Model B left – ML estimate α = β = 1 center – smoothed (μ = 1) right – smoothed (μ = 10) X = A B B B P(A | C) = 0.45 P(B | C) = 0.55
16
Another approach… Another way to formally estimate language models is via: Expectation over posterior Takes more uncertainty into account than MAP estimate Because we chose to use conjugate priors the integral can be evaluated analytically
17
Multinomial / Multiple-Bernoulli Connection Multinomial Multiple-Bernoulli Dirichlet smoothing
18
Bayesian Framework (Ranking) Query likelihood –estimate model θ D for each document D –score document D by P(Q | θ D ) –measures likelihood of observing query Q given model θ D KL-divergence –estimate model for both query and document –score document D by KL(θ Q || θ D ) –measures “distance” between two models Predictive density
19
Results
20
Conclusions Both estimation and smoothing can achieved using Bayesian estimation techniques Little difference between MAP and posterior expectation estimates – mostly depends on μ Not much difference between Multinomial and Multiple-Bernoulli language models Scoring multinomial is cheaper No good reason to choose multiple-Bernoulli over multinomial in general
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.