Formal Multinomial and Multiple- Bernoulli Language Models Don Metzler.

Formal Multinomial and Multiple- Bernoulli Language Models Don Metzler

Overview Two formal estimation techniques –MAP estimates [Zargoza, Hiemstra, Tipping, SIGIR’03] –Posterior expectations Language models considered –Multinomial –Multiple-Bernoulli (2 models)

Bayesian Framework (MAP Estimation) Assume textual data X (document, query, etc) is generated by sampling from some distribution P(X | θ) parameterized by θ Assume some prior over θ. For each X, we want to find the maximum a posteriori (MAP) estimate: θ X is our (language) model for data X

Multinomial Modeling assumptions: Why Dirichlet? –Conjugate prior to multinomial –Easy to work with

Multinomial

How do we set α? α = 1 => uniform prior => ML estimate α = 2 => Laplacian smoothing Dirichlet-like smoothing:

left – ML estimate – α = 1 center – Laplace – α = 2 right – α = μP(w | C) μ = 10 X = A B B B P(A | C) = 0.45 P(B | C) = 0.55

Multiple-Bernoulli Assume vocabulary V = A B C D How do we model text X = D B B D? –In multinomial, we represent X as the sequence D B B D –In multiple-Bernoulli we represent X as the vector [0 1 0 1] denoting terms B and D occur in X –Each X represented by single binary vector

Multiple-Bernoulli (Model A) Modeling assumptions: –Each X is a single sample from a multiple- Bernoulli distribution parameterized by θ –Use conjugate prior (multiple-Beta)

Multiple-Bernoulli (Model A)

Problems with Model A Ignores document length –This may be desirable in some applications Ignores term frequencies How to solve this? –Model X as a collection of samples (one per word occurrence) from an underlying multiple-Bernoulli distribution –Example: V = A B C D, X = B D D B Representation: {[0 1 0 0], [0 0 0 1], [0 0 0 1], [0 1 0 0] }

Multiple-Bernoulli (Model B) Modeling assumptions: –Each X is a collection (multiset) of indicator vectors sampled from a multiple-Bernoulli distribution parameterized by θ –Use conjugate prior (multiple-Beta)

Multiple-Bernoulli (Model B)

How do we set α, β? α = β = 1 => uniform prior => ML estimate But we want smoothed probabilities… –One possibility:

Multiple-Bernoulli Model B left – ML estimate α = β = 1 center – smoothed (μ = 1) right – smoothed (μ = 10) X = A B B B P(A | C) = 0.45 P(B | C) = 0.55

Another approach… Another way to formally estimate language models is via: Expectation over posterior Takes more uncertainty into account than MAP estimate Because we chose to use conjugate priors the integral can be evaluated analytically

Multinomial / Multiple-Bernoulli Connection Multinomial Multiple-Bernoulli Dirichlet smoothing

Bayesian Framework (Ranking) Query likelihood –estimate model θ D for each document D –score document D by P(Q | θ D ) –measures likelihood of observing query Q given model θ D KL-divergence –estimate model for both query and document –score document D by KL(θ Q || θ D ) –measures “distance” between two models Predictive density

Results

Conclusions Both estimation and smoothing can achieved using Bayesian estimation techniques Little difference between MAP and posterior expectation estimates – mostly depends on μ Not much difference between Multinomial and Multiple-Bernoulli language models Scoring multinomial is cheaper No good reason to choose multiple-Bernoulli over multinomial in general

Formal Multinomial and Multiple- Bernoulli Language Models Don Metzler.

Similar presentations

Presentation on theme: "Formal Multinomial and Multiple- Bernoulli Language Models Don Metzler."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Formal Multinomial and Multiple- Bernoulli Language Models Don Metzler.

Similar presentations

Presentation on theme: "Formal Multinomial and Multiple- Bernoulli Language Models Don Metzler."— Presentation transcript:

Similar presentations

About project

Feedback