Part 5 Language Model CSE717, SPRING 2008 CUBS, Univ at Buffalo.

Part 5 Language Model CSE717, SPRING 2008 CUBS, Univ at Buffalo

Examples of Good & Bad Language Models Excerption from Herman, comic strips by Jim Unger 1 2 34

What’s a Language Model A Language model is a probability distribution over word sequences P(“And nothing but the truth”)  0.001 P(“And nuts sing on the roof”)  0

What’s a language model for? Speech recognition Handwriting recognition Spelling correction Optical character recognition Machine translation (and anyone doing statistical modeling)

The Equation The observation can be image features (handwriting recognition), acoustics (speech recognition), word sequence in another language (MT), etc.

How Language Models work Hard to compute P(“And nothing but the truth”) Decompose probability P(“and nothing but the truth) = P(“and”)  P(“nothing|and”)  P(“but|and nothing”)  P(“the|and nothing but”)  P(“truth|and nothing but the”)

The Trigram Approximation Assume each word depends only on the previous two words P(“the|and nothing but”)  P(“the|nothing but”) P(“truth|and nothing but the”)  P(“truth|but the”)

How to find probabilities? Count from real text Pr(“the | nothing but”)  c(“nothing but the”) / c(“nothing but”)

Evaluation How can you tell a good language model from a bad one? Run a speech recognizer (or your application of choice), calculate word error rate Slow Specific to your recognizer

Perplexity An example Data: “the whole truth and nothing but the truth” Lexicon: L={the, whole, truth, and, nothing, but} Model 1: uni-gram, Pr(L 1 )=…=Pr(L 6 )=1/6 Model 2: unigram, Pr(“the”)=Pr(“truth”)=1/4, Pr(“whole”)=Pr(“and”)=Pr(“nothing”)=Pr(“but”)=1/8

Perplexity: Is lower better? Remarkable fact: the “true” model for data has the lowest possible perplexity Lower the perplexity, the closer we are to true model. Perplexity correlates well with the error rate of recognition task Correlates better when both models are trained on same data Doesn’t correlate well when training data changes

Smoothing Terrible on test data: If no occurrences of C(xyz), probability is 0 P(sing|nuts) =0 leads to infinite perplexity!

Smoothing: Add One Add one smoothing: Add delta smoothing: Simple add-one smoothing does not perform well – the probability of rarely seen events is over-estimated

Smoothing: Simple Interpolation Interpolate Trigram, Bigram, Unigram for best combination Almost good enough

Smoothing: Redistribution of Probability Mass (Backing Off) [Katz87] Discounting Discounted probability mass Redistribution (n-1)-gram

Factor can be determined by the relative frequency of singletons, i.e., events observed exactly once in the data [Ney95] Linear Discount

Generalization : function of y, determined by cross-validation Requires more data Computation is expensive More General Formulation Drawback of linear discount The counts of frequently observed events are modified the most ; against the “law of large numbers”

The discount is an absolute value Works pretty well, easier than linear discounting Absolute Discounting

References [1] Katz S, Estimation of probabilities from sparse data for the language model component of a speech recognizer, IEEE Trans on Acoustics, Speech, and Signal Processing 35(3):400-401, 1987 [2] Ney H, Essen U, Kneser R, On the estimation of “small” probabilities by leaving-one-out, ITTT Trans. on PAMI 17(12): 1202-1212, 1995 [3] Joshua Goodman, A tutorial of language model: the State of The Art in Language Modeling, research.microsoft.com/~joshuago/lm-tutorial- public.ppt

Part 5 Language Model CSE717, SPRING 2008 CUBS, Univ at Buffalo.

Similar presentations

Presentation on theme: "Part 5 Language Model CSE717, SPRING 2008 CUBS, Univ at Buffalo."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Part 5 Language Model CSE717, SPRING 2008 CUBS, Univ at Buffalo.

Similar presentations

Presentation on theme: "Part 5 Language Model CSE717, SPRING 2008 CUBS, Univ at Buffalo."— Presentation transcript:

Similar presentations

About project

Feedback