Download presentation
Presentation is loading. Please wait.
Published byΔαμιανός Αναγνώστου Modified over 5 years ago
1
Lecture 7 HMMs – the 3 Problems Forward Algorithm
CSCE Natural Language Processing Lecture 7 HMMs – the 3 Problems Forward Algorithm Topics Overview Readings: Chapter 6 February 6, 2013
2
Overview Last Time Today Tagging Markov Chains Hidden Markov Models
NLTK book – chapter 5 tagging Today Viterbi dynamic programming calculation Noam Chomsky on You Tube Revisited smoothing Dealing with zeroes Laplace Good-Turing
3
Katz Backoff
4
Back to Tagging Brown Tagset -
In 1967, Kucera and Francis published their classic work Computational Analysis of Present-Day American English – tags added later ~1979 500 texts each roughly 2000 words Zipf’s Law – “the frequency of the n-th most frequent word is roughly proportional to 1/n” Newer larger corpora ~ 100 million words Corpus of Contemporary American English, the British National Corpus or the International Corpus of English
5
Figure 5.4 pronoun in Celex Counts from COBUILD 16-million word corpus
6
Figure 5.6 Penn Treebank Tagset
7
Figure 5.7
8
Figure 5.7 continued
9
Figure 5.8
10
Figure 5.10
11
5.5.4 Extending HMM to Trigrams
Find best tag sequence Bayes rule Markov assumption Extended for Trigrams
12
Chapter 6 - HMMs formalism revisited
13
Markov – Output Independence
Markov Assumption Output Independence: (Eq 6.7)
14
Figure 6.2 initial probabilities
15
Figure 6.3 Example Markov chain Probability of a sequence
16
Figure 6.4 Probability zero links (Bakis model for temporal problems)
17
HMMs – The Three Problems
18
Likelihood Computation – The Forward Algorithm
Computing Likelihood: Given an HMM λ = (A, B) and an observation sequence O = o1, o2, … ot, determine the likelihood P(O | λ)
19
Figure 6.5 B – observational Probabilities for 3 1 3 ice creams
20
Figure 6.6 transitions for 3 1 3 ice creams
21
Likelihood computation
22
Likelihood Probability – P(Q | λ)
23
Fig 6.7 forward computation Example
24
Notations for the Forward Algorithm
αt-1 (i) = previous forward probability from step t-1 for state I aij = the transition probability from state qi to qj bj(ot) = the observational likelihood = P(ot | qj) Note output independence means the Observational likelihood bj(ot) = P(ot | qj ) does not depend on the previous states or previous observations
25
Figure 6.8 Forward computation α1(j)
26
Figure 6.9 Forward Algorithm
27
Figure 6.10 Viterbi for Problem 2 Decoding – finding tag sequence that gives max
28
Figure 6.11 Viterbi again
29
Figure 6.12 Viterbi Example
30
Figure 6.13 Upcoming Attractions Next time learning the Model (A,B)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.