Download presentation
Presentation is loading. Please wait.
Published byLily Willes Modified over 9 years ago
1
Ling 570 Day 6: HMM POS Taggers 1
2
Overview Open Questions HMM POS Tagging Review Viterbi algorithm Training and Smoothing HMM Implementation Details 2
3
HMM POS TAGGING 3
4
HMM Tagger 4
5
5
6
6
7
7
8
The good HMM Tagger From the Brown/Switchboard corpus: –P(VB|TO) =.34 –P(NN|TO) =.021 –P(race|VB) =.00003 –P(race|NN) =.00041 a.P(VB|TO) x P(race|VB) =.34 x.00003 =.00001 b.P(NN|TO) x P(race|NN) =.021 x.00041 =.000007 a. TO followed by VB in the context of race is more probable (‘race’ really has no effect here). 8
9
HMM Philosophy Imagine: the author, when creating this sentence, also had in mind the parts-of- speech of each of these words. After the fact, we’re now trying to recover those parts of speech. They’re the hidden part of the Markov model. 9
10
What happens when we do it the wrong way? Invert word and tag, P(t|w) instead of P(w|t): 1.P(VB|race) =.02 2.P(NN|race) =.98 2 would drown out virtually any other probability! We’d always tag race with NN! 10
11
What happens when we do it the wrong way? 11
12
N-gram POS tagging JJ NNSVBRB colorlessgreenideassleepfuriously 12
13
N-gram POS tagging JJ NNSVBRB colorlessgreenideassleepfuriously Predict current tag conditioned on prior n-1 tags 13
14
N-gram POS tagging JJ NNSVBRB colorlessgreenideassleepfuriously Predict current tag conditioned on prior n-1 tags Predict word conditioned on current tag 14
15
N-gram POS tagging JJ NNSVBRB colorlessgreenideassleepfuriously 15
16
N-gram POS tagging JJ NNSVBRB colorlessgreenideassleepfuriously 16
17
HMM bigram tagger JJ NNSVBRB colorlessgreenideassleepfuriously 17
18
HMM trigram tagger JJ NNSVBRB colorlessgreenideassleepfuriously 18
19
Training An HMM needs to be trained on the following: 1.The initial state probabilities 2.The state transition probabilities –The tag-tag matrix 3.The emission probabilities –The tag-word matrix 19
20
Implementation 20
21
Implementation Transition distribution 21
22
Implementation Emission distribution 22
23
Implementation 23
24
Implementation 24
25
REVIEW VITERBI ALGORITHM 25
26
Consider two examples Mariners hit a a home run Mariners hit made the news 26
27
Consider two examples Mariners hit a a home run N N N N V V DT N N Mariners hit made the news N N V V DT N N N N 27
28
Parameters As probabilities, they get very small NVDT 0.2500.0310.250 N 0.1250.008 V0.1250.0160.125 DT0.7070.0310.002 ahithomemadeMarinersnewsrunthe N0.0000.0016.10E-050.0016.10E-05 V0.0010.004 DT0.2500.500 28
29
Parameters As probabilities, they get very small NVDT 0.2500.0310.250 N 0.1250.008 V0.1250.0160.125 DT0.7070.0310.002 ahithomemadeMarinersnewsrunthe N0.0000.0016.10E-050.0016.10E-05 V0.0010.004 DT0.2500.500 NVDT -2.0-5.0-2.0 N -3.0-7.0 V-3.0-6.0-3.0 DT-0.5-5.0-9.0 ahithomemadeMarinersnewsrunthe N-13-10-14-10-14 V-10-8 DT-2 As log probabilities, they won’t underflow… …and we can just add them 29
30
NVDT -2-5-2 N -3-7 V-3-6-3 DT-0.5-5-9 ahithomemadeMarinersnewsrunthe N-13-10-14-10-14 V-10-8 DT-2 Marinershitahomerun N V DT 30
31
NVDT -2.0-5.0-2.0 N -3.0-7.0 V-3.0-6.0-3.0 DT-0.5-5.0-9.0 ahithomemadeMarinersnewsrunthe N-13-10-14-10-14 V-10-8 DT-2 Marinershitmadethenews N V DT 31
32
Viterbi 32
33
Pseudocode 33
34
Pseudocode 34
35
SMOOTHING 35
36
Training 36
37
Why Smoothing? Zero counts 37
38
Why Smoothing? Zero counts Handle missing tag sequences: –Smooth transition probabilities 38
39
Why Smoothing? Zero counts Handle missing tag sequences: –Smooth transition probabilities Handle unseen words: –Smooth observation probabilities 39
40
Why Smoothing? Zero counts Handle missing tag sequences: –Smooth transition probabilities Handle unseen words: –Smooth observation probabilities Handle unseen (word,tag) pairs where both are known 40
41
Smoothing Tag Sequences 41
42
Smoothing Tag Sequences 42
43
Smoothing Tag Sequences 43
44
Smoothing Tag Sequences 44
45
Smoothing Emission Probabilities 45
46
Smoothing Emission Probabilities 46
47
Smoothing Emission Probabilities Preprocessing the training corpus: –Count occurrences of all words –Replace words singletons with magic token –Gather counts on modified data, estimate parameters Preprocessing the test set –For each test set word –If seen at least twice in training set, leave it alone –Otherwise replace with –Run Viterbi on this modified input 47
48
Unknown Words Is there other information we could use for P(w|t)? –Information in words themselves? Morphology: –-able: JJ –-tion NN –-ly RB –Case: John NP, etc –Augment models Add to ‘context’ of tags Include as features in classifier models –We’ll come back to this idea! 48
49
HMM IMPLEMENTATION 49
50
HMM Implementation: Storing an HMM Approach #1: –Hash table (direct): π i = 50
51
HMM Implementation: Storing an HMM Approach #1: –Hash table (direct): π i =pi{state_str} a ij : 51
52
HMM Implementation: Storing an HMM Approach #1: –Hash table (direct): π i =pi{state_str} a ij :a{from_state_str}{to_state_str} b i (o t ): 52
53
HMM Implementation: Storing an HMM Approach #1: –Hash table (direct): π i =pi{state_str} a ij :a{from_state_str}{to_state_str} b i (o t ): b{state_str}{symbol} 53
54
HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}= 54
55
HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}=state_idx symbol2idx{symbol}= 55
56
HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}=state_idx symbol2idx{symbol}=symbol_idx idx2symbol[symbol_idx] = 56
57
HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}=state_idx symbol2idx{symbol}=symbol_idx idx2symbol[symbol_idx] = symbol idx2state[state_idx]= 57
58
HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}=state_idx symbol2idx{symbol}=symbol_idx idx2symbol[symbol_idx] = symbol idx2state[state_idx]=state_str π i : 58
59
HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}=state_idx symbol2idx{symbol}=symbol_idx idx2symbol[symbol_idx] = symbol idx2state[state_idx]=state_str π i :pi[state_idx] a ij : 59
60
HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}=state_idx symbol2idx{symbol}=symbol_idx idx2symbol[symbol_idx] = symbol idx2state[state_idx]=state_str π i :pi[state_idx] a ij :a[from_state_idx][to_state_idx] b i (o t ): 60
61
HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}=state_idx symbol2idx{symbol}=symbol_idx idx2symbol[symbol_idx] = symbol idx2state[state_idx]=state_str π i :pi[state_idx] a ij :a[from_state_idx][to_state_idx] b i (o t ):b[state_idx][symbol_idx] 61
62
HMM Matrix Representations Issue: 62
63
HMM Matrix Representations Issue: –Many matrix entries are 0 Especially b[i][o] Approach 3: Sparse matrix representation –a[i] = 63
64
HMM Matrix Representations Issue: –Many matrix entries are 0 Especially b[i][o] Approach 3: Sparse matrix representation –a[i] = “j1 p1 j2 p2…” –a[j] = 64
65
HMM Matrix Representations Issue: –Many matrix entries are 0 Especially b[i][o] Approach 3: Sparse matrix representation –a[i] = “j1 p1 j2 p2…” –a[j] = “i1 p1 i2 p2..” –b[i] = “o1 p1 o2 p2 …” –b[o] = “i1 p1 i2 p2…” 65
66
HMM Matrix Representations Issue: –Many matrix entries are 0 Especially b[i][o] Approach 3: Sparse matrix representation –a[i] = “j1 p1 j2 p2…” –a[j] = “i1 p1 i2 p2..” –b[i] = “o1 p1 o2 p2 …” –b[o] = “i1 p1 i2 p2…” Could be: –Array of hashes –Array of lists of non-empty values –The latter is often quite fast, because lists are short and fit into cache lines 66
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.