Ling 570 Day 6: HMM POS Taggers 1. Overview Open Questions HMM POS Tagging Review Viterbi algorithm Training and Smoothing HMM Implementation Details.

Ling 570 Day 6: HMM POS Taggers 1

Overview Open Questions HMM POS Tagging Review Viterbi algorithm Training and Smoothing HMM Implementation Details 2

HMM POS TAGGING 3

HMM Tagger 4

HMM Philosophy Imagine: the author, when creating this sentence, also had in mind the parts-of- speech of each of these words. After the fact, we’re now trying to recover those parts of speech. They’re the hidden part of the Markov model. 9

What happens when we do it the wrong way? Invert word and tag, P(t|w) instead of P(w|t): 1.P(VB|race) =.02 2.P(NN|race) =.98 2 would drown out virtually any other probability! We’d always tag race with NN! 10

What happens when we do it the wrong way? 11

N-gram POS tagging JJ NNSVBRB colorlessgreenideassleepfuriously 12

N-gram POS tagging JJ NNSVBRB colorlessgreenideassleepfuriously Predict current tag conditioned on prior n-1 tags 13

N-gram POS tagging JJ NNSVBRB colorlessgreenideassleepfuriously Predict current tag conditioned on prior n-1 tags Predict word conditioned on current tag 14

HMM bigram tagger JJ NNSVBRB colorlessgreenideassleepfuriously 17

HMM trigram tagger JJ NNSVBRB colorlessgreenideassleepfuriously 18

Training An HMM needs to be trained on the following: 1.The initial state probabilities 2.The state transition probabilities –The tag-tag matrix 3.The emission probabilities –The tag-word matrix 19

Implementation 20

Implementation Transition distribution 21

Implementation Emission distribution 22

Implementation 23

Implementation 24

REVIEW VITERBI ALGORITHM 25

Consider two examples Mariners hit a a home run Mariners hit made the news 26

Consider two examples Mariners hit a a home run N N N N V V DT N N Mariners hit made the news N N V V DT N N N N 27

Parameters As probabilities, they get very small NVDT 0.2500.0310.250 N 0.1250.008 V0.1250.0160.125 DT0.7070.0310.002 ahithomemadeMarinersnewsrunthe N0.0000.0016.10E-050.0016.10E-05 V0.0010.004 DT0.2500.500 28

Parameters As probabilities, they get very small NVDT 0.2500.0310.250 N 0.1250.008 V0.1250.0160.125 DT0.7070.0310.002 ahithomemadeMarinersnewsrunthe N0.0000.0016.10E-050.0016.10E-05 V0.0010.004 DT0.2500.500 NVDT -2.0-5.0-2.0 N -3.0-7.0 V-3.0-6.0-3.0 DT-0.5-5.0-9.0 ahithomemadeMarinersnewsrunthe N-13-10-14-10-14 V-10-8 DT-2 As log probabilities, they won’t underflow… …and we can just add them 29

NVDT -2-5-2 N -3-7 V-3-6-3 DT-0.5-5-9 ahithomemadeMarinersnewsrunthe N-13-10-14-10-14 V-10-8 DT-2 Marinershitahomerun N V DT 30

NVDT -2.0-5.0-2.0 N -3.0-7.0 V-3.0-6.0-3.0 DT-0.5-5.0-9.0 ahithomemadeMarinersnewsrunthe N-13-10-14-10-14 V-10-8 DT-2 Marinershitmadethenews N V DT 31

Viterbi 32

Pseudocode 33

Pseudocode 34

SMOOTHING 35

Training 36

Why Smoothing? Zero counts 37

Why Smoothing? Zero counts Handle missing tag sequences: –Smooth transition probabilities 38

Why Smoothing? Zero counts Handle missing tag sequences: –Smooth transition probabilities Handle unseen words: –Smooth observation probabilities 39

Why Smoothing? Zero counts Handle missing tag sequences: –Smooth transition probabilities Handle unseen words: –Smooth observation probabilities Handle unseen (word,tag) pairs where both are known 40

Smoothing Tag Sequences 41

Smoothing Emission Probabilities 45

Smoothing Emission Probabilities 46

Smoothing Emission Probabilities Preprocessing the training corpus: –Count occurrences of all words –Replace words singletons with magic token –Gather counts on modified data, estimate parameters Preprocessing the test set –For each test set word –If seen at least twice in training set, leave it alone –Otherwise replace with –Run Viterbi on this modified input 47

Unknown Words Is there other information we could use for P(w|t)? –Information in words themselves? Morphology: –-able:  JJ –-tion  NN –-ly  RB –Case: John  NP, etc –Augment models Add to ‘context’ of tags Include as features in classifier models –We’ll come back to this idea! 48

HMM IMPLEMENTATION 49

HMM Implementation: Storing an HMM Approach #1: –Hash table (direct): π i = 50

HMM Implementation: Storing an HMM Approach #1: –Hash table (direct): π i =pi{state_str} a ij : 51

HMM Implementation: Storing an HMM Approach #1: –Hash table (direct): π i =pi{state_str} a ij :a{from_state_str}{to_state_str} b i (o t ): 52

HMM Implementation: Storing an HMM Approach #1: –Hash table (direct): π i =pi{state_str} a ij :a{from_state_str}{to_state_str} b i (o t ): b{state_str}{symbol} 53

HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}= 54

HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}=state_idx symbol2idx{symbol}= 55

HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}=state_idx symbol2idx{symbol}=symbol_idx idx2symbol[symbol_idx] = 56

HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}=state_idx symbol2idx{symbol}=symbol_idx idx2symbol[symbol_idx] = symbol idx2state[state_idx]= 57

HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}=state_idx symbol2idx{symbol}=symbol_idx idx2symbol[symbol_idx] = symbol idx2state[state_idx]=state_str π i : 58

HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}=state_idx symbol2idx{symbol}=symbol_idx idx2symbol[symbol_idx] = symbol idx2state[state_idx]=state_str π i :pi[state_idx] a ij : 59

HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}=state_idx symbol2idx{symbol}=symbol_idx idx2symbol[symbol_idx] = symbol idx2state[state_idx]=state_str π i :pi[state_idx] a ij :a[from_state_idx][to_state_idx] b i (o t ): 60

HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}=state_idx symbol2idx{symbol}=symbol_idx idx2symbol[symbol_idx] = symbol idx2state[state_idx]=state_str π i :pi[state_idx] a ij :a[from_state_idx][to_state_idx] b i (o t ):b[state_idx][symbol_idx] 61

HMM Matrix Representations Issue: 62

HMM Matrix Representations Issue: –Many matrix entries are 0 Especially b[i][o] Approach 3: Sparse matrix representation –a[i] = 63

HMM Matrix Representations Issue: –Many matrix entries are 0 Especially b[i][o] Approach 3: Sparse matrix representation –a[i] = “j1 p1 j2 p2…” –a[j] = 64

HMM Matrix Representations Issue: –Many matrix entries are 0 Especially b[i][o] Approach 3: Sparse matrix representation –a[i] = “j1 p1 j2 p2…” –a[j] = “i1 p1 i2 p2..” –b[i] = “o1 p1 o2 p2 …” –b[o] = “i1 p1 i2 p2…” 65

HMM Matrix Representations Issue: –Many matrix entries are 0 Especially b[i][o] Approach 3: Sparse matrix representation –a[i] = “j1 p1 j2 p2…” –a[j] = “i1 p1 i2 p2..” –b[i] = “o1 p1 o2 p2 …” –b[o] = “i1 p1 i2 p2…” Could be: –Array of hashes –Array of lists of non-empty values –The latter is often quite fast, because lists are short and fit into cache lines 66

Ling 570 Day 6: HMM POS Taggers 1. Overview Open Questions HMM POS Tagging Review Viterbi algorithm Training and Smoothing HMM Implementation Details.

Similar presentations

Presentation on theme: "Ling 570 Day 6: HMM POS Taggers 1. Overview Open Questions HMM POS Tagging Review Viterbi algorithm Training and Smoothing HMM Implementation Details."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ling 570 Day 6: HMM POS Taggers 1. Overview Open Questions HMM POS Tagging Review Viterbi algorithm Training and Smoothing HMM Implementation Details.

Similar presentations

Presentation on theme: "Ling 570 Day 6: HMM POS Taggers 1. Overview Open Questions HMM POS Tagging Review Viterbi algorithm Training and Smoothing HMM Implementation Details."— Presentation transcript:

Similar presentations

About project

Feedback