Download presentation
Presentation is loading. Please wait.
Published byHilda Strickland Modified over 9 years ago
1
Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10
2
2 Contents Markov Model Taggers Hidden Markov Model Taggers Transformation-Based Learning of Tags Tagging Accuracy and Uses of Taggers
3
3 Markov Model Taggers Markov properties Limited horizon Time invariant cf. Wh-extraction (Chomsky) a. Should Peter buy a book? b. Which book should Peter buy?
4
4 Markov Model Taggers The probabilistic model Finding the best tagging t 1,n for a sentence w 1,n ex: P(AT NN BEZ IN AT VB | The bear is on the move)
5
5 assumtion words are independent of each other a word’s identity only depends on its tag
6
6 Markov Model Taggers Training for all tags t j do for all tags t k do end for all tags t j do for all words w l do end
7
7 First tag Second tag ATBEZINNNVBPERIOD AT00048636019 BEZ19730426187038 IN4332201325173140185 NN10673720424701177361421392 VB607242475814761291522 PERIOD801675465613299540 ATBEZINNNVBPERIOD bear 00010430 is 0100650000 move 000361330 on 005484000 president 00038200 progress 00010840 the 6901600000. 0000048809
8
8 Markov Model Taggers Tagging (the Viterbi algorithm)
9
9 Variations The models for unknown words 1. assuming that they can be any part of speech 2. using morphological to make inferences about a possible parts of speech
10
10 Z: normalization constant
11
11 Variation Trigram taggers Interpolation Variable Memory Markov Model (VMMM)
12
12 Variation Smoothing Reversibility K l : the number of possible parts of speech of w l
13
13 Variation Sequence vs. tag by tag Time flies like an arrow. a. NN VBZ RB AT NN.P(.) = 0.01 b. NN NNS VB AT NN.P(.) = 0.01 there is no large difference in accuracy between maximizing the sequence and tag
14
14 Hidden Markov Model Taggers When we have no tagged training data Initializing all parameters with the dictionary information Jelinek’s method Kupiec’s method
15
15 Hidden Markov Model Taggers Jelinek’s method initializing the HMM with the MLE for P(w k |t i ) assuming that words occur equally likely with each of their possible tags. T(w j ): the number of tags allowed for w j
16
16 Hidden Markov Model Taggers Kupiec’s method grouping all words with the same possible parts of speech into ‘metawords’ u L not to fine-tune parameters for each word
17
17 Hidden Markov Model Taggers Training after initialization, the HMM is trained using the Forward-Backward algorithm Tagging equal to VMM ! the difference between VMM tagging and HMM tagging is in how we train the model, not in how we tag.
18
18 Hidden Markov Model Taggers The effect of initialization on HMM overtraining problem D0maximum likelihood estimates from a tagged training corpus D1correct ordering only of lexical probabilities D2lexical probabilities proportional to overall tag probabilities D3equal lexical probabilities for all tags admissible for a word T0maximum likelihood estimates from a tagged training corpus T1equal probabilities for all transitions
19
19 Use Visible Markov Model a sufficiently large training text similar to the intended text of application Run Forward-Backward for a few iterations no training text training and test text are very different but at least some lexical information Run Forward-Backward for a larger number of iterations no lexical information
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.