Download presentation
Presentation is loading. Please wait.
Published byCornelius Lynch Modified over 8 years ago
1
13-1 Chapter 13 Part-of-Speech Tagging
2
13-2 POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models Hidden Markov Models Training and Initialization Other Methods
3
13-3 Part Of Speech Tagging Assign syntactic categories to words in text –The-AT representative-NN put-VBD chairs-NNS on-IN the-AT table-NN. –The-AT representative-JJ put-NN chairs-VBZ on-IN the-AT table-NN. –Tagging set (see next) Usefulness –Lexical Acquisition, Shallow/Partial Parse –Information Extraction –Question Answering
4
13-4 Brown/Penn tag sets
5
13-5 Sources of information syntagmatic structural information –looking at information about tag sequences –AT JJ NN vs. AT JJ VBP –77% performance (Greene and Rubin, 1971) lexical information –predicting a tag based on the word concerned –The word flour is much more likely to be a noun than a verb
6
13-6 Visible Markov View as Markov Chain –Limited Horizon: a word’s tag only depends on the previous tag –Time Invariant: the dependency does not change over time
7
13-7
8
13-8 Find the best tagging t 1,n for a sentence w 1,n. Words are independently of each other A word’s identity only depends on its tag.
9
13-9 The final equation for determining the optimal tags for a sentence:
10
13-10 Viterbi Algorithm
11
13-11 Unknown Words Simplest model –Unknown words can be of any part of speech –Or only any open class part of speech, I.e., nouns, verbs, and so on Morphological and other cues –-ed: past tense forms or past participles
12
13-12 Transformation-Based Learning of Tags A specification of which “error-correcting” transformations are admissible The learning algorithm –Tag each word in the training corpus with its most frequent tag –Construct a ranked list of transformations that transforms the initial tagging into a tagging that is close to correct
13
13-13 Transformations Triggering environment Rewrite rule t 1 t 2 : replace tag t 1 by tag t 2. Triggering environment: potential rewriting locations where a trigger will be sought Tag t j occurs in one of the three previous positions Tag t j occurs two positions earlier and tag t k occurs in the following position
14
13-14 (1) Trigger by tags (2) Trigger by word to work in a school go to school?? for cut, put more valuable player don’t, shouldn’t (3) Trigger by morphology e.g., unknown words are tagged as proper nouns (NNP) if capitalized, as common nouns (NN) otherwise. Replace NN by NNS if the unknown word’s suffix is –s.
15
13-15 Reduce the error rate E(C k ): the number of words that are mistagged in tagged corpus C k.
16
13-16 Tagging Accuracy 95%~97% The amount of training data available The tag set The difference between training corpus and dictionary on the one hand and the corpus of application on the other Unknown words
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.