Download presentation
Presentation is loading. Please wait.
Published byDavid Henry Modified over 9 years ago
1
September 2003 1 PART-OF-SPEECH TAGGING Universita’ di Venezia 1 Ottobre 2003
2
September 2003 2 This lecture Tagsets Rule-based tagging Brill tagger Tagging with Markov models The Viterbi algorithm
3
September 2003 3 POS tagging: the problem Secretariat/NNP is/VBZ expected/VBN to/TO race/VB tomorrow/NN People/NNS continue/VBP to/TO inquire/VB the/DT reason/NN for/IN the/DT race/NN for/IN outer/JJ space/NN Problem: assign a tag to race Requires: tagged corpus
4
September 2003 4 Ambiguity in POS tagging The AT man NN VB still NN VB RB saw NN VBD her PPO PP$
5
September 2003 5 How hard is POS tagging? Number of tags1234567 Number of words types 353403760264611221 In the Brown corpus, - 11.5% of word types ambiguous - 40% of word TOKENS
6
September 2003 6 Why is POS tagging useful? Makes search of patterns of interest to linguists in a corpus much easier (original motivation!) Useful as a basis for parsing For applications such as IR, provides some degree of meaning distinction In ASR, helps selection of next word
7
September 2003 7 Choosing a tagset The choice of tagset greatly affects the difficulty of the problem Need to strike a balance between – Getting better information about context (best: introduce more distinctions) – Make it possible for classifiers to do their job (need to minimize distinctions)
8
September 2003 8 Some of the best-known Tagsets Brown corpus: 87 tags Penn Treebank: 45 tags Lancaster UCREL C5 (used to tag the BNC): 61 tags Lancaster C7: 145 tags
9
September 2003 9 Important Penn Treebank tags
10
September 2003 10 Verb inflection tags
11
September 2003 11 The entire Penn Treebank tagset
12
September 2003 12 UCREL C5
13
September 2003 13 Tagsets per l’italiano Si-TAL (Pisa, Venezia, IRST,....) PAROLE ???
14
September 2003 14 Il tagset di SI-TAL
15
September 2003 15 POS tags in the Brown corpus Television/NN has/HVZ yet/RB to/TO work/VB out/RP a/AT living/RBG arrangement/NN with/IN jazz/NN,/, which/VDT comes/VBZ to/IN the/AT medium/NN more/QL as/CS an/AT uneasy/JJ guest/NN than/CS as/CS a/AT relaxed/VBN member/NN of/IN the/AT family/NN./.
16
September 2003 16 SGML-based POS in the BNC TROUSERS SUIT There is nothing masculine about these new trouser suits in summer 's soft pastels. Smart and acceptable for city wear but soft enough for relaxed days
17
September 2003 17 Esercizi Abbonati al minimo ma la squadra piace Si sta bene in B …
18
September 2003 18 Tagging methods Hand-coded Brill tagger Statistical (Markov) taggers
19
September 2003 19 Hand-coded POS tagging: the two-stage architecture Early POS taggers all hand-coded Most of these (Harris, 1962; Greene and Rubin, 1971) and the best of the recent ones, ENGTWOL (Voutilainen, 1995) based on a two-stage architecture
20
September 2003 20 Hand-coded rules (ENGTWOL) STEP 1: assign to each word a list of potential parts of speech - in ENGTWOL, this done by a two-lever morphological analyzer (a finite state transducer) STEP 2: use about 1000 hand-coded CONSTRAINTS (if-then rules) to choose a tag using contextual information - the constraints act as FILTERS
21
September 2003 21 Example Pavlov had shown that salivation …. PavlovPAVLOV N NOM SG PROPER hadHAVE V PAST VFIN SVO HAVE PCP2 SVOO shownSHOW PCP2 SVOO SVO SG thatADV PRON DEM SG DET CENTRAL DEM SG CS salivationN NOM SG
22
September 2003 22 A constraint ADVERBIAL-THAT RULE Given input: “that” if (+1 A/ADV/QUANT); /* next word adj,adv, quant */ (+2 SENT-LIM); /* and following that there is a sentence boundary */ (NOT –1 SVOC/A); /* and previous word is not verb `consider’ */ then eliminate non-ADV tags else eliminate ADV tag.
23
September 2003 23 Tagging with lexical frequencies Secretariat/NNP is/VBZ expected/VBN to/TO race/VB tomorrow/NN People/NNS continue/VBP to/TO inquire/VB the/DT reason/NN for/IN the/DT race/NN for/IN outer/JJ space/NN Problem: assign a tag to race given its lexical frequency Solution: we choose the tag that has the greater – P(race|VB) – P(race|NN) Actual estimate from the Switchboard corpus: – P(race|NN) =.00041 – P(race|VB) =.00003
24
September 2003 24 Factors that play a role in POS tagging Both the Brill tagger and HMM-based taggers achieve good results by combining – FREQUENCY I poured FLOUR/NN into the bowl. Peter should FLOUR/VB the baking tray – information about CONTEXT I saw the new/JJ PLAY/NN in the theater. The boy will/MD PLAY/VBP in the garden.
25
September 2003 25 The Brill tagger An example of TRANSFORMATION-BASED LEARNING Very popular (freely available, works fairly well) A SUPERVISED method: requires a tagged corpus Basic idea: do a quick job first (using frequency), then revise it using contextual rules
26
September 2003 26 An example Examples: – It is expected to race tomorrow. – The race for outer space. Tagging algorithm: 1. Tag all uses of “race” as NN (most likely tag in the Brown corpus) It is expected to race/NN tomorrow the race/NN for outer space 2. Use a transformation rule to replace the tag NN with VB for all uses of “race” preceded by the tag TO: It is expected to race/VB tomorrow the race/NN for outer space
27
September 2003 27 Transformation-based learning in the Brill tagger 1. Tag the corpus with the most likely tag for each word 2. Choose a TRANSFORMATION that deterministically replaces an existing tag with a new one such that the resulting tagged corpus has the lowest error rate 3. Apply that transformation to the training corpus 4. Repeat 5. Return a tagger that a. first tags using unigrams b. then applies the learned transformations in order
28
September 2003 28 The algorithm
29
September 2003 29 Examples of learned transformations
30
September 2003 30 Templates
31
September 2003 31 An example
32
September 2003 32 Markov Model POS tagging Again, the problem is to find an `explanation’ with the highest probability: As in yesterday’s case, this can be ‘turned around’ using Bayes’ Rule:
33
September 2003 33 Combining frequency and contextual information As in the case of spelling, this equation can be simplified: As we will see, once further simplifications are applied, this equation will encode both FREQUENCY and CONTEXT INFORMATION
34
September 2003 34 Three further assumptions MARKOV assumption: a tag only depends on a FIXED NUMBER of previous tags (here, assume bigrams) – Simplify second factor INDEPENDENCE assumption: words are independent from each other. A word’s identity only depends on its own tag – Simplify first factor
35
September 2003 35 The final equations FREQUENCY CONTEXT
36
September 2003 36 Estimating the probabilities Can be done using Maximum Likelihood Estimation as usual, for BOTH probabilities:
37
September 2003 37 An example of tagging with Markov Models : Secretariat/NNP is/VBZ expected/VBN to/TO race/VB tomorrow/NN People/NNS continue/VBP to/TO inquire/VB the/DT reason/NN for/IN the/DT race/DT for/IN outer/JJ space/NN Problem: assign a tag to race given the subsequences – to/TO race/??? – the/DT race/??? Solution: we choose the tag that has the greater of these probabilities: – P(VB|TO) P(race|VB) – P(NN|TO)P(race|NN)
38
September 2003 38 Tagging with MMs (2) Actual estimates from the Switchboard corpus: LEXICAL FREQUENCIES: – P(race|NN) =.00041 – P(race|VB) =.00003 CONTEXT: – P(NN|TO) =.021 – P(VB|TO) =.34 The probabilities: – P(VB|TO) P(race|VB) =.00001 – P(NN|TO)P(race|NN) =.000007
39
September 2003 39 A graphical interpretation of the POS tagging equations
40
September 2003 40 Hidden Markov Models
41
September 2003 41 An example
42
September 2003 42 Computing the most likely sequence of tags In general, the problem of computing the most likely sequence t 1.. t n could have exponential complexity It can however be solved in polynomial time using an example of DYNAMIC PROGRAMMING: the VITERBI ALGORITHM (Viterbi, 1967) (Also called TRELLIS ALGORITHMs)
43
September 2003 43 Trellis algorithms
44
September 2003 44 The Viterbi algorithm
45
September 2003 45 Viterbi (pseudo-code format)
46
September 2003 46 Viterbi: an example
47
September 2003 47 Markov chains and Hidden Markov Models Markov chain: only transition probabilities. Each node associated with a single OUTPUT Hidden Markov Models: nodes may have more than one output; probability P(w|t) of outputting word w from state t.
48
September 2003 48 Training HMMs The reason why HMMS are so popular is because they come with a LEARNING ALGORITHM: the FORWARD-BACKWARD algorithm (an instance of a class of algorithms called EM algorithms) Basic idea of the forward-backward algorithm: start by assigning random transition and emission probabilities, then iterate
49
September 2003 49 Evaluation of POS taggers Can reach up to 96.7% correct on Penn Treebank (see Brants, 2000) (But see next lecture)
50
September 2003 50 Additional issues Most of the difference in performance between POS algorithms depends on their treatment of UNKNOWN WORDS Multiple token words (‘Penn Treebank’) Class-based N-grams
51
September 2003 51 Other techniques There is a move away from HMMs for this task and towards techniques that make it easier to use multiple features MAXIMUM ENTROPY taggers among the highest performing at the moment
52
September 2003 52 Freely available POS taggers Quite a few taggers are freely available – Brill (TBL) – QTAG (HMM; can be trained for other languages) – LT POS (part of the Edinburgh LTG suite of tools) – See Chris Manning’s Statistical NLP resources web page (from the course web page)
53
September 2003 53 POS tagging per l’italiano Xerox Grenoble IMMORTALE (Universita’ di Venezia) Pi-Tagger (Universita’ di Pisa)
54
September 2003 54 Other kinds of tagging Sense tagging (SEMCOR, SENSEVAL) Syntactic tagging (`supertagging’) Dialogue act tagging Semantic tagging (animacy, etc.)
55
September 2003 55 Readings Jurafsky and Martin, chapter 8
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.