Download presentation
Presentation is loading. Please wait.
Published byBertina Jefferson Modified over 9 years ago
1
Word classes and part of speech tagging Chapter 5
2
Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches 1: rule-based tagging Automatic approaches 2: stochastic tagging On Part 2: finish stochastic tagging, and continue on to: Automatic approaches 3: transformation-based tagging Other issues: tagging unknown words, evaluation
3
Slide 2 Definition “The process of assigning a part-of-speech or other lexical class marker to each word in a corpus” (Jurafsky and Martin) the girl kissed the boy on the cheek WORDS TAGS N V P DET
4
Slide 3 An Example the girl kiss the boy on the cheek LEMMATAG +DET +NOUN +VPAST +DET +NOUN +PREP +DET +NOUN the girl kissed the boy on the cheek WORD
5
Slide 4 Motivation Speech synthesis — pronunciation Speech recognition — class-based N-grams Information retrieval — stemming, selection high-content words Word-sense disambiguation Corpus analysis of language & lexicography
6
Slide 5 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches 1: rule-based tagging Automatic approaches 2: stochastic tagging Automatic approaches 3: transformation-based tagging Other issues: tagging unknown words, evaluation
7
Slide 6 Word Classes Basic word classes: Noun, Verb, Adjective, Adverb, Preposition, … Open vs. Closed classes Open: Nouns, Verbs, Adjectives, Adverbs. Why “open”? Closed: determiners: a, an, the pronouns: she, he, I prepositions: on, under, over, near, by, …
8
Slide 7 Open Class Words Every known human language has nouns and verbs Nouns: people, places, things Classes of nouns proper vs. common count vs. mass Verbs: actions and processes Adjectives: properties, qualities Adverbs: hodgepodge! Unfortunately, John walked home extremely slowly yesterday Numerals: one, two, three, third, …
9
Slide 8 Closed Class Words Differ more from language to language than open class words Examples: prepositions: on, under, over, … particles: up, down, on, off, … determiners: a, an, the, … pronouns: she, who, I,.. conjunctions: and, but, or, … auxiliary verbs: can, may should, …
10
Slide 9 Prepositions from CELEX
11
Slide 10 English Single-Word Particles
12
Slide 11 Pronouns in CELEX
13
Slide 12 Conjunctions
14
Slide 13 Auxiliaries
15
Slide 14 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches 1: rule-based tagging Automatic approaches 2: stochastic tagging Automatic approaches 3: transformation-based tagging Other issues: tagging unknown words, evaluation
16
Slide 15 Word Classes: Tag Sets Vary in number of tags: a dozen to over 200 Size of tag sets depends on language, objectives and purpose –Some tagging approaches (e.g., constraint grammar based) make fewer distinctions e.g., conflating prepositions, conjunctions, particles –Simple morphology = more ambiguity = fewer tags
17
Slide 16 Word Classes: Tag set example PRP PRP$
18
Slide 17 Example of Penn Treebank Tagging of Brown Corpus Sentence The/DT grand/JJ jury/NN commented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS./. VB DT NN. Book that flight. VBZ DT NN VB NN ? Does that flight serve dinner ? See http://www.infogistics.com/posdemo.htmhttp://www.infogistics.com/posdemo.htm Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo
19
Slide 18 The Problem Words often have more than one word class: this This is a nice day = PRP This day is nice = DT You can go this far = RB
20
Slide 19 Word Class Ambiguity (in the Brown Corpus) Unambiguous (1 tag): 35,340 Ambiguous (2-7 tags): 4,100 2 tags3,760 3 tags264 4 tags61 5 tags12 6 tags2 7 tags1 (Derose, 1988)
21
Slide 20 Part-of-Speech Tagging Rule-Based Tagger: ENGTWOL (ENGlish TWO Level analysis) Stochastic Tagger: HMM-based Transformation-Based Tagger (Brill)
22
Slide 21 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches 1: rule-based tagging Automatic approaches 2: stochastic tagging Automatic approaches 3: transformation-based tagging Other issues: tagging unknown words, evaluation
23
Slide 22 Rule-Based Tagging Basic Idea: –Assign all possible tags to words –Remove tags according to set of rules of type: if word+1 is an adj, adv, or quantifier and the following is a sentence boundary and word-1 is not a verb like “consider” then eliminate non-adv else eliminate adv. –Typically more than 1000 hand-written rules
24
Slide 23 Sample ENGTWOL Lexicon Demo: http://www2.lingsoft.fi/cgi-bin/engtwolhttp://www2.lingsoft.fi/cgi-bin/engtwol
25
Slide 24 Stage 1 of ENGTWOL Tagging First Stage: Run words through a morphological analyzer to get all parts of speech. Example: Pavlov had shown that salivation … PavlovPAVLOV N NOM SG PROPER hadHAVE V PAST VFIN SVO HAVE PCP2 SVO shownSHOW PCP2 SVOO SVO SV thatADV PRON DEM SG DET CENTRAL DEM SG CS salivationN NOM SG
26
Slide 25 Stage 2 of ENGTWOL Tagging Second Stage: Apply constraints. Constraints used in negative way. Example: Adverbial “that” rule Given input: “that” If (+1 A/ADV/QUANT) (+2 SENT-LIM) (NOT -1 SVOC/A) Then eliminate non-ADV tags Else eliminate ADV
27
Slide 26 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches 1: rule-based tagging Automatic approaches 2: stochastic tagging Automatic approaches 3: transformation-based tagging Other issues: tagging unknown words, evaluation
28
Slide 27 Stochastic Tagging Based on probability of certain tag occurring given various possibilities Requires a training corpus No probabilities for words not in corpus. Training corpus may be different from test corpus.
29
Slide 28 Stochastic Tagging (cont.) Simple Method: Choose most frequent tag in training text for each word! –Result: 90% accuracy –Baseline –Others will do better –HMM is an example
30
Slide 29 HMM Tagger Intuition: Pick the most likely tag for this word. HMM Taggers choose tag sequence that maximizes this formula: –P(word|tag) × P(tag|previous n tags) Let T = t 1,t 2,…,t n Let W = w 1,w 2,…,w n Find POS tags that generate a sequence of words, i.e., look for most probable sequence of tags T underlying the observed words W.
31
Slide 30 Start with Bigram-HMM Tagger argmax T P(T|W) argmax T P(T)P(W|T) argmax t P(t 1 …t n )P(w 1 …w n |t 1 …t n ) argmax t [P(t 1 )P(t 2 |t 1 )…P(t n |t n-1 )][P(w 1 |t 1 )P(w 2 |t 2 )…P(w n |t n )] To tag a single word: t i = argmax j P(t j |t i-1 )P(w i |t j ) How do we compute P(t i |t i-1 )? c(t i-1 t i )/c(t i-1 ) How do we compute P(w i |t i )? c(w i,t i )/c(t i ) How do we compute the most probable tag sequence? Viterbi
32
Slide 31 An Example Secretariat/NNP is/VBZ expected/VBN to/TO race/VB tomorrow/NN People/NNS continue/VBP to/TO inquire/VB the DT reason/NN for/IN the/DT race/NN for/IN outer/JJ space/NN to/TOrace/???
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.