Word Classes and Part-of-Speech (POS) Tagging

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
Outline Why part of speech tagging? Word classes
1 Part of Speech tagging Lecture 9 Slides adapted from: Dan Jurafsky, Julia Hirschberg, Jim Martin.
Chapter 8. Word Classes and Part-of-Speech Tagging From: Chapter 8 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech.
BİL711 Natural Language Processing
Part-of-speech tagging. Parts of Speech Perhaps starting with Aristotle in the West (384–322 BCE) the idea of having parts of speech lexical categories,
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
February 2007CSA3050: Tagging II1 CSA2050: Natural Language Processing Tagging 2 Rule-Based Tagging Stochastic Tagging Hidden Markov Models (HMMs) N-Grams.
Natural Language Processing Lecture 8—9/24/2013 Jim Martin.
LING 388 Language and Computers Lecture 22 11/25/03 Sandiway FONG.
1 Part of Speech Tagging (Chapter 5) September 2009 Lecture #6.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Part-Of-Speech (POS) Tagging.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Word Classes and English Grammar.
CS 4705 Lecture 8 Word Classes and Part-of- Speech (POS) Tagging.
Learning Bit by Bit Hidden Markov Models. Weighted FSA weather The is outside
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
CS Catching Up CS Porter Stemmer Porter Stemmer (1980) Used for tasks in which you only care about the stem –IR, modeling given/new distinction,
POS based on Jurafsky and Martin Ch. 8 Miriam Butt October 2003.
NLP and Speech 2004 English Grammar
POS Tagging and Context-Free Grammars
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.
Part of speech (POS) tagging
CMSC 723 / LING 645: Intro to Computational Linguistics November 3, 2004 Lecture 9 (Dorr): Word Classes, POS Tagging (Chapter 8) Intro to Syntax (Start.
Word classes and part of speech tagging Chapter 5.
Albert Gatt Corpora and Statistical Methods Lecture 9.
8. Word Classes and Part-of-Speech Tagging 2007 년 5 월 26 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.287 ~ 303.
1 POS Tagging: Introduction Heng Ji Feb 2, 2008 Acknowledgement: some slides from Ralph Grishman, Nicolas Nicolov, J&M.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Parts of Speech Sudeshna Sarkar 7 Aug 2008.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Lecture 6 POS Tagging Methods Topics Taggers Rule Based Taggers Probabilistic Taggers Transformation Based Taggers - Brill Supervised learning Readings:
Fall 2005 Lecture Notes #8 EECS 595 / LING 541 / SI 661 Natural Language Processing.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
10/24/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
Word classes and part of speech tagging Chapter 5.
Speech and Language Processing Ch8. WORD CLASSES AND PART-OF- SPEECH TAGGING.
Tokenization & POS-Tagging
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Word classes and part of speech tagging 09/28/2004 Reading: Chap 8, Jurafsky & Martin Instructor: Rada Mihalcea Note: Some of the material in this slide.
Natural Language Processing
CSA3202 Human Language Technology HMMs for POS Tagging.
Part-of-speech tagging
Human Language Technology Part of Speech (POS) Tagging II Rule-based Tagging.
1 Computational Lexicology, Morphology and Syntax Course 6 Diana Trandab ă ț Academic year:
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Part-of-Speech Tagging & Sequence Labeling Hongning Wang
Word classes and part of speech tagging Chapter 5.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
POS TAGGING AND HMM Tim Teks Mining Adapted from Heng Ji.
6/18/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini.
Speech and Language Processing SLP Chapter 5. 10/31/1 2 Speech and Language Processing - Jurafsky and Martin 2 Today  Parts of speech (POS)  Tagsets.
Lecture 5 POS Tagging Methods
Lecture 9: Part of Speech
CSCI 5832 Natural Language Processing
CS4705 Part of Speech tagging
CSC 594 Topics in AI – Natural Language Processing
CSCI 5832 Natural Language Processing
Natural Language Processing
Lecture 6: Part of Speech Tagging (II): October 14, 2004 Neal Snider
Natural Language Processing
CPSC 503 Computational Linguistics
Natural Language Processing (NLP)
Presentation transcript:

Word Classes and Part-of-Speech (POS) Tagging CS4705 Julia Hirschberg CS 4705

Garden Path Sentences The old dog …………the footsteps of the young. The cotton clothing …………is made of grows in Mississippi. The horse raced past the barn …………fell. 2 2

Word Classes Words that somehow ‘behave’ alike: Appear in similar contexts Perform similar functions in sentences Undergo similar transformations ~9 traditional word classes of parts of speech Noun, verb, adjective, preposition, adverb, article, interjection, pronoun, conjunction 3 3

Some Examples N noun chair, bandwidth, pacing V verb study, debate, munch ADJ adjective purple, tall, ridiculous ADV adverb unfortunately, slowly P preposition of, by, to PRO pronoun I, me, mine DET determiner the, a, that, those 4 4

Defining POS Tagging The process of assigning a part-of-speech or lexical class marker to each word in a corpus: the koala put keys on table WORDS TAGS N V P DET 5 5

Applications for POS Tagging Speech synthesis pronunciation Lead Lead INsult inSULT OBject obJECT OVERflow overFLOW DIScount disCOUNT CONtent conTENT Parsing: e.g. Time flies like an arrow Is flies an N or V? Word prediction in speech recognition Possessive pronouns (my, your, her) are likely to be followed by nouns Personal pronouns (I, you, he) are likely to be followed by verbs Machine Translation 6 6

Closed vs. Open Class Words Closed class: relatively fixed set Prepositions: of, in, by, … Auxiliaries: may, can, will, had, been, … Pronouns: I, you, she, mine, his, them, … Usually function words (short common words which play a role in grammar) Open class: productive English has 4: Nouns, Verbs, Adjectives, Adverbs Many languages have all 4, but not all! In Lakhota and possibly Chinese, what English treats as adjectives act more like verbs. 7 7

Open Class Words Nouns Proper nouns Columbia University, New York City, Arthi Ramachandran, Metropolitan Transit Center English capitalizes these Many have abbreviations Common nouns All the rest German capitalizes these. 8 8

Adjectives: identify properties or qualities of nouns Count nouns vs. mass nouns Count: Have plurals, countable: goat/goats, one goat, two goats Mass: Not countable (fish, salt, communism) (?two fishes) Adjectives: identify properties or qualities of nouns Color, size, age, … Adjective ordering restrictions in English: Old blue book, not Blue old book In Korean, adjectives are realized as verbs Adverbs: also modify things (verbs, adjectives, adverbs) The very happy man walked home extremely slowly yesterday. Mass: is this singular or plural? 9 9

Verbs: Directional/locative adverbs (here, home, downhill) Degree adverbs (extremely, very, somewhat) Manner adverbs (slowly, slinkily, delicately) Temporal adverbs (Monday, tomorrow) Verbs: In English, take morphological affixes (eat/eats/eaten) Represent actions (walk, ate), processes (provide, see), and states (be, seem) Many subclasses, e.g. eats/V  eat/VB, eat/VBP, eats/VBZ, ate/VBD, eaten/VBN, eating/VBG, ... Reflect morphological form & syntactic function

How Do We Assign Words to Open or Closed? Nouns denote people, places and things and can be preceded by articles? But… My typing is very bad. *The Mary loves John. Verbs are used to refer to actions, processes, states But some are closed class and some are open I will have emailed everyone by noon. Adverbs modify actions Is Monday a temporal adverbial or a noun? 11 11

Closed Class Words Idiosyncratic Closed class words (Prep, Det, Pron, Conj, Aux, Part, Num) are generally easy to process, since we can enumerate them….but Is it a Particles or a Preposition? George eats up his dinner/George eats his dinner up. George eats up the street/*George eats the street up. Articles come in 2 flavors: definite (the) and indefinite (a, an) What is this in ‘this guy…’? 12 12

Choosing a POS Tagset To do POS tagging, first need to choose a set of tags Could pick very coarse (small) tagsets N, V, Adj, Adv. More commonly used: Brown Corpus (Francis & Kucera ‘82), 1M words, 87 tags – more informative but more difficult to tag Most commonly used: Penn Treebank: hand-annotated corpus of Wall Street Journal, 1M words, 45-46 subset We’ll use for HW1 13 13

Penn Treebank Tagset 14 14

Using the Penn Treebank Tags The/DT grand/JJ jury/NN commmented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS ./. Prepositions and subordinating conjunctions marked IN (“although/IN I/PRP..”) Except the preposition/complementizer “to” is just marked “TO” NB: PRP$ (possessive pronoun) vs. $ 15 15

Tag Ambiguity Words often have more than one POS: back The back door = JJ On my back = NN Win the voters back = RB Promised to back the bill = VB The POS tagging problem is to determine the POS tag for a particular instance of a word 16 16

Tagging Whole Sentences with POS is Hard Ambiguous POS contexts E.g., Time flies like an arrow. Possible POS assignments Time/[V,N] flies/[V,N] like/[V,Prep] an/Det arrow/N Time/N flies/V like/Prep an/Det arrow/N Time/V flies/N like/Prep an/Det arrow/N Time/N flies/N like/V an/Det arrow/N ….. 17 17

How Big is this Ambiguity Problem? Types in Brown Corpus: many of the ambiguous words are very common 18 18

How Do We Disambiguate POS? Many words have only one POS tag (e.g. is, Mary, very, smallest) Others have a single most likely tag (e.g. a, dog) Tags also tend to co-occur regularly with other tags (e.g. Det, N) In addition to conditional probabilities of words P(w1|wn-1), we can look at POS likelihoods (P(t1|tn-1)) to disambiguate sentences and to assess sentence likelihoods 19 19

Some Ways to do POS Tagging Rule-based tagging E.g. EnCG ENGTWOL tagger Transformation-based tagging Learned rules (statistic and linguistic) E.g., Brill tagger Stochastic, or, Probabilistic tagging HMM (Hidden Markov Model) tagging 20 20

Rule-Based Tagging Typically…start with a dictionary of words and possible tags Assign all possible tags to words using the dictionary Write rules by hand to selectively remove tags Stop when each word has exactly one (presumably correct) tag 21 21

Start with a POS Dictionary she: PRP promised: VBN,VBD to: TO back: VB, JJ, RB, NN the: DT bill: NN, VB Etc… for the ~100,000 words of English 22 22

Assign All Possible POS to Each Word NN RB VBN JJ VB PRP VBD TO VB DT NN She promised to back the bill 23 23

Apply Rules Eliminating Some POS E.g., Eliminate VBN if VBD is an option when VBN|VBD follows “<start> PRP” NN RB VBN JJ VB PRP VBD TO VB DT NN She promised to back the bill VBN: past participle VBD: past tense 24 24

Apply Rules Eliminating Some POS E.g., Eliminate VBN if VBD is an option when VBN|VBD follows “<start> PRP” NN RB JJ VB PRP VBD TO VB DT NN She promised to back the bill VBN: part participle VBD: past tense 25 25

EngCG ENGTWOL Tagger Richer dictionary includes morphological and syntactic features (e.g. subcategorization frames) as well as possible POS Uses two-level morphological analysis on input and returns all possible POS Apply negative constraints (> 3744) to rule out incorrect POS

Sample ENGTWOL Dictionary e.g. subcat frame: SVOO: subject verb double object VFIN: finite verbs, inflected for e.g. tense, number, gender (not infinitives, gerunds, participles serving as adj) 27 27

ENGTWOL Tagging: Stage 1 First Stage: Run words through FST morphological analyzer to get POS info from morph E.g.: Pavlov had shown that salivation … Pavlov PAVLOV N NOM SG PROPER had HAVE V PAST VFIN SVO HAVE PCP2 SVO shown SHOW PCP2 SVOO SVO SV that ADV PRON DEM SG DET CENTRAL DEM SG CS salivation N NOM SG 28 28

ENGTWOL Tagging: Stage 2 Second Stage: Apply NEGATIVE constraints E.g., Adverbial that rule Eliminate all readings of that except the one in It isn’t that odd. Given input: that If (+1 A/ADV/QUANT) ; if next word is adj/adv/quantifier (+2 SENT-LIM) ; followed by E-O-S (NOT -1 SVOC/A) ; and the previous word is not a verb like consider which allows adjective complements (e.g. I consider that odd) Then eliminate non-ADV tags Else eliminate ADV Is that an adverb or not? 29 29

Transformation-Based (Brill) Tagging Combines Rule-based and Stochastic Tagging Like rule-based because rules are used to specify tags in a certain environment Like stochastic approach because we use a tagged corpus to find the best performing rules Rules are learned from data Input: Tagged corpus Dictionary (with most frequent tags) 4/15/2017 30 30

Transformation-Based Tagging Basic Idea: Strip tags from tagged corpus and try to learn them by rule application For untagged, first initialize with most probable tag for each word Change tags according to best rewrite rule, e.g. “if word-1 is a determiner and word is a verb then change the tag to noun” Compare to gold standard Iterate Rules created via rule templates, e.g.of the form if word-1 is an X and word is a Y then change the tag to Z” Find rule that applies correctly to most tags and apply Iterate on newly tagged corpus until threshold reached Return ordered set of rules NB: Rules may make errors that are corrected by later rules 4/15/2017 31 31

Templates for TBL 4/15/2017 32 32

Sample TBL Rule Application Labels every word with its most-likely tag E.g. race occurences in the Brown corpus: P(NN|race) = .98 P(VB|race)= .02 is/VBZ expected/VBN to/TO race/NN tomorrow/NN Then TBL applies the following rule “Change NN to VB when previous tag is TO” … is/VBZ expected/VBN to/TO race/NN tomorrow/NN becomes … is/VBZ expected/VBN to/TO race/VB tomorrow/NN 4/15/2017 33 33

Learning Rules 2 parts to a rule Triggering environment Rewrite rule The range of triggering environments of templates (from Manning & Schutze 1999:363) Schema ti-3 ti-2 ti-1 ti ti+1 ti+2 ti+3 1 * 2 * 3 * 4 * 5 * 6 * 7 * 8 * 9 * 34

TBL Tagging Algorithm Step 1: Label every word with most likely tag (from dictionary) Step 2: Check every possible transformation & select one which most improves tag accuracy (cf Gold) Step 3: Re-tag corpus applying this rule, and add rule to end of rule set Repeat 2-3 until some stopping criterion is reached, e.g., X% correct with respect to training corpus RESULT: Ordered set of transformation rules to use on new data tagged only with most likely POS tags 4/15/2017 35 35

TBL Issues Problem: Could keep applying (new) transformations ad infinitum Problem: Rules are learned in ordered sequence Problem: Rules may interact But: Rules are compact and can be inspected by humans 4/15/2017 36 36

Evaluating Tagging Approaches For any NLP problem, we need to know how to evaluate our solutions Possible Gold Standards -- ceiling: Annotated naturally occurring corpus Human task performance (96-7%) How well do humans agree? Kappa statistic: avg pairwise agreement corrected for chance agreement Can be hard to obtain for some tasks: sometimes humans don’t agree

Baseline: how well does simple method do? For tagging, most common tag for each word (91%) How much improvement do we get over baseline?

Methodology: Error Analysis Confusion matrix: E.g. which tags did we most often confuse with which other tags? How much of the overall error does each confusion account for? VB TO NN

More Complex Issues Tag indeterminacy: when ‘truth’ isn’t clear Caribbean cooking, child seat Tagging multipart words wouldn’t --> would/MD n’t/RB How to handle unknown words Assume all tags equally likely Assume same tag distribution as all other singletons in corpus Use morphology, word length,….

Summary We can develop statistical methods for identifying the POS of word sequences which come close to human performance – high 90s But not completely “solved” despite published statistics Especially for spontaneous speech Next Class: Read Chapter 6:1-5 on Hidden Markov Models