Part-of-speech tagging. Parts of Speech Perhaps starting with Aristotle in the West (384–322 BCE) the idea of having parts of speech lexical categories,

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.
Advertisements

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.
Outline Why part of speech tagging? Word classes
Chapter 8. Word Classes and Part-of-Speech Tagging From: Chapter 8 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech.
BİL711 Natural Language Processing
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
Natural Language Processing Lecture 8—9/24/2013 Jim Martin.
LING 388 Language and Computers Lecture 22 11/25/03 Sandiway FONG.
Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili University of Tehran.
Hidden Markov Model (HMM) Tagging  Using an HMM to do POS tagging  HMM is a special case of Bayesian inference.
Part of Speech Tagging with MaxEnt Re-ranked Hidden Markov Model Brian Highfill.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Learning Bit by Bit Hidden Markov Models. Weighted FSA weather The is outside
Part-of-speech Tagging cs224n Final project Spring, 2008 Tim Lai.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
POS based on Jurafsky and Martin Ch. 8 Miriam Butt October 2003.
Boosting Applied to Tagging and PP Attachment By Aviad Barzilai.
Part of speech (POS) tagging
Part-of-Speech Tagging & Sequence Labeling
Word classes and part of speech tagging Chapter 5.
BIOI 7791 Projects in bioinformatics Spring 2005 March 22 © Kevin B. Cohen.
Albert Gatt Corpora and Statistical Methods Lecture 9.
1 Sequence Labeling Raymond J. Mooney University of Texas at Austin.
Raymond J. Mooney University of Texas at Austin
Part-of-speech tagging A simple but useful form of linguistic analysis Christopher Manning.
Part-of-Speech Tagging
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Parts of Speech Sudeshna Sarkar 7 Aug 2008.
CS 4705 Hidden Markov Models Julia Hirschberg CS4705.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Natural Language Processing Lecture 8—2/5/2015 Susan W. Brown.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
인공지능 연구실 정 성 원 Part-of-Speech Tagging. 2 The beginning The task of labeling (or tagging) each word in a sentence with its appropriate part of speech.
Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides.
Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová
10/24/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini.
Hindi Parts-of-Speech Tagging & Chunking Baskaran S MSRI.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
Transformation-Based Learning Advanced Statistical Methods in NLP Ling 572 March 1, 2012.
13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.
Word classes and part of speech tagging Chapter 5.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Word classes and part of speech tagging 09/28/2004 Reading: Chap 8, Jurafsky & Martin Instructor: Rada Mihalcea Note: Some of the material in this slide.
CSA3202 Human Language Technology HMMs for POS Tagging.
CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.
Part-of-speech tagging
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong. Adminstrivia Homework 7 out today – due Saturday by midnight.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Probability and Time. Overview  Modelling Evolving Worlds with Dynamic Baysian Networks  Simplifying Assumptions Stationary Processes, Markov Assumption.
Part-of-Speech Tagging & Sequence Labeling Hongning Wang
Word classes and part of speech tagging Chapter 5.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
POS TAGGING AND HMM Tim Teks Mining Adapted from Heng Ji.
6/18/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini.
Part-of-Speech Tagging CSE 628 Niranjan Balasubramanian Many slides and material from: Ray Mooney (UT Austin) Mausam (IIT Delhi) * * Mausam’s excellent.
CS 2750: Machine Learning Hidden Markov Models Prof. Adriana Kovashka University of Pittsburgh March 21, 2016 All slides are from Ray Mooney.
Lecture 5 POS Tagging Methods
Lecture 9: Part of Speech
Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15
LING/C SC/PSYC 438/538 Lecture 20 Sandiway Fong.
CSC 594 Topics in AI – Natural Language Processing
CSCI 5832 Natural Language Processing
Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides.
Natural Language Processing
Part-of-Speech Tagging Using Hidden Markov Models
Presentation transcript:

Part-of-speech tagging

Parts of Speech Perhaps starting with Aristotle in the West (384–322 BCE) the idea of having parts of speech lexical categories, word classes, “tags”, POS Dionysius Thrax of Alexandria (c. 100 BCE): 8 parts of speech Still with us! But his 8 aren’t exactly the ones we are taught today Thrax: noun, verb, article, adverb, preposition, conjunction, participle, pronoun School grammar: noun, verb, adjective, adverb, preposition, conjunction, pronoun, interjection

Open class (lexical) words Closed class (functional) NounsVerbs ProperCommon Modals Main Adjectives Adverbs Prepositions Particles Determiners Conjunctions Pronouns … more IBM Italy cat / cats snow see registered can had old older oldest slowly to with off up the some and or he its Numbers 122,312 one Interjections Ow Eh

Open vs. Closed classes Closed: determiners: a, an, the pronouns: she, he, I prepositions: on, under, over, near, by, … Why “ closed ” ? Open: Nouns, Verbs, Adjectives, Adverbs.

POS Tagging Words often have more than one POS: back The back door = JJ On my back = NN Win the voters back = RB Promised to back the bill = VB The POS tagging problem is to determine the POS tag for a particular instance of a word.

POS Tagging Input: Plays well with others Ambiguity: NNS/VBZ UH/JJ/NN/RB IN NNS Output:Plays/VBZ well/RB with/IN others/NNS Uses: MT: reordering of adjectives and nouns (say from Spanish to English) Text-to-speech (how do we pronounce “lead”?) Can write regexps like (Det) Adj* N+ over the output for phrases, etc. Input to a syntactic parser Penn Treebank POS tags Penn Treebank POS tags

The Penn TreeBank Tagset 7

Penn Treebank tags 8

POS tagging performance How many tags are correct? (Tag accuracy) About 97% currently But baseline is already 90% Baseline is performance of stupidest possible method Tag every word with its most frequent tag Tag unknown words as nouns Partly easy because Many words are unambiguous You get points for them (the, a, etc.) and for punctuation marks!

Deciding on the correct part of speech can be difficult even for people Mrs/NNP Shaefer/NNP never/RB got/VBD around/RP to/TO joining/VBG All/DT we/PRP gotta/VBN do/VB is/VBZ go/VB around/IN the/DT corner/NN Chateau/NNP Petrus/NNP costs/VBZ around/RB 250/CD

How difficult is POS tagging? About 11% of the word types in the Brown corpus are ambiguous with regard to part of speech But they tend to be very common words. E.g., that I know that he is honest = IN Yes, that play was nice = DT You can’t go that far = RB 40% of the word tokens are ambiguous

Sources of information What are the main sources of information for POS tagging? Knowledge of neighboring words Bill saw that man yesterday NNP NN DT NN NN VB VB(D) IN VB NN Knowledge of word probabilities man is rarely used as a verb…. The latter proves the most useful, but the former also helps

More and Better Features  Feature- based tagger Can do surprisingly well just looking at a word by itself: Wordthe: the  DT Lowercased wordImportantly: importantly  RB Prefixesunfathomable: un-  JJ SuffixesImportantly: -ly  RB CapitalizationMeridian: CAP  NNP Word shapes35-year: d-x  JJ Then build a classifier to predict tag Maxent P(t|w): 93.7% overall / 82.6% unknown

Overview: POS Tagging Accuracies Rough accuracies: Most freq tag: ~90% / ~50% Trigram HMM: ~95% / ~55% Maxent P(t|w): 93.7% / 82.6% TnT (HMM++): 96.2% / 86.0% MEMM tagger: 96.9% / 86.9% Bidirectional dependencies:97.2% / 90.0% Upper bound: ~98% (human agreement) Most errors on unknown words

POS tagging as a sequence classification task We are given a sentence (an “observation” or “sequence of observations”) Secretariat is expected to race tomorrow She promised to back the bill What is the best sequence of tags which corresponds to this sequence of observations? Probabilistic view: Consider all possible sequences of tags Out of this universe of sequences, choose the tag sequence which is most probable given the observation sequence of n words w1…wn.

How do we apply classification to sequences?

Sequence Labeling as Classification Classify each token independently but use as input features, information about the surrounding tokens (sliding window). Slide from Ray Mooney John saw the saw and decided to take it to the table. classifier NNP

Sequence Labeling as Classification Classify each token independently but use as input features, information about the surrounding tokens (sliding window). Slide from Ray Mooney John saw the saw and decided to take it to the table. classifier VBD

Sequence Labeling as Classification Classify each token independently but use as input features, information about the surrounding tokens (sliding window). Slide from Ray Mooney John saw the saw and decided to take it to the table. classifier DT

Sequence Labeling as Classification Classify each token independently but use as input features, information about the surrounding tokens (sliding window). Slide from Ray Mooney John saw the saw and decided to take it to the table. classifier NN

Sequence Labeling as Classification Classify each token independently but use as input features, information about the surrounding tokens (sliding window). Slide from Ray Mooney John saw the saw and decided to take it to the table. classifier CC

Sequence Labeling as Classification Classify each token independently but use as input features, information about the surrounding tokens (sliding window). Slide from Ray Mooney John saw the saw and decided to take it to the table. classifier VBD

Sequence Labeling as Classification Classify each token independently but use as input features, information about the surrounding tokens (sliding window). Slide from Ray Mooney John saw the saw and decided to take it to the table. classifier TO

Sequence Labeling as Classification Classify each token independently but use as input features, information about the surrounding tokens (sliding window). Slide from Ray Mooney John saw the saw and decided to take it to the table. classifier VB

Sequence Labeling as Classification Classify each token independently but use as input features, information about the surrounding tokens (sliding window). Slide from Ray Mooney John saw the saw and decided to take it to the table. classifier PRP

Sequence Labeling as Classification Classify each token independently but use as input features, information about the surrounding tokens (sliding window). Slide from Ray Mooney John saw the saw and decided to take it to the table. classifier IN

Sequence Labeling as Classification Classify each token independently but use as input features, information about the surrounding tokens (sliding window). Slide from Ray Mooney John saw the saw and decided to take it to the table. classifier DT

Sequence Labeling as Classification Classify each token independently but use as input features, information about the surrounding tokens (sliding window). Slide from Ray Mooney John saw the saw and decided to take it to the table. classifier NN

Sequence Labeling as Classification Using Outputs as Inputs Better input features are usually the categories of the surrounding tokens, but these are not available yet. Can use category of either the preceding or succeeding tokens by going forward or back and using previous output. Slide from Ray Mooney

Forward Classification Slide from Ray Mooney John saw the saw and decided to take it to the table. classifier NNP

Forward Classification Slide from Ray Mooney NNP John saw the saw and decided to take it to the table. classifier VB D

Forward Classification Slide from Ray Mooney NNP VBD John saw the saw and decided to take it to the table. classifier DT

Forward Classification Slide from Ray Mooney NNP VBD DT John saw the saw and decided to take it to the table. classifier NN

Forward Classification Slide from Ray Mooney NNP VBD DT NN John saw the saw and decided to take it to the table. classifier CC

Forward Classification Slide from Ray Mooney NNP VBD DT NN CC John saw the saw and decided to take it to the table. classifier VBD

Forward Classification Slide from Ray Mooney NNP VBD DT NN CC VBD John saw the saw and decided to take it to the table. classifier TO

Forward Classification Slide from Ray Mooney NNP VBD DT NN CC VBD TO John saw the saw and decided to take it to the table. classifier VB

Backward Classification Disambiguating “to” in this case would be even easier backward. Slide from Ray Mooney DT NN John saw the saw and decided to take it to the table. classifier IN

Backward Classification Disambiguating “to” in this case would be even easier backward. Slide from Ray Mooney IN DT NN John saw the saw and decided to take it to the table. classifier PR P

Backward Classification Disambiguating “to” in this case would be even easier backward. Slide from Ray Mooney PRP IN DT NN John saw the saw and decided to take it to the table. classifier VB

Backward Classification Disambiguating “to” in this case would be even easier backward. Slide from Ray Mooney VB PRP IN DT NN John saw the saw and decided to take it to the table. classifier TO

Backward Classification Disambiguating “to” in this case would be even easier backward. Slide from Ray Mooney TO VB PRP IN DT NN John saw the saw and decided to take it to the table. classifier VBD

Backward Classification Disambiguating “to” in this case would be even easier backward. Slide from Ray Mooney VBD TO VB PRP IN DT NN John saw the saw and decided to take it to the table. classifier CC

Backward Classification Disambiguating “to” in this case would be even easier backward. Slide from Ray Mooney CC VBD TO VB PRP IN DT NN John saw the saw and decided to take it to the table. classifier VBD

Backward Classification Disambiguating “to” in this case would be even easier backward. Slide from Ray Mooney VBD CC VBD TO VB PRP IN DT NN John saw the saw and decided to take it to the table. classifier DT

Backward Classification Disambiguating “to” in this case would be even easier backward. Slide from Ray Mooney DT VBD CC VBD TO VB PRP IN DT NN John saw the saw and decided to take it to the table. classifier VBD

Backward Classification Disambiguating “to” in this case would be even easier backward. Slide from Ray Mooney VBD DT VBD CC VBD TO VB PRP IN DT NN John saw the saw and decided to take it to the table. classifier NNP

The Maximum Entropy Markov Model (MEMM) A sequence version of the logistic regression (also called maximum entropy) classifier. Find the best series of tags: 48

The Maximum Entropy Markov Model (MEMM) 49

Features for the classifier at each tag 50

More features 51

MEMM computes the best tag sequence 52

MEMM Decoding Simplest algorithm: What we use in practice: The Viterbi algorithm A version of the same dynamic programming algorithm we used to compute minimum edit distance. 53

The Stanford Tagger Is a bidirectional version of the MEMM called a cyclic dependency network Stanford tagger: 54