POS based on Jurafsky and Martin Ch. 8 Miriam Butt October 2003.

Slides:



Advertisements
Similar presentations
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Advertisements

Outline Why part of speech tagging? Word classes
Word Classes and Part-of-Speech (POS) Tagging
1 Part of Speech tagging Lecture 9 Slides adapted from: Dan Jurafsky, Julia Hirschberg, Jim Martin.
Chapter 8. Word Classes and Part-of-Speech Tagging From: Chapter 8 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech.
BİL711 Natural Language Processing
Part-of-speech tagging. Parts of Speech Perhaps starting with Aristotle in the West (384–322 BCE) the idea of having parts of speech lexical categories,
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
February 2007CSA3050: Tagging II1 CSA2050: Natural Language Processing Tagging 2 Rule-Based Tagging Stochastic Tagging Hidden Markov Models (HMMs) N-Grams.
LINGUISTICA GENERALE E COMPUTAZIONALE DISAMBIGUAZIONE DELLE PARTI DEL DISCORSO.
Natural Language Processing Lecture 8—9/24/2013 Jim Martin.
Hidden Markov Models IP notice: slides from Dan Jurafsky.
September PART-OF-SPEECH TAGGING Universita’ di Venezia 1 Ottobre 2003.
LING 388 Language and Computers Lecture 22 11/25/03 Sandiway FONG.
Hidden Markov Model (HMM) Tagging  Using an HMM to do POS tagging  HMM is a special case of Bayesian inference.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 22: 11/9.
1 Part of Speech Tagging (Chapter 5) September 2009 Lecture #6.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Part-Of-Speech (POS) Tagging.
Tagging – more details Reading: D Jurafsky & J H Martin (2000) Speech and Language Processing, Ch 8 R Dale et al (2000) Handbook of Natural Language Processing,
1 I256: Applied Natural Language Processing Marti Hearst Sept 20, 2006.
POS Tagging HMM Taggers (continued). Today Walk through the guts of an HMM Tagger Address problems with HMM Taggers, specifically unknown words.
Part of speech (POS) tagging
1 PART-OF-SPEECH TAGGING. 2 Topics of the next three lectures Tagsets Rule-based tagging Brill tagger Tagging with Markov models The Viterbi algorithm.
CMSC 723 / LING 645: Intro to Computational Linguistics November 3, 2004 Lecture 9 (Dorr): Word Classes, POS Tagging (Chapter 8) Intro to Syntax (Start.
Word classes and part of speech tagging Chapter 5.
Announcements Main CSE file server went down last night –Hand in your homework using ‘submit_cse467’ as soon as you can – no penalty if handed in today.
8. Word Classes and Part-of-Speech Tagging 2007 년 5 월 26 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.287 ~ 303.
1 POS Tagging: Introduction Heng Ji Feb 2, 2008 Acknowledgement: some slides from Ralph Grishman, Nicolas Nicolov, J&M.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Parts of Speech Sudeshna Sarkar 7 Aug 2008.
Natural Language Processing References: 1. Foundations of Statistical Natural Language Processing 2. Speech and Language Processing Berlin Chen Department.
Natural Language Processing
CS 4705 Hidden Markov Models Julia Hirschberg CS4705.
Natural Language Processing Lecture 8—2/5/2015 Susan W. Brown.
Lecture 6 POS Tagging Methods Topics Taggers Rule Based Taggers Probabilistic Taggers Transformation Based Taggers - Brill Supervised learning Readings:
Fall 2005 Lecture Notes #8 EECS 595 / LING 541 / SI 661 Natural Language Processing.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
10/24/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Word Bi-grams and PoS Tags COMP3310 Natural Language Processing Eric Atwell,
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
Word classes and part of speech tagging Chapter 5.
Speech and Language Processing Ch8. WORD CLASSES AND PART-OF- SPEECH TAGGING.
Tokenization & POS-Tagging
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Word classes and part of speech tagging 09/28/2004 Reading: Chap 8, Jurafsky & Martin Instructor: Rada Mihalcea Note: Some of the material in this slide.
Natural Language Processing
CSA3202 Human Language Technology HMMs for POS Tagging.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Part-of-speech tagging
Human Language Technology Part of Speech (POS) Tagging II Rule-based Tagging.
NLP. Introduction to NLP Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-based.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Word classes and part of speech tagging Chapter 5.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
1 COMP790: Statistical NLP POS Tagging Chap POS tagging Goal: assign the right part of speech (noun, verb, …) to words in a text “The/AT representative/NN.
6/18/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini.
Speech and Language Processing SLP Chapter 5. 10/31/1 2 Speech and Language Processing - Jurafsky and Martin 2 Today  Parts of speech (POS)  Tagsets.
1 Natural Language Processing Vasile Rus
Lecture 5 POS Tagging Methods
CSCI 5832 Natural Language Processing
CS4705 Part of Speech tagging
CSC 594 Topics in AI – Natural Language Processing
CSCI 5832 Natural Language Processing
Part of Speech Tagging September 9, /12/2018.
Lecture 6: Part of Speech Tagging (II): October 14, 2004 Neal Snider
Natural Language Processing
CPSC 503 Computational Linguistics
Presentation transcript:

POS based on Jurafsky and Martin Ch. 8 Miriam Butt October 2003

Parts of Speech There are ten parts of speech and they are all troublesome. Mark Twain The awful German Language The definitions [of the parts of speech] are very far from having attained the degree of exactitude found in Euclidean Geometry. Otto Jespersen The Philosophy of Grammar

Parts of Speech Go back to early Greek grammar (techne by Thrax). noun, verb, pronoun, preposition, adverb, conjunction, participle, article. CL Applications: 45 (Penn Treebank) 61 (CLAWS, for the BNC) 54 (STTS, German standard) 8 POS:

POS Tags Compare the Penn Tagset with STTS in detail. Why so many? Machines (and humans) need to be as accurate as possible. (Though ADV tends to be a garbage category). Why the Differences? Different Languages have different requirements.

Word Classes Open Class: Nouns, Verbs, Adjectives, Adverbs Closed Class: Auxiliaries, Articles, Conjunctions, Prepositions/Particles vs. Because languages have open word classes, one cannot simply list word+tag associations. What to do?

POS Tagging 1.Manual Tagging 2.Machine Tagging 3.A Combination of Both Methods:

Manual Tagging 1.Agree on a Tagset after much discussion. 2.Chose a corpus, annotate it manually by two or more people. 3.Check on inter-annotator agreement. 4.Fix any problems with the Tagset (if still possible). Methods:

Machine Tagging 1.Rule based tagging. 2.Stochastic tagging. 3.A combination of both. Methods:

Rule Based Tagging 1.Use a lexicon to assign each word potential POS. 2.Disambiguate POS (mostly open classes) via rules: to race/VB vs. the race/NN This entails some knowledge of syntax (patterns of word combination). Mostly used by early applications (1960s-1970s) Methods:

Rule Based Tagging: ENGTWOL 1.Morphology for lemmatization entries for English word stems (first pass) handwritten constraints to eliminate tags (second pass) ENGTWOL (Voutilainen 1995) Methods:

Rule Based Tagging: ENGTWOL PavlovPAVLOV N NOM SG PROPER hadHAVE V PAST VFIN SVO HAVE PCP2 SVO shownSHOW PCP2 SVOO SVO SV that ADV PRON DEM SG DET CENTRAL DEM SG CS salivationN NOM SG Example: First Pass

Rule Based Tagging: ENGTWOL Adverbial-that rule Given input “that” if (+1 A/ADV/QUANT); /* if next word is one of these */ (+2 SENT-LIM); /* and following is a sentence boundary */ (NOT -1 SVO/A); /* and previous word is not a verb like */ /* consider (object complements) */ /* “I consider that odd.” */ then eliminate non-ADV tags else eliminate ADV tag Example: Second Pass

Machine Tagging 1.Use a lexicon to assign each word potential POS. 2.Disambiguate POS (mostly open classes) via learned patterns: what type of word is most likely to follow a given POS? to race/VB vs. the race/NN This entails machine learning. Wide-spread Today Methods:

Machine Learning 1.Take a hand tagged corpus 2.Have the machine learn the patterns in the corpus. 3.Give the machine a lexicon of word+tag associations. 4. Give the machine a new corpus to tag. 5.The machine uses the initial information in the lexicon and the patterns it has learned to tag the new corpus. 6.Examine the result and correct the output. 7.Give the corrected output back to the machine for a new round. 8.Keep going until the machine is not learning any more. Methods:

Machine Tagging Probability of Tag Assignment P(word|tag) * P(tag|previous n tags) Example in J+M: HMM (Hidden Markov Models) Others also possible, e.g. Neural Nets Bigram or Trigram Strategy is commonly used. If we are expecting a tag (e.g., V), how likely is it that this word would appear (e.g., race)?

Machine Tagging Example from J+M (1) Secretariat/NNP is /VBZ expected/VBN to/TO race/?? tomorrow/NN (2) People/NNS continue/VBP to/TO inquire/VB the/DT reason/NN for/IN the/DT race/?? for/IN outer/JJ space/NN Bigram Analysis P(race|VB)*P(VB|TO) vs. P(race|NN)*P(NN|TO) P(race|VB)*P(VB|DT) vs. P(race|NN)*P(NN|DT) race: VB or NN?

Machine Tagging Example from J+M Likelihoods from Brown+Switchboard Corpora P(race|VB) =.00003P(VB|TO) =. 34 P(race|NN) =.00041P(NN|TO) =. 021 Result for first sentence: race/VB P(race|VB)*P(VB|TO) = P(race|NN)*P(NN|TO) =

Combination Tagging Most taggers today use a combination of some rules plus learned patterns. The famous Brill Tagger uses a lexicon, and handwritten rules plus rules learned on the basis of a corpus (previous errors in tagging). Accuracy of today’s taggers: 93%-97%. So, they are accurate enough to be a useful first step in many applications.

Common Tagging Problems Multiple Words Unknown Words

Treebanks Machine learning can only be done on the basis of a huge corpus. Treebanks store these types of corpora (mostly initially tagged by hand). Examples: Penn Treebank, BNC, COSMAS, TIGER

Some Online Taggers