Part-of-Speech (POS) tagging See Eric Brill “Part-of-speech tagging”. Chapter 17 of R Dale, H Moisl & H Somers (eds) Handbook of Natural Language Processing,

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.
1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
Universität des Saarlandes Seminar: Recent Advances in Parsing Technology Winter Semester Jesús Calvillo.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
1 A Hidden Markov Model- Based POS Tagger for Arabic ICS 482 Presentation A Hidden Markov Model- Based POS Tagger for Arabic By Saleh Yousef Al-Hudail.
Ch-9: Markov Models Prepared by Qaiser Abbas ( )
Tagging with Hidden Markov Models. Viterbi Algorithm. Forward-backward algorithm Reading: Chap 6, Jurafsky & Martin Instructor: Paul Tarau, based on Rada.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Stemming, tagging and chunking Text analysis short of parsing.
Part-of-speech Tagging cs224n Final project Spring, 2008 Tim Lai.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
POS based on Jurafsky and Martin Ch. 8 Miriam Butt October 2003.
Tagging – more details Reading: D Jurafsky & J H Martin (2000) Speech and Language Processing, Ch 8 R Dale et al (2000) Handbook of Natural Language Processing,
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.
Part of speech (POS) tagging
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging and Chunking with Maximum Entropy Model Sandipan Dandapat.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
BIOI 7791 Projects in bioinformatics Spring 2005 March 22 © Kevin B. Cohen.
Natural Language Understanding
Albert Gatt Corpora and Statistical Methods Lecture 9.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Robert Hass CIS 630 April 14, 2010 NP NP↓ Super NP tagging JJ ↓
Part-of-Speech Tagging
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Some Advances in Transformation-Based Part of Speech Tagging
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging for Bengali with Hidden Markov Model Sandipan Dandapat,
인공지능 연구실 정 성 원 Part-of-Speech Tagging. 2 The beginning The task of labeling (or tagging) each word in a sentence with its appropriate part of speech.
Natural Language Processing Lecture 6 : Revision.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.
S1: Chapter 1 Mathematical Models Dr J Frost Last modified: 6 th September 2015.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.
Word classes and part of speech tagging Chapter 5.
Tokenization & POS-Tagging
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Natural Language Processing
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Supertagging CMSC Natural Language Processing January 31, 2006.
February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking.
Part-of-speech tagging
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
Stochastic and Rule Based Tagger for Nepali Language Krishna Sapkota Shailesh Pandey Prajol Shrestha nec & MPP.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
POS Tagging1 POS Tagging 1 POS Tagging Rule-based taggers Statistical taggers Hybrid approaches.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Word classes and part of speech tagging Chapter 5.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Natural Language Processing Vasile Rus
Part-Of-Speech Tagging Radhika Mamidi. POS tagging Tagging means automatic assignment of descriptors, or tags, to input tokens. Example: “Computational.
CSC 594 Topics in AI – Natural Language Processing
CSCI 5832 Natural Language Processing
CSCI 5832 Natural Language Processing
Presentation transcript:

Part-of-Speech (POS) tagging See Eric Brill “Part-of-speech tagging”. Chapter 17 of R Dale, H Moisl & H Somers (eds) Handbook of Natural Language Processing, New York (2000): Marcel Dekker D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000): Prentice Hall, Chapter 8 CD Manning & H Schütze: Foundations of Statistical Natural Language Processing, Cambridge, Mass (1999): MIT Press, Chapter 10. [skip the maths bits if too daunting]

2/24 Word categories A.k.a. parts of speech (POSs) Important and useful to identify words by their POS –To distinguish homonyms –To enable more general word searches POS familiar (?) from school and/or language learning (noun, verb, adjective, etc.)

3/24 Word categories Recall that we distinguished –open-class categories (noun, verb, adjective, adverb) –Closed-class categories (preposition, determiner, pronoun, conjunction, …) While the big four are fairly clearcut, it is less obvious exactly what and how many closed-class categories there may be

4/24 POS tagging Labelling words for POS can be done by –dictionary lookup –morphological analysis –“tagging” Identifying POS can be seen as a prerequisite to parsing, and/or a process in its own right However, there are some differences: –Parsers often work with the most simple set of word categories, subcategorized by feature (or attribute- value) schemes –Indeed the parsing procedure may contribute to the disambiguation of homonyms

5/24 POS tagging POS tagging, per se, aims to identify word-category information somewhat independently of sentence structure … … and typically uses rather different means POS tags are generally shown as labels on words: John/NPN saw/VB the/AT book/NCN on/PRP the/AT table/NN./PNC

6/24 What is a tagger? Lack of distinction between … –Software which allows you to create something you can then use to tag input text, e.g. “Brill’s tagger” –The result of running such software, e.g. a tagger for English (based on the such-and-such corpus) Taggers (even rule-based ones) are almost invariably trained on a given corpus “Tagging” usually understood to mean “POS tagging”, but you can have other types of tags (eg semantic tags)

7/24 Tagging vs. parsing Once tagger is “trained”, process consists straightforward look-up, plus local context (and sometimes morphology) Tagger will attempt to assign a tag to unknown words, and to disambiguate homographs “Tagset” (list of categories) usually larger with more distinctions than categories used in parsing

8/24 Tagset Parsing usually has basic word-categories, whereas tagging makes more subtle distinctions E.g. noun sg vs pl vs genitive, common vs proper, +is, +has, … and all combinations Parser uses maybe categories, tagger may use

9/24 Simple taggers Default tagger has one tag per word, and assigns it on the basis of dictionary lookup –Tags may indicate ambiguity but not resolve it, e.g. NVB for noun-or-verb Words may be assigned different tags with associated probabilities –Tagger will assign most probable tag unless –there is some way to identify when a less probable tag is in fact correct Tag sequences may be defined by regular expressions, and assigned probabilities (including 0 for illegal sequences – negative rules)

10/24 Rule-based taggers Earliest type of tagging: two stages Stage 1: look up word in lexicon to give list of potential POSs Stage 2: Apply rules which certify or disallow tag sequences Rules originally handwritten; more recently Machine Learning methods can be used cf transformation-based tagging, below

11/24 How do they work? Tagger must be “trained” Many different techniques, but typically … Small “training corpus” hand-tagged Tagging rules learned automatically Rules define most likely sequence of tags Rules based on –Internal evidence (morphology) –External evidence (context) –Probabilities

12/24 What probabilities do we have to learn? (a) Individual word probabilities: Probability that a given tag t is appropriate for a given word w –Easy (in principle): learn from training corpus: –Problem of “sparse data”: Add a small amount to each calculation, so we get no zeros run occurs 4800 times in the training corpus: 3600 times as a verb, 1200 times as a noun: P(verb|run) = 0.75

13/24 (b) Tag sequence probability: Probability that a given tag sequence t 1,t 2,…,t n is appropriate for a given word sequence w 1,w 2,…,w n –P(t 1,t 2,…,t n | w 1,w 2,…,w n ) = ??? –Too hard to calculate for entire sequence: P(t 1,t 2,t 3,t 4,...) = P(t 2 |t 1 )  P(t 3 |t 1,t 2 )  P(t 4 |t 1,t 2,t 3 )  … –Subsequence is more tractable –Sequence of 2 or 3 should be enough: Bigram model: P(t 1,t 2 ) = P(t 2 |t 1 ) Trigram model: P(t 1,t 2,t 3 ) = P(t 2 |t 1 )  P(t 3 |t 2 ) N-gram model:

14/24 More complex taggers Bigram taggers assign tags on the basis of sequences of two words (usually assigning tag to word n on the basis of word n-1 ) An nth-order tagger assigns tags on the basis of sequences of n words As the value of n increases, so does the complexity of the statistical calculation involved in comparing probability combinations

15/24 Stochastic taggers Nowadays, pretty much all taggers are statistics-based and have been since 1980s (or even earlier... Some primitive algorithms were already published in 60s and 70s) Most common is based on Hidden Markov Models (also found in speech processing, etc.)

16/24 (Hidden) Markov Models Probability calculations imply Markov models: we assume that P(t|w) is dependent only on the (or, a sequence of) previous word(s) (Informally) Markov models are the class of probabilistic models that assume we can predict the future without taking too much account of the past Markov chains can be modelled by finite state automata: the next state in a Markov chain is always dependent on some finite history of previous states Model is “hidden” if it is actually a succession of Markov models, whose intermediate states are of no interest

17/24 Supervised vs unsupervised training Learning tagging rules from a marked-up corpus (supervised learning) gives very good results (98% accuracy) –Though assigning most probable tag, and “proper noun” to unknowns will give 90% But it depends on having a corpus already marked up to a high quality If this is not available, we have to try something else: –“forward-backward” algorithm –A kind of “bootstrapping” approach

18/24 Forward-backward (Baum-Welch) algorithm Start with initial probabilities –If nothing known, assume all Ps equal Adjust the individual probabilities so as to increase the overall probability. Re-estimate the probabilities on the basis of the last iteration Continue until convergence –i.e. there is no improvement, or improvement is below a threshold All this can be done automatically

19/24 Transformation-based tagging Eric Brill (1993) Start from an initial tagging, and apply a series of transformations Transformations are learned as well, from the training data Captures the tagging data in much fewer parameters than stochastic models The transformations learned (often) have linguistic “reality”

20/24 Transformation-based tagging Three stages: –Lexical look-up –Lexical rule application for unknown words –Contextual rule application to correct mis-tags Painting analogy

21/24 Transformation-based learning Change tag a to b when: –Internal evidence (morphology) –Contextual evidence One or more of the preceding/following words has a specific tag One or more of the preceding/following words is a specific word One or more of the preceding/following words has a certain form Order of rules is important –Rules can change a correct tag into an incorrect tag, so another rule might correct that “mistake”

22/24 Transformation-based tagging: examples if a word is currently tagged NN, and has a suffix of length 1 which consists of the letter 's', change its tag to NNS if a word has a suffix of length 2 consisting of the letter sequence 'ly', change its tag to RB (regardless of the initial tag) change VBN to VBD if previous word is tagged as NN Change VBD to VBN if previous word is ‘by’

23/24 Transformation-based tagging: example Booth/NP killed/VBN Abraham/NP Lincoln/NP Abraham/NP Lincoln/NP was/BEDZ shot/VBD by/BY Booth/NP He/PPS witnessed/VBD Lincoln/NP killed/VBN by/BY Booth/NP Example after lexical lookup Booth/NP killed/VBD Abraham/NP Lincoln/NP Abraham/NP Lincoln/NP was/BEDZ shot/VBN by/BY Booth/NP He/PPS witnessed/VBD Lincoln/NP killed/VBN by/BY Booth/NP Example after application of contextual rule ’vbd vbn NEXTWORD by’ Booth/NP killed/VBD Abraham/NP Lincoln/NP Abraham/NP Lincoln/NP was/BEDZ shot/VBD by/BY Booth/NP He/PPS witnessed/VBD Lincoln/NP killed/VBD by/BY Booth/NP Example after application of contextual rule ’vbn vbd PREVTAG np’

24/24 Tagging – final word Many taggers now available for download Sometimes not clear whether “tagger” means –Software enabling you to build a tagger given a corpus –An already built tagger for a given language Because a given tagger (2 nd sense) will have been trained on some corpus, it will be biased towards that (kind of) corpus –Question of goodness of match between original training corpus and material you want to use the tagger on