1 PART-OF-SPEECH TAGGING. 2 Topics of the next three lectures Tagsets Rule-based tagging Brill tagger Tagging with Markov models The Viterbi algorithm.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.
Advertisements

Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Outline Why part of speech tagging? Word classes
Chapter 8. Word Classes and Part-of-Speech Tagging From: Chapter 8 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech.
BİL711 Natural Language Processing
February 2007CSA3050: Tagging II1 CSA2050: Natural Language Processing Tagging 2 Rule-Based Tagging Stochastic Tagging Hidden Markov Models (HMMs) N-Grams.
LINGUISTICA GENERALE E COMPUTAZIONALE DISAMBIGUAZIONE DELLE PARTI DEL DISCORSO.
Natural Language Processing Lecture 8—9/24/2013 Jim Martin.
Hidden Markov Models IP notice: slides from Dan Jurafsky.
September PART-OF-SPEECH TAGGING Universita’ di Venezia 1 Ottobre 2003.
Statistical NLP: Lecture 11
Hidden Markov Model (HMM) Tagging  Using an HMM to do POS tagging  HMM is a special case of Bayesian inference.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 22: 11/9.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Part-Of-Speech (POS) Tagging.
POS based on Jurafsky and Martin Ch. 8 Miriam Butt October 2003.
Tagging – more details Reading: D Jurafsky & J H Martin (2000) Speech and Language Processing, Ch 8 R Dale et al (2000) Handbook of Natural Language Processing,
1 I256: Applied Natural Language Processing Marti Hearst Sept 20, 2006.
POS Tagging HMM Taggers (continued). Today Walk through the guts of an HMM Tagger Address problems with HMM Taggers, specifically unknown words.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.
I256 Applied Natural Language Processing Fall 2009 Lecture 6 Introduction of Graphical Models Part of speech tagging Barbara Rosario.
Part of speech (POS) tagging
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
CMSC 723 / LING 645: Intro to Computational Linguistics November 3, 2004 Lecture 9 (Dorr): Word Classes, POS Tagging (Chapter 8) Intro to Syntax (Start.
Word classes and part of speech tagging Chapter 5.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 13, 2004.
1 I256: Applied Natural Language Processing Marti Hearst Sept 18, 2006.
Albert Gatt Corpora and Statistical Methods Lecture 9.
Heshaam Faili University of Tehran
8. Word Classes and Part-of-Speech Tagging 2007 년 5 월 26 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.287 ~ 303.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Natural Language Processing References: 1. Foundations of Statistical Natural Language Processing 2. Speech and Language Processing Berlin Chen Department.
Natural Language Processing
CS 4705 Hidden Markov Models Julia Hirschberg CS4705.
Natural Language Processing Lecture 8—2/5/2015 Susan W. Brown.
Lecture 6 POS Tagging Methods Topics Taggers Rule Based Taggers Probabilistic Taggers Transformation Based Taggers - Brill Supervised learning Readings:
Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.
Fall 2005 Lecture Notes #8 EECS 595 / LING 541 / SI 661 Natural Language Processing.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
10/24/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini.
Page 1 Part-of-Speech Tagging L545 Spring Page 2 POS Tagging Problem  Given a sentence W1…Wn and a tagset of lexical categories, find the most.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Word Bi-grams and PoS Tags COMP3310 Natural Language Processing Eric Atwell,
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.
Word classes and part of speech tagging Chapter 5.
Tokenization & POS-Tagging
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Word classes and part of speech tagging 09/28/2004 Reading: Chap 8, Jurafsky & Martin Instructor: Rada Mihalcea Note: Some of the material in this slide.
CSA3202 Human Language Technology HMMs for POS Tagging.
Human Language Technology Part of Speech (POS) Tagging II Rule-based Tagging.
NLP. Introduction to NLP Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-based.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
POS Tagging1 POS Tagging 1 POS Tagging Rule-based taggers Statistical taggers Hybrid approaches.
Modified from Diane Litman's version of Steve Bird's notes 1 Rule-Based Tagger The Linguistic Complaint –Where is the linguistic knowledge of a tagger?
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Word classes and part of speech tagging Chapter 5.
1 COMP790: Statistical NLP POS Tagging Chap POS tagging Goal: assign the right part of speech (noun, verb, …) to words in a text “The/AT representative/NN.
6/18/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini.
Speech and Language Processing SLP Chapter 5. 10/31/1 2 Speech and Language Processing - Jurafsky and Martin 2 Today  Parts of speech (POS)  Tagsets.
1 Natural Language Processing Vasile Rus
Part-Of-Speech Tagging Radhika Mamidi. POS tagging Tagging means automatic assignment of descriptors, or tags, to input tokens. Example: “Computational.
Lecture 5 POS Tagging Methods
Word classes and part of speech tagging
CSC 594 Topics in AI – Natural Language Processing
CSCI 5832 Natural Language Processing
CSCI 5832 Natural Language Processing
Lecture 6: Part of Speech Tagging (II): October 14, 2004 Neal Snider
Presentation transcript:

1 PART-OF-SPEECH TAGGING

2 Topics of the next three lectures Tagsets Rule-based tagging Brill tagger Tagging with Markov models The Viterbi algorithm

3 POS tagging: the problem People/NNS continue/VBP to/TO inquire/VB the/DT reason/NN for/IN the/DT race/NN for/IN outer/JJ space/NN Problem: assign a tag to race Requires: tagged corpus

4 Why is POS tagging useful? Makes search of patterns of interest to linguists in a corpus much easier (original motivation!) Useful as a basis for parsing For applications such as IR, provides some degree of meaning distinction In ASR, helps selection of next word

5 Ambiguity in POS tagging The AT man NN VB still NN VB RB saw NN VBD her PPO PP$

6 How hard is POS tagging? Number of tags Number of words types In the Brown corpus, % of word types ambiguous - 40% of word TOKENS

7 Frequency + Context Both the Brill tagger and HMM-based taggers achieve good results by combining – FREQUENCY I poured FLOUR/NN into the bowl. Peter should FLOUR/VB the baking tray – Information about CONTEXT I saw the new/JJ PLAY/NN in the theater. The boy will/MD PLAY/VBP in the garden.

8 The importance of context Secretariat/NNP is/VBZ expected/VBN to/TO race/VB tomorrow/NN People/NNS continue/VBP to/TO inquire/VB the/DT reason/NN for/IN the/DT race/NN for/IN outer/JJ space/NN

9 Choosing a tagset The choice of tagset greatly affects the difficulty of the problem Need to strike a balance between – Getting better information about context (best: introduce more distinctions) – Make it possible for classifiers to do their job (need to minimize distinctions)

10 Some of the best-known Tagsets Brown corpus: 87 tags Penn Treebank: 45 tags Lancaster UCREL C5 (used to tag the BNC): 61 tags Lancaster C7: 145 tags

11 Important Penn Treebank tags

12 Verb inflection tags

13 The entire Penn Treebank tagset

14 UCREL C5

15 Tagsets per l’italiano Si-TAL (Pisa, Venezia, IRST,....) PAROLE ???

16 Il tagset di SI-TAL

17 POS tags in the Brown corpus Television/NN has/HVZ yet/RB to/TO work/VB out/RP a/AT living/RBG arrangement/NN with/IN jazz/NN,/, which/VDT comes/VBZ to/IN the/AT medium/NN more/QL as/CS an/AT uneasy/JJ guest/NN than/CS as/CS a/AT relaxed/VBN member/NN of/IN the/AT family/NN./.

18 SGML-based POS in the BNC TROUSERS SUIT There is nothing masculine about these new trouser suits in summer 's soft pastels. Smart and acceptable for city wear but soft enough for relaxed days

19 Esercizi Abbonati al minimo ma la squadra piace Si sta bene in B …

20 Quick test DoCoMo and Sony are to develop a chip that would let people pay for goods through their mobiles.

21 Tagging methods Hand-coded Brill tagger Statistical (Markov) taggers

22 Hand-coded POS tagging: the two-stage architecture Early POS taggers all hand-coded Most of these (Harris, 1962; Greene and Rubin, 1971) and the best of the recent ones, ENGTWOL (Voutilainen, 1995) based on a two-stage architecture

23 Hand-coded rules (ENGTWOL) STEP 1: assign to each word a list of potential parts of speech - in ENGTWOL, this done by a two-lever morphological analyzer (a finite state transducer) STEP 2: use about 1000 hand-coded CONSTRAINTS (if-then rules) to choose a tag using contextual information - the constraints act as FILTERS

24 Example Pavlov had shown that salivation …. PavlovPAVLOV N NOM SG PROPER hadHAVE V PAST VFIN SVO HAVE PCP2 SVOO shownSHOW PCP2 SVOO SVO SG thatADV PRON DEM SG DET CENTRAL DEM SG CS salivationN NOM SG

25 A constraint ADVERBIAL-THAT RULE Given input: “that” if (+1 A/ADV/QUANT); /* next word adj,adv, quant */ (+2 SENT-LIM); /* and following that there is a sentence boundary */ (NOT –1 SVOC/A); /* and previous word is not verb `consider’ */ then eliminate non-ADV tags else eliminate ADV tag.

26 Tagging with lexical frequencies Secretariat/NNP is/VBZ expected/VBN to/TO race/VB tomorrow/NN People/NNS continue/VBP to/TO inquire/VB the/DT reason/NN for/IN the/DT race/NN for/IN outer/JJ space/NN Problem: assign a tag to race given its lexical frequency Solution: we choose the tag that has the greater – P(race|VB) – P(race|NN) Actual estimate from the Switchboard corpus: – P(race|NN) = – P(race|VB) =.00003

27 The Brill tagger An example of TRANSFORMATION-BASED LEARNING Very popular (freely available, works fairly well) A SUPERVISED method: requires a tagged corpus Basic idea: do a quick job first (using frequency), then revise it using contextual rules

28 An example Examples: – It is expected to race tomorrow. – The race for outer space. Tagging algorithm: 1. Tag all uses of “race” as NN (most likely tag in the Brown corpus) It is expected to race/NN tomorrow the race/NN for outer space 2. Use a transformation rule to replace the tag NN with VB for all uses of “race” preceded by the tag TO: It is expected to race/VB tomorrow the race/NN for outer space

29 Transformation-based learning in the Brill tagger 1. Tag the corpus with the most likely tag for each word 2. Choose a TRANSFORMATION that deterministically replaces an existing tag with a new one such that the resulting tagged corpus has the lowest error rate 3. Apply that transformation to the training corpus 4. Repeat 5. Return a tagger that a. first tags using unigrams b. then applies the learned transformations in order

30 The algorithm

31 Examples of learned transformations

32 Templates

33 An example

34 Markov Model POS tagging Again, the problem is to find an `explanation’ with the highest probability: As in yesterday’s case, this can be ‘turned around’ using Bayes’ Rule:

35 Combining frequency and contextual information As in the case of spelling, this equation can be simplified: As we will see, once further simplifications are applied, this equation will encode both FREQUENCY and CONTEXT INFORMATION

36 Three further assumptions MARKOV assumption: a tag only depends on a FIXED NUMBER of previous tags (here, assume bigrams) – Simplify second factor INDEPENDENCE assumption: words are independent from each other. A word’s identity only depends on its own tag – Simplify first factor

37 The final equations FREQUENCY CONTEXT

38 Estimating the probabilities Can be done using Maximum Likelihood Estimation as usual, for BOTH probabilities:

39 An example of tagging with Markov Models : Secretariat/NNP is/VBZ expected/VBN to/TO race/VB tomorrow/NN People/NNS continue/VBP to/TO inquire/VB the/DT reason/NN for/IN the/DT race/DT for/IN outer/JJ space/NN Problem: assign a tag to race given the subsequences – to/TO race/??? – the/DT race/??? Solution: we choose the tag that has the greater of these probabilities: – P(VB|TO) P(race|VB) – P(NN|TO)P(race|NN)

40 Tagging with MMs (2) Actual estimates from the Switchboard corpus: LEXICAL FREQUENCIES: – P(race|NN) = – P(race|VB) = CONTEXT: – P(NN|TO) =.021 – P(VB|TO) =.34 The probabilities: – P(VB|TO) P(race|VB) = – P(NN|TO)P(race|NN) =

41 A graphical interpretation of the POS tagging equations

42 Hidden Markov Models

43 An example

44 Computing the most likely sequence of tags In general, the problem of computing the most likely sequence t 1.. t n could have exponential complexity It can however be solved in polynomial time using an example of DYNAMIC PROGRAMMING: the VITERBI ALGORITHM (Viterbi, 1967) (Also called TRELLIS ALGORITHMs)

45 Trellis algorithms

46 The Viterbi algorithm

47 Viterbi (pseudo-code format)

48 Viterbi: an example

49 Markov chains and Hidden Markov Models Markov chain: only transition probabilities. Each node associated with a single OUTPUT Hidden Markov Models: nodes may have more than one output; probability P(w|t) of outputting word w from state t.

50 Training HMMs The reason why HMMS are so popular is because they come with a LEARNING ALGORITHM: the FORWARD-BACKWARD algorithm (an instance of a class of algorithms called EM algorithms) Basic idea of the forward-backward algorithm: start by assigning random transition and emission probabilities, then iterate

51 Evaluation of POS taggers Can reach up to 96.7% correct on Penn Treebank (see Brants, 2000) (But see next lecture)

52 Additional issues Most of the difference in performance between POS algorithms depends on their treatment of UNKNOWN WORDS Multiple token words (‘Penn Treebank’) Class-based N-grams

53 Other techniques There is a move away from HMMs for this task and towards techniques that make it easier to use multiple features MAXIMUM ENTROPY taggers among the highest performing at the moment

54 Freely available POS taggers Quite a few taggers are freely available – Brill (TBL) – QTAG (HMM; can be trained for other languages) – LT POS (part of the Edinburgh LTG suite of tools) – See Chris Manning’s Statistical NLP resources web page (from the course web page)

55 POS tagging per l’italiano Xerox Grenoble IMMORTALE (Universita’ di Venezia) Pi-Tagger (Universita’ di Pisa)

56 Other kinds of tagging Sense tagging (SEMCOR, SENSEVAL) Syntactic tagging (`supertagging’) Dialogue act tagging Semantic tagging (animacy, etc.)

57 Readings Jurafsky and Martin, chapter 8