February 2007CSA3050: Tagging II1 CSA2050: Natural Language Processing Tagging 2 Rule-Based Tagging Stochastic Tagging Hidden Markov Models (HMMs) N-Grams.

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.
Outline Why part of speech tagging? Word classes
Word Classes and Part-of-Speech (POS) Tagging
Chapter 8. Word Classes and Part-of-Speech Tagging From: Chapter 8 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech.
BİL711 Natural Language Processing
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
LINGUISTICA GENERALE E COMPUTAZIONALE DISAMBIGUAZIONE DELLE PARTI DEL DISCORSO.
Natural Language Processing Lecture 8—9/24/2013 Jim Martin.
Hidden Markov Models IP notice: slides from Dan Jurafsky.
September PART-OF-SPEECH TAGGING Universita’ di Venezia 1 Ottobre 2003.
Hidden Markov Model (HMM) Tagging  Using an HMM to do POS tagging  HMM is a special case of Bayesian inference.
Albert Gatt Corpora and Statistical Methods Lecture 8.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 22: 11/9.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Part-Of-Speech (POS) Tagging.
More about tagging, assignment 2 DAC723 Language Technology Leif Grönqvist 4. March, 2003.
POS based on Jurafsky and Martin Ch. 8 Miriam Butt October 2003.
Tagging – more details Reading: D Jurafsky & J H Martin (2000) Speech and Language Processing, Ch 8 R Dale et al (2000) Handbook of Natural Language Processing,
1 I256: Applied Natural Language Processing Marti Hearst Sept 20, 2006.
POS Tagging HMM Taggers (continued). Today Walk through the guts of an HMM Tagger Address problems with HMM Taggers, specifically unknown words.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.
Part of speech (POS) tagging
1 PART-OF-SPEECH TAGGING. 2 Topics of the next three lectures Tagsets Rule-based tagging Brill tagger Tagging with Markov models The Viterbi algorithm.
CMSC 723 / LING 645: Intro to Computational Linguistics November 3, 2004 Lecture 9 (Dorr): Word Classes, POS Tagging (Chapter 8) Intro to Syntax (Start.
Word classes and part of speech tagging Chapter 5.
1 I256: Applied Natural Language Processing Marti Hearst Sept 18, 2006.
Albert Gatt Corpora and Statistical Methods Lecture 9.
8. Word Classes and Part-of-Speech Tagging 2007 년 5 월 26 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.287 ~ 303.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Parts of Speech Sudeshna Sarkar 7 Aug 2008.
Natural Language Processing References: 1. Foundations of Statistical Natural Language Processing 2. Speech and Language Processing Berlin Chen Department.
Natural Language Processing
CS 4705 Hidden Markov Models Julia Hirschberg CS4705.
Natural Language Processing Lecture 8—2/5/2015 Susan W. Brown.
Lecture 6 POS Tagging Methods Topics Taggers Rule Based Taggers Probabilistic Taggers Transformation Based Taggers - Brill Supervised learning Readings:
1 LIN 6932 Spring 2007 LIN6932: Topics in Computational Linguistics Hana Filip Lecture 4: Part of Speech Tagging (II) - Introduction to Probability February.
Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.
Fall 2005 Lecture Notes #8 EECS 595 / LING 541 / SI 661 Natural Language Processing.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
10/24/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Word Bi-grams and PoS Tags COMP3310 Natural Language Processing Eric Atwell,
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
Word classes and part of speech tagging Chapter 5.
Tokenization & POS-Tagging
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Hidden Markov Models & POS Tagging Corpora and Statistical Methods Lecture 9.
Word classes and part of speech tagging 09/28/2004 Reading: Chap 8, Jurafsky & Martin Instructor: Rada Mihalcea Note: Some of the material in this slide.
Albert Gatt LIN3022 Natural Language Processing Lecture 7.
Natural Language Processing
CSA3202 Human Language Technology HMMs for POS Tagging.
February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking.
Human Language Technology Part of Speech (POS) Tagging II Rule-based Tagging.
NLP. Introduction to NLP Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-based.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Modified from Diane Litman's version of Steve Bird's notes 1 Rule-Based Tagger The Linguistic Complaint –Where is the linguistic knowledge of a tagger?
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Word classes and part of speech tagging Chapter 5.
6/18/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini.
Speech and Language Processing SLP Chapter 5. 10/31/1 2 Speech and Language Processing - Jurafsky and Martin 2 Today  Parts of speech (POS)  Tagsets.
Lecture 5 POS Tagging Methods
CSC 594 Topics in AI – Natural Language Processing
CSCI 5832 Natural Language Processing
CSCI 5832 Natural Language Processing
Lecture 6: Part of Speech Tagging (II): October 14, 2004 Neal Snider
Classical Part of Speech (PoS) Tagging
CPSC 503 Computational Linguistics
Presentation transcript:

February 2007CSA3050: Tagging II1 CSA2050: Natural Language Processing Tagging 2 Rule-Based Tagging Stochastic Tagging Hidden Markov Models (HMMs) N-Grams

February 2007CSA3050: Tagging II2 Tagging 2 Lecture Slides based on Mike Rosner and Marti Hearst notes Additions from NLTK tutorials

February 2007CSA3050: Tagging II3 Rule-Based Tagger Basic Idea: –Assign all possible tags to words –Remove tags according to set of rules if word+1 is an adj, adv, or quantifier and the following is a sentence boundary and word-1 is not a verb like “consider” then eliminate non-adv else eliminate adv. –Typically more than 1000 hand-written rules, but may be machine-learned.

February 2007CSA3050: Tagging II4 ENGTWOL Based on two-level morphology 56,000 entries for English word stems Each entry annotated with morphological and syntactic features

February 2007CSA3050: Tagging II5 Sample ENGTWOL Lexicon

February 2007CSA3050: Tagging II6 ENGTWOL Tagger Stage 1: Run words through morphological analyzer to get all parts of speech. –E.g. for the phrase “the tables”, we get the following output: " " "the" DET CENTRAL ART SG/PL " " "table" N NOM PL "table" V PRES SG3 VFIN Stage 2: Apply constraints to rule out incorrect POSs

February 2007CSA3050: Tagging II7 Examples of Constraints Discard all verb readings if to the left there is an unambiguous determiner, and between that determiner and the ambiguous word itself, there are no nominals (nouns, abbreviations etc.). Discard all finite verb readings if the immediately preceding word is to. Discard all subjunctive readings if to the left, there are no instances of the subordinating conjunction that or lest. The first constraint would discard the verb reading in the previous representation. There are about 1,100 constraints

February 2007CSA3050: Tagging II8 Example PavlovPVLOV N NOM SG PROPER hadHAVE V PAST VFIN SVO HAVE PCP2 SVO shownSHOW PCP2 SVOO SVO SV thatADV PRON DEM SG DET CENTRAL SEM SG CS salivationN NOM SG

February 2007CSA3050: Tagging II9 Actual Constraint Syntax Given input: “that” If (+1 A/ADV/QUANT) (+2 SENT-LIM) (NOT -1 SVOC/A) Then eliminate non-ADV tags Else eliminate ADV tag this rule eliminates the adverbial sense of that as in “it isn’t that odd”

February 2007CSA3050: Tagging II10 3 Approaches to Tagging 1.Rule-Based Tagger: ENGTWOL Tagger (Voutilainen 1995) 2.Stochastic Tagger: HMM-based Tagger 3.Transformation-Based Tagger: Brill Tagger (Brill 1995)

February 2007CSA3050: Tagging II11 Stochastic Tagging Based on the probability of a certain tag given various possibilities. Necessitates a training corpus. Difficulties –There are no probabilities for words that are not in the training corpus.  Smoothing –Training corpus may be too different from test corpus.

February 2007CSA3050: Tagging II12 Stochastic Tagging Simple Method: Choose the most frequent tag in the training text for each word! –Result: 90% accuracy !! –But we can do better than that by employing a more elaborate statistical model –Hidden Markov Models (HMM) are a class of such models.

February 2007CSA3050: Tagging II13 Hidden Markov Model (for pronunciation) [start ax b aw end] [start ix b aw dx end] [start ax b ae t end] Observation Sequences

February 2007CSA3050: Tagging II14 Three Fundamental Questions for HMMs Given an HMM, how likely is a given observation sequence? Given an observation sequence, how do we choose a state sequence that best explains the observations? Given an observation sequence and a space of possible HMMs, how do we find the HMM that best fits the observed data?

February 2007CSA3050: Tagging II15 Two Observation Sequences for Tagging NNPDN Timeflieslikeanarrow VVPDN

February 2007CSA3050: Tagging II16 Two Kinds of Probability involved in generating a sequence t1 t2 t3 t5 t6 w1 w2 w3 w4 w5 Transitional t1 t2 t4 t5 t6 P(tag|previous n tags) t1 t2 t3 t5 t6 w1 w2 w3 w4 w5 Output t1 t2 t4 t5 t6 P(w|t)

February 2007CSA3050: Tagging II17 Simplifying Assumptions cannot handle all phenomena Limited Horizon: a given tag depends only upon a N previous tags – usually N=2. –central embedding? The cat the dog the bird saw bark meowed. –long distance dependencies It is easy to consider it impossible for anyone but a genius to try to talk to Chris. Time (sentence position) invariance: (P,V) may not be equally likely at beginning/end of sentence

February 2007CSA3050: Tagging II18 Estimating N-gram probabilities To estimate the probability that “Z” appears after “XY”: –count how many times “XYZ” appears = A –count how many times “XY” appears = B –Estimate = A/B Same principle applies for tags We can use these estimates to rank alternative tags for a given word.

February 2007CSA3050: Tagging II19 Data Used for Training a Hidden Markov Model Estimate the probabilities from relative frequencies. –Transitional probabilities: probability that a sequence of tags t1,... tn is followed by a tag t P(t|t 1..t n ) = count(t 1..t n followed by t)/count(t 1..t n ) –Output probabilities: probability that a given tag t will be realised as a word w: P(w|t) = count(w tagged as t)/count(t)

February 2007CSA3050: Tagging II20 An Example Secretariat/NNP is/VBZ expected/VBN to/TO race/VB tomorrow/NN People/NNS continue/VBP to/TO inquire/VB the DT reason/NN for/IN the/DT race/NN for/IN outer/JJ space/NN Consider first sentence; choose between A = to/TO race/VB B = to/TO race/NN We need to choose maximum probability: –P(A) = P(VB|TO) × P(race|VB) –P(B) = P(NN|TO) × P(race|NN)]

February 2007CSA3050: Tagging II21 Calculating Maximum P(VB|TO)P(race|VB)P(VB|TO) x P(race|VB) P(NN|TO)P(race|NN)P(NN|TO) x P(race|NN)

February 2007CSA3050: Tagging II22 Remarks We have shown how to calculate the most probable tag for one word. Normally we are interested in the most probable sequence of tags for the entire sentence. The Viterbi algorithm is used to calculate the entire sentence probability Have a look at: For a quick introduction… (PDF on website)

February 2007CSA3050: Tagging II23 Next Sessions… Transformation Based Tagging Chunking