Part-of-speech Tagging cs224n Final project Spring, 2008 Tim Lai.

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
Ling 570 Day 6: HMM POS Taggers 1. Overview Open Questions HMM POS Tagging Review Viterbi algorithm Training and Smoothing HMM Implementation Details.
Part-Of-Speech Tagging and Chunking using CRF & TBL
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
Hidden Markov Model (HMM) Tagging  Using an HMM to do POS tagging  HMM is a special case of Bayesian inference.
Part of Speech Tagging with MaxEnt Re-ranked Hidden Markov Model Brian Highfill.
POS Tagging & Chunking Sambhav Jain LTRC, IIIT Hyderabad.
Tagging with Hidden Markov Models. Viterbi Algorithm. Forward-backward algorithm Reading: Chap 6, Jurafsky & Martin Instructor: Paul Tarau, based on Rada.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Deliverable #2: Question Classification Group 5 Caleb Barr Maria Alexandropoulou.
POS Tagging Markov Models. POS Tagging Purpose: to give us explicit information about the structure of a text, and of the language itself, without necessarily.
Viterbi Algorithm Ralph Grishman G Natural Language Processing.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
More about tagging, assignment 2 DAC723 Language Technology Leif Grönqvist 4. March, 2003.
Tagging – more details Reading: D Jurafsky & J H Martin (2000) Speech and Language Processing, Ch 8 R Dale et al (2000) Handbook of Natural Language Processing,
1 I256: Applied Natural Language Processing Marti Hearst Sept 20, 2006.
Viterbi Algorithm. Computing Probabilities viterbi [ s, t ] = max(s’) ( viterbi [ s’, t-1] * transition probability P(s | s’) * emission probability P.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.
Author Identification for LiveJournal Alyssa Liang.
Part of speech (POS) tagging
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging and Chunking with Maximum Entropy Model Sandipan Dandapat.
Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 18: 10/26.
Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network Kristina Toutanova, Dan Klein, Christopher Manning, Yoram Singer Stanford University.
Albert Gatt Corpora and Statistical Methods Lecture 9.
Part-of-Speech Tagging
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
1 Persian Part Of Speech Tagging Mostafa Keikha Database Research Group (DBRG) ECE Department, University of Tehran.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Some Advances in Transformation-Based Part of Speech Tagging
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging for Bengali with Hidden Markov Model Sandipan Dandapat,
Albert Gatt Corpora and Statistical Methods Lecture 10.
인공지능 연구실 정 성 원 Part-of-Speech Tagging. 2 The beginning The task of labeling (or tagging) each word in a sentence with its appropriate part of speech.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Lecture 10 NLTK POS Tagging Part 3 Topics Taggers Rule Based Taggers Probabilistic Taggers Transformation Based Taggers - Brill Supervised learning Readings:
Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.
Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.
S1: Chapter 1 Mathematical Models Dr J Frost Last modified: 6 th September 2015.
13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.
Word classes and part of speech tagging Chapter 5.
Tokenization & POS-Tagging
Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.
Albert Gatt LIN3022 Natural Language Processing Lecture 7.
Viterbi Algorithm CSCI-GA.2590 – Natural Language Processing Ralph Grishman NYU.
Albert Gatt Corpora and Statistical Methods. POS Tagging Assign each word in continuous text a tag indicating its part of speech. Essentially a classification.
CSA3202 Human Language Technology HMMs for POS Tagging.
Part-of-speech tagging
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
HMM vs. Maximum Entropy for SU Detection Yang Liu 04/27/2004.
Conditional Markov Models: MaxEnt Tagging and MEMMs
Stochastic and Rule Based Tagger for Nepali Language Krishna Sapkota Shailesh Pandey Prajol Shrestha nec & MPP.
Statistical NLP Spring 2010 Lecture 7: POS / NER Tagging Dan Klein – UC Berkeley.
NLP. Introduction to NLP Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-based.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
POS Tagging1 POS Tagging 1 POS Tagging Rule-based taggers Statistical taggers Hybrid approaches.
Modified from Diane Litman's version of Steve Bird's notes 1 Rule-Based Tagger The Linguistic Complaint –Where is the linguistic knowledge of a tagger?
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
1 COMP790: Statistical NLP POS Tagging Chap POS tagging Goal: assign the right part of speech (noun, verb, …) to words in a text “The/AT representative/NN.
Part-Of-Speech Tagging Radhika Mamidi. POS tagging Tagging means automatic assignment of descriptors, or tags, to input tokens. Example: “Computational.
Maximum Entropy Models and Feature Engineering CSCI-GA.2591
CSCI 5832 Natural Language Processing
Klein and Manning on CRFs vs CMMs
Hindi POS Tagger By Naveen Sharma ( )
Presentation transcript:

Part-of-speech Tagging cs224n Final project Spring, 2008 Tim Lai

POS Tagging – 3 general techniques 1. Rule based system Relies on a hand-picked set of rules Performance is not very good 2. Stochastic methods HMM with Viterbi algorithm to determine best tagging Uses emission probabilities, i.e. P(word | tag) and transition probabilities, i.e. P(prevTag | currentTag) Maximum Entropy models also useful 3. Hybrid of the two Rules-based system to do POS tagging Uses rule templates and learns useful rules during training

Simple HMM vs Max-Ent HMM using bigrams for transition probabilities Max-Ent using simple features such as previous tag and current word

Error Analysis HMM and Max-Ent both perform well when tested on data from same domain Only 6.6 % of words were ambiguous, making known words easy to tag Accuracy drops when using test data from another domain Most errors are caused by unknown words, or the POS tagging of words near unknown words. In sentences without unknown words, accuracy ~ 99%! Most common mistake is mis-tagging JJ as NN Need to enhance both taggers to deal with unknowns.

Enhancement ideas For HMM – Transition probabilities can be modeled using trigrams, taking more context information into account when word is unknown For Max-Ent – Word shapes, word features, and more context can help Results: HMM – Switching from Unigram to Bigram helps a lot, but using Trigram doesn’t help much. Max-Ent – Hand picked features did not help much, but adding prefixes and suffixes were most helpful.

Transformation-based tagging One more idea to try – using rule-based templates to learn POS tagging rules Sample rule template: Change tag A to tag B when the [preceding | following] word is tagged Z. Change tag A to tag B when the the tag Z appears within [N] positions of the current word. Result Using a very restricted set of rule templates, accuracy went up 0.5 %

Final results HMM with bigram and rule-based adjustments Max-Ent with prefix/suffix, word shape features and rule-based adjustments Max-Ent performs better, with 97% accuracy achievable