Part-of-Speech Tagging Using Hidden Markov Models

Slides:



Advertisements
Similar presentations
Expectation Maximization Dekang Lin Department of Computing Science University of Alberta.
Advertisements

Ling 570 Day 6: HMM POS Taggers 1. Overview Open Questions HMM POS Tagging Review Viterbi algorithm Training and Smoothing HMM Implementation Details.
CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.
Part-of-speech tagging. Parts of Speech Perhaps starting with Aristotle in the West (384–322 BCE) the idea of having parts of speech lexical categories,
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
Natural Language Processing Lecture 8—9/24/2013 Jim Martin.
Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili University of Tehran.
Hidden Markov Model (HMM) Tagging  Using an HMM to do POS tagging  HMM is a special case of Bayesian inference.
Albert Gatt Corpora and Statistical Methods Lecture 8.
Part II. Statistical NLP Advanced Artificial Intelligence (Hidden) Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
POS based on Jurafsky and Martin Ch. 8 Miriam Butt October 2003.
POS Tagging HMM Taggers (continued). Today Walk through the guts of an HMM Tagger Address problems with HMM Taggers, specifically unknown words.
BIOI 7791 Projects in bioinformatics Spring 2005 March 22 © Kevin B. Cohen.
CS224N Interactive Session Competitive Grammar Writing Chris Manning Sida, Rush, Ankur, Frank, Kai Sheng.
Unit One: Parts of Speech
Albert Gatt Corpora and Statistical Methods Lecture 9.
Part-of-Speech Tagging
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
CS 4705 Hidden Markov Models Julia Hirschberg CS4705.
Natural Language Processing Lecture 8—2/5/2015 Susan W. Brown.
인공지능 연구실 정 성 원 Part-of-Speech Tagging. 2 The beginning The task of labeling (or tagging) each word in a sentence with its appropriate part of speech.
Fall 2005 Lecture Notes #8 EECS 595 / LING 541 / SI 661 Natural Language Processing.
Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.
S1: Chapter 1 Mathematical Models Dr J Frost Last modified: 6 th September 2015.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
Funny Factory Keith Harris Matt Gamble Mike Cialowicz Zeid Rusan.
Word classes and part of speech tagging Chapter 5.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Hidden Markov Models & POS Tagging Corpora and Statistical Methods Lecture 9.
CSA3202 Human Language Technology HMMs for POS Tagging.
CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.
Part-of-speech tagging
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Albert Gatt Corpora and Statistical Methods. Acknowledgement Some of the examples in this lecture are taken from a tutorial on HMMs by Wolgang Maass.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
Part-Of-Speech Tagging Radhika Mamidi. POS tagging Tagging means automatic assignment of descriptors, or tags, to input tokens. Example: “Computational.
Lecture 5 POS Tagging Methods
Dr. Pushpak Bhattacharyya
Lecture 9: Part of Speech
ADJECTIVES Review.
Introduction to Machine Learning and Text Mining
Unit One: Parts of Speech
General Verb types lesson 3.
An INTRODUCTION TO HIDDEN MARKOV MODEL
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15
REPORTED SPEECH Unit 11 – English 12.
CSC 594 Topics in AI – Natural Language Processing
I love speaking English. What about you?
Prototype-Driven Learning for Sequence Models
CSCI 5832 Natural Language Processing
Parts of speech - overview
Funny Factory Mike Cialowicz Zeid Rusan Matt Gamble Keith Harris.
web1T and deep learning methods
Core Concepts Lecture 1 Lexical Frequency.
CSCI 5832 Natural Language Processing
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15
Lecture 6: Part of Speech Tagging (II): October 14, 2004 Neal Snider
CONTEXT DEPENDENT CLASSIFICATION
8 PARTS OF SPEECH Review Parts of Speech Rap –
Algorithms of POS Tagging
Natural Language Processing
Hidden Markov Models Teaching Demo The University of Arizona
Professor Junghoo “John” Cho UCLA
Presentation transcript:

Part-of-Speech Tagging Using Hidden Markov Models Alankar kotwal Sharad mirani 12d070010 12d070003 Hi, I’m Alankar, this is Sharad. Our project is about a crucial part of Natural Language Processing, called POS Tagging. EE 638: Estimation and Identification

Teach a computer to recognize the role of words in a sentence So this is the problem in general.

‘Roles of words’? Why do we care? Text ? Understanding Text ‘Why do we care?’ is a natural question to ask. The entire field of Natural Language Processing is built with the hope that computes can, one day, ‘understand’ text content as well as humans can. However, there is an inherent gap between words, their meanings an the concept they convey in a sentence. Clearly, all three are important in understanding the sentence. Our job here is to deal with the highest level of all, at a concept level. Each word expresses a concept: a noun is an entity, an adjective is a quality. We see how parts of speech come in naturally while trying to understand the ‘concept’ carried by words in a sentence.

Why is this non-trivial? Are you understanding this? She was very understanding. Verb (Participle)! Adjective! One might say, this could easily be achieved! Build a set of rules that dictate what word is what part of speech. But there exist words that can act as multiple parts of speech with changing context. Even the word ‘dogs’ can be a verb. Some words like ‘still’ can represent as many as 7 distinct parts of speech. Now let us take the example on the screen. What is the difference between ‘understanding’ used in both sentences? It is a verb in the first sentence and an adjective in the second. Note that given the ‘context’ of the word, it is possible to predict the POS definitively! This turns out to be true in general. In fact, this motivates why Hidden Markov Models are such a good fit for the problem. Context Context

The Markov Model: Transitions Noun Pronoun Start Verb End Thus, in general, a small sentence model would look like this. For instance, an instance ‘She was very understanding’ would have been pronoun, verb, adj, adj. Article Adjective

The Markov Model: Emissions Noun He, She, It Is, Was, Presenting Estimation, Identification Pronoun Start Verb End Each state would emit words, we model the distribution of words it emits. Unknown words? Define word __UNK__ which has equal probability of emission for all states. Beautiful, Interesting Article A, An, The Adjective

Sentence Representation Words POS Input Numbers HashMap Computers cannot handle words. How do we represent sentences? HashMaps.

Our dataset: WSJ Excerpts Confidence/NN in/IN the/DT pound/NN is/VBZ widely/RB expected/VBN to/TO take/VB another/DT sharp/JJ dive/NN ./. Word/POS We picked out a dataset of sentences taken from the Wall Street Journal. You see an example on the screen.

Estimating Transition Probability Training Estimating Transition Probability Increment the “next POS” count for each pair of POS Estimating Emission Probability Increment the “POS type” for each word How do we train?

Hash the input test cases Testing Hash the input test cases Run Viterbi algorithm How do we test?

Our results A set is a collection. => {ART, NN, VBZ, ART, NN, .} Did you set my alarm? => {VBD, PRP, VB, PRP$, NN, .} When bank financing for the buy-out collapsed last week, so did UAL's stock. => {WRB, NN, NN, IN, DT, NN, VBD, JJ, NN, ,, RB, VBD, NNP, POS, NN, .} This is my awesome* project for Estimation*. => {DT, VBZ, PRP$, JJ, NN, IN, NN, .} A few results demonstrating the capability of the HMM to model context. The stars denote words not used in the train set. Train set with 8637 sentences, test set with 1952 sentences, 45 POS => Error rate 5.17%

Where do we fail? Unknown words in some cases => Not much we can do without ‘meaning’ Confusing usage: How do you tell if set is present or past? => Global context? Difficulty disambiguating POS from just previous word => Trigram HMMs! So here are our failure cases. Compare these to what a human would do in the same situation!

Thank You It has been a real pleasure exploring NLP from a statistical perspective! Thank you!