CS 4705 Lecture 8 Word Classes and Part-of- Speech (POS) Tagging.

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
Introduction to Syntax Owen Rambow September 30.
Outline Why part of speech tagging? Word classes
Word Classes and Part-of-Speech (POS) Tagging
1 Part of Speech tagging Lecture 9 Slides adapted from: Dan Jurafsky, Julia Hirschberg, Jim Martin.
Chapter 8. Word Classes and Part-of-Speech Tagging From: Chapter 8 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech.
BİL711 Natural Language Processing
Part-of-speech tagging. Parts of Speech Perhaps starting with Aristotle in the West (384–322 BCE) the idea of having parts of speech lexical categories,
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
Natural Language Processing Lecture 8—9/24/2013 Jim Martin.
LING 388 Language and Computers Lecture 22 11/25/03 Sandiway FONG.
1 A Hidden Markov Model- Based POS Tagger for Arabic ICS 482 Presentation A Hidden Markov Model- Based POS Tagger for Arabic By Saleh Yousef Al-Hudail.
1 Part of Speech Tagging (Chapter 5) September 2009 Lecture #6.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Word Classes and English Grammar.
Introduction to Syntax Owen Rambow October
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
CS Catching Up CS Porter Stemmer Porter Stemmer (1980) Used for tasks in which you only care about the stem –IR, modeling given/new distinction,
POS based on Jurafsky and Martin Ch. 8 Miriam Butt October 2003.
Tagging – more details Reading: D Jurafsky & J H Martin (2000) Speech and Language Processing, Ch 8 R Dale et al (2000) Handbook of Natural Language Processing,
1 I256: Applied Natural Language Processing Marti Hearst Sept 20, 2006.
NLP and Speech 2004 English Grammar
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
POS Tagging and Context-Free Grammars
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.
Part of speech (POS) tagging
Word classes and part of speech tagging Chapter 5.
Announcements Main CSE file server went down last night –Hand in your homework using ‘submit_cse467’ as soon as you can – no penalty if handed in today.
Stochastic POS tagging Stochastic taggers choose tags that result in the highest probability: P(word | tag) * P(tag | previous n tags) Stochastic taggers.
(Some issues in) Text Ranking. Recall General Framework Crawl – Use XML structure – Follow links to get new pages Retrieve relevant documents – Today.
BIOI 7791 Projects in bioinformatics Spring 2005 March 22 © Kevin B. Cohen.
Parts of Speech (Lexical Categories). Parts of Speech Nouns, Verbs, Adjectives, Prepositions, Adverbs (etc.) The building blocks of sentences The [ N.
Albert Gatt Corpora and Statistical Methods Lecture 9.
LING/C SC/PSYC 438/538 Lecture 23 Sandiway Fong. Administrivia Homework 4 – out today – due next Wednesday – (recommend you attempt it early) Reading.
8. Word Classes and Part-of-Speech Tagging 2007 년 5 월 26 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.287 ~ 303.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Parts of Speech Sudeshna Sarkar 7 Aug 2008.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
S1: Chapter 1 Mathematical Models Dr J Frost Last modified: 6 th September 2015.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
Parts of Speech (Lexical Categories). Parts of Speech n Nouns, Verbs, Adjectives, Prepositions, Adverbs (etc.) n The building blocks of sentences n The.
Transformation-Based Learning Advanced Statistical Methods in NLP Ling 572 March 1, 2012.
13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.
Word classes and part of speech tagging Chapter 5.
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
Speech and Language Processing Ch8. WORD CLASSES AND PART-OF- SPEECH TAGGING.
Tokenization & POS-Tagging
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Word classes and part of speech tagging 09/28/2004 Reading: Chap 8, Jurafsky & Martin Instructor: Rada Mihalcea Note: Some of the material in this slide.
Natural Language Processing
CSA3202 Human Language Technology HMMs for POS Tagging.
Part-of-speech tagging
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Modified from Diane Litman's version of Steve Bird's notes 1 Rule-Based Tagger The Linguistic Complaint –Where is the linguistic knowledge of a tagger?
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Word classes and part of speech tagging Chapter 5.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
Lecture 9: Part of Speech
CS4705 Part of Speech tagging
CSCI 5832 Natural Language Processing
Introduction to Syntax
Natural Language Processing
Presentation transcript:

CS 4705 Lecture 8 Word Classes and Part-of- Speech (POS) Tagging

What is a word class? Words that somehow ‘behave’ alike: –Appear in similar contexts –Perform similar functions in sentences –Undergo similar transformations Why do we want to identify them? –Pronunciation (desert/desert) –Stemming –Semantics –More accurate N-grams –Simple syntactic information

How many word classes are there? A basic set: –N, V, Adj, Adv, Prep, Det, Aux, Part, Conj, Num A simple division: open/content vs. closed/function –Open: N, V, Adj, Adv –Closed: Prep, Det, Aux, Part, Conj, Num Many subclasses, e.g. –eats/V  eat/VB, eat/VBP, eats/VBZ, ate/VBD, eaten/VBN, eating/VBG,... –Reflect morphological form & syntactic function

How do we decide which words go in which classes? Nouns denote people, places and things and can be preceded by articles? But… My typing is very bad. *The Mary loves John. Verbs are used to refer to actions and processes –But some are closed class and some are open I will have ed everyone by noon. Adjectives describe properties or qualities, but a cat sitter, a child seat

Adverbs include locatives (here), degree modifiers (very), manner adverbs (gingerly) and temporals (today) –Is Monday a temporal adverb or a noun? Closed class items (Prep, Det, Pron, Conj, Aux, Part, Num) are easier, since we can enumerate them….but –Part vs. Prep George eats up his dinner/George eats his dinner up. George eats up the street/*George eats the street up. –Articles come in 2 flavors: definite (the) and indefinite (a, an)

–Conjunctions also have 2 varieties, coordinate (and, but) and subordinate/complementizers (that, because, unless,…) –Pronouns may be personal (I, he,...), possessive (my, his), or wh (who, whom,...) –Auxiliary verbs include the copula (be), do, have and their variants plus the modals (can, will, shall,…) And more… –Interjections/discourse markers –Existential there –Greetings, politeness terms

So how do we choose a Tagset? Brown Corpus (Francis & Kucera ‘82), 1M words, 87 tags Penn Treebank: hand-annotated corpus of Wall Street Journal, 1M words, tagsPenn Treebank How do tagsets differ? –Degree of granularity –Idiosyncratic decisions, e.g. Penn Treebank doesn’t distinguish to/Prep from to/Inf, eg. –I/PP want/VBP to/TO go/VB to/TO Zanzibar/NNP./. –Don’t tag it if you can recover from word (e.g. do forms)

Part-of-Speech Tagging How do we assign POS tags to words in a sentence? –Get/V the/Det bass/N –Time flies like an arrow. –Time/[V,N] flies/[V,N] like/[V,Prep] an/Det arrow/N –Time/N flies/V like/Prep an/Det arrow/N –Fruit/N flies/N like/V a/DET banana/N –Fruit/N flies/V like/V a/DET banana/N –The/Det flies/N like/V a/DET banana/N

Potential Sources of Disambiguation Many words have only one POS tag (e.g. is, Mary, very, smallest) Others have a single most likely tag (e.g. a, dog) But tags also tend to co-occur regularly with other tags (e.g. Det, N) In addition to conditional probabilities of words P(w 1 |w n-1 ), we can look at POS likelihoods P(t 1 |t n- 1 ) to disambiguate sentences and to assess sentence likelihoods

Approaches to POS Tagging Hand-written rules Statistical approaches Hybrid systems (e.g. Brill’s transformation-based learning)

Statistical POS Tagging Goal: choose the best sequence of tags T for a sequence of words W in a sentence – –By Bayes Rule –Since we can ignore P(W), we have

Statistical POS Tagging: the Prior P(T) = P(t 1, t 2, …, t n-1, t n ) By the Chain Rule: = P(t n | t 1, …, t n-1 ) P(t 1, …, t n-1 ) = Making the Markov assumption: e.g., for bigrams,

Statistical POS Tagging: the (Lexical) Likelihood P(W|T) = P(w 1, w 2, …, w n | t 1, t 2, …, t n ) From the Chain Rule: = Simplifying assumption: probability of a word depends only on its own tag P(w i |t i ) So...

Estimate the Tag Priors and the Lexical Likelihoods from Corpus Maximum-Likelihood Estimation –For bigrams: P (t i |t i-1 ) = c(t i-1, t i )/c(t i-1 ) P(w i | t i ) =

Brill Tagging: TBL Start with simple (less accurate) rules…learn better ones from tagged corpus –Tag each word initially with most likely POS –Examine set of transformations to see which improves tagging decisions compared to tagged corpustransformations –Re-tag corpus –Repeat until, e.g., performance doesn’t improve –Result: tagging procedure which can be applied to new, untagged text

An Example The horse raced past the barn fell. The/DT horse/NN raced/VBN past/IN the/DT barn/NN fell/VBD./. 1) Tag every word with most likely tag and score The/DT horse/NN raced/VBD past/NN the/DT barn/NN fell/VBD./. 2) For each template, try every instantiation (e.g. Change VBN to VBD when the preceding word is tagged NN, add rule to ruleset, retag corpus, and score

3) Stop when no transformation improves score 4) Result: set of transformation rules which can be applied to new, untagged data (after initializing with most common tag) ….What problems will this process run into?

Methodology: Evaluation For any NLP problem, we need to know how to evaluate our solutions Possible Gold Standards -- ceiling: –Annotated naturally occurring corpus –Human task performance (96-7%) How well do humans agree? Kappa statistic: avg pairwise agreement corrected for chance agreement –Can be hard to obtain for some tasks

Baseline: how well does simple method do? –For tagging, most common tag for each word (91%) –How much improvement do we get over baseline

Methodology: Error Analysis Confusion matrix: –E.g. which tags did we most often confuse with which other tags? –How much of the overall error does each confusion account for?

More Complex Issues Tag indeterminacy: when ‘truth’ isn’t clear Carribean cooking, child seat Tagging multipart words wouldn’t --> would/MD n’t/RB Unknown words –Assume all tags equally likely –Assume same tag distribution as all other singletons in corpus –Use morphology, word length,….

Summary We can develop statistical methods for identifying the POS of word sequences which reach close to human performance But not completely “solved” Next Class: Guest Lecture by Owen Rambow –Read Chapter 9 –Homework 1 due