Lecture 9: Part of Speech

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.
Advertisements

 adj (adjectif)  adv (adverbe)  det (déterminant)  nom  prep (préposition)  pron (pronom)  verbe.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Outline Why part of speech tagging? Word classes
In your notebooks make a list of the following for your MadLib. The number next to the part of speech indicates how many of each word you need. Plural.
Word Classes and Part-of-Speech (POS) Tagging
Ch 9 Part of Speech Tagging (slides adapted from Dan Jurafsky, Jim Martin, Dekang Lin, Rada Mihalcea, and Bonnie Dorr and Mitch Marcus.) ‏
1 Part of Speech tagging Lecture 9 Slides adapted from: Dan Jurafsky, Julia Hirschberg, Jim Martin.
Chapter 8. Word Classes and Part-of-Speech Tagging From: Chapter 8 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech.
BİL711 Natural Language Processing
Part-of-speech tagging. Parts of Speech Perhaps starting with Aristotle in the West (384–322 BCE) the idea of having parts of speech lexical categories,
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance July 27 EMNLP 2011 Shay B. Cohen Dipanjan Das Noah A. Smith Carnegie Mellon University.
LING 388 Language and Computers Lecture 22 11/25/03 Sandiway FONG.
1 Words and the Lexicon September 10th 2009 Lecture #3.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Part-Of-Speech (POS) Tagging.
Learning Bit by Bit Hidden Markov Models. Weighted FSA weather The is outside
POS based on Jurafsky and Martin Ch. 8 Miriam Butt October 2003.
NLP and Speech 2004 English Grammar
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Outline of English Syntax.
Word classes and part of speech tagging Chapter 5.
PARTS OF SPEECH 1 The principles of the traditional classification of the English vocabulary 2 Notional and functional parts of speech. 3 The field structure.
CS224N Interactive Session Competitive Grammar Writing Chris Manning Sida, Rush, Ankur, Frank, Kai Sheng.
8. Word Classes and Part-of-Speech Tagging 2007 년 5 월 26 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.287 ~ 303.
1 POS Tagging: Introduction Heng Ji Feb 2, 2008 Acknowledgement: some slides from Ralph Grishman, Nicolas Nicolov, J&M.
Parts of Speech Sudeshna Sarkar 7 Aug 2008.
Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
Word classes and part of speech tagging Chapter 5.
Linguistics The eleventh week. Chapter 4 Syntax  4.1 Introduction  4.2 Word Classes.
Speech and Language Processing Ch8. WORD CLASSES AND PART-OF- SPEECH TAGGING.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Word classes and part of speech tagging 09/28/2004 Reading: Chap 8, Jurafsky & Martin Instructor: Rada Mihalcea Note: Some of the material in this slide.
Part-of-speech tagging
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong. Adminstrivia Homework 7 out today – due Saturday by midnight.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Word classes and part of speech tagging Chapter 5.
Parts of Speech By: Miaya Nischelle Sample. NOUN A noun is a person place or thing.
SI485i : NLP Set 7 Syntax and Parsing. 2 Syntax Grammar, or syntax: The kind of implicit knowledge of your native language that you had mastered by the.
POS TAGGING AND HMM Tim Teks Mining Adapted from Heng Ji.
Speech and Language Processing SLP Chapter 5. 10/31/1 2 Speech and Language Processing - Jurafsky and Martin 2 Today  Parts of speech (POS)  Tagsets.
CSC 594 Topics in AI – Natural Language Processing
CSC 594 Topics in AI – Natural Language Processing
Nouns and Personal Pronouns
English Basics Mrs.Azzah.
Annotating Urdu Corpus
Parts of Speech Review.
wow the hungry cat chase the mouse under the table and quickly ate it
CSCI 5832 Natural Language Processing
CS4705 Part of Speech tagging
CSC 594 Topics in AI – Natural Language Processing
LING/C SC/PSYC 438/538 Lecture 20 Sandiway Fong.
CSC 594 Topics in AI – Natural Language Processing
CSC 594 Topics in AI – Natural Language Processing
KEY CHALLENGES Overview POS Tagging
CSCI 5832 Natural Language Processing
Improving an Open Source Question Answering System
Core Concepts Lecture 1 Lexical Frequency.
Parts of the speech and abbreviations
FIRST SEMESTER GRAMMAR
Natural Language Processing
PART OF SPEECH TAGGING (POS)
LING/C SC/PSYC 438/538 Lecture 23 Sandiway Fong.
Universal Dependencies
Statistical n-gram David ling.
Natural Language Processing
CPSC 503 Computational Linguistics
Dr. Bill Vicars Lifeprint.com
Natural Language Processing (NLP)
Presentation transcript:

Lecture 9: Part of Speech Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16 CS6501 Natural Language Processing

CS6501 Natural Language Processing This lecture Parts of speech (POS) POS Tagsets CS6501 Natural Language Processing

CS6501 Natural Language Processing Parts of Speech Traditional parts of speech ~ 8 of them CS6501 Natural Language Processing

CS6501 Natural Language Processing POS examples N noun chair, bandwidth, pacing V verb study, debate, munch ADJ adjective purple, tall, ridiculous ADV adverb unfortunately, slowly P preposition of, by, to PRO pronoun I, me, mine DET determiner the, a, that, those CS6501 Natural Language Processing

CS6501 Natural Language Processing Parts of Speech A.k.a. parts-of-speech, lexical categories, word classes, morphological classes, lexical tags... Lots of debate within linguistics about the number, nature, and universality of these CS6501 Natural Language Processing

CS6501 Natural Language Processing POS Tagging The process of assigning a part-of-speech to each word in a collection (sentence). WORD tag the DET koala N put V the DET keys N on P table N CS6501 Natural Language Processing

Why is POS Tagging Useful? First step of a vast number of practical tasks Parsing Need to know if a word is an N or V before you can parse Information extraction Finding names, relations, etc. Speech synthesis/recognition OBject obJECT OVERflow overFLOW DIScount disCOUNT CONtent conTENT Machine Translation CS6501 Natural Language Processing

Open and Closed Classes Closed class: a small fixed membership Prepositions: of, in, by, … Pronouns: I, you, she, mine, his, them, … Usually function words (short common words which play a role in grammar) Open class: new ones can be created English has 4: Nouns, Verbs, Adjectives, Adverbs Many languages have these 4, but not all! CS6501 Natural Language Processing

CS6501 Natural Language Processing Open Class Words Nouns Proper nouns (Boulder, Granby, Eli Manning) Common nouns (the rest). Count nouns and mass nouns Count: have plurals, get counted: goat/goats, one goat, two goats Mass: don’t get counted (snow, salt, communism) (*two snows) Verbs In English, have morphological affixes (eat/eats/eaten) CS6501 Natural Language Processing

CS6501 Natural Language Processing Closed Class Words Examples: prepositions: on, under, over, … particles: up, down, on, off, … determiners: a, an, the, … pronouns: she, who, I, .. conjunctions: and, but, or, … auxiliary verbs: can, may should, … numerals: one, two, three, third, … CS6501 Natural Language Processing

Prepositions from CELEX CELEX: online dictionary Frequency counts are from COBUILD 16-billion-word corpus CS6501 Natural Language Processing

CS6501 Natural Language Processing English Particles CS6501 Natural Language Processing

CS6501 Natural Language Processing Conjunctions CS6501 Natural Language Processing

CS6501 Natural Language Processing Choosing a Tagset Could pick very coarse tagsets N, V, Adj, Adv, Other More commonly used set is finer grained E.g., “Penn TreeBank tagset”, 45 tags: PRP$, WRB, WP$, VBG Brown cropus, 87 tags. Prague Dependency Treebank (Czech) 4452 tags AAFP3----3N----: (nejnezajímavějším) Adj Regular Feminine Plural….Superlative [Hajic 2006, VMC tutorial] CS6501 Natural Language Processing

Penn TreeBank POS Tagset CS6501 Natural Language Processing

CS6501 Natural Language Processing Using the Penn Tagset The/DT grand/JJ jury/NN commmented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS ./. CS6501 Natural Language Processing

CS6501 Natural Language Processing Universal Tag set ~ 12 different tags NOUN, VERB, ADJ, ADV, PRON, DET, ADP, NUM, CONJ, PRT, “.”, X CS6501 Natural Language Processing

POS Tagging v.s. Word clustering Words often have more than one POS: back The back door = JJ On my back = NN Win the voters back = RB Promised to back the bill = VB These examples from Dekang Lin CS6501 Natural Language Processing

CS6501 Natural Language Processing How Hard is POS Tagging? CS6501 Natural Language Processing

CS6501 Natural Language Processing POS tag sequences Some tag sequences more likely occur than others POS Ngram view https://books.google.com/ngrams/graph?co ntent=_ADJ_+_NOUN_%2C_ADV_+_NOU N_%2C+_ADV_+_VERB_ Existing methods often model POS tagging as a sequence tagging problem CS6501 Natural Language Processing

CS6501 Natural Language Processing Evaluation How many words in the unseen test data can be tagged correctly? Usually evaluated on Penn Treebank State of the art ~97% Trivial baseline (most likely tag) ~94% Human performance ~97% CS6501 Natural Language Processing