Speech and Language Processing Ch8. WORD CLASSES AND PART-OF- SPEECH TAGGING.

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.
Grammar Spinner Touch any part of the screen to begin. (Or click your mouse) Touch the screen again each time you want to spin.
Word Classes and POS Tagging Read J & M Chapter 8. You may also want to look at: view.html.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Outline Why part of speech tagging? Word classes
1 Part of Speech tagging Lecture 9 Slides adapted from: Dan Jurafsky, Julia Hirschberg, Jim Martin.
Chapter 8. Word Classes and Part-of-Speech Tagging From: Chapter 8 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech.
BİL711 Natural Language Processing
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
Natural Language Processing Lecture 8—9/24/2013 Jim Martin.
LING 388 Language and Computers Lecture 22 11/25/03 Sandiway FONG.
1 Words and the Lexicon September 10th 2009 Lecture #3.
1 Part of Speech Tagging (Chapter 5) September 2009 Lecture #6.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Part-Of-Speech (POS) Tagging.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
POS based on Jurafsky and Martin Ch. 8 Miriam Butt October 2003.
Tagging – more details Reading: D Jurafsky & J H Martin (2000) Speech and Language Processing, Ch 8 R Dale et al (2000) Handbook of Natural Language Processing,
NLP and Speech 2004 English Grammar
Part of speech (POS) tagging
CMSC 723 / LING 645: Intro to Computational Linguistics November 3, 2004 Lecture 9 (Dorr): Word Classes, POS Tagging (Chapter 8) Intro to Syntax (Start.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 18: 10/26.
Word classes and part of speech tagging Chapter 5.
You will need 5 sheets of blank paper (Each sheet must be a different color).
Albert Gatt Corpora and Statistical Methods Lecture 9.
The Eight Parts of Speech
February 2007CSA3050: Tagging I1 CSA2050: Natural Language Processing Tagging 1 Tagging POS and Tagsets Ambiguities NLTK.
8. Word Classes and Part-of-Speech Tagging 2007 년 5 월 26 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.287 ~ 303.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Parts of Speech Sudeshna Sarkar 7 Aug 2008.
Some Advances in Transformation-Based Part of Speech Tagging
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová
CS : Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with.
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
The Parts of Speech By Ms. Walsh The 8 Parts of Speech… Nouns Adjectives Pronouns Verbs Adverbs Conjunctions Prepositions Interjections Walsh Publishing.
Word classes and part of speech tagging Chapter 5.
The Parts of Speech The 8 Parts of Speech… Nouns Adjectives Pronouns Verbs Adverbs Conjunctions Prepositions Interjections.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Word classes and part of speech tagging 09/28/2004 Reading: Chap 8, Jurafsky & Martin Instructor: Rada Mihalcea Note: Some of the material in this slide.
Natural Language Processing
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Parts of Speech Review. A Noun is a person, place, thing, or idea.
GoBack definitions Level 1 Parts of Speech GoBack is a memorization game; the teacher asks students definitions, and when someone misses one, you go back.
Part-of-speech tagging
Linguistics Lecture-1: Words Pushpak Bhattacharyya, CSE Department, IIT Bombay 14 June, 2008.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Word classes and part of speech tagging Chapter 5.
Parts of Speech By: Miaya Nischelle Sample. NOUN A noun is a person place or thing.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
POS TAGGING AND HMM Tim Teks Mining Adapted from Heng Ji.
1 Natural Language Processing Vasile Rus
Lecture 9: Part of Speech
Parts of Speech Review.
CS4705 Part of Speech tagging
CSCI 5832 Natural Language Processing
CSC 594 Topics in AI – Natural Language Processing
Grammar Review.
CSCI 5832 Natural Language Processing
Part of Speech Tagging September 9, /12/2018.
Conjunctions Prepared by: Khaled Hadi Al Ahbabi Grade: 12 LC
Improving an Open Source Question Answering System
FIRST SEMESTER GRAMMAR
Classical Part of Speech (PoS) Tagging
Natural Language Processing
Natural Language Processing (NLP)
Presentation transcript:

Speech and Language Processing Ch8. WORD CLASSES AND PART-OF- SPEECH TAGGING

Artificial Intelligence Laboratory 2 Agenda  What are they?  Distribution  Tagsets  Tagging  Rules  Probabilities  Transformation-Based(Brill)

Artificial Intelligence Laboratory 3 Parts of Speech  Start with eight basic categories  Noun, verb, pronoun, preposition, adjective, adverb, article, conjunction  These categories are based on morphological and distributional properties (not semantics)  Some cases are easy, others are murky

Artificial Intelligence Laboratory 4 Parts of Speech  Two kinds of category  Closed class Prepositions, articles, conjunctions, pronouns  Open class Nouns, verbs, adjectives, adverbs

Artificial Intelligence Laboratory 5 Fig 8.1 Prepositions(and particles) of English from the CELEX on-line dictionary. Frequency counts are from the COBUILD 16 million word corpus.

Artificial Intelligence Laboratory 6 Fig 8.2 English single-word particles from Quirk et al.(1985).

Artificial Intelligence Laboratory 7 Fig 8.3 Coordinating and subordinating conjunctions of English from the CELEX on-line dictionary. Frequency counts are from the COBUILD 16 million word corpus.

Artificial Intelligence Laboratory 8 Fig 8.4 Pronouns of English from the CELEX on-line dictionary. Frequency counts are from the COBUILD 16 million word corpus.

Artificial Intelligence Laboratory 9 Fig 8.5 English modal verbs from the CELEX on-line dictionary. Frequency counts are from the COBUILD 16 million word corpus.

Artificial Intelligence Laboratory 10 Sets of Parts of Speech: Tagsets  There are various standard tagsets to choose from; some have a lot more tags than others  The choice of tagset is based on the application  Accurate tagging can be done with even large tagsets

Artificial Intelligence Laboratory 11 Fig 8.6 Penn Treebank part-of-speech tags (including punctuation).

Artificial Intelligence Laboratory 12 Tagging  Part of speech tagging is the process of assigning parts of speech to each word in a sentence … Assume we have  A tagset  A dictionary that gives you the possible set of tags for each entry  A text to be tagged  A reason? The/DT grand/JJ jury/NN commented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS./.

Artificial Intelligence Laboratory 13 Figure 8.7 The number of word types in Brown corpus by degree of ambiguity (after DeRose(1988)).

Artificial Intelligence Laboratory 14 Tagging - Rules  Hand-crafted rules for ambiguous words that test the context to make appropriate choices  Early attempts fairly error-prone  Extremely labor-intensive

Artificial Intelligence Laboratory 15 Figure 8.8 Sample lexical entries from the ENGTWOL lexicon described in Voutilainen(1995) and Heikkila(1995).

Artificial Intelligence Laboratory 16 Tagging - Probabilities  장점  충분한 크기의 태그부탁 말뭉치만 주어지면 태깅에 필요한 통계 정보의 추출이 용이하기 때문에 확장성이 좋고 적용범위가 넓으 며 전체적인 정확성이 비교적 높다는 장점  단점  말뭉치에 의존적  의미 있는 통계정보를 추출하기 위해서는 일정크기 이상의 태그 부탁 말뭉치 필요  말뭉치 구축에 시간과 노력이 많이 요구됨  말뭉치가 편중되어 있거나 불충분한 경우에는 data sparseness 로 인해 신뢰도가 떨어짐

Artificial Intelligence Laboratory 17 Tagging - Probabilities  We want the best set of tags for a sequence of words (a sentence) W is a sequence of words T is a sequence of tags The probability of the word sequence P(W) will be the same for each tag sequence

Artificial Intelligence Laboratory 18 Tagging - Transformation-Based(Brill tagging)  Combine rules and statistics …  TBL(Transformation-Based Learning) is based on rules  Rules are automatically induced from the data(ML)

Artificial Intelligence Laboratory 19 Brill tagging - Examples  Race  “ race ” as NN:.98  “ race ” as VB:.02  So you ’ ll be wrong 2% of the time, which really isn ’ t bad  Patch the cases where you know it has to be a verb  Change NN to VB when previous tag is TO

Artificial Intelligence Laboratory 20 Brill tagging - Rules  Where did that transformational rule come from?  Define a hypothesis space of rules that might help decrease an error rate  Search that space (exhaustively?) to find rules that most reduce an error rate.  Continue to add rules until some stopping criteria is reached Figure 8.9 Brill’s(1995) templates. Each begins with “Change tag a to tag b when : …”. The variables a, b, z and w range over parts-of-speech.