Word Classes and POS Tagging Read J & M Chapter 8. You may also want to look at: view.html.

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Word Bi-grams and PoS Tags
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.
Corpus Processing and NLP
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
(It’s not that bad…). Error ID  They give you a sentence  Four sections are underlined  E is ALWAYS “No error”  Your job is to identify which one,
BİL711 Natural Language Processing
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
LING 388 Language and Computers Lecture 22 11/25/03 Sandiway FONG.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
POS based on Jurafsky and Martin Ch. 8 Miriam Butt October 2003.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Part of speech (POS) tagging
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Part-of-Speech Tagging & Sequence Labeling
Albert Gatt Corpora and Statistical Methods Lecture 9.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
ELN – Natural Language Processing Giuseppe Attardi
8. Word Classes and Part-of-Speech Tagging 2007 년 5 월 26 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.287 ~ 303.
9/8/20151 Natural Language Processing Lecture Notes 1.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Parts of Speech Sudeshna Sarkar 7 Aug 2008.
Some Advances in Transformation-Based Part of Speech Tagging
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Natural Language Processing Lecture 6 : Revision.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.
Introduction to CL & NLP CMSC April 1, 2003.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
For Wednesday Read chapter 23 Homework: –Chapter 22, exercises 1,4, 7, and 14.
Word classes and part of speech tagging Chapter 5.
Speech and Language Processing Ch8. WORD CLASSES AND PART-OF- SPEECH TAGGING.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Natural Language Processing
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Natural Language Processing Slides adapted from Pedro Domingos
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Word classes and part of speech tagging Chapter 5.
NATURAL LANGUAGE PROCESSING
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
What do we do with this Latin Part of Speech ( PoS )? Latin to English.
Part-Of-Speech Tagging Radhika Mamidi. POS tagging Tagging means automatic assignment of descriptors, or tags, to input tokens. Example: “Computational.
Lecture 9: Part of Speech
Basic Parsing with Context Free Grammars Chapter 13
CSCI 5832 Natural Language Processing
Linguistic Essentials
Classical Part of Speech (PoS) Tagging
Natural Language Processing
CPSC 503 Computational Linguistics
Hindi POS Tagger By Naveen Sharma ( )
David Kauchak CS159 – Spring 2019
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

Word Classes and POS Tagging Read J & M Chapter 8. You may also want to look at: view.html

Why Do We Care about Parts of Speech? Pronunciation Hand me the lead pipe. Predicting what words can be expected next Personal pronoun (e.g., I, she) ____________ Stemming -s means singular for verbs, plural for nouns As the basis for syntactic parsing and then meaning extraction I will lead the group into the lead smelter. Machine translation (E) content +N  (F) contenu +N (E) content +Adj  (F) content +Adj or satisfait +Adj

Remember the Mapping Problem We’ve sort of ignored this issue as we’ve looked at: Dealing with a noisy channel, Probabilistic techniques we can use for various subproblems Corpora we can analyze to collect our facts. We need to return to it now. POS tagging is the first step.

Understanding – the Big Picture Morphology POS Tagging Syntax Semantics Discourse Integration Generation goes backwards. For this reason, we generally want declarative representations of the facts. POS tagging is an exception to this.

Two Kinds of Issues Linguistic – what are the facts about language? Algorithmic – what are effective computational procedures for dealing with those facts?

What is a Part of Speech? Is this a semantic distinction? For example, maybe Noun is the class of words for people, places and things. Maybe Adjective is the class of words for properties of nouns. Consider:green book book is a Noun green is an Adjective Now consider:book worm This green is very soothing.

Morphological and Syntactic Definition of POS An Adjective is a word that can fill the blank in: It’s so __________. A Noun is a word that can be marked as plural. A Noun is a word that can fill the blank in: the __________ is What is green? It’s so green. Both greens could work for the walls. The green is a little much given the red rug.

How Many Parts of Speech Are There? A first cut at the easy distinctions: Open classes: nouns, verbs, adjectives, adverbs Closed classes: function words conjunctions: and, or, but pronounts: I, she, him prepositions: with, on determiners: the, a, an

But It Gets Harder provided, as in “I’ll go provided John does.” there, as in “There aren’t any cookies.” might, as in “I might go.” or “I might could go.” no, as in “No, I won’t go.”

What’s a Preposition From the CELEX online dictionary. Frequencies are from the COBUILD 16 million word corpus.

What’s a Pronoun? CELEX dictionary list of pronouns:

Tagsets Brown corpus tagset (87 tags): Penn Treebank tagset (45 tags): (8.6) C7 tagset (146 tags)

Algorithms for POS Tagging Why can’t we just look them up in a dictionary? Ambiguity – In the Brown corpus, 11.5% of the word types are ambiguous (using 87 tags): Worse, 40% of the tokens are ambiguous.

Algorithms for POS Tagging Why can’t we just look them up in a dictionary? Words that aren’t in the dictionary =578&e=1&u=/nm/ /ts_nm/iraq_usa_dc One idea: P(t i | w i ) = the probability that a random hapax legomenon in the corpus has tag t i. Nouns are more likely than verbs, which are more likely than pronouns. Another idea: use morphology.

Algorithms for POS Tagging - Knowledge Dictionary Morphological rules, e.g., _____-tion _____-ly capitalization N-gram frequencies to _____ DET _____ N But what about rare words, e.g, smelt (two verb forms, melt and past tense of smell, and one noun form, a small fish) Combining these V _____-ing I was gracking vs. Gracking is fun.

Algorithms for POS Tagging - Approaches Basic approaches Rule-Based Stochastic Do we return one best answer or several answers and let later steps decide? How does the requisite knowledge get entered?

Training/Teaching an NLP Component Each step of NLP analysis requires a module that knows what to do. How do such modules get created? By hand By training Advantages of hand creation: based on sound linguistic principles, sensible to people, explainable Advantages of training from a corpus: less work, extensible to new languages, customizable for specific domains.

Training/Teaching a POS Tagger The problem is tractable. We can do a very good job with just: a dictionary A tagset a large corpus, usually tagged by hand There are only somewhere between 50 and 150 possibilities for each word and 3 or 4 words of context is almost always enough. The task: ____ _ __ ______ __ _ _____ What is the weather like in Austin?

Contrast with Training Other NLP Parts The task: ____ _ __ ______ __ _ _____ What is the weather like in Austin? The weather in Austin is like what? Months Month Days RainfallByStation year month station rainfall Stations station City

Rule-Based POS Tagging Step 1: Using a dictionary, assign to each word a list of possible tags. Step 2: Figure out what to do about words that are unknown or ambiguous. Two approaches: Rules that specify what to do. Rules that specify what not to do: Example: Adverbial “that” rule Given input: “that” If (+1 A/ADV/QUANT) (+2 SENT-LIM) (NOT -1 SVOC/A) Then eliminate non-ADV tags Else eliminate ADV It isn’t that odd vs I consider that odd vs I believe that he is right. From ENGTWOL

Stochastic POS Tagging First approximation: choose the tag that is most likely for the given word. Next try: consider N-gram frequencies and choose the tag that is most likely in the current context. Should the context be the last N words or the last N classes? Next try: combine the two:

Hybrids – the Brill Tagger Learning rules stochastically: Transformation Based Learning Step 1: Assign each word the tag that is most likely given no contextual information. Race example: P(NN|race) =.98P(VB|race) =.02 Step 2: Apply transformation rules that use the context that was just established. Race example: Change NN to VB when the previous tag is TO. Secretariat is expected to race tomorrow. The race is already over.

Learning Brill Tagger Transformations Three major stages: 1.Label every word with its most-likely tag. 2.Examine every possible transformation and select the one with the most improved tagging. 3.Retag the data according to this rule. These three stages are repeated until some stopping point is reached. The output of TBL is an ordered list of transformations, which constitute a tagging procedure that can be applied to a new corpus.

The Universe of Possible Transformations?

One or Many Answers Example: I’m going to water ski. I’m going to water the lawn. The architecture issue: If we just give one answer, we can follow a single path. If we don’t decide yet, we’ll need to manage search.

Search Managing search: Depth-first Breadth-first – chart parsing S VP NP PP NP NP V V PR N det N PREP DET N I hit the boy with a bat.

Evaluation Given an algorithm, how good is it? What is causing the errors? Can anything be done about them?

How Good is An Algorithm? How good is the algorithm? What’s the maximum performance we have any reason to believe is achievable? (How well can people do?) How good is good enough? Is 97% good enough? Example 1: A speech dialogue system correctly assigns a meaning to a user’s input 97% of the time. Example 2: An OCR systems correctly determines letters 97% of the time.