8. Word Classes and Part-of-Speech Tagging 2007 년 5 월 26 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.287 ~ 303.

Slides:



Advertisements
Similar presentations
Language and Grammar Unit
Advertisements

What you’ll need to know for Freshman DGP
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.
Outline Why part of speech tagging? Word classes
Word Classes and Part-of-Speech (POS) Tagging
1 Part of Speech tagging Lecture 9 Slides adapted from: Dan Jurafsky, Julia Hirschberg, Jim Martin.
Chapter 8. Word Classes and Part-of-Speech Tagging From: Chapter 8 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech.
BİL711 Natural Language Processing
Part-of-speech tagging. Parts of Speech Perhaps starting with Aristotle in the West (384–322 BCE) the idea of having parts of speech lexical categories,
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
February 2007CSA3050: Tagging II1 CSA2050: Natural Language Processing Tagging 2 Rule-Based Tagging Stochastic Tagging Hidden Markov Models (HMMs) N-Grams.
LING 388 Language and Computers Lecture 22 11/25/03 Sandiway FONG.
The Eight Parts of Speech
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
1 A Hidden Markov Model- Based POS Tagger for Arabic ICS 482 Presentation A Hidden Markov Model- Based POS Tagger for Arabic By Saleh Yousef Al-Hudail.
1 Words and the Lexicon September 10th 2009 Lecture #3.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Part-Of-Speech (POS) Tagging.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Word Classes and English Grammar.
POS based on Jurafsky and Martin Ch. 8 Miriam Butt October 2003.
LING 388: Language and Computers Sandiway Fong Lecture 23: 11/15.
NLP and Speech 2004 English Grammar
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
Part of speech (POS) tagging
Word classes and part of speech tagging Chapter 5.
From Textual Information to Numerical Vectors Chapters Presented by Aaron Hagan.
Grammar Skills Workshop
Parts of Speech.
Chapter 4 Basics of English Grammar Business Communication Copyright 2010 South-Western Cengage Learning.
THE PARTS OF SPEECH. PART OF SPEECH  All words serve a particular function in a sentence.  A word’s function is determined by what “part of speech”
1 POS Tagging: Introduction Heng Ji Feb 2, 2008 Acknowledgement: some slides from Ralph Grishman, Nicolas Nicolov, J&M.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Parts of Speech Sudeshna Sarkar 7 Aug 2008.
Some Advances in Transformation-Based Part of Speech Tagging
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Lecture 6 POS Tagging Methods Topics Taggers Rule Based Taggers Probabilistic Taggers Transformation Based Taggers - Brill Supervised learning Readings:
Natural Language Processing Lecture 6 : Revision.
English Review for Final These are the chapters to review. In Textbook: Chapter 1 Nouns Chapter 2 Pronouns Chapter 3 Adjectives Chapter 4 Verbs Chapter.
English Review for Final These are the chapters to review. In Textbook: Chapter 1 Nouns Chapter 2 Pronouns Chapter 3 Adjectives Chapter 4 Verbs Chapter.
10/24/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini.
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
Parts of Speech A Brief Review. Noun Person, Place, Thing, or Idea Common: begins with lower case letter (city) Proper: begins with capital letter (Detroit)
Word classes and part of speech tagging Chapter 5.
Speech and Language Processing Ch8. WORD CLASSES AND PART-OF- SPEECH TAGGING.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
PARTS OF SPEECHPARTS OF SPEECH. NOUNS Definition: A noun names a person, place, or thing. Example: John, computer, honesty, school A singular noun is.
Word classes and part of speech tagging 09/28/2004 Reading: Chap 8, Jurafsky & Martin Instructor: Rada Mihalcea Note: Some of the material in this slide.
Natural Language Processing
Parts of Speech Major source: Wikipedia. Adjectives An adjective is a word that modifies a noun or a pronoun, usually by describing it or making its meaning.
English Review for Final These are the chapters to review. In Textbook: Chapter 9 Nouns Chapter 10 Pronouns Chapter 11 Adjectives Chapter 12 Verbs Chapter.
Parts of Speech A Brief Review. Noun Person, Place, Thing, or Idea Common: begins with lower case letter (city) Proper: begins with capital letter (Detroit)
GoBack definitions Level 1 Parts of Speech GoBack is a memorization game; the teacher asks students definitions, and when someone misses one, you go back.
Part-of-speech tagging
Verb phrases Main reference: Randolph Quirk and Sidney Greenbaum, A University Grammar of English, Longman: London, (3.23 – 3.55)
PARTS OF SPEECH REVIEW: NOUNS A noun is a word that names a person, place, thing or an idea. There are several different categories of nouns:  Common.
Human Language Technology Part of Speech (POS) Tagging II Rule-based Tagging.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Basic Syntactic Structures of English CSCI-GA.2590 – Lecture 2B Ralph Grishman NYU.
Word classes and part of speech tagging Chapter 5.
Word classes categories of words categorised depending on meaning, function and how they are formed can be found in dictionaries nine word classes in English.
6/18/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini.
Parts of speech English Grade 9 Kaleena Ortiz PARTS OF SPEECH Noun Pronoun Adjective AdverbVerbPreposition Conjunction Interjection Click here for this.
Speech and Language Processing SLP Chapter 5. 10/31/1 2 Speech and Language Processing - Jurafsky and Martin 2 Today  Parts of speech (POS)  Tagsets.
Syntax Parts of Speech and Parts of the Sentence.
Lecture 9: Part of Speech
Parts of Speech Review.
Lecture 7 HMMs – the 3 Problems Forward Algorithm
English parts of speech
Natural Language Processing
Presentation transcript:

8. Word Classes and Part-of-Speech Tagging 2007 년 5 월 26 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.287 ~ 303

Origin of POS  Techne: a grammatical sketch of Greek which is written by Dionysius Thrax of Alexandria (c. 100 B.C.) or someone else.  Eight parts-of-speech: noun, verb, pronoun, preposition, adverb, conjunction, participle, article  The basis for practically all subsequent part-of-speech descriptions of Greek, Latin and most European language for the next 2000 years.

Recent Lists of POS  Recent POS list have much larger than before  Penn Treeback (Marcus et al., 1993): 45  Brown corpus (Francis, 1979; Francis and Kučera, 1982): 87  C7 tagset (Garside et al., 1997): 146  Synonym of POS  word classes  morphological classes  lexical tags

POS can be used in  Recognize or Produce pronunciation of words  CONtent (noun), conTENT (adjective)  Object (noun), obJECT (adjective)  ……  In information retrieval  Stemming  Select out nouns or other important words  ASR language model like class-based N-grams  Partial parsing

8.1 (Mostly) English Word Classes  Closed class types: have relatively fixed membership  Ex. Prepositions: new prepositions are rarely coined.  Generally function words (ex. of, it, and, or, ……) -Very short -Occur frequently -Play an important role in grammar  Open class types: have relatively updatable membership  Ex. Noun and verb: new words continually coined or borrowed from other language.  Four major open classes (but not all of human language have all of these) -Nouns -Verbs -Adjectives -Adverbs

Open Classes  Noun  Verb  Adjective  Adverb

Definition of Noun  Functional definition is not good  The name given to the lexical class in which the words for most people, places, or things occur  Bandwidth? Relationship? Pacing?  Semantic definition of noun  Thing like its ability to occur with determiners (a goat, its bandwidth, Plato’s Republic), to take (IBM’s annual revenue), and for most but not all nouns, to occur in the plural form (goats, abaci).

Grouping Noun  Uniqueness  Proper nouns: Regina, Colorado, IBM, ……  Common nouns: book, stair, apple, ……  Countable  Count nouns -Can occur in both the singular and plural: goat(s), relationship(s), …… -Can be counted: (one, two, ……) goat(s)  Mass nouns -Cannot be counted: two snows (x), two communisms (x) -Can appear without articles where singular count nouns cannot: Snow is white (o), Goat is white (x)

Verbs  Verbs have a number of morphological forms  Non-3 rd -person-sg: eat  3 rd -person-sg: eats  Progressive: eating  Past participle: eaten  Auxiliaries: subclass of English verbs

Adjectives  Terms that describe properties or qualities  Concept of color, age, value, ……  There are languages without adjectives. (ex. Chinese)

Adverbs  Directional adverbs, locative adverbs: specify the direction or location of some action  Ex. home, here, downhill  Degree adverbs: specify the extent of some action, process, or property  Ex. extremely, very, somewhat  Manner adverbs, temporal adverbs: describe the time that some action or event took place  Ex. yesterday, Monday  Some adverbs (ex. Monday) are tagged in some tagging schemes as nouns

Closed Classes  Prepositions: on, under, over, near, by, at, from, to, with  Determiners: a, an, the  Pronouns: she, who, I, others  Conjunctions: and, but, or, as, if, when  Auxiliary verbs: can, may, should, are  Particles: up, down, on, off, in, out, at, by  Numerals: one, two, three, first, second, third

Prepositions  Occur before noun phrases  Often indicating spatial or temporal relations  Literal (ex. on it, before then, by the house)  Metaphorical (on time, with gusto, beside herself)  Often indicate other relations as well  Ex. Hamlet was written by Shakespeare, and [from Shakespeare] “And I did laugh sans intermission an hour by his dial” Figure 8.1 Prepositions (and particles) of English from the CELEX on-line dictionary. Frequently counts are from the COBUILD 16 million word corpus

Particle  Often combines with a verb to form a larger unit called a phrasal verb  Come in: adjective  Come with: preposition  Come on: particle Figure 8.2 English single-word particles from Quirk et al. (1985).

Determiners (articles)  a, an: mark a noun phrase as indefinite  the: mark it as definite  this?, that?  COBUILD statistics out of 16 million words  the: 1,071,676  a: 413,887  an: 59,359

Conjunctions  Used to join two phrases, clauses, sentences  Coordinating conjunction: equal status -and, or, but  Subordinating conjunction: embedded status -that (ex. I thought that you might like some milk) -complementizers: Subordinating conjunctions like that which link a verb to its argument in this way (more: Chapter 9, 11) Figure 8.3 Coordinating and subordinating conjunctions of English from the CELEX on-line dictionary. Frequency counts are from the COBUILD 16 million word corpus.

Pronouns  A kind of shorthand for referring to some noun phrase or entity or event.  Personal pronouns: you, she, I, it, me, ……  Possessive pronouns: my, your, his, her, its, one’s, our, their, ……  Wh-pronouns: what, who, whom, whoever Figure 8.4 Pronouns of English from the CELEX on=line dictionary. Frequency counts are from the COBUILD 16 million word corpus.

Auxiliary Verbs  Words that mark certain semantic features of a main verb = modal verb  be: copula verb  do  have: perfect tenses  can: ability, possibility  may: permission, possibility  …… Figure 8.5 English modal verbs from the CELEX on-line dictionary. Frequency counts are from the COBUILD 16 million word corpus.

Other Closed Classes  Interjections  on, ah, hey, man, alas, ……  negatives  no, not, ……  politeness markers  please, thank you, ……  greetings  hello, goodbye, ……  existential there  There are two on the table

8.2 Tagsets for English  There are various tagsets for English.  Brown corpus (Francis, 1979; Francis and Kučera, 1982): 87 tags  Penn Treebank (Marcus et al., 1993): 45 tags  British National Corpus (Garside st al., 1997): 61 tags (C5 tagset)  C7 tagset: 164 tags  Which tagset to use for a particular application depends on how much information the application needs Figure 8.6 Penn Treebank part-of-speech tags (including punctuation)

8.3 Part-of-Speech Tagging  Definition: Process of assigning a POS or other lexical class marker to each word in a corpus.  Input: a string of words, tagset (ex. Book that flight, Penn Treebank tagset)  Output: a single best tag for each word (ex. Book/VB that/DT flight/NN./.)  Problem: resolve ambiguity → disambiguation  Ex. book (Hand me that book, Book that flight) Figure 8.7 The number of word types in Brown corpus by degree of ambiguity (after DeRose(1988))

Taggers  Rule-based taggers  Generally involve a large database of hand-written disambiguation rule  Ex. ENGTWOL (based on the Constraint Grammar architecture of Karlsson et al. (1995))  Stochastic taggers  Generally resolve tagging ambiguities by using a training corpus to compute the probability of a given word having a given tag in a given context.  Ex. HMM tagger(=Maximum Likelihood Tagger = Markov model tagger, based on the Hidden Markov Model)  Transformation-based tagger, Brill tagger (after Brill(1995))  Shares features of rule-based tagger and stochastic tagger  The rules are automatically induced from a previously tagged training corpus.

8.4 Rule-Based Part-of-Speech Tagging  Earliest algorithm  Based on two-stage architecture -First stage: assign each word a list of potential POS using dictionary -Second stage: winnow down the lists using hand-written disambiguation rule

ENGTWOL (Voutilainen, 1995)  lexicon  Based on two-level morphology  Using 56,000 entries for English word stems (Heikkilä, 1995)  Counting a word with multiple POS as separate entries Figure 8.8 Sample lexical entries from the ENGTWOL lexicon described in Voutilainen (1995) and Heikkilä (1995)

ENGTWOL – Process1  Process  First stage: Each word is run through the two-level lexicon transducer and the all possible POS are returned. -Ex. -PavlovPAVLOV N NOM SG PROPER -hadHAVE V PAST VFIN SVO - HAVE PCP2 SVO -shownSHOW PCP2 SVOO SVO SV -thatADV - PRON DEM SG - DET CENTRAL DEM SG - CS -salivationN NOM SG

ENGTWOL – Process2  Second stage -Eliminate tags that are inconsistent with the context using a set of about 1,100 constraints in negative way -Ex. -Adverbial-that rule -Given input: “that” -if  (+1 A/ADV/QUANT); /* if next word is adj, adverb, or quantifier */  (+2 SENT-LIM); /* and following which is a sentence boundary, */  (NOT -1 SVOC/A); /* and the previous word is not a verb like */  /* ‘consider’ which allows adjs as object complements */ -then eliminate non-ADV tags -else eliminate ADV tag  Also uses -Probabilistic constraints -Other syntactic information