CMSC 723 / LING 645: Intro to Computational Linguistics November 3, 2004 Lecture 9 (Dorr): Word Classes, POS Tagging (Chapter 8) Intro to Syntax (Start.

Slides:



Advertisements
Similar presentations
Syntactic analysis using Context Free Grammars. Analysis of language Morphological analysis – Chairs, Part Of Speech (POS) tagging – The/DT man/NN left/VBD.
Advertisements

Outline Why part of speech tagging? Word classes
Language Modeling Speech Recognition is enhanced if the applications are able to verify the grammatical structure of the speech This requires an understanding.
Word Classes and Part-of-Speech (POS) Tagging
Chapter 8. Word Classes and Part-of-Speech Tagging From: Chapter 8 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech.
BİL711 Natural Language Processing
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
February 2007CSA3050: Tagging II1 CSA2050: Natural Language Processing Tagging 2 Rule-Based Tagging Stochastic Tagging Hidden Markov Models (HMMs) N-Grams.
Natural Language Processing Lecture 8—9/24/2013 Jim Martin.
September PART-OF-SPEECH TAGGING Universita’ di Venezia 1 Ottobre 2003.
Hidden Markov Model (HMM) Tagging  Using an HMM to do POS tagging  HMM is a special case of Bayesian inference.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 22: 11/9.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Part-Of-Speech (POS) Tagging.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Word Classes and English Grammar.
Artificial Intelligence 2005/06 From Syntax to Semantics.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
POS based on Jurafsky and Martin Ch. 8 Miriam Butt October 2003.
NLP and Speech 2004 English Grammar
POS Tagging HMM Taggers (continued). Today Walk through the guts of an HMM Tagger Address problems with HMM Taggers, specifically unknown words.
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
Syntax and Context-Free Grammars CMSC 723: Computational Linguistics I ― Session #6 Jimmy Lin The iSchool University of Maryland Wednesday, October 7,
Part of speech (POS) tagging
1 PART-OF-SPEECH TAGGING. 2 Topics of the next three lectures Tagsets Rule-based tagging Brill tagger Tagging with Markov models The Viterbi algorithm.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Grammar Sentence Constructs.
1 CONTEXT-FREE GRAMMARS. NLE 2 Syntactic analysis (Parsing) S NPVP ATNNSVBD NP AT NNthechildrenate thecake.
Word classes and part of speech tagging Chapter 5.
Announcements Main CSE file server went down last night –Hand in your homework using ‘submit_cse467’ as soon as you can – no penalty if handed in today.
Stochastic POS tagging Stochastic taggers choose tags that result in the highest probability: P(word | tag) * P(tag | previous n tags) Stochastic taggers.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Context Free Grammars Reading: Chap 12-13, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
8. Word Classes and Part-of-Speech Tagging 2007 년 5 월 26 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.287 ~ 303.
Context Free Grammars Reading: Chap 12-13, Jurafsky & Martin This slide set was adapted from J. Martin and Rada Mihalcea.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Parts of Speech Sudeshna Sarkar 7 Aug 2008.
Lecture 6 POS Tagging Methods Topics Taggers Rule Based Taggers Probabilistic Taggers Transformation Based Taggers - Brill Supervised learning Readings:
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
Natural Language Processing Lecture 6 : Revision.
Fall 2005 Lecture Notes #8 EECS 595 / LING 541 / SI 661 Natural Language Processing.
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
10/24/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini.
Context Free Grammars Reading: Chap 9, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Rada Mihalcea.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
Word classes and part of speech tagging Chapter 5.
Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin.
Speech and Language Processing Ch8. WORD CLASSES AND PART-OF- SPEECH TAGGING.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Word classes and part of speech tagging 09/28/2004 Reading: Chap 8, Jurafsky & Martin Instructor: Rada Mihalcea Note: Some of the material in this slide.
CSA2050 Introduction to Computational Linguistics Parsing I.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
1 Context Free Grammars October Syntactic Grammaticality Doesn’t depend on Having heard the sentence before The sentence being true –I saw a unicorn.
Natural Language Processing
CSA3202 Human Language Technology HMMs for POS Tagging.
Artificial Intelligence 2004
February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking.
Natural Language Processing Lecture 14—10/13/2015 Jim Martin.
Human Language Technology Part of Speech (POS) Tagging II Rule-based Tagging.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
CSA3050: NLP Algorithms Sentence Grammar NLP Algorithms.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Word classes and part of speech tagging Chapter 5.
Speech and Language Processing SLP Chapter 5. 10/31/1 2 Speech and Language Processing - Jurafsky and Martin 2 Today  Parts of speech (POS)  Tagsets.
Context Free Grammars. Slide 1 Syntax Syntax = rules describing how words can connect to each other * that and after year last I saw you yesterday colorless.
Natural Language Processing Vasile Rus
Lecture 9: Part of Speech
CSC 594 Topics in AI – Natural Language Processing
Lecture 6: Part of Speech Tagging (II): October 14, 2004 Neal Snider
David Kauchak CS159 – Spring 2019
Presentation transcript:

CMSC 723 / LING 645: Intro to Computational Linguistics November 3, 2004 Lecture 9 (Dorr): Word Classes, POS Tagging (Chapter 8) Intro to Syntax (Start chapter 9) Prof. Bonnie J. Dorr Dr. Christof Monz TA: Adam Lee

Administrivia  Assignment 2 extension: Now due one week later. NOVEMBER 17, 2004

Word Classes and Part-of-Speech Tagging  Definition and Example  Motivation  Word Classes  Rule-based Tagging  Stochastic Tagging  Transformation-Based Tagging  Tagging Unknown Words

Definition “The process of assigning a part-of-speech or other lexical class marker to each word in a corpus” (Jurafsky and Martin) the girl kissed the boy on the cheek WORDS TAGS N V P DET

An Example the girl kiss the boy on the cheek LEMMATAG +DET +NOUN +VPAST +DET +NOUN +PREP +DET +NOUN the girl kissed the boy on the cheek WORD From:

Motivation  Speech synthesis — pronunciation  Speech recognition — class-based N-grams  Information retrieval — stemming, selection high-content words  Word-sense disambiguation  Corpus analysis of language & lexicography

Word Classes  Basic word classes: Noun, Verb, Adjective, Adverb, Preposition, …  POS based on morphology and syntax  Open vs. Closed classes –Open: Nouns, Verbs, Adjectives, Adverbs. –Closed: determiners: a, an, the pronouns: she, he, I prepositions: on, under, over, near, by, …

Open Class Words  Every known human language has nouns and verbs  Nouns: people, places, things –Classes of nouns proper vs. common count vs. mass  Verbs: actions and processes  Adjectives: properties, qualities  Adverbs: hodgepodge! –Unfortunately, John walked home extremely slowly yesterday

Closed Class Words  Idiosyncratic  Examples: –prepositions: on, under, over, … –particles: up, down, on, off, … –determiners: a, an, the, … –pronouns: she, who, I,.. –conjunctions: and, but, or, … –auxiliary verbs: can, may should, … –numerals: one, two, three, third, …

Prepositions from CELEX

English Single-Word Particles

Pronouns in CELEX

Conjunctions

Auxiliaries

Word Classes: Tag Sets  Vary in number of tags: a dozen to over 200  Size of tag sets depends on language, objectives and purpose –Some tagging approaches (e.g., constraint grammar based) make fewer distinctions e.g., conflating prepositions, conjunctions, particles –Simple morphology = more ambiguity = fewer tags

Word Classes: Tag set example PRP PRP$

Example of Penn Treebank Tagging of Brown Corpus Sentence The/DT grand/JJ jury/NN commented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS./. VB DT NN. Book that flight. VBZ DT NN VB NN ? Does that flight serve dinner ?

The Problem  Words often have more than one word class: this –This is a nice day = PRP –This day is nice = DT –You can go this far = RB

Word Class Ambiguity (in the Brown Corpus) Unambiguous (1 tag): 35,340 Ambiguous (2-7 tags): 4,100 2 tags3,760 3 tags264 4 tags61 5 tags12 6 tags2 7 tags1 (Derose, 1988)

Part-of-Speech Tagging  Rule-Based Tagger: ENGTWOL  Stochastic Tagger: HMM-based  Transformation-Based Tagger: Brill

Rule-Based Tagging  Basic Idea: –Assign all possible tags to words –Remove tags according to set of rules of type: if word+1 is an adj, adv, or quantifier and the following is a sentence boundary and word-1 is not a verb like “consider” then eliminate non-adv else eliminate adv. –Typically more than 1000 hand-written rules, but may be machine-learned.

Sample ENGTWOL Lexicon

Stage 1 of ENGTWOL Tagging  First Stage: Run words through Kimmo-style morphological analyzer to get all parts of speech.  Example: Pavlov had shown that salivation … PavlovPAVLOV N NOM SG PROPER hadHAVE V PAST VFIN SVO HAVE PCP2 SVO shownSHOW PCP2 SVOO SVO SV thatADV PRON DEM SG DET CENTRAL DEM SG CS salivationN NOM SG

Stage 2 of ENGTWOL Tagging  Second Stage: Apply constraints.  Constraints used in negative way.  Example: Adverbial “that” rule Given input: “that” If (+1 A/ADV/QUANT) (+2 SENT-LIM) (NOT -1 SVOC/A) Then eliminate non-ADV tags Else eliminate ADV

Stochastic Tagging  Based on probability of certain tag occurring given various possibilities  Necessitates a training corpus  No probabilities for words not in corpus.  Training corpus may be too different from test corpus.

Stochastic Tagging (cont.) Simple Method: Choose most frequent tag in training text for each word! –Result: 90% accuracy –Why? –Baseline: Others will do better –HMM is an example

HMM Tagger  Intuition: Pick the most likely tag for this word.  HMM Taggers choose tag sequence that maximizes this formula: –P(word|tag) × P(tag|previous n tags)  Let T = t 1,t 2,…,t n Let W = w 1,w 2,…,w n Find POS tags that generate a sequence of words, i.e., look for most probable sequence of tags T underlying the observed words W.

Start with Bigram-HMM Tagger  argmax T P(T|W)  argmax T P(T)P(W|T)  argmax t P(t 1 …t n )P(w 1 …w n |t 1 …t n )  argmax t [P(t 1 )P(t 2 |t 1 )…P(t n |t n-1 )][P(w 1 |t 1 )P(w 2 |t 2 )…P(w n |t n )]  To tag a single word: t i = argmax j P(t j |t i-1 )P(w i |t j )  How do we compute P(t i |t i-1 )? –c(t i-1 t i )/c(t i-1 )  How do we compute P(w i |t i )? –c(w i,t i )/c(t i )  How do we compute the most probable tag sequence? –Viterbi

An Example  Secretariat/NNP is/VBZ expected/VBN to/TO race/VB tomorrow/NN  People/NNS continue/VBP to/TO inquire/VB the DT reason/NN for/IN the/DT race/NN for/IN outer/JJ space/NN  to/TOrace/??? the/DTrace/???  t i = argmax j P(t j |t i-1 )P(w i |t j ) –max[P(VB|TO)P(race|VB), P(NN|TO)P(race|NN)]  Brown: –P(NN|TO) =.021 × P(race|NN) =.00041= –P(VB|TO) =.34 × P(race|VB) =.00003=.00001

An Early Approach to Statistical POS Tagging  PARTS tagger (Church, 1988): Stores probability of tag given word instead of word given tag.  P(tag|word) × P(tag|previous n tags)  Compare to: P(word|tag) × P(tag|previous n tags)  Consider this alternative (on your own).

Transformation-Based Tagging (Brill Tagging)  Combination of Rule-based and stochastic tagging methodologies –Like rule-based because rules are used to specify tags in a certain environment –Like stochastic approach because machine learning is used—with tagged corpus as input  Input: –tagged corpus –dictionary (with most frequent tags)

Transformation-Based Tagging (cont.)  Basic Idea: –Set the most probable tag for each word as a start value –Change tags according to rules of type “if word-1 is a determiner and word is a verb then change the tag to noun” in a specific order  Training is done on tagged corpus: –Write a set of rule templates –Among the set of rules, find one with highest score –Continue from 2 until lowest score threshold is passed –Keep the ordered set of rules  Rules make errors that are corrected by later rules

TBL Rule Application  Tagger labels every word with its most-likely tag –For example: race has the following probabilities in the Brown corpus: P(NN|race) =.98 P(VB|race)=.02  Transformation rules make changes to tags –“Change NN to VB when previous tag is TO” … is/VBZ expected/VBN to/TO race/NN tomorrow/NN becomes … is/VBZ expected/VBN to/TO race/VB tomorrow/NN

TBL: Rule Learning  2 parts to a rule –Triggering environment –Rewrite rule  The range of triggering environments of templates (from manning & Schutze 1999:363) Schemat i-3 t i-2 t i-1 t i t i+1 t i+2 t i+3 1* 2* 3* 4* 5* 6* 7* 8* 9*

TBL: The Algorithm  Step 1: Label every word with most likely tag (from dictionary)  Step 2: Check every possible transformation & select one which most improves tagging  Step 3: Re-tag corpus applying the rules  Repeat 2-3 until some criterion is reached, e.g., X% correct with respect to training corpus  RESULT: Sequence of transformation rules

TBL: Rule Learning (cont.)  Problem: Could apply transformations ad infinitum!  Constrain the set of transformations with “templates”: –Replace tag X with tag Y, provided tag Z or word Z’ appears in some position  Rules are learned in ordered sequence  Rules may interact.  Rules are compact and can be inspected by humans

Templates for TBL

TBL: Problems  First 100 rules achieve 96.8% accuracy First 200 rules achieve 97.0% accuracy  Execution Speed: TBL tagger is slower than HMM approach  Learning Speed: Brill’s implementation can take over a day (600k tokens) BUT … (1) Learns small number of simple, non-stochastic rules (2) Can be made to work faster with FST (3) Best performing algorithm on unknown words

Tagging Unknown Words  New words added to (newspaper) language 20+ per month  Plus many proper names …  Increases error rates by 1-2%  Method 1: assume they are nouns  Method 2: assume the unknown words have a probability distribution similar to words only occurring once in the training set.  Method 3: Use morphological information, e.g., words ending with –ed tend to be tagged VBN.

Evaluation  The result is compared with a manually coded “Gold Standard” –Typically accuracy reaches 96-97% –This may be compared with result for a baseline tagger (one that uses no context).  Important: 100% is impossible even for human annotators.

Why do we need to learn POS?  Grammar induction for different languages  Parser induction for different languages Note: Grammars and Parsing are the next two topics.

What is Grammar?  A vogue in the 19 th century to describe the structures of an area of knowledge (e.g., Busby’s “ A Grammar of Music” Field’s “A Grammar of Colouring”)  The meaning of grammar has changed: –Used to be: “A listing of principles or structures” –Now: “principles or structures as a field of inquiry” –The grammar of syntax  Children’s syntax  Not meaning

Who cares?  Grammar checkers  Question answering / database access  Information extraction  Generation  Translation

Syntax  We just covered: POS categories  To be covered: –Constituency –Grammatical relations –Subcategorization and dependencies

What is a constituent?  The idea: groups of words may behave as a single unit or phrase  Examples: –“she” –“Michael” –“well-weathered three-story structure”  How can we model the constituency? –With context-free grammars

Context Free Grammars  Chomsky (1956) Backus (1959)  Capture constituents and ordering –Need something else for grammatical relations and dependency relations  Consists of –Set of terminals (either lexical items or POS) –Set of non-terminal (the constituent of language) –Sets of rules of the form A → , where  is a string of zero or more terminals and non-terminals  They are equivalent to Backus-Naur form grammars

Context-Free Grammars  Set of rules (productions) – expressing the way symbols of the language can be grouped together –NP → Det Nominal –NP → Proper Noun –Nominal → Noun | Noun Nominal  Lexicon –Det → a –Det → the –Noun → flight

Another Example S → NP VP NP → Det Nominal Nominal → Noun VP → V Det → a Noun → flight V → left

Grammar L 0

Lexicon L 0

Derivations and Trees S NPVP NP Nom Noun DetVerbPro flightmorninga preferI [ S [ NP [ Pro I]] [ VP [ V prefer] [ NP [ Det a] [ Nom [ N morning] [ N flight]]]]] S → NP VP, NP → Pro, Pro → I, VP → V NP, V → prefer, NP → Det Nom, Det → a, Nom → Noun Nom, Noun → morning, Noun → flight

Formal Definition of CFG  Set of non-terminal symbols N  Set of terminals   Set of productions A →  A  N,  -string  (  N)*  A designated start symbol

As Opposed to What?  Regular expressions  Context sensitive grammars  Turing Machines

Chomsky Hierarchy

Rule Skeletons for Chomsky Hierarchy

What does Context-Free Mean? Lefthand side of rule sits there all by itself!

Grammar Development

Key Constituents  Sentences  Noun phrases  Verb phrases  Prepositional phrases

Sentence Types  Declaratives S → NP VP (e.g., John left)  Imperatives S → VP (e.g., Leave!)  Yes-No Questions S → Aux NP VP (e.g., Did John leave?)  WH Questions S → Wh-word VP (e.g., who left?) S → Wh –word Aux NP VP (e.g., When did John leave? )

Noun Phrase  Head  Pre-nominal modifiers  Post-nominal modifiers

Before the Noun  Determiner: a, the, that, this, those, any, some  No determiner (e.g., in plural, mass nouns “dinner”)  Predeterminers: all (e.g., all the flights)  Postdeterminers: cardinals, ordinals, quantifiers: one, two; first, second, next, last, past, other, another; many, (a) few, several, much, a little  Adjectives: a first-class fare, a nonstop flight, the longest layover  AP (Adjective phrase): the least expensive fare  NP → (Det) (Card) (Ord) (Quant) (AP) Nominal

After the Noun  Prepositional phrases any stopovers [for Delta seven fifty one] all flights [from Cleveland] [to Newark] Nominal → Nominal PP (PP) (PP)  Non-finite clauses: gerundive, -ed, infinitive Any flights arriving after 11:00 am.  Relative clauses A flight that serves breakfast

Gerunds  any flights [arriving after ten p.m]  General Form: Nominal → Nominal GerundVP  GerundVP → GerundV NP | GerundV PP | GerundV | GerundV NP PP  GerundV → being | preferring | arriving …

Infinitives and –ed forms  the last flight to arrive in Boston  I need to have dinner served  which is the aircraft used by this flight?

Postnominal relative clauses  Restrictive relative clauses: –A flight that serves breakfast –Flights that leave in the morning –The United flight that arrives in San Jose at ten p.m.  Rules: –Nominal → Nominal RelClause –RelClause → (who | that) VP

Combining post-modifiers  A flight from Phoenix to Detroit leaving Monday evening  Evening flights from Nashville to Houston that serve dinner  The earliest American Airlines flight that I can get

Recursive Structure  Rules where the non-terminal on the left- hand side also appears on the right-hand side. –NP → NP PP (The flight to Boston) –VP → VP PP (departed Miami at noon)

More Recursive Structures  Allow us to do the following: –Flights to Miami –Flights to Miami from Boston –Flights to Miami from Boston in April –Flights to Miami from Boston in April on Friday –Flights to Miami from Boston in April on Friday under $300 –….

Conjunctions  Any phrasal constituent can be conjoined with a constituent of the same type to form a new constituent of that type –NP → NP and NP –S → S and S  X → X and X

Next Time  Finish Chapters 8, 9  Read Chapter 10.