POS Tagging and Context-Free Grammars

Slides:



Advertisements
Similar presentations
Social Communication Three to Six Years Old. Goal: Use words, phrases and sentences to inform, direct, ask questions and express anticipation, imagination,
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.
APA Style Grammar. Verbs  Use active rather than passive voice, select tense and mood carefully  Poor: The survey was conducted in a controlled setting.
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
Day 1 The Great Depression Skills and Explanations Introductory Prepositional Phrase When you have a prepositional phrase that begins a sentence and it.
 Nouns name persons, places, things, or ideas.  Proper: CAPITAL LETTERS  Montana, Sally, United States of America  Common: no capital letters  state,
Statistical NLP: Lecture 3
Word Classes and Part-of-Speech (POS) Tagging
BİL711 Natural Language Processing
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
1 A Hidden Markov Model- Based POS Tagger for Arabic ICS 482 Presentation A Hidden Markov Model- Based POS Tagger for Arabic By Saleh Yousef Al-Hudail.
1 Words and the Lexicon September 10th 2009 Lecture #3.
1 Part of Speech Tagging (Chapter 5) September 2009 Lecture #6.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Word Classes and English Grammar.
CS 4705 Lecture 8 Word Classes and Part-of- Speech (POS) Tagging.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
CS Catching Up CS Porter Stemmer Porter Stemmer (1980) Used for tasks in which you only care about the stem –IR, modeling given/new distinction,
NLP and Speech 2004 English Grammar
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.
Part of speech (POS) tagging
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Outline of English Syntax.
Announcements Main CSE file server went down last night –Hand in your homework using ‘submit_cse467’ as soon as you can – no penalty if handed in today.
Stochastic POS tagging Stochastic taggers choose tags that result in the highest probability: P(word | tag) * P(tag | previous n tags) Stochastic taggers.
Chapter Section A: Verb Basics Section B: Pronoun Basics Section C: Parallel Structure Section D: Using Modifiers Effectively The Writer’s Handbook: Grammar.
Unit One: Parts of Speech
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
Phrases and Sentences: Grammar
Albert Gatt Corpora and Statistical Methods Lecture 9.
Chapter 4 Syntax Part II.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Parts of Speech and Functions of Words.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
ASPECTS OF LINGUISTIC COMPETENCE 4 SEPT 09, 2013 – DAY 6 Brain & Language LING NSCI Harry Howard Tulane University.
Natural Language Processing Lecture 6 : Revision.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Parts of Speech Notes. Part of Speech: Nouns  A naming word  Names a person, place, thing, idea, living creature, quality, or idea Examples: cowboy,
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Word classes and part of speech tagging Chapter 5.
Linguistic Essentials
Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin.
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
CSA2050 Introduction to Computational Linguistics Parsing I.
1 Context Free Grammars October Syntactic Grammaticality Doesn’t depend on Having heard the sentence before The sentence being true –I saw a unicorn.
Natural Language Processing
Parts of Speech Major source: Wikipedia. Adjectives An adjective is a word that modifies a noun or a pronoun, usually by describing it or making its meaning.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
SYNTAX.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
PARTS OF SPEECH The 8 “building blocks” of the English language…
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Word classes and part of speech tagging Chapter 5.
NATURAL LANGUAGE PROCESSING
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
Chapter 5 English Syntax: The Grammar of Words. What is syntax? the study of the structures of sentences combining words to create ‘all & only’ ‘well-formed’
 Nouns name persons, places, things, or ideas. 1. Proper: CAPITAL LETTERS  Montana, Sally, United States of America 2. Common: no capital.
Syntax Parts of Speech and Parts of the Sentence.
Descriptive Grammar – 2S, 2016 Mrs. Belén Berríos Droguett
Statistical NLP: Lecture 3
CSCI 5832 Natural Language Processing
Probabilistic and Lexicalized Parsing
Natural Language Processing
David Kauchak CS159 – Spring 2019
Presentation transcript:

POS Tagging and Context-Free Grammars CS 4705

From Words to Syntactic Structures Words  morphological structures Cows, cowed, reformulation Ngrams  statistical co-occurrences The man/unicorn said, The man the POS  word classes DET N V, DET N DET Syntactic Constituents  word relationships S  NP VP, S  S conj S

POS Review: Words of the same class behave similarly Are these words nouns or adjectives? a blue seat a child seat a very blue seat *a very child seat this seat is blue *this seat is child What are the word classes? How do we identify a word’s POS?

How many word classes are there? A basic set: Open class/content words: Noun, Verb, Adjective, Adverb, Closed class/function words: Preposition, Pronoun, Determiner, Auxiliary and copular verbs, Particle, Conjunction

Nouns Words that describe people, places, things, events, abstractions, activities, … Hector, mirror, New York, cat, love, government Can take the possessive: Hector’s, the cat’s Can (usually) occur as plurals: governments, many New Yorks Can occur with determiners: the cat, Homer’s Hector Subclasses: Proper Nouns: Hector, New York Common Nouns: cat, dog, football Mass vs count nouns: enumerable or not (cat, sand)

Verbs Refer to actions, events, conditions, processes Go, kicked, think, manage, trying Tense: when did the action,… take place? Present: I kick (simple), I am kicking (progressive), I have kicked (perfect), I have been kicking (perfect progressive) Past: I kicked, I was kicking, I had kicked, I had been kicking Future: I will kick, I will be kicking, I will have kicked, I will have been kicking Aspect: the nature of the action,… -- simple/indefinite, complete, continuing

Adjectives Describe properties or qualities Pretty, red, careful, cat-like, wishful, silly

Adverbs Modify verbs or adverbs or …. Directional or locative: here, upward Degree modifiers very, too Manner: slowly Temporals: today, now Are they adverbs or nouns?

Prepositions and Particles Prepositions: indicate spatial or temporal relations To Boston, From Boston In, for, with, toward, into, by Particles: act like prepositions or adverbs but behave like semantic units with their verbs Test: can you move the prep/part and what follows to the front of the sentence? Prep: We ran up the hill. Up the hill we ran. Part: We ran up the bill. *Up the bill we ran.

Some particles with their verbs: Run into (*Into Bill we ran) Find out (*Out the truth we found) Turn on (*On the light we turned) Throw up (*Up his dinner he threw)

Determiners Articles: the cat, a cat, an idiot Possessive nouns/pronouns: her cat, Sally’s cat Numbers: five cats Indefinite pronouns: each cat, some cats Demonstrative pronouns: that cat, those cats

Conjunctions Coordinate: and, but Subordinate/complementizers: …that the war is over, …because I love you, …unless you change your ways

Pronouns Personal: I, he,... Possessive: my, his,… Indefinite: someone, everyone, anybody, nothing Interrogative or wh: who, whom,... And many more…

Auxiliary Verbs Indicate features of a main verb, such as tense and aspect Be (copula), have, do, can/will/shall/may (modal) He is silent, She has done that, We can help

And more… Interjections/discourse markers Existential there : There is a unicorn in the garden Greetings, politeness terms

Part-of-Speech Tagging It’s useful to know the POS of words in a sentence Time/N flies/V like/Prep an/Det arrow/N Fruit/N flies/N like/V a/DET banana/N

POS can disambiguate Some words have only one POS tag: is, Mary, very, smallest Others have a single most likely tag: a, dog Many are more ambiguous: likes, bass But luckily….tags tend to co-occur regularly with other tags (e.g. DET N more likely than N DET) We can learn POS ngram probabilities P(t1|tn-1) from a tagged corpus just as we learn word ngram probabilities

Approaches to POS Tagging Hand-written rules Statistical approaches (e.g. HMM-based taggers) Hybrid systems (e.g. Brill’s TBL: transformation-based learning)

Statistical POS Tagging Goal: choose the best sequence of tags T for a sequence of words W in a sentence By Bayes Rule Since we can ignore P(W), we have

Statistical POS Tagging: the Prior P(T) = P(t1, t2, …, tn-1 , tn) By the Chain Rule: = P(tn | t1, …, tn-1 ) P(t1, …, tn-1) = Making the Markov assumption: e.g., for bigrams,

Statistical POS Tagging: the (Lexical) Likelihood P(W|T) = P(w1, w2, …, wn | t1, t2, …, tn ) From the Chain Rule: = Simplifying assumption: probability of a word depends only on its own tag P(wi|ti) So...

Estimate the Tag Priors and the Lexical Likelihoods from Corpus Maximum-Likelihood Estimation For bigrams: P (ti|ti-1) = c(ti-1, ti )/c(ti-1 ) P(wi| ti) =

Brill Tagging: TBL Start with simple rules…learn better ones from tagged corpus Init: Start with a (hand) tagged corpus and remove the tags from a copy Tag each word in the copy with most likely POS (obtained from the original or another tagged corpus) Select a transformation that most improves tagging accuracy (compared to original) Re-tag the whole corpus applying just this

transformation and put it on the list of transformations Compare the new tags of the copy to the original Again, select the transformation that most improves the accuracy of the (better) tags on the copy compared to the original Iterate until performance doesn’t improve (no transformation improves tagging accuracy) Result: tagging procedure (set of transformations) which can be applied to new, untagged text

Transformations Change tag a to tag b when….

An Example Time flies like an arrow. Tag every word with most likely tag and score Time/N flies/V like/V an/DET arrow/N 2) For each template, try every instantiation and apply to tagged corpus and score e.g. Change V to N when the preceding word is tagged V Time/N flies/V like/N an/DET arrow/N e.g. Change V to Prep when the preceding word is tagged V

Time/N flies/V like/Prep an/DET arrow/N 3) Select the transformation rule that most improves the overall accuracy of POS assignments on the training corpus 4) Add the new rule to the tagging procedure list 5) Iterate from (2) until no transformation improves score Result: ordered list of transformation rules which can be applied sequentially to new, untagged data (after initializing with most common tag)

Methodology: Evaluation For any NLP problem, we need to know how to evaluate our solutions Possible Gold Standards -- ceiling: Annotated naturally occurring corpus Human task performance (96-7%) How well do humans agree? Kappa statistic: avg pairwise agreement corrected for chance agreement Can be hard to obtain for some tasks

Baseline: how well does simple method do? For tagging, most common tag for each word (91%) How much improvement do we get over baseline

Methodology: Error Analysis Confusion matrix: E.g. which tags did we most often confuse with which other tags? How much of the overall error does each confusion account for?

More Complex Issues Tag indeterminacy: when ‘truth’ isn’t clear Carribean cooking, child seat Tagging multipart words wouldn’t --> would/MD n’t/RB Unknown words Assume all tags equally likely Assume same tag distribution as all other singletons in corpus Use morphology, word length,….

How do we build larger constituents? Phrases of the same category behave similarly What are the syntactic constituents? How do we identify them?

Basic Constituents and Rewrite Rules S  NP VP NP  DET NOM NP  PropN NOM  N | NOM DET  a | an | the PropN  George | Morocco N  cat | box VP  V NP VP  V V  exploded

More Constituents and Rules VP  V PP PP  Prep NP Prep  at | over | under | in | by

How to write a grammar Scenario: You are a lowly programmer in IT at a major financial institution in NYC. Your boss tells you the department needs to port data from an old database in which the person name field was not divided into multiple fields (title, firstname, middle name, surname, suffix) to a new modern database Your task: Separate these names into their proper fields for the new database What do you do?

Solutions Go through the old database names one at a time and type them into the new db Create a script with regular expressions to search for names with different components and write each out into a standard set of fields Build an FST to process the names and output field-separated components Write a Context Free Grammar to parse the names into their constituents

A Name Grammar Name  Title Firstname Middlename Surname Honorific Name  Firstname Middlename Surname Honorific Name  Firstname Middlename Surname Name  Title Firstname MiddleInitial Surname Honorific …….

A Better Name Grammar Name  Title BaseName Suffix Name  Basename Suffix Basename  Firstname Middle Surname Middle  Middlename Middle  MiddleInitial Title  Mr. | Mrs.| Ms.| Miss | Dr. | Gen. | … Suffix  Jr. | Sr. | Esq. | DDS | … …….

Next Class How do we use CFGs for parsing Read Chapter 11