POS Tagging and Context-Free Grammars

POS Tagging and Context-Free Grammars
CS 4705

From Words to Syntactic Structures
Words  morphological structures Cows, cowed, reformulation Ngrams  statistical co-occurrences The man/unicorn said, The man the POS  word classes DET N V, DET N DET Syntactic Constituents  word relationships S  NP VP, S  S conj S

POS Review: Words of the same class behave similarly
Are these words nouns or adjectives? a blue seat a child seat a very blue seat *a very child seat this seat is blue *this seat is child What are the word classes? How do we identify a word’s POS?

How many word classes are there?
A basic set: Open class/content words: Noun, Verb, Adjective, Adverb, Closed class/function words: Preposition, Pronoun, Determiner, Auxiliary and copular verbs, Particle, Conjunction

Nouns Words that describe people, places, things, events, abstractions, activities, … Hector, mirror, New York, cat, love, government Can take the possessive: Hector’s, the cat’s Can (usually) occur as plurals: governments, many New Yorks Can occur with determiners: the cat, Homer’s Hector Subclasses: Proper Nouns: Hector, New York Common Nouns: cat, dog, football Mass vs count nouns: enumerable or not (cat, sand)

Verbs Refer to actions, events, conditions, processes
Go, kicked, think, manage, trying Tense: when did the action,… take place? Present: I kick (simple), I am kicking (progressive), I have kicked (perfect), I have been kicking (perfect progressive) Past: I kicked, I was kicking, I had kicked, I had been kicking Future: I will kick, I will be kicking, I will have kicked, I will have been kicking Aspect: the nature of the action,… -- simple/indefinite, complete, continuing

Adjectives Describe properties or qualities
Pretty, red, careful, cat-like, wishful, silly

Adverbs Modify verbs or adverbs or ….
Directional or locative: here, upward Degree modifiers very, too Manner: slowly Temporals: today, now Are they adverbs or nouns?

Prepositions and Particles
Prepositions: indicate spatial or temporal relations To Boston, From Boston In, for, with, toward, into, by Particles: act like prepositions or adverbs but behave like semantic units with their verbs Test: can you move the prep/part and what follows to the front of the sentence? Prep: We ran up the hill. Up the hill we ran. Part: We ran up the bill. *Up the bill we ran.

Some particles with their verbs:
Run into (*Into Bill we ran) Find out (*Out the truth we found) Turn on (*On the light we turned) Throw up (*Up his dinner he threw)

Determiners Articles: the cat, a cat, an idiot
Possessive nouns/pronouns: her cat, Sally’s cat Numbers: five cats Indefinite pronouns: each cat, some cats Demonstrative pronouns: that cat, those cats

Conjunctions Coordinate: and, but Subordinate/complementizers:
…that the war is over, …because I love you, …unless you change your ways

Pronouns Personal: I, he,... Possessive: my, his,…
Indefinite: someone, everyone, anybody, nothing Interrogative or wh: who, whom,... And many more…

Auxiliary Verbs Indicate features of a main verb, such as tense and aspect Be (copula), have, do, can/will/shall/may (modal) He is silent, She has done that, We can help

And more… Interjections/discourse markers Existential there : There is a unicorn in the garden Greetings, politeness terms

Part-of-Speech Tagging
It’s useful to know the POS of words in a sentence Time/N flies/V like/Prep an/Det arrow/N Fruit/N flies/N like/V a/DET banana/N

POS can disambiguate Some words have only one POS tag: is, Mary, very, smallest Others have a single most likely tag: a, dog Many are more ambiguous: likes, bass But luckily….tags tend to co-occur regularly with other tags (e.g. DET N more likely than N DET) We can learn POS ngram probabilities P(t1|tn-1) from a tagged corpus just as we learn word ngram probabilities

Approaches to POS Tagging
Hand-written rules Statistical approaches (e.g. HMM-based taggers) Hybrid systems (e.g. Brill’s TBL: transformation-based learning)

Statistical POS Tagging
Goal: choose the best sequence of tags T for a sequence of words W in a sentence By Bayes Rule Since we can ignore P(W), we have

Statistical POS Tagging: the Prior
P(T) = P(t1, t2, …, tn-1 , tn) By the Chain Rule: = P(tn | t1, …, tn-1 ) P(t1, …, tn-1) = Making the Markov assumption: e.g., for bigrams,

Statistical POS Tagging: the (Lexical) Likelihood
P(W|T) = P(w1, w2, …, wn | t1, t2, …, tn ) From the Chain Rule: = Simplifying assumption: probability of a word depends only on its own tag P(wi|ti) So...

Estimate the Tag Priors and the Lexical Likelihoods from Corpus
Maximum-Likelihood Estimation For bigrams: P (ti|ti-1) = c(ti-1, ti )/c(ti-1 ) P(wi| ti) =

Brill Tagging: TBL Start with simple rules…learn better ones from tagged corpus Init: Start with a (hand) tagged corpus and remove the tags from a copy Tag each word in the copy with most likely POS (obtained from the original or another tagged corpus) Select a transformation that most improves tagging accuracy (compared to original) Re-tag the whole corpus applying just this

transformation and put it on the list of transformations
Compare the new tags of the copy to the original Again, select the transformation that most improves the accuracy of the (better) tags on the copy compared to the original Iterate until performance doesn’t improve (no transformation improves tagging accuracy) Result: tagging procedure (set of transformations) which can be applied to new, untagged text

Transformations Change tag a to tag b when….

An Example Time flies like an arrow.
Tag every word with most likely tag and score Time/N flies/V like/V an/DET arrow/N 2) For each template, try every instantiation and apply to tagged corpus and score e.g. Change V to N when the preceding word is tagged V Time/N flies/V like/N an/DET arrow/N e.g. Change V to Prep when the preceding word is tagged V

Time/N flies/V like/Prep an/DET arrow/N
3) Select the transformation rule that most improves the overall accuracy of POS assignments on the training corpus 4) Add the new rule to the tagging procedure list 5) Iterate from (2) until no transformation improves score Result: ordered list of transformation rules which can be applied sequentially to new, untagged data (after initializing with most common tag)

Methodology: Evaluation
For any NLP problem, we need to know how to evaluate our solutions Possible Gold Standards -- ceiling: Annotated naturally occurring corpus Human task performance (96-7%) How well do humans agree? Kappa statistic: avg pairwise agreement corrected for chance agreement Can be hard to obtain for some tasks

Baseline: how well does simple method do?
For tagging, most common tag for each word (91%) How much improvement do we get over baseline

Methodology: Error Analysis
Confusion matrix: E.g. which tags did we most often confuse with which other tags? How much of the overall error does each confusion account for?

More Complex Issues Tag indeterminacy: when ‘truth’ isn’t clear
Carribean cooking, child seat Tagging multipart words wouldn’t --> would/MD n’t/RB Unknown words Assume all tags equally likely Assume same tag distribution as all other singletons in corpus Use morphology, word length,….

How do we build larger constituents?
Phrases of the same category behave similarly What are the syntactic constituents? How do we identify them?

More Constituents and Rules
VP  V PP PP  Prep NP Prep  at | over | under | in | by

How to write a grammar Scenario: You are a lowly programmer in IT at a major financial institution in NYC. Your boss tells you the department needs to port data from an old database in which the person name field was not divided into multiple fields (title, firstname, middle name, surname, suffix) to a new modern database Your task: Separate these names into their proper fields for the new database What do you do?

Solutions Go through the old database names one at a time and type them into the new db Create a script with regular expressions to search for names with different components and write each out into a standard set of fields Build an FST to process the names and output field-separated components Write a Context Free Grammar to parse the names into their constituents

A Name Grammar Name  Title Firstname Middlename Surname Honorific
Name  Firstname Middlename Surname Honorific Name  Firstname Middlename Surname Name  Title Firstname MiddleInitial Surname Honorific …….

A Better Name Grammar Name  Title BaseName Suffix
Name  Basename Suffix Basename  Firstname Middle Surname Middle  Middlename Middle  MiddleInitial Title  Mr. | Mrs.| Ms.| Miss | Dr. | Gen. | … Suffix  Jr. | Sr. | Esq. | DDS | … …….

Next Class How do we use CFGs for parsing Read Chapter 11

POS Tagging and Context-Free Grammars

Similar presentations

Presentation on theme: "POS Tagging and Context-Free Grammars"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

POS Tagging and Context-Free Grammars

Similar presentations

Presentation on theme: "POS Tagging and Context-Free Grammars"— Presentation transcript:

Similar presentations

About project

Feedback