Some Advances in Transformation-Based Part of Speech Tagging

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.
Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
Natural Language Processing Projects Heshaam Feili
BİL711 Natural Language Processing
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
Introduction to Linguistics n About how many words does the average 17 year old know?
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
1 A Hidden Markov Model- Based POS Tagger for Arabic ICS 482 Presentation A Hidden Markov Model- Based POS Tagger for Arabic By Saleh Yousef Al-Hudail.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
1 A Sentence Boundary Detection System Student: Wendy Chen Faculty Advisor: Douglas Campbell.
Part-of-speech Tagging cs224n Final project Spring, 2008 Tim Lai.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
Part-of-Speech (POS) tagging See Eric Brill “Part-of-speech tagging”. Chapter 17 of R Dale, H Moisl & H Somers (eds) Handbook of Natural Language Processing,
Tagging – more details Reading: D Jurafsky & J H Martin (2000) Speech and Language Processing, Ch 8 R Dale et al (2000) Handbook of Natural Language Processing,
Brill’s Tagger from UNIX Natural Language Understanding CAP6640 Spring 2005.
Boosting Applied to Tagging and PP Attachment By Aviad Barzilai.
Part of speech (POS) tagging
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging and Chunking with Maximum Entropy Model Sandipan Dandapat.
Participles A participle is a form of a verb that acts as an adjective. –The crying woman left the movie theater. –The frustrated child ran away from home.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Parts of Speech Sudeshna Sarkar 7 Aug 2008.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
A Survey of NLP Toolkits Jing Jiang Mar 8, /08/20072 Outline WordNet Statistics-based phrases POS taggers Parsers Chunkers (syntax-based phrases)
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.
Jeopardy Unit 2 – Changes in My World Embedded Assessment 1 Vocabulary Review.
Phonemes A phoneme is the smallest phonetic unit in a language that is capable of conveying a distinction in meaning. These units are identified within.
Albert Gatt Corpora and Statistical Methods Lecture 10.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Discriminative Models for Spoken Language Understanding Ye-Yi Wang, Alex Acero Microsoft Research, Redmond, Washington USA ICSLP 2006.
Hindi Parts-of-Speech Tagging & Chunking Baskaran S MSRI.
Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.
Transformation-Based Learning Advanced Statistical Methods in NLP Ling 572 March 1, 2012.
13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.
Word classes and part of speech tagging Chapter 5.
Tokenization & POS-Tagging
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.
Artificial Intelligence: Natural Language
Natural Language Processing
March 2006Introduction to Computational Linguistics 1 CLINT Tokenisation.
Supertagging CMSC Natural Language Processing January 31, 2006.
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
Automatic Grammar Induction and Parsing Free Text - Eric Brill Thur. POSTECH Dept. of Computer Science 심 준 혁.
POS Tagger and Chunker for Tamil
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
POS Tagging1 POS Tagging 1 POS Tagging Rule-based taggers Statistical taggers Hybrid approaches.
Modified from Diane Litman's version of Steve Bird's notes 1 Rule-Based Tagger The Linguistic Complaint –Where is the linguistic knowledge of a tagger?
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Utilizing vector models for automatic text lemmatization Ladislav Gallay Supervisor: Ing. Marián Šimko, PhD. Slovak University of Technology Faculty of.
Word classes and part of speech tagging Chapter 5.
2 pt 3 pt. 4 pt 5pt 1 pt 2 pt 3 pt 4 pt 5 pt 1 pt 2pt 3 pt 4pt 5 pt 1pt 2pt 3 pt 4 pt 5 pt 1 pt 2 pt 3 pt 4pt 5 pt 1pt Parts of speech PunctuationVerbal's.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
Verbals. Gerunds, infinitives, and participles, are words that originate from verbs. They can be confusing because they are like verbs and at the same.
1. Review of last Friday (Form, Function, Fluency)
Maximum Entropy Models and Feature Engineering CSCI-GA.2591
Welcome 6th Grade Class To
Using Context Clues L 7.4a: Use context (e.g., the overall meaning of a sentence or paragraph; a word's position or function in a sentence) as a clue.
PREPOSITIONAL PHRASES
Hindi POS Tagger By Naveen Sharma ( )
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

Some Advances in Transformation-Based Part of Speech Tagging Eric Brill A Maximum Entropy Approach to Identifying Sentence Boundaries Jeffrey C. Reynar and Adwait Ratnaparkhi Presenter Sawood Alam <salam@cs.odu.edu>

Some Advances in Transformation-Based Part of Speech Tagging Spoken Language Systems Group Laboratory for Computer Science Massachusetts Institute of Technology Cambridge, Massachusetts 02139brill@goldilocks.lcs.mit.edu

Introduction Stochastic tagging Trainable rule-based tagger Relevant linguistic information with simple non-stochastic rules Lexical relationship in tagging Rule-based approach to tagging unknown words Extended into a k-best tagger

Markov-Model Based Taggers Tag sequence that maximizes Prob(word|tag) * Prob(tag|previous n tags)

Stochastic Tagging Avoid laborious manual rule construction Linguistic information is only captured indirectly

Transformation-Based Error-Driven Learning

An Earlier Transformation-Based Tagger Initially assign most likely tag based on training corpus Unknown word is tagged based on some features Change tag a to b when: The preceding/following word is tagged z The word two before/after is tagged z One of the two/three preceding/following words is tagged z The preceding word is tagged z and the following word is tagged w The preceding/following word is tagged z and the word two before/after is tagged w Example: change from noun to verb if previous word is a modal

Lexicalizing the Tagger Change tag a to tag b when: The preceding/following word is w The word two before/after is w One of the two preceding/following words is w The current word is w and the preceding/following word is x The current word is w and the preceding/following word is tagged z Example: change from preposition to adverb if the word two positions to the right is "as“ from non-3rd person singular present verb to base form verb if one of the previous two words is "n’t"

Comparison of Tagging Accuracy With No Unknown Words Method Training Corpus Size (Words) # of Rules or Context. Probs. Acc. (%) Stochastic 64 K 6,170 96.3 1 Million 10,000 96.7 Rule-Based w/o Lex. Rules 600 K 219 96.9 Rule-Based With Lex. Rules 267 97.2

Unknown Words Change the tag of an unknown word (from X) to Y if: Deleting the prefix x, |x| <= 4, results in a word (x is any string of length 1 to 4) The first (1,2,3,4) characters of the word are x Deleting the suffix x, |x| <= 4, results in a word The last (1,2,3,4) characters of the word are x Adding the character string x as a suffix results in a word (|x| <= 4) Adding the character string x as a prefix results in a word (|x| <= 4) Word W ever appears immediately to the left/right of the word Character Z appears in the word

Unknown Words Learning Change tag: From common noun to plural common noun if the word has suffix "-s" From common noun to number if the word has character ". " From common noun to adjective if the word has character "-" From common noun to past participle verb if the word has suffix "-ed" From common noun to gerund or present participle verb if the word has suffix "-ing" To adjective if adding the suffix "-ly" results in a word To adverb if the word has suffix "-ly" From common noun to number if the word "$" ever appears immediately to the left From common noun to adjective if the word has suffix "-al" From noun to base form verb if the word "would" ever appears immediately to the left

K-Best Tags Modify "change" to "add" in the transformation templates

k-Best Tagging Results # of Rules Accuracy Avg. # of tags per word 96.5 1.00 50 96.9 1.02 100 97.4 1.04 150 97.9 1.10 200 98.4 1.19 250 99.1 1.50

Future Work Apply these techniques to other problems Learning pronunciation networks for speech recognition Learning mappings between sentences and semantic representations

A Maximum Entropy Approach to Identifying Sentence Boundaries Jeffrey C. Reynar and Adwait Ratnaparkhi Department of Computer and Information Science University of Pennsylvania Philadelphia, Pennsylvania~ USA {jcreynar, adwait}@unagi.cis.upenn.edu

Introduction Many freely available natural language processing tools require their input to be divided into sentences, but make no mention of how to accomplish this. Punctuation marks, such as ., ?, and ! might be ambiguous. Issues with abbreviations: E.g. The president lives in Washington, D.C.

Previous Work to disambiguate sentence boundaries they use a decision tree (99.8% accuracy on Brown corpus) or a neural network (98.5% accuracy on WSJ corpus)

Approach Potential sentence boundary (., ? and !) Contextual information The Prefix The Suffix The presence of particular characters in the Prefix or Suffix Whether the Candidate is an honorific (e.g. Ms., Dr., Gen.) Whether the Candidate is a corporate designator (e.g. Corp., S.p.A., L.L.C.) Features of the word left/right of the Candidate List of abbreviations

H(p) = - Σp(b,c) log p(b,c) Maximum Entropy H(p) = - Σp(b,c) log p(b,c) Under following constraints: Σ p(b,c) * fj(b,c) = Σp'(b,c) * fj(b,c), 1 <= j <= k p(yes|c) > 0.5 p(yes|c) = p(yes|c) / (p(yes|c) + p(no|c))

System Performance WJS Brown Sentences 20478 51672 Candidate P. Marks 32173 61282 Accuracy 98.8% 97.9% False Positives 201 750 False Negatives 171 506

Conclusions Achieved comparable (to state-of-the-art systems) accuracy with far less resources.