Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.
Part-Of-Speech Tagging and Chunking using CRF & TBL
BİL711 Natural Language Processing
Part-of-speech tagging. Parts of Speech Perhaps starting with Aristotle in the West (384–322 BCE) the idea of having parts of speech lexical categories,
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
Statistical NLP: Lecture 11
Hidden Markov Model (HMM) Tagging  Using an HMM to do POS tagging  HMM is a special case of Bayesian inference.
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
POS Tagging & Chunking Sambhav Jain LTRC, IIIT Hyderabad.
Tagging with Hidden Markov Models. Viterbi Algorithm. Forward-backward algorithm Reading: Chap 6, Jurafsky & Martin Instructor: Paul Tarau, based on Rada.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Part-of-speech Tagging cs224n Final project Spring, 2008 Tim Lai.
Part-of-Speech (POS) tagging See Eric Brill “Part-of-speech tagging”. Chapter 17 of R Dale, H Moisl & H Somers (eds) Handbook of Natural Language Processing,
Tagging – more details Reading: D Jurafsky & J H Martin (2000) Speech and Language Processing, Ch 8 R Dale et al (2000) Handbook of Natural Language Processing,
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור שישי Viterbi Tagging Syntax עידו.
Statistical techniques in NLP Vasileios Hatzivassiloglou University of Texas at Dallas.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.
Part of speech (POS) tagging
1 Complementarity of Lexical and Simple Syntactic Features: The SyntaLex Approach to S ENSEVAL -3 Saif Mohammad Ted Pedersen University of Toronto, Toronto.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Sequence labeling and beam search LING 572 Fei Xia 2/15/07.
Word classes and part of speech tagging Chapter 5.
Announcements Main CSE file server went down last night –Hand in your homework using ‘submit_cse467’ as soon as you can – no penalty if handed in today.
(Some issues in) Text Ranking. Recall General Framework Crawl – Use XML structure – Follow links to get new pages Retrieve relevant documents – Today.
BIOI 7791 Projects in bioinformatics Spring 2005 March 22 © Kevin B. Cohen.
Natural Language Understanding
Albert Gatt Corpora and Statistical Methods Lecture 9.
Part-of-Speech Tagging
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
CS 4705 Hidden Markov Models Julia Hirschberg CS4705.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging for Bengali with Hidden Markov Model Sandipan Dandapat,
Albert Gatt Corpora and Statistical Methods Lecture 10.
인공지능 연구실 정 성 원 Part-of-Speech Tagging. 2 The beginning The task of labeling (or tagging) each word in a sentence with its appropriate part of speech.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
Transformation-Based Learning Advanced Statistical Methods in NLP Ling 572 March 1, 2012.
13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.
Word classes and part of speech tagging Chapter 5.
Tokenization & POS-Tagging
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Hidden Markov Models & POS Tagging Corpora and Statistical Methods Lecture 9.
Word classes and part of speech tagging 09/28/2004 Reading: Chap 8, Jurafsky & Martin Instructor: Rada Mihalcea Note: Some of the material in this slide.
Albert Gatt LIN3022 Natural Language Processing Lecture 7.
CSA3202 Human Language Technology HMMs for POS Tagging.
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking.
Part-of-speech tagging
Stochastic and Rule Based Tagger for Nepali Language Krishna Sapkota Shailesh Pandey Prajol Shrestha nec & MPP.
Albert Gatt Corpora and Statistical Methods. Acknowledgement Some of the examples in this lecture are taken from a tutorial on HMMs by Wolgang Maass.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
POS Tagging1 POS Tagging 1 POS Tagging Rule-based taggers Statistical taggers Hybrid approaches.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
1 COMP790: Statistical NLP POS Tagging Chap POS tagging Goal: assign the right part of speech (noun, verb, …) to words in a text “The/AT representative/NN.
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
Part-Of-Speech Tagging Radhika Mamidi. POS tagging Tagging means automatic assignment of descriptors, or tags, to input tokens. Example: “Computational.
CSCI 5832 Natural Language Processing
Presentation transcript:

Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002

Part-of-Speech Tagging Tagging is the task of labeling (or tagging) each word in a sentence with its appropriate part of speech. The[AT] representative[NN] put[VBD] chairs[NNS] on[IN] the[AT] table[NN]. Tagging is a case of limited syntactic disambiguation. Many words have more than one syntactic category. Tagging has limited scope: we just fix the syntactic categories of words and do not do a complete parse.

Information Souces in Tagging How do we decide the correct POS for a word? Syntagmatic Information : Look at tags of other words in the context of the word we are interested in. Lexical Information : Predicting a tag based on the word concerned. For words with a number of POS, they usually occur used as one particular POS.

Markov Model Taggers We look at the sequence of tags in a text as a Markov chain. Limited horizon. P(X i+1 = t j |X 1,…,X i ) = P(X i+1 = t j |X i ) Time invariant (stationary). P(X i+1 = t j |X i ) = P(X 2 = t j |X 1 )

The Visible Markov Model Tagging Algorithm The MLE of tag t k following tag t j is obtained from training corpora: a kj =P(t k |t j ) = C(t k, t j )/C(t j ). The probability of a word being emitted by a tag: b kjl P(w l |t j ) = C(w l, w j )/C(w j ). The best tagging sequence t 1,n for a sentence w 1,n : arg max P(t 1,n | w 1,n ) = arg max ________________ P(w 1,n ) t 1,n P (w 1,n | t 1,n )P(t 1,n ) = arg max P (w 1,n | t 1,n )P(t 1,n ) t 1,n = P P (w i | t i )P(t i |t i-1 ) n i=1

The Viterbi Algorithm We determine the optimal tags for a sentence 1. comment: Given a sentence of length n 2. Comment: Initialization 3. d 1 (PERIOD)= d 1 (t)=0.0 for t!= PERIOD 5. comment: Induction 6. for I:=1 to n step 1 do 7. for all tags t j do 8. d i+1 (t j ):=max 1[ k [ T [d i (t k ) P(w i+1 |t j ) P(t k | t k )] 9. Y i+1 (t j ):=argmax 1[ k [ T [d i (t k ) P(w i+1 |t j ) P(t k | t k )] 10. end 11. End 12. Termination and path-readout.

Unknown Words Some tags are more common than others (for example a new word can be most likely verbs, nouns etc. but not prepositions or articles). Use features of the word (morphological and other cues, for example words ending in –ed are likely to be past tense forms or past participles). Use context.

Hidden Markov Model Taggers Often a large tagged training set is not available. We can use an HMM to learn the regularities of tag sequences in this case. The states of the HMM correspond to the tags and the output alphabet consists of words in dictionary or classes of words.

Initialization of HMM Tagger (Jelinek, 1985) Output alphabet consists of words. Emission probabilities are given by: b j.l = b j *.l C(w l ) _____________ S w m b j *.m C(w m ) b j *.l = 0 if t j is not a part of speech allowed for w l 1Otherwise, where T(w l ) is the number of tags allowed for w l T(w l ) ___ the sum is over all words w m in the dictionary

Initialization (Cont.) (Kupiec, 1992) Output alphabet consists of word equivalence classes, i.e., metawords u L, where L is a subset of the integers from 1 to T, where T is the number of different tags in the tag set) b j.L = b j *.l C(u L ) _____________ S u L’ b j *.L’ C(u L’ ) b j *.L = 0 if j is not in L 1Otherwise, where |L| is the number of indices in L. |L| ___ the sum in the denom. is over all the metawords u L’

Training the HMM Once the initialization is completed, the HMM is trained using the Forward-Backward algorithm. Use the Viterbi algorithm. Tagging using the HMM

Transformation-Based Tagging Exploits wider range of lexical and syntactic regularities. Condition the tags on preceding words not just preceding tags. Use more context than bigram or trigram.

Transformations A transformation consists of two parts, a triggering environment and a rewrite rule. Examples of some transformations learned in transformation- based tagging Source tag Target tag triggering environment NN VB previous tag is TO VBP VB one of the previous three tags is MD JJR RBR next tag is JJ VBP VB one of the previous two words is n’t

Learning Algorithm The learning algorithm selects the best transformations and determines their order of application Initially tag each word with its most frequent tag. Iteratively we choose the transformation that reduces the error rate most. We stop when there is no transformation left that reduces the error rate by more than a prespecified threshold.

Tagging Accuracy Ranges from 95%-97% Depends on: Amount of training data available. The tag set. Difference between training corpus and dictionary and the corpus of application. Unknown words in the corpus of application.

Applications of Tagging Partial parsing : syntactic analysis Information Extraction : tagging and partial parsing help identify useful terms and relationships between them. Information Retrieval : noun phrase recognition and query-document matching based on meaningful units rather than individual terms. Question Answering : analyzing a query to understand what type of entity the user is looking for and how it is related to other noun phrases mentioned in the question.