인공지능 연구실 정 성 원 Part-of-Speech Tagging. 2 The beginning The task of labeling (or tagging) each word in a sentence with its appropriate part of speech.

Slides:

Advertisements

Similar presentations

Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.

Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.

Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)

CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.

Part-Of-Speech Tagging and Chunking using CRF & TBL

Part of Speech Tagging The DT students NN went VB to P class NN Plays VB NN well ADV NN with P others NN DT Fruit NN flies NN VB NN VB like VB P VB a DT.

Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.

1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.

Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.

Ch 9. Markov Models 고려대학교 자연어처리연구실 한 경 수

Statistical NLP: Lecture 11

Hidden Markov Model (HMM) Tagging  Using an HMM to do POS tagging  HMM is a special case of Bayesian inference.

Statistical NLP: Hidden Markov Models Updated 8/12/2005.

Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕.

Albert Gatt Corpora and Statistical Methods Lecture 8.

Tagging with Hidden Markov Models. Viterbi Algorithm. Forward-backward algorithm Reading: Chap 6, Jurafsky & Martin Instructor: Paul Tarau, based on Rada.

Part II. Statistical NLP Advanced Artificial Intelligence (Hidden) Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.

Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.

FSA and HMM LING 572 Fei Xia 1/5/06.

POS Tagging Markov Models. POS Tagging Purpose: to give us explicit information about the structure of a text, and of the language itself, without necessarily.

Part-of-speech Tagging cs224n Final project Spring, 2008 Tim Lai.

Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.

Tagging – more details Reading: D Jurafsky & J H Martin (2000) Speech and Language Processing, Ch 8 R Dale et al (2000) Handbook of Natural Language Processing,

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור שישי Viterbi Tagging Syntax עידו.

Statistical techniques in NLP Vasileios Hatzivassiloglou University of Texas at Dallas.

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.

Part of speech (POS) tagging

Word classes and part of speech tagging Chapter 5.

Fall 2001 EE669: Natural Language Processing 1 Lecture 9: Hidden Markov Models (HMMs) (Chapter 9 of Manning and Schutze) Dr. Mary P. Harper ECE, Purdue.

BIOI 7791 Projects in bioinformatics Spring 2005 March 22 © Kevin B. Cohen.

Natural Language Understanding

Albert Gatt Corpora and Statistical Methods Lecture 9.

Part-of-Speech Tagging

Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.

1 Persian Part Of Speech Tagging Mostafa Keikha Database Research Group (DBRG) ECE Department, University of Tehran.

Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.

Graphical models for part of speech tagging

CS 4705 Hidden Markov Models Julia Hirschberg CS4705.

Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging for Bengali with Hidden Markov Model Sandipan Dandapat,

Albert Gatt Corpora and Statistical Methods Lecture 10.

Natural Language Processing Lecture 6 : Revision.

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 3 (10/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Statistical Formulation.

Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.

Transformation-Based Learning Advanced Statistical Methods in NLP Ling 572 March 1, 2012.

13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.

Tokenization & POS-Tagging

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.

Chapter 23: Probabilistic Language Models April 13, 2004.

Hidden Markov Models & POS Tagging Corpora and Statistical Methods Lecture 9.

Albert Gatt LIN3022 Natural Language Processing Lecture 7.

Albert Gatt Corpora and Statistical Methods. POS Tagging Assign each word in continuous text a tag indicating its part of speech. Essentially a classification.

CSA3202 Human Language Technology HMMs for POS Tagging.

CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.

February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking.

Dongfang Xu School of Information

Albert Gatt Corpora and Statistical Methods. Acknowledgement Some of the examples in this lecture are taken from a tutorial on HMMs by Wolgang Maass.

Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)

Stochastic Methods for NLP Probabilistic Context-Free Parsers Probabilistic Lexicalized Context-Free Parsers Hidden Markov Models – Viterbi Algorithm Statistical.

Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.

N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.

Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.

CSCI 5832 Natural Language Processing

N-Gram Model Formulas Word sequences Chain rule of probability

Presentation transcript:

인공지능 연구실 정 성 원 Part-of-Speech Tagging

2 The beginning The task of labeling (or tagging) each word in a sentence with its appropriate part of speech. –The representative put chairs on the table AT NN VBD NNS IN AT NN Using Brown/Penn tag sets A problem of limited scope –Instead of constructing a complete parse –Fix the syntactic categories of the word in a sentence Tagging is a limited but useful application. –Information extraction –Question and answering –Shallow parsing

3 The Information Sources in Tagging Syntagmatic: look at the tags assigned to nearby words; some combinations are highly likely while others are highly unlikely or impossible –ex) a new play –AT JJ NN –AT JJ VBP Lexical : look at the word itself. (90% accuracy just by picking the most likely tag for each word) –Verb is more likely to be a noun than a verb

4 Notation w i the word at position i in the corpus t i the tag of w i w i,i+m the words occurring at positions i through i+m t i,i+m the tags t i … t i+m for w i … w i+m w l the l th word in the lexicon t j the j th tag in the tag set C(w l ) the number of occurrences of w l in the training set C(t j )the number of occurrences of t j in the training set C(t j,t k )the number of occurrences of t j followed by t k C(w l,t j )the number of occurrences of w l that are tagged as t j Tnumber of tags in tag set Wnumber of words in the lexicon nsentence length

5 The Probabilistic Model (I) The sequence of tags in a text as Markov chain. –A word’s tag only depends on the previous tag (Limited horizon) –Dependency does not change over time (Time invariance) compact notation : Limited Horizon Property

6 The Probabilistic Model (II) Maximum likelihood estimate tag following

7 The Probabilistic Model (III) (We define P(t 1 |t 0 )=1.0 to simplify our notation) The final equation

8 The Probabilistic Model (III) Algorithm for training a Visible Markov Model Tagger Syntagmatic Probabilities: for all tags t j do for all tags t k do P(t k | t j )=C(t j, t k )/C(t j ) end Lexical Probabilities: for all tags t j do for all words w l do P(w l | t j )=C(w l, t j )/C(t j ) end

9 The Probabilistic Model (IV) Second tag First tagATBEZINNNVBPERIOD AT BEZ IN NN VB PERIOD

10 The Probabilistic Model (V) ATBEZINNNVBPERIOD bear is move on president progress the

11 The Viterbi algorithm comment : Given: a sentence of length n comment : Initialization δ 1 (PERIOD) = 1.0 δ 1 (t) = 0.0 for t ≠ PERIOD comment : Induction for i := 1 to n step 1 do for all tags tj do δ i+1 (t j ) := max 1<=k<=T [δ i (t k )*P(w i+1 |t j )*P(t j |t k )] ψ i+1 (t j ) := argmax 1<=k<=T [δ i (t k )*P(w i+1 |t j )*P(t j |t k )] end comment : Termination and path-readout X n+1 = argmax 1<=j<=T δ n+1 (j) for j := n to 1 step – 1 do X j = ψ j+1 (X j+1 ) end P(X 1, …, X n ) = max 1<=j<=T δ n+1 (t j )

12 Variations (I) Unknown words –Unknown words are a major problem for taggers –The simplest model for unknown words Assume that they can be of any part of speech –Use morphological information Past tense form : words ending in –ed –Capitalized

13 Variations (II) Trigram taggers –The basic Markov Model tagger = bigram tagger –two tag memory –disambiguate more cases Interpolation and variable memory –trigram tagger may make worse pridictions than a bigram tagger –linear interpolation Variable Memory Markov Model

14 Variations (III) Smoothing Reversibility –Markov model decodes from left to right = decodes from right to left K l is the number of possible parts of speech of w l

15 Variations (IV) Maximum Likelihood: Sequence vs. tag by tag –Viterbi Alogorithm : maximize P(t 1,n |w 1,n ) –Consider : maximize P(t i |w 1,n ) for all i which amounts to summing over different tag sequance –ex) Time flies like a arrow. a. NN VBZ RB AT NN.P(.) = 0.01 b. NN NNS VB AT NN.P(.) = 0.01 c. NN NNS RB AT NN.P(.) = d. NN VBZ VB AT NN.P(.) = 0 –one error does not affect the tagging of other words

16 Applying HMMs to POS tagging(I) If we have no training data, we can use a HMM to learn the regularities of tag sequences. HMM consists of the following elements –a set of states ( = tags ) –an output alphabet ( words or classes of words ) –initial state probabilities –state transition probabilities –symbol emission probabilities

17 Applying HMMs to POS tagging(II) Jelinek’s method –b j.l : probability that word (or word class) l is emitted by tag j

18 Applying HMMs to POS tagging(III) Kupiec’s method |L| is the number of indices in L

19 Transformation-Based Learning of Tags Markov assumption are too crude → transformation-based tagging Exploit a wider range An order of magnitude fewer decisions Two key components –a specification of which ‘error-correcting’ transformations are admissible –The learning algorithm

20 Transformation(I) A triggering environment A rewrite rule –Form t 1 →t 2 : replace t 1 by t 2

21 Transformation(II) environments can be conditioned –combination of words and tags Morphology-triggered transformation –ex) Replace NN by NNS if the unknown word’s suffix is -s Source tagTarget TagTrigging environment NNVBprevious tag is TO VBPVBone of the previous three tags is MD JJRRBRnext tag is JJ VBPVBone of the previous two words is n’t

22 The learning algorithm C 0 := corpus with each word tagged with its most frequent tag for k:=0 step 1 do ν:=the transformation u i that minimizes E(u i (C k )) if (E(C k )-E(ν(C k ))) < Є then break fi C k+1 := ν(C k ) τ k+1 := ν end Output sequence: τ 1, …, τ k

23 Relation to other models Decision trees –similarity with Transformation-based learning a series of relableing –difference with Transformation-based learning split at each node in a decision tree different sequence of transformation for each node Probabilistic models in general

24 Automata Transformation-based tagging has a rule component, it also has a quantitative component. Once learning is complete, transformation-based tagging is purely symbolic Transformation-based tagger can be converted into another symbolic object Roche and Schobes(1995) : finite state transducer Advantage : speed

25 Other Method, Other Languages Other approaches to tagging –In chapter 16 Languages other than English –In many other languages, word order is much freer –The rich inflections of a word contribute more information about part of speech

26 Tagging accuracy 95%~97% when calculated over all words Considerable factors –The amount of training data available –The tag set –The difference between training set and test set –Unknown words a ‘dump’ tagger –Always chooses a word’s most frequent tag –Accuracy of about 90% EngCG

27 Applications of tagging Benefit from syntactically disambiguated text Partial Parsing –Finding none phrases of sentence Information Extraction –Finding value for the predefined slots of a template –Finding good indexing term in information retrieval Question Answering –Returning an appropriate noun such as a location, a person, or a date