Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6801 עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.

Slides:

Advertisements

Similar presentations

Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.

Advertisements

Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.

HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:

Part of Speech Tagging The DT students NN went VB to P class NN Plays VB NN well ADV NN with P others NN DT Fruit NN flies NN VB NN VB like VB P VB a DT.

Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.

1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.

Hidden Markov Models Eine Einführung.

Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.

 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.

Statistical NLP: Lecture 11

Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili University of Tehran.

Hidden Markov Models Theory By Johan Walters (SR 2003)

Hidden Markov Model (HMM) Tagging  Using an HMM to do POS tagging  HMM is a special case of Bayesian inference.

Statistical NLP: Hidden Markov Models Updated 8/12/2005.

Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕.

1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.

POS Tagging & Chunking Sambhav Jain LTRC, IIIT Hyderabad.

Albert Gatt Corpora and Statistical Methods Lecture 8.

Tagging with Hidden Markov Models. Viterbi Algorithm. Forward-backward algorithm Reading: Chap 6, Jurafsky & Martin Instructor: Paul Tarau, based on Rada.

Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.

FSA and HMM LING 572 Fei Xia 1/5/06.

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור עשר Chart Parsing (cont) Features.

Part-of-speech Tagging cs224n Final project Spring, 2008 Tim Lai.

Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.

S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter

1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

Forward-backward algorithm LING 572 Fei Xia 02/23/06.

POS Tagging HMM Taggers (continued). Today Walk through the guts of an HMM Tagger Address problems with HMM Taggers, specifically unknown words.

Elze de Groot1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור שישי Viterbi Tagging Syntax עידו.

Sequence labeling and beam search LING 572 Fei Xia 2/15/07.

Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.

Word classes and part of speech tagging Chapter 5.

Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.

Fall 2001 EE669: Natural Language Processing 1 Lecture 9: Hidden Markov Models (HMMs) (Chapter 9 of Manning and Schutze) Dr. Mary P. Harper ECE, Purdue.

Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network Kristina Toutanova, Dan Klein, Christopher Manning, Yoram Singer Stanford University.

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור תשע Bottom Up Parsing עידו דגן.

Albert Gatt Corpora and Statistical Methods Lecture 9.

Part-of-Speech Tagging

1 Persian Part Of Speech Tagging Mostafa Keikha Database Research Group (DBRG) ECE Department, University of Tehran.

Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.

CS 4705 Hidden Markov Models Julia Hirschberg CS4705.

Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging for Bengali with Hidden Markov Model Sandipan Dandapat,

Albert Gatt Corpora and Statistical Methods Lecture 10.

인공지능 연구실 정 성 원 Part-of-Speech Tagging. 2 The beginning The task of labeling (or tagging) each word in a sentence with its appropriate part of speech.

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)

Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.

Sequence Models With slides by me, Joshua Goodman, Fei Xia.

Transformation-Based Learning Advanced Statistical Methods in NLP Ling 572 March 1, 2012.

13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.

Tokenization & POS-Tagging

Hidden Markov Models & POS Tagging Corpora and Statistical Methods Lecture 9.

Albert Gatt LIN3022 Natural Language Processing Lecture 7.

Albert Gatt Corpora and Statistical Methods. POS Tagging Assign each word in continuous text a tag indicating its part of speech. Essentially a classification.

CSA3202 Human Language Technology HMMs for POS Tagging.

CS Statistical Machine learning Lecture 24

Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.

Dongfang Xu School of Information

Albert Gatt Corpora and Statistical Methods. Acknowledgement Some of the examples in this lecture are taken from a tutorial on HMMs by Wolgang Maass.

Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)

Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.

Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.

N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.

Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.

Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.

Hidden Markov Models BMI/CS 576

N-Gram Model Formulas Word sequences Chain rule of probability

Presentation transcript:

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו דגן המחלקה למדעי המחשב אוניברסיטת בר אילן

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Supervised Learning Scheme Classification Model “Labeled” Examples New Examples Classifications Training Algorithm Classification Algorithm

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Transformational Based Learning (TBL) for Tagging Introduced by Brill (1995) Can exploit a wider range of lexical and syntactic regularities via transformation rules – triggering environment and rewrite rule Tagger: –Construct initial tag sequence for input – most frequent tag for each word –Iteratively refine tag sequence by applying “transformation rules” in rank order Learner: –Construct initial tag sequence for the training corpus –Loop until done: Try all possible rules and compare to known tags, apply the best rule r* to the sequence and add it to the rule ranking

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Some examples 1. Change NN to VB if previous is TO –to/TO conflict/NN with  VB 2. Change VBP to VB if MD in previous three –might/MD vanish/VBP  VB 3. Change NN to VB if MD in previous two –might/MD reply/NN  VB 4. Change VB to NN if DT in previous two –the/DT reply/VB  NN

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Transformation Templates Specify which transformations are possible For example: change tag A to tag B when: 1.The preceding (following) tag is Z 2.The tag two before (after) is Z 3.One of the two previous (following) tags is Z 4.One of the three previous (following) tags is Z 5.The preceding tag is Z and the following is W 6.The preceding (following) tag is Z and the tag two before (after) is W

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Lexicalization New templates to include dependency on surrounding words (not just tags): Change tag A to tag B when: 1.The preceding (following) word is w 2.The word two before (after) is w 3.One of the two preceding (following) words is w 4.The current word is w 5.The current word is w and the preceding (following) word is v 6.The current word is w and the preceding (following) tag is X (Notice: word-tag combination) 7.etc…

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Initializing Unseen Words How to choose most likely tag for unseen words? Transformation based approach: –Start with NP for capitalized words, NN for others –Learn “morphological” transformations from: Change tag from X to Y if: 1.Deleting prefix (suffix) x results in a known word 2.The first (last) characters of the word are x 3.Adding x as a prefix (suffix) results in a known word 4.Word W ever appears immediately before (after) the word 5.Character Z appears in the word

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Unannotated Input Text Annotated Text Ground Truth for Input Text Rules Learning Algorithm TBL Learning Scheme Setting Initial State

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Greedy Learning Algorithm Initial tagging of training corpus – most frequent tag per word At each iteration: –Identify rules that fix errors and compute “error reduction” for each transformation rule: #errors fixed - #errors introduced –Find best rule; If error reduction greater than a threshold (to avoid overfitting): Apply best rule to training corpus Append best rule to ordered list of transformations

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Stochastic POS Tagging POS tagging: For a given sentence W = w 1 …w n Find the matching POS tags T = t 1 …t n In a statistical framework: T' = arg max P(T|W) T

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Bayes’ Rule Words are independent of each other A word’s identity depends only on its own tag Markovian assumptions Denominator doesn’t depend on tags Chaining rule Notation: P(t 1 ) = P(t 1 | t 0 )

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books The Markovian assumptions Limited Horizon –P(X i+1 = t k |X1,…,X i ) = P(X i+1 = t k | X i ) Time invariant –P(X i+1 = t k | X i ) = P(X j+1 = t k | X j )

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Maximum Likelihood Estimations In order to estimate P(w i |t i ), P(t i |t i-1 ) we can use the maximum likelihood estimation –P(w i |t i ) = c(w i,t i ) / c(t i ) –P(t i |t i-1 ) = c(t i-1 t i ) / c(t i-1 ) Notice estimation for i=1

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Unknown Words Many words will not appear in the training corpus. Unknown words are a major problem for taggers (!) Solutions – –Incorporate Morphological Analysis –  Consider words appearing once in training data as UNKOWNs

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books “Add-1/Add-Constant” Smoothing

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Smoothing for Tagging For P(t i |t i-1 ) Optionally – for P(t i |t i-1 )

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Viterbi Finding the most probable tag sequence can be done with the viterbi algorithm. No need to calculate every single possible tag sequence (!)

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Hmms Assume a state machine with –Nodes that correspond to tags –A start and end state –Arcs corresponding to transition probabilities - P(t i |t i-1 ) –A set of observations likelihoods for each state - P(w i |t i )

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books NN VBZ NNS AT VB RB P(like)=0.2 P(fly)=0.3 … P(eat)= P(likes)=0.3 P(flies)=0.1 … P(eats)=0.5 P(the)=0.4 P(a)=0.3 P(an)=0.2 …

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books HMMs An HMM is similar to an Automata augmented with probabilities Note that the states in an HMM do not correspond to the input symbols. The input symbols don’t uniquely determine the next state.

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books HMM definition HMM=(S,K,A,B) –Set of states S={s 1,…s n } –Output alphabet K={k 1,…k n } –State transition probabilities A={a ij } i,j  S –Symbol emission probabilities B=b(i,k) i  S,k  K –start and end states (Non emitting) Alternatively: initial state probabilities Note: for a given i-  a ij =1 &  b(i,k)=1

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Why Hidden? Because we only observe the input - the underlying states are hidden Decoding: The problem of part-of-speech tagging can be viewed as a decoding problem: Given an observation sequence W=w 1,…,w n find a state sequence T=t 1,…,t n that best explains the observation.

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Homework