Presentation is loading. Please wait.

Presentation is loading. Please wait.

Part of Speech Tagging The DT students NN went VB to P class NN Plays VB NN well ADV NN with P others NN DT Fruit NN flies NN VB NN VB like VB P VB a DT.

Similar presentations


Presentation on theme: "Part of Speech Tagging The DT students NN went VB to P class NN Plays VB NN well ADV NN with P others NN DT Fruit NN flies NN VB NN VB like VB P VB a DT."— Presentation transcript:

1 Part of Speech Tagging The DT students NN went VB to P class NN Plays VB NN well ADV NN with P others NN DT Fruit NN flies NN VB NN VB like VB P VB a DT banana NN Some examples: * ? *

2 Addresses the ambiguity problem –Use probabilities to find the more likely tag sequence Some popular approaches: –Transformational tagger –Maximum Entropy –Hidden Markov Model Probabilistic POS Tagging

3 Problem Setup There are M types of POS tags –Tag set: {t_1,..,t_M}. The word vocabulary size is V –Vocabulary set: {w_1,..,w_V}. We have a word sequence of length n: = w 1,w 2 …w n Want to find the best sequence of POS tags: = t 1,t 2 …t n

4 Noisy Channel Framework P( | ) is awkward to estimate directly, but by Bayes Rule: Can cast the problem in terms of the noisy channel model –POS tag sequence is the source –Through the “noisy channel,” the sequence is transformed into the observed English words.

5 Model for POS Tagging Need to compute Pr( | ) and Pr( ) Make Markov assumptions to simplify: –Generation of each word w i, only depends on its tag t i, and not on previous words –Generation of each tag t i only depends on its immediate predecessor t i-1

6 POS Model in Terms of HMM The states of the HMM represent POS tags The output alphabet corresponds to the English vocabulary [notation:t i is the ith tag in a tag sequence, t_i represents the ith tag in the tag set {t_1,..,t_M}]  i : [p(t_i|*start*)] prob of starting on state t_i a ij : [p(t_j|t_i)] prob of going from t_i to t_j b jk : [p(w_k|t_j)] prob of output vocab w_k at state t_j

7 Learning the Parameters with Annotated Corpora Values for model parameters are unknown –Suppose we have pairs of sequences: = w 1,w 2 …w n = t 1,t 2 …t n such that are the correct tags for How to estimate the parameters? Max Likelihood Estimate –Just count co-occurrences

8 Learning the Parameters without Annotated Corpora Values for model still unknown, but we have no annotated tags for the word sequences Need to search through the space of all possible parameters to find good values for the parameters. Expectation Maximization –Learn the parameters through iterative refinement –A form of greedy heuristic –Guaranteed to find a locally optimal solution but it may not be *the best* solution

9 The EM Algorithm (sketch) Given (as training data): word sequences Initialize all the parameters of the model to some random values Repeat until Convergence E-Step Compute the expected likelihood of generating all training sequences using the current model M-Step Update the parameters of the model to maximize the likelihood of getting

10 An Inefficient HMM Training Algorithm Initialize all parameters ( , A, B) to some random values Repeat until convergence clear all count table entries for every training sequence Pr( ) := 0 for all possible sequence compute Pr( ) and Pr( | ) Pr( ) += Pr( | )Pr( ) for all possible sequence compute Pr( | ) /* Pr( | )= Pr( | )Pr( )/Pr( ) */ Count(t 1 |*start*) += Pr( | ) for each position s = 1..n /* update all expected counts */ Count(t s+1 |t s ) += Pr( | ) Count(w s |t s ) += Pr ( | ) for all tags t_i  i := Count(t_i|*start*)/Count(*start*) for all pairs of tags t_i and t_j: a ij := Count(t_j|t_i)/Count(t_i) /* use expected counts collected */ for all pairs of word tag pair t_j, w_k: b jk := Count(w_k|t_j)/Count(t_j) E-Step M-Step

11 Forward & Backward Equations Forward:  i (s) –Pr(w 1,w 2 …w s,t_i) –Prob of outputting prefix w 1..w s (through all possible paths) and land on state (tag) t_i at time (position) s. –Base case: –Inductive step: Backward:  i (s) –Pr(w s+1, …w n |t_i) –Prob of outputting suffix w s+1..w n (through all possible paths) knowing that we must be on state t_i at time (position) s. –Base case: –Inductive step: Note: I used [w s ] to denote some index k, such that w s = w_k

12 More Fun with Forward & Backward Equations Can use  to compute prob of word sequence Pr( ) for any time/position step s: Can also compute prob of leaving state t_i at time step s Can compute prob of going from state t_i to t_j at time s

13 Update Rules for Parameter Re-Estimation Using the probability quantities defined in the previous slide (based on forward and backward functions), we can get new values for the HMM parameters: Prob of leaving state t_i at time step 1 Total expected count of going from t_i to t_j Total expected count of leaving t_i Total expected count of t_i generating w_k Total expected count of leaving t_i

14 Efficient Training of HMM Init same as before Repeat E-Step: Compute all forward and backward values:  i (s),  i (s) /* where i=1..M, s=1..n */ M-Step: update all parameters using the update rules in the previous slide Until Convergence


Download ppt "Part of Speech Tagging The DT students NN went VB to P class NN Plays VB NN well ADV NN with P others NN DT Fruit NN flies NN VB NN VB like VB P VB a DT."

Similar presentations


Ads by Google