Download presentation
Presentation is loading. Please wait.
1
Pushpak Bhattacharyya CSE Dept., IIT Bombay
CS626/449 : Natural Language Processing, Speech and the Web/Topics in AI Lecture 31: POS Tagging (discussion to assist the CMU pronunciation dictionary assignment) Pushpak Bhattacharyya CSE Dept., IIT Bombay
2
Lexicon Example ^_ Some_ People_ Jump_ High_ ._
Lexicon/ Lexical Example Dictionary Tag Some A (Adjective) {Quantifier} People N (Noun) lot of people V (Verb) peopled the city with soldiers Jump V (Verb) he jumped high N (Noun) This was a good jump High R (Adverb) He jumped high A (Adjective) high mountain N (Noun) Bombay high; on a high
3
Bigram Assumption Best tag sequence = T* = argmax P(T|W)
= argmax P(T)P(W|T) (by Baye’s Theorem) P(T) = P(t0=^ t1t2 … tn+1=.) = P(t0)P(t1|t0)P(t2|t1t0)P(t3|t2t1t0) … P(tn|tn-1tn-2…t0)P(tn+1|tntn-1…t0) = P(t0)P(t1|t0)P(t2|t1) … P(tn|tn-1)P(tn+1|tn) = P(ti|ti-1) Bigram Assumption ∏ n+1 i = 0
4
Lexical Probability Assumption
P(W|T) = P(w0|t0-tn+1)P(w1|w0t0-tn+1)P(w2|w1w0t0-tn+1) … P(wn|w0-wn-1t0-tn+1)P(wn+1|w0-wnt0-tn+1) Assumption: A word is determined completely by its tag. This is inspired by speech recognition = P(wo|to)P(w1|t1) … P(wn+1|tn+1) = P(wi|ti) = P(wi|ti) (Lexical Probability Assumption) Thus, argmax P(T)P(W|T) = Equation ∏ n+1 i = 0 ∏ n+1 i = 1
5
Generative Model ^_^ People_N Jump_V High_R ._. Lexical Probabilities
Bigram Probabilities N A A This model is called Generative model. Here words are observed from tags as states. This is similar to HMM.
6
Bigram probabilities
7
Lexical Probability
8
Calculation from actual data
Corpus ^ Ram got many NLP books. He found them all very interesting. Pos Tagged ^ N V A N N . ^ N V N A R A .
9
Recording numbers ^ N V A R . 2 1
10
Probabilities ^ N V A R . 1 1/5 2/5 1/2 1/3
11
Compare with the Pronunciation Dictionary Assignment
Phoneme Example Translation AE at AE T AH hut HH AH T AO ought AO T AW cow K AW AY hide HH AY D B be B IY In POS tagging the Labels are already given on the words. The “alignment” of Words with labels are already Given. In the assignment the most Likely alignment is to be Discovered followed by the Best possible mapping.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.