Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging for Bengali with Hidden Markov Model Sandipan Dandapat, Sudeshna Sarkar Department of Computer Science & Engineering Indian Institute of Technology Kharagpur
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Machine Learning to Resolve POS Tagging HMM Supervised (DeRose,88; Mcteer,91; Brants,2000; etc.) Semi-supervised (Cutting,92; Merialdo,94; Kupiec,92; etc.) Maximum Entropy (Ratnaparkhi,96; etc.) TB(ED)L (Brill,92,94,95; etc.) Decision Tree (Black,92; Marquez,97; etc.)
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Our Approach HMM based Simplicity of the model Language Independence Reasonably good accuracy Data intensive Sparseness problem when extending order We are adapting first-order HMM
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur POS Tagging Schema Language Model Disambiguation Algorithm Raw text Tagged text Possible POS Class Restriction … POS tagging
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur POS Tagging: Our Approach First-order HMM Disambiguation Algorithm Raw text Tagged text Possible POS Class Restriction … POS tagging First order HMM: Current state depends on previous state
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur POS Tagging: Our Approach µ = (π,A,B) Disambiguation Algorithm Raw text Tagged text Possible POS Class Restriction … POS tagging Model Parameters First-order HMM
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur POS Tagging: Our Approach µ = (π,A,B) Disambiguation Algorithm Raw text Tagged text … POS tagging t i {T} or t i T MA (w i ) {T} : Set of all tags T MA (w i ) : Set of tags computed by Morphological Analyzer First-order HMM
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur POS Tagging: Our Approach µ = (π,A,B) Viterbi Algorithm Raw text Tagged text … POS tagging t i {T} or t i T MA (w i ) {T} : Set of all tags T MA (w i ) : Set of tags computed by Morphological Analyzer First-order HMM
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Disambiguation Algorithm Text: Tags: Where, t i {T}, w i {T} = Set of tags
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Disambiguation Algorithm Text: Tags: Where, t i T MA (w i ), w i {T} = Set of tags
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Learning HMM Parameters Supervised Learning ( HMM-S) Estimates three parameters directly from the tagged corpus
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Learning HMM Parameters Semi-supervised Learning (HMM-SS) Untagged data (observation) are used to find a model that most likely produce the observation sequence Initial model is created based on tagged training data Based on initial model and untagged data, update the model parameters New model parameters are estimated using Baum-Welch algorithm
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Smoothing and Unknown Word Hypothesis All emission and transition are not observed from the training data Add-one smoothing to estimate both emission and transition probabilities Not all words are known to Morphological Analyzer Assume open class grammatical categories
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Experiments Baseline Model Supervised bigram HMM (HMM-S) HMM-S HMM-S + IMA HMM-S + CMA Semi-supervised bigram HMM (HMM-SS) HMM-SS HMM-SS + IMA HMM-SS + CMA
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Data Used Tagged data: 3085 sentences ( ~ 41,000 words) Includes both the data in non-privileged and privileged mode Untagged corpus from CIIL: 11,000 sentences (100,000 words) – unclean To re-estimate the model parameters using Baum-Welch algorithm
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Tagset and Corpus Ambiguity Tagset consists of 27 grammatical classes Corpus Ambiguity Mean number of possible tags for each word Measured in the training tagged data DutchSpanishGermanEnglishFrenchBengali (Dermatas et al 1995)
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Results on Development set
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Results on Development set MethodAccuracy Baseline69.11 ACOPOST83.45 HMM-S74.53 HMM-S + IMA78.65 HMM-S + CMA88.83 HMM-SS73.77 HMM-SS + IMA77.98 HMM-SS + CMA89.65
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Error Analysis Actual Class Predicted Class % of total error % of class error NNCNN VRBVFM JJNN QFJJ RBJJ NLOCNN VNNVFM3.74.5
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Results on Test Set Tested on 458 sentences ( 5127 words) Precision: 84.32% Recall: 84.36% F β=1 : 84.34% TypePrecision(%)Recall (%)F β=1 Frequency SYM NEG PRP QFNUM Top 4 classes in terms of F-measure
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Results on Test Set Tested on 458 sentences ( 5127 words) Precision: 84.32% Recall: 84.36% F β=1 : 84.34% TypePrecision(%)Recall (%)F β=1 Frequency VJJ0000 NVB00028 JVB00012 INF Bottom 4 classes in terms of F-measure
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Further Improvement Uses suffix information to handle unknown words Calculates the probability of a tag, given the last m letters (suffix) of a word Each symbol emission probability of unknown word is normalized
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Further Improvement Accuracy reflected on development set
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Conclusion and Future Scope Morphological restriction on tags gives an efficient tagging model even when small labeled text is available Semi-supervised learning performs better compare to supervised learning Better adjustment of emission probability can be adopted for both unknown words and less frequent words Higher order Markov model can be adopted
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Thank You