Hindi POS Tagger By Naveen Sharma (02005010) Prabhu Sachin H. (05305901) Prateek Choudhary (02005016) Gaurav Meena (00005020)
Problem Definition and Challanges Pos Tagging Identifying lexical category of a word on the basis of its context in sentence. e.g. Shyam Khana Khayega Shyam[NN] Khana[NN] Khayega[V]. Challenges : Resolving Ambiguities Multiple suffix Multiple category Handling Unknown words
Approach Possible approaches Rule Based Stochastic Hybrid Take possible tags Use Disambiguation rules e.g If ( +1 A/ADV) Eliminate Non ADV tags. Stochastic Probability based Hybrid Use features of both rule based and stochastic Improved Accuracy
Rule Base Approach Basic Setup Components Rule Based Morphological Analyzer Takes word as input and generates all possible tags Stemmer Rule Generation Mainly literature Transformation Based.....Attempted
Algorithm for POS Tagging POS_TAGGER ( sentence s ){ w<- the first untagged word from the right in the sentence s if ( some word is untagged in s ){ X <- PPOS(w) /*X is the possible set of lexical categories that w can take*/ if (w is not the last word of the sentence) then X <- X ( Intesection) ∩ PREV(word immediately following w) for each element e in ( X ){ if ( w tagged as e obeys semantic constraint set ) {tag w as e and call POS_TAGGER( s )} else output the tagged sentence s}
Current Work Work Done Literature --- available Algos Corpora Transformation based rule generation Rule Based Tagging Corpora Limited...still searching
Future Schedule Stage Timeline Groundwork 10th - 25th March Implemenation/1st run 1st April Discussions/ Improvements 1st - 10th April Final run Demo
References Brill Eric. Transformation Based Error Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging. 21(4): 543-564, 1995 Ray P. R., Harish V., Sarkar S. and Basu A. Part of Speech Tagging and Local Word Grouping Techniques for Natural Language Parsing in Hindi. Proceedings of International Conference on Natural Language Processing (ICON 2003), Mysore, 2003.(pp 9 - 19)