Download presentation
1
PoS tagging and Chunking with HMM and CRF
Pranjal Awasthi, Delip Rao, Ravindran Balaraman Dept. Of CSE IIT Madras
2
Outline Overview of the system PoS tagging with HMM Chunking with CRF
Results Summary
3
Aim: To leverage existing tools and algorithms (for English)
Overview of the system Aim: To leverage existing tools and algorithms (for English) for the NLPAI task Tools used: TnT tagger, TBL, MALLET
4
Overview of the system TNT CRF (MALLET) + TBL PoS Tagging Chunking
5
The TnT tagger (Brants, 2000)
A Second Order Hidden Markov Model based tagger Used for English and other languages On NLPAI dataset, TnT alone gave F1=78.9 Why TnT? PoS tagging a sequence labeling task HMM, CRFs are good candidates
6
Poor performance of CRFs in PoS tagging
For NLPAI dataset F1 = 69.4 Features used: wi-1, wi-1wi, wi+1, wiwi+1 Linear chain CRF was used (MALLET) Reasons for poor performance Large number of PoS tags (26) compared to Chunking Selection of features Type of CRF?
7
Transformation Based Learning (Brill, 1995)
Added as a post processing step to “correct” TnT output Idea: Derive correction rules during training based on observing what has gone wrong Apply these rules for testing
8
Transformation Based Learning (contd …)
Use of TnT improved F1 by 1% TnT is sensitive to the templates used Possible improvements on template selection Training time can be long unless indexing is used
9
Summary of PoS tagging Results
Model Precision Recall F1 CRF 69.40 TNT 78.94 TNT+TBL 80.74
10
Chunking with CRF Based on (Sha & Periera, 2003)
Using SimpleTagger provided with MALLET Chunking accuracies Chunking with F1 Reference PoS tags 89.69 Generated PoS tags 79.58
11
Summary Demonstrated the use of off-the-shelf software for Tagging and Chunking Only code written: TBL + glue scripts Overall PoS F1 = and Chunk F1 = 79.58 Have we “hit the wall” in pure ML based tools Not sure yet!
12
Thanks!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.