Download presentation
Presentation is loading. Please wait.
1
Classifying Parts of Speech Based on Sparse Data Katherine Brainard
2
The Problem Sparse data has little contextual information Many words fall into this category Automatic PoS taggers and finders are useful
3
Approach Relatively easy to learn categories from frequent words Infrequent words often more “ regular ” than their common counterparts Learn frequent words, then use these to classify infrequent Uses clustering for the frequent words
4
Evaluating the Model Somewhat tricky - want eval function that doesn ’ t encourage degenerate behavior Evaluation separated from clustering Used both bigram probability model and comparison with already-tagged data
5
Results Improvement of ~36% from delaying processing of data About 2.5 times better than classifying infrequent words into one lump Using just contextual data produced the best performance
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.