Classifying Parts of Speech Based on Sparse Data Katherine Brainard.

Classifying Parts of Speech Based on Sparse Data Katherine Brainard

The Problem Sparse data has little contextual information Many words fall into this category Automatic PoS taggers and finders are useful

Approach Relatively easy to learn categories from frequent words Infrequent words often more “ regular ” than their common counterparts Learn frequent words, then use these to classify infrequent Uses clustering for the frequent words

Evaluating the Model Somewhat tricky - want eval function that doesn ’ t encourage degenerate behavior Evaluation separated from clustering Used both bigram probability model and comparison with already-tagged data

Results Improvement of ~36% from delaying processing of data About 2.5 times better than classifying infrequent words into one lump Using just contextual data produced the best performance

Classifying Parts of Speech Based on Sparse Data Katherine Brainard.

Similar presentations

Presentation on theme: "Classifying Parts of Speech Based on Sparse Data Katherine Brainard."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Classifying Parts of Speech Based on Sparse Data Katherine Brainard.

Similar presentations

Presentation on theme: "Classifying Parts of Speech Based on Sparse Data Katherine Brainard."— Presentation transcript:

Similar presentations

About project

Feedback