Download presentation
Presentation is loading. Please wait.
1
Taking the Kitchen Sink Seriously: An Ensemble Approach to Word Sense Disambiguation from Christopher Manning et al.
2
Overview ● 23 student WSD projects combined in a 2-layer voting scheme (an ensemble of ensemble classifiers). ● Performed well on SENSEVAL-2: 4 th place out of 21 supervised systems on the English Lexical Sample task. ● Offers some valuable lessons for both WSD and ensemble methods in general.
3
System Overview ● 23 different "1 st order" classifiers. – Independently developed WSD systems. – Use a variety of algorithms (naïve bayes, n-gram, etc.). ● These 1 st order classifiers combined into a variety of 2 nd order classifiers/voting mechanisms. – 2 nd order classifiers vary with respect to: ● Algorithms used to combine 1 st order classifiers. ● Number of voters. Each takes the top k 1 st order, where k is one of {1,3,5,7,9,11,13,15}.
4
Voting Algorithms ● Majority vote (each vote has weight 1). ● Weighted voting, with weights determined by EM. – Tries to choose weights that maximize the likelihood of 2 nd order training instances, where the probability of a sense (given the votes) is defined as the sum of weighted votes for that sense. ● Maximum entropy using features derived from the votes of the 1 st order classifiers.
5
Classifier Construction Process ● For each word: – Train each 1 st order on ¾ of training data – Use remaining ¼ of data to rank performance of 1 st orders – For each 2 nd order classifier: ● Take the top k 1 st orders for this word ● Train the 2 nd order on ¾ of training data using this ensemble – Rank performance of 2 nd orders with ¼ of training data – Take the top 2 nd order as the classifier for this word. Retrain on all the training data.
6
Results ● 61.7% accuracy in SENSEVAL-2 competition (4 th place). ● After competition, improved performance: – Used global performance (i.e., over all words) as a tie breaker for rankings of both 1 st and 2 nd order. – Improved accuracy to 63.9% (would have been 2 nd ).
7
Results for 2 nd Order Classifiers ● Results are averaged over all words. ● Note MaxEnt's ability to resist dilution.
8
Evaluating Effects of Combination ● We want different classifiers to make different mistakes. ● We can measure this differentiation as the average (over all pairs of 1 st order classifiers) of the fraction of errors that are shared (error independence). ● When error independence and word difficulty grow, the advantage of combination grows.
9
Lessons for WSD ● Every word is a separate problem. – All 1 st and 2 nd order classifiers had some words on which they did the best. ● Implementation details: – Large or small window sizes work better than medium window sizes. – This suggests that senses are determined on both a very local, collocational level and a very general, topical level. – Smoothing is very important.
10
Lessons for Ensemble Methods ● Variety within the ensemble is desirable. – Qualitatively different approaches are better than minor perturbations in similar approaches. – We can measure the extent to which this ideal is achieved. ● Variety in combination algorithms helps as well. – In particular, it can help with overfitting (because different algorithms will start overtraining at different points).
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.