Download presentation
Presentation is loading. Please wait.
1
Distributional Clustering of Words for Text Classification L. Douglas Baker Andrew Kachites McCallum SIGIR’98
2
Distributional Clustering Word similarity based on class label distribution Word similarity based on class label distribution ‘puck’ and ‘goalie’ ‘puck’ and ‘goalie’ ‘team’ ‘team’
3
Distributional Clustering Clustering words based on class distribution - (supervised) Clustering words based on class distribution - (supervised) Similarity between w t & w s similarity between P(C|w t ) & P(C|w s ) Similarity between w t & w s similarity between P(C|w t ) & P(C|w s ) Information theoretic measure to calculate similarity between distributions Information theoretic measure to calculate similarity between distributions Kullback-Leibler divergence to the mean Kullback-Leibler divergence to the mean
4
Distributional Clustering Class 8: Autos and Class 9: Motorcycles
5
Distributional Clustering
6
Kullback-Leibler Divergence Here, D is asymmetric and D infinity when P(y)=0 and P(x)≠0 Also, D ≥ 0
7
Kullback-Leibler Divergence Where, Jensen-Shannon Divergence is a special case of symmetrised KL-Divergence. P(w t )=P(w s )=0.5
8
Clustering Algorithm Characteristics: -Greedy Aggressive -Local Optimal -Hard Clustering -Agglomerative
9
Experiments Dataset: Dataset: 20 Newsgroups 20 Newsgroups Reuters-21578 Reuters-21578 Yahoo Science Hierarchy Yahoo Science Hierarchy Compared with: Compared with: Supervised Latent Semantic indexing Supervised Latent Semantic indexing Class-based clustering Class-based clustering Feature selection by mutual information with the class variable Feature selection by mutual information with the class variable Feature selection by Markov-blanket method Feature selection by Markov-blanket method Classifier : NBC Classifier : NBC
10
Results
11
Conclusion Useful semantic word clusterings Useful semantic word clusterings Higher classification accuracy Higher classification accuracy Smaller classification models Smaller classification models Word clustering vs. feature selection ?? What if the data is Noisy?? Noisy?? Sparse?? Sparse??
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.