Distributional Clustering of Words for Text Classification L. Douglas Baker Andrew Kachites McCallum SIGIR’98.

Distributional Clustering of Words for Text Classification L. Douglas Baker Andrew Kachites McCallum SIGIR’98

Distributional Clustering Word similarity based on class label distribution Word similarity based on class label distribution ‘puck’ and ‘goalie’ ‘puck’ and ‘goalie’ ‘team’ ‘team’

Distributional Clustering Clustering words based on class distribution - (supervised) Clustering words based on class distribution - (supervised) Similarity between w t & w s  similarity between P(C|w t ) & P(C|w s ) Similarity between w t & w s  similarity between P(C|w t ) & P(C|w s ) Information theoretic measure to calculate similarity between distributions Information theoretic measure to calculate similarity between distributions Kullback-Leibler divergence to the mean Kullback-Leibler divergence to the mean

Distributional Clustering Class 8: Autos and Class 9: Motorcycles

Distributional Clustering

Kullback-Leibler Divergence Here, D is asymmetric and D  infinity when P(y)=0 and P(x)≠0 Also, D ≥ 0

Kullback-Leibler Divergence Where, Jensen-Shannon Divergence is a special case of symmetrised KL-Divergence. P(w t )=P(w s )=0.5

Clustering Algorithm Characteristics: -Greedy Aggressive -Local Optimal -Hard Clustering -Agglomerative

Experiments Dataset: Dataset: 20 Newsgroups 20 Newsgroups Reuters-21578 Reuters-21578 Yahoo Science Hierarchy Yahoo Science Hierarchy Compared with: Compared with: Supervised Latent Semantic indexing Supervised Latent Semantic indexing Class-based clustering Class-based clustering Feature selection by mutual information with the class variable Feature selection by mutual information with the class variable Feature selection by Markov-blanket method Feature selection by Markov-blanket method Classifier : NBC Classifier : NBC

Results

Conclusion Useful semantic word clusterings Useful semantic word clusterings Higher classification accuracy Higher classification accuracy Smaller classification models Smaller classification models Word clustering vs. feature selection ?? What if the data is Noisy?? Noisy?? Sparse?? Sparse??

Distributional Clustering of Words for Text Classification L. Douglas Baker Andrew Kachites McCallum SIGIR’98.

Similar presentations

Presentation on theme: "Distributional Clustering of Words for Text Classification L. Douglas Baker Andrew Kachites McCallum SIGIR’98."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Distributional Clustering of Words for Text Classification L. Douglas Baker Andrew Kachites McCallum SIGIR’98.

Similar presentations

Presentation on theme: "Distributional Clustering of Words for Text Classification L. Douglas Baker Andrew Kachites McCallum SIGIR’98."— Presentation transcript:

Similar presentations

About project

Feedback