Distributional Clustering of Words for Text Classification L. Douglas Baker Andrew Kachites McCallum SIGIR’98.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Supervised Normalized Cut for Detecting, Classifying and Identifying Special Nuclear Materials Yan T. Yang Barak Fishbain Dorit S. Hochbaum Eric B. Norman.
Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach Xiaoli Zhang Fern, Carla E. Brodley ICML’2003 Presented by Dehong Liu.
Co-clustering based classification for Out-of-domain Documents
On feature distributional clustering for text categorization Bekkerman, El-Yaniv, Tishby and Winter The Technion. June, 27, 2001.
Self Taught Learning : Transfer learning from unlabeled data Presented by: Shankar B S DMML Lab Rajat Raina et al, CS, Stanford ICML 2007.
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Locally Constraint Support Vector Clustering
The use of unlabeled data to improve supervised learning for text summarization MR Amini, P Gallinari (SIGIR 2002) Slides prepared by Jon Elsas for the.
Distributional Clustering of Words for Text Classification Authors: L.Douglas Baker Andrew Kachites McCallum Presenter: Yihong Ding.
Three Papers: AUC, PFA and BIOInformatics The three papers are posted online.
Distributional Clustering of Words for Text Classification Presentation by: Thomas Walsh (Rutgers University) L.Douglas Baker (Carnegie Mellon University)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
Cross Validation Framework to Choose Amongst Models and Datasets for Transfer Learning Erheng Zhong ¶, Wei Fan ‡, Qiang Yang ¶, Olivier Verscheure ‡, Jiangtao.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.
The Power of Word Clusters for Text Classification Noam Slonim and Naftali Tishby Presented by: Yangzhe Xiao.
Data Visualization and Feature Selection: New Algorithms for Nongaussian Data Howard Hua Yang and John Moody NIPS ’ 99.
Relaxed Transfer of Different Classes via Spectral Partition Xiaoxiao Shi 1 Wei Fan 2 Qiang Yang 3 Jiangtao Ren 4 1 University of Illinois at Chicago 2.
Basic Concepts in Information Theory
Kullback-Leibler Boosting Ce Liu, Hueng-Yeung Shum Microsoft Research Asia CVPR 2003 Presented by Derek Hoiem.
1 Zi Yang, Wei Li, Jie Tang, and Juanzi Li Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University, China {yangzi,
Dongyeop Kang1, Youngja Park2, Suresh Chari2
Employing EM and Pool-Based Active Learning for Text Classification Andrew McCallumKamal Nigam Just Research and Carnegie Mellon University.
Smart RSS Aggregator A text classification problem Alban Scholer & Markus Kirsten 2005.
Xiaoxiao Shi, Qi Liu, Wei Fan, Philip S. Yu, and Ruixin Zhu
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.
Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.
Learning the threshold in Hierarchical Agglomerative Clustering
A Comparative Study of Kernel Methods for Classification Applications Yan Liu Oct 21, 2003.
Object Recognition a Machine Translation Learning a Lexicon for a Fixed Image Vocabulary Miriam Miklofsky.
Use of FCA in the Ontology Extraction Step for the Improvement of the Semantic Information Retrieval Peter Butka TU Košice, Slovakia.
Dual Transfer Learning Mingsheng Long 1,2, Jianmin Wang 2, Guiguang Ding 2 Wei Cheng, Xiang Zhang, and Wei Wang 1 Department of Computer Science and Technology.
HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.
Detecting New a Priori Probabilities of Data Using Supervised Learning Karpov Nikolay Associate professor NRU Higher School of Economics.
Semi-automatic Product Attribute Extraction from Store Website
V. Clustering 인공지능 연구실 이승희 Text: Text mining Page:82-93.
Information-Theoretic Co- Clustering Inderjit S. Dhillon et al. University of Texas, Austin presented by Xuanhui Wang.
Model-based Clustering
Computational Intelligence: Methods and Applications Lecture 33 Decision Tables & Information Theory Włodzisław Duch Dept. of Informatics, UMK Google:
Text Categorization by Boosting Automatically Extracted Concepts Lijuan Cai and Tommas Hofmann Department of Computer Science, Brown University SIGIR 2003.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Wei Xu,
Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
Memoryless Document Vector Dongxu Zhang Advised by Dong Wang
Web-Mining Agents: Transfer Learning TrAdaBoost R. Möller Institute of Information Systems University of Lübeck.
Text-classification using Latent Dirichlet Allocation - intro graphical model Lei Li
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Chun Kai Chen Author : Andrew.
Information Bottleneck Method & Double Clustering + α Summarized by Byoung Hee, Kim.
A Semantic Text Classification Based on DBpedia Rujiang Bai, Junhua Liao Shandong University of Technology Library Zibo , China { brj,
Active, Semi-Supervised Learning for Textual Information Access Anastasia Krithara¹, Cyril Goutte², Massih-Reza Amini³, Jean-Michel Renders¹ Massih-Reza.
Understanding unstructured texts via Latent Dirichlet Allocation Raphael Cohen DSaaS, EMC IT June 2015.
Test1 Here some text. Text 2 More text.
Cross Domain Distribution Adaptation via Kernel Mapping
Artificial Intelligence Research Laboratory
Automatic Discovery of Network Applications: A Hybrid Approach
[type text here] [type text here] [type text here] [type text here]
Your text here Your text here Your text here Your text here Your text here Pooky.Pandas.
Clustering vs. Classification
Human Action Recognition Week 8
Supervised vs. unsupervised Learning
Your text here Your text here Your text here Your text here
Knowledge Transfer via Multiple Model Local Structure Mapping
[type text here] [type text here] [type text here] [type text here]
Using Link Information to Enhance Web Page Classification
Presentation transcript:

Distributional Clustering of Words for Text Classification L. Douglas Baker Andrew Kachites McCallum SIGIR’98

Distributional Clustering Word similarity based on class label distribution Word similarity based on class label distribution ‘puck’ and ‘goalie’ ‘puck’ and ‘goalie’ ‘team’ ‘team’

Distributional Clustering Clustering words based on class distribution - (supervised) Clustering words based on class distribution - (supervised) Similarity between w t & w s  similarity between P(C|w t ) & P(C|w s ) Similarity between w t & w s  similarity between P(C|w t ) & P(C|w s ) Information theoretic measure to calculate similarity between distributions Information theoretic measure to calculate similarity between distributions Kullback-Leibler divergence to the mean Kullback-Leibler divergence to the mean

Distributional Clustering Class 8: Autos and Class 9: Motorcycles

Distributional Clustering

Kullback-Leibler Divergence Here, D is asymmetric and D  infinity when P(y)=0 and P(x)≠0 Also, D ≥ 0

Kullback-Leibler Divergence Where, Jensen-Shannon Divergence is a special case of symmetrised KL-Divergence. P(w t )=P(w s )=0.5

Clustering Algorithm Characteristics: -Greedy Aggressive -Local Optimal -Hard Clustering -Agglomerative

Experiments Dataset: Dataset: 20 Newsgroups 20 Newsgroups Reuters Reuters Yahoo Science Hierarchy Yahoo Science Hierarchy Compared with: Compared with: Supervised Latent Semantic indexing Supervised Latent Semantic indexing Class-based clustering Class-based clustering Feature selection by mutual information with the class variable Feature selection by mutual information with the class variable Feature selection by Markov-blanket method Feature selection by Markov-blanket method Classifier : NBC Classifier : NBC

Results

Conclusion Useful semantic word clusterings Useful semantic word clusterings Higher classification accuracy Higher classification accuracy Smaller classification models Smaller classification models Word clustering vs. feature selection ?? What if the data is Noisy?? Noisy?? Sparse?? Sparse??