Document Clustering Based on Non-negative Matrix Factorization Wei Xu et al, SIGIR’03 Presented by Weilin Xu, 11/06/2014
Motivation Document clustering applications Narrow Broad Variety of information needs Document clustering applications News clustering Extractive summarization … Why traditional clustering methods not applicable? Agglomerative clustering (bottom-up hierarchical) Cons: Computationally expensive Document partitioning (flat clustering), e.g. K-means Cons: Harsh simplifying assumptions on distributions Hierarchical: computationally expensive (O(n2logn)) Flat: based on weak assumptions, compact shape clusters in K-means, independent dimensions in Naïve Bayes, Gaussian distribution assumption in Gaussian mixture model.
Motivation Existing document clustering methods Latent semantic indexing (LSI) Singular Value Decomposition Negative coefficients Graph-based spectral clustering Presented by Jinghe Leads to computing singular vectors or eigenvectors Equivalent to each other under certain conditions. Both need additional clustering methods. LSI: project each document into the singular vector space, then conducts document clustering using traditional clustering Graph: lead to the computation of singular vectors or eigenvectors of certain graph affinity matrices.
Proposed method Non-negative Matrix Factorization-based clustering 𝑋≈𝑈 𝑉 𝑇 U: term-concept matrix; V: document-concept matrix. Non-negative coefficients Latent semantic space is not necessarily orthogonal Factorization, Approximation
Roadmap Representations Method NMF vs. SVD Experiment Summary
Representations 𝑋 𝑚×𝑛 = 𝑥 11 ⋯ 𝑥 1𝑛 ⋮ ⋱ ⋮ 𝑥 𝑚1 ⋯ 𝑥 𝑚𝑛 , term-document matrix E.g. TF-IDF vector of document i 𝑋 𝑚×𝑛 ≈ 𝑈 𝑚×𝑘 𝑉 𝑛×𝑘 𝑇 𝑈 𝑚×𝑘 : term-topic matrix, 𝑉 𝑛×𝑘 : document-topic matrix
Method 𝑋≈𝑈 𝑉 𝑇 Minimize the objective function: Derivation process (omitted) Updating equations: Normalize U: Normalize U to get unique solution of U and V, Frobenius norm Convergence is guaranteed
NMF vs. SVD
Experiment - Corpora #doc #cluster
Experiment – Compared methods Spectral clustering: Average Association (AA) Equivalent to [LSI + K-means], if <Xi,Xj> similarity Spectral clustering: Normalized Cut (NC) Standard form of NMF (NMF) NC weighted form (NMF-NCW)
Experiment - Evaluations Accuracy Mutual information Normalized: Map(), find by Kuhn-Munkres algorithm.
Experiment - Results 4 2 3 1 NMF-NCW > NC > NMF > AA
Experiment - Results 4 2 3 1 NMF-NCW > NC > NMF > AA
Summary NMF-based document partitioning method Differ from SVD-based LSI and spectral clustering Latent semantic space not need to be orthogonal Non-negative values in latent semantic directions Without additional clustering Outperform the best methods.
Thanks