Concept Decomposition for Large Sparse Text Data Using Clustering

Slides:



Advertisements
Similar presentations
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Advertisements

The 5th annual UK Workshop on Computational Intelligence London, 5-7 September 2005 Department of Electronic & Electrical Engineering University College.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) Dimensionality Reductions or data projections Random projections.
Dimensionality Reduction PCA -- SVD
Self Organization of a Massive Document Collection
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model.
Comparison of information retrieval techniques: Latent semantic indexing (LSI) and Concept indexing (CI) Jasminka Dobša Faculty of organization and informatics,
Lecture 19 Singular Value Decomposition
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Vector Space Information Retrieval Using Concept Projection Presented by Zhiguo Li
Optimization of Sparse Matrix Kernels for Data Mining Eun-Jin Im and Katherine Yelick U.C.Berkeley.
Information Retrieval in Text Part III Reference: Michael W. Berry and Murray Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Singular Value Decomposition in Text Mining Ram Akella University of California Berkeley Silicon Valley Center/SC Lecture 4b February 9, 2011.
TFIDF-space  An obvious way to combine TF-IDF: the coordinate of document in axis is given by  General form of consists of three parts: Local weight.
Minimum Spanning Trees Displaying Semantic Similarity Włodzisław Duch & Paweł Matykiewicz Department of Informatics, UMK Toruń School of Computer Engineering,
Lecture 21 SVD and Latent Semantic Indexing and Dimensional Reduction
1/ 30. Problems for classical IR models Introduction & Background(LSI,SVD,..etc) Example Standard query method Analysis standard query method Seeking.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Unsupervised Learning
3D Geometry for Computer Graphics
Dimension of Meaning Author: Hinrich Schutze Presenter: Marian Olteanu.
Presented by Arun Qamra
1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka Virginia de Sa (UCSD) Cogsci 108F Linear.
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Length and Dot Product in R n Notes: is called a unit vector. Notes: The length of a vector is also called its norm. Chapter 5 Inner Product Spaces.
Summarized by Soo-Jin Kim
Chapter 2 Dimensionality Reduction. Linear Methods
Presented By Wanchen Lu 2/25/2013
1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.
Latent Semantic Indexing Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Non Negative Matrix Factorization
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Co-clustering Documents and Words Using Bipartite Spectral Graph Partitioning Jinghe Zhang 10/28/2014 CS 6501 Information Retrieval.
Expressing Implicit Semantic Relations without Supervision ACL 2006.
Pseudo-supervised Clustering for Text Documents Marco Maggini, Leonardo Rigutini, Marco Turchi Dipartimento di Ingegneria dell’Informazione Università.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
A Clustering Method Based on Nonnegative Matrix Factorization for Text Mining Farial Shahnaz.
Generic Summarization and Keyphrase Extraction Using Mutual Reinforcement Principle and Sentence Clustering Hongyuan Zha Department of Computer Science.
A Convergent Solution to Tensor Subspace Learning.
Information-Theoretic Co- Clustering Inderjit S. Dhillon et al. University of Texas, Austin presented by Xuanhui Wang.
1. Systems of Linear Equations and Matrices (8 Lectures) 1.1 Introduction to Systems of Linear Equations 1.2 Gaussian Elimination 1.3 Matrices and Matrix.
Natural Language Processing Topics in Information Retrieval August, 2002.
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CS654: Digital Image Analysis Lecture 11: Image Transforms.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
Chapter 61 Chapter 7 Review of Matrix Methods Including: Eigen Vectors, Eigen Values, Principle Components, Singular Value Decomposition.
Lecture 11 Inner Product Spaces Last Time Change of Basis (Cont.) Length and Dot Product in R n Inner Product Spaces Elementary Linear Algebra R. Larsen.
IR 6 Scoring, term weighting and the vector space model.
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Jinbo Bi Joint work with Jiangwen Sun, Jin Lu, and Tingyang Xu
Document Clustering Based on Non-negative Matrix Factorization
Multiplicative updates for L1-regularized regression
Parallelism in High-Performance Computing Applications
Singular Value Decomposition
Lecture 21 SVD and Latent Semantic Indexing and Dimensional Reduction
Principal Component Analysis
Spectral Clustering Eric Xing Lecture 8, August 13, 2010
Dimension reduction : PCA and Clustering
Outline Singular Value Decomposition Example of PCA: Eigenfaces.
Large scale multilingual and multimodal integration
Lecture 13: Singular Value Decomposition (SVD)
K-Medoid May 5, 2019.
Latent semantic space: Iterative scaling improves precision of inter-document similarity measurement Rie Kubota Ando. Latent semantic space: Iterative.
Presentation transcript:

Concept Decomposition for Large Sparse Text Data Using Clustering Dhillon, I. S. and Modha, D. S. Machine Learning, 42(1), 2001 Nov., 9, 2001 Summarized by Jeong-Ho Chang

Introduction Study a certain spherical k-means algorithm for clustering document vectors. Empirically demonstrate that the clusters produced have a certain “fractal-like” and “self similar” behavior. Matrix approximation by concept decomposition: explore intimate connections between clustering using the spherical k-means algorithm and the problem of matrix approximation for the word-by-document matrices.

Vector Space Model for Text term weighting component depends on the number of occurrences of words j in document i . global weighting component depends on the number of documents which contain the word j. normalization component

The Spherical k-means Algorithm

Concept Vectors Cosine Similarity Concept Vectors concept vector mean vector

Spherical k-means (1/4) Objective function Optimal Partitioning Measurement for “coherence” or “quality” of each cluster

Spherical k-means (2/4)

Spherical k-means (3/4): Convergence Monotone:

Spherical k-means (4/4): Convergence Bound: the limit exists Does not imply that underlying partitioning converges.

Experimental Results (1/4) Data set CLASSIC3 data set 3,893 documents MEDLINE(1033), CISI(1460), CRANFIELD(1400) 4,099 words after preprocessing Use only term frequency. NSF data set 13,297 abstracts of the grants awarded by NSF 5,298 words after preprocessing Use term frequency and inverse document frequency.

Experimental Results (2/4) Confusion matrix for CLASSIC3 data Objective function plot

Experimental Results (3/4) Intra-cluster structure

Experimental Results (4/4) Inter-cluster structure

Relation with Euclidean k-means Algorithms Can be thought of as a matrix approximation problem

Matrix Approximation using Clustering

Clustering as Matrix Approximation Formulation : word-by-document matrix : matrix approximation where i-th column is the concept vector closest to the xi . How effective is the approximation? Frobenius norm

Concept Decomposition (1/2) Formulation concept decomposition as the least-squares approximation of X

Concept Decomposition (2/2)

Concept Vectors and Singular Vectors: A Comparison

Concept vectors are local and sparse (1/6) Locality Three concept vectors for CLASSIC3 data

Concept vectors are local and sparse (2/6) Three singular vectors for CLASSIC3 data

Concept vectors are local and sparse (3/6) Four (among 10) concept vectors for NSF data

Concept vectors are local and sparse (4/6) Four (among 10) singular vectors for NSF data

Concept vectors are local and sparse (5/6) Sparsity With the increasing number of clusters, concept vectors become progressively sparser.

Concept vectors are local and sparse (6/6) Orthonormality With increasing number of clusters, the concept vectors tend towards “orthonormality.”

Principal Angles: Comparing Concept and Singular subspaces (1/4) Generalize the notion of an angle between two lines to higher-dimensional subspaces of Rd. Formulation F and G is subspaces of Rd. Average cosine of the principal angles

Principal Angles: Comparing Concept and Singular subspaces (2/4) CLASSIC3 data set With singular subspace S3 With singular subspace S10

Principal Angles: Comparing Concept and Singular subspaces (3/4) NSF data set (1/2) With singular subspace S64

Principal Angles: Comparing Concept and Singular subspaces (4/4) NSF data set (2/2) With singular subspace S235

Conclusions Present spherical k-means algorithm for text documents High-dimensional and sparse. Average cluster coherence tend to be quite low. There is a large void surrounding each concept vector. Uncommon for low-dimensional, dense data set. The concept decompositions that are derived from concept vectors can be used for matrix approximation. Comparable to that of truncated SVDs. The concept vectors constitute a powerful sparse and localized “basis” for text data sets.