RANDOM PROJECTIONS IN DIMENSIONALITY REDUCTION APPLICATIONS TO IMAGE AND TEXT DATA Ella Bingham and Heikki Mannila Ângelo Cardoso IST/UTL November 2009.

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

Linear Subspaces - Geometry. No Invariants, so Capture Variation Each image = a pt. in a high-dimensional space. –Image: Each pixel a dimension. –Point.

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Mutidimensional Data Analysis Growth of big databases requires important data processing.  Need for having methods allowing to extract this information.

Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.

Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.

Machine Learning Lecture 8 Data Processing and Representation

Dimensionality Reduction PCA -- SVD

1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.

A Comprehensive Study on Third Order Statistical Features for Image Splicing Detection Xudong Zhao, Shilin Wang, Shenghong Li and Jianhua Li Shanghai Jiao.

São Paulo Advanced School of Computing (SP-ASC’10). São Paulo, Brazil, July 12-17, 2010 Looking at People Using Partial Least Squares William Robson Schwartz.

Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December

Principal Component Analysis

L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.

4 th NETTAB Workshop Camerino, 5 th -7 th September 2004 Alberto Bertoni, Raffaella Folgieri, Giorgio Valentini

Face Recognition using PCA (Eigenfaces) and LDA (Fisherfaces)

Dimension reduction : PCA and Clustering by Agnieszka S. Juncker

Face Recognition Jeremy Wyatt.

Dimension Reduction and Feature Selection Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.

FACE RECOGNITION, EXPERIMENTS WITH RANDOM PROJECTION

Microarray analysis Algorithms in Computational Biology Spring 2006 Written by Itai Sharon.

Dimension reduction : PCA and Clustering by Agnieszka S. Juncker Part of the slides is adapted from Chris Workman.

Face Collections : Rendering and Image Processing Alexei Efros.

Dimensionality Reduction

Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)

Eigenfaces for Recognition Student: Yikun Jiang Professor: Brendan Morris.

Summarized by Soo-Jin Kim

Linear Least Squares Approximation. 2 Definition (point set case) Given a point set x 1, x 2, …, x n  R d, linear least squares fitting amounts to find.

Presented By Wanchen Lu 2/25/2013

BACKGROUND LEARNING AND LETTER DETECTION USING TEXTURE WITH PRINCIPAL COMPONENT ANALYSIS (PCA) CIS 601 PROJECT SUMIT BASU FALL 2004.

Local Non-Negative Matrix Factorization as a Visual Representation Tao Feng, Stan Z. Li, Heung-Yeung Shum, HongJiang Zhang 2002 IEEE Presenter : 張庭豪.

Pseudo-supervised Clustering for Text Documents Marco Maggini, Leonardo Rigutini, Marco Turchi Dipartimento di Ingegneria dell’Informazione Università.

Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.

N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.

Why to reduce the number of the features? Having D features, we want to reduce their number to n, where n

An Efficient Greedy Method for Unsupervised Feature Selection

1/18 New Feature Presentation of Transition Probability Matrix for Image Tampering Detection Luyi Chen 1 Shilin Wang 2 Shenghong Li 1 Jianhua Li 1 1 Department.

Handling nonnumerical variables (2) Sections 6.3—6.6 Kenrick Bingham

PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

Feature Selection and Extraction Michael J. Watts

Principal Component Analysis (PCA).

2D-LDA: A statistical linear discriminant analysis for image matrix

3D Face Recognition Using Range Images Literature Survey Joonsoo Lee 3/10/05.

Chapter 15: Classification of Time- Embedded EEG Using Short-Time Principal Component Analysis by Nguyen Duc Thang 5/2009.

Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.

Principal Components Analysis ( PCA)

Out of sample extension of PCA, Kernel PCA, and MDS WILSON A. FLORERO-SALINAS DAN LI MATH 285, FALL

Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:

Principal Component Analysis (PCA)

Unsupervised Learning

Dimensionality Reduction

Dimension Review Many of the geometric structures generated by chaotic map or differential dynamic systems are extremely complex. Fractal : hard to define.

Dimensionality Reduction

Dimension reduction : PCA and Clustering by Agnieszka S. Juncker

Principal Component Analysis (PCA)

Machine Learning Dimensionality Reduction

Outline Multilinear Analysis

Principal Component Analysis

Outline Peter N. Belhumeur, Joao P. Hespanha, and David J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,”

PCA vs ICA vs LDA.

Blind Signal Separation using Principal Components Analysis

Object Modeling with Layers

Principal Component Analysis

Outline H. Murase, and S. K. Nayar, “Visual learning and recognition of 3-D objects from appearance,” International Journal of Computer Vision, vol. 14,

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Lecture 4 - Monte Carlo improvements via variance reduction techniques: antithetic sampling Antithetic variates: for any one path obtained by a gaussian.

Principal Component Analysis

Using Manifold Structure for Partially Labeled Classification

Unsupervised Learning

Presentation transcript:

RANDOM PROJECTIONS IN DIMENSIONALITY REDUCTION APPLICATIONS TO IMAGE AND TEXT DATA Ella Bingham and Heikki Mannila Ângelo Cardoso IST/UTL November

Outline 1. Dimensionality Reduction – Motivation 2. Methods for dimensionality reduction 1. PCA 2. DCT 3. Random Projection 3. Results on Image Data 4. Results on Text Data 5. Conclusions 2

Dimensionality Reduction Motivation  Many applications have high dimensional data  Market basket analysis Wealth of alternative products  Text Large vocabulary  Image Large image window  We want to process the data  High dimensionality of data restricts the choice of data processing methods Time needed to use processing methods is too long Memory requirements make it impossible to use some methods 3

Dimensionality Reduction Motivation  We want to visualize high dimensional data  Some features may be irrelevant  Some dimensions may be highly correlated with some other, e.g. height and foot size  “Intrinsic” dimensionality may be smaller than the number of features  The data can be best described and understood by a smaller number dimensions 4

Methods for dimensionality reduction  Main idea is to project the high-dimensional (d) space into a lower-dimensional (k) space  A statistically optimal way is to project into a lower- dimensional orthogonal subspace that captures as much variation of the data as possible for the chosen k  The best (in terms of mean squared error ) and most widely used way to do this is PCA  How to compare different methods?  Amount of distortion caused  Computational complexity 5

Principal Components Analysis (PCA) Intuition  Given an original space in 2d  How can we represent that points in a k-dimensional space (k<=d) while preserving as much information as possible Original axes * * * * * * * * * * * * * * * * * * * * * * * * Data points First principal component Second principal component 6

Principal Components Analysis (PCA) Algorithm  Eigenvalues  A measure of how much data variance is explained by each eigenvector  Singular Value Decomposition (SVD)  Can be used to find the eigenvectors and eigenvalues of the covariance matrix  To project into the lower-dimensional space  Multiply the principal components (PC’s) by X and subtract the mean of X in each dimension  To restore into the original space  Multiply the projection by the principal components and add the mean of X in each dimension  Algorithm 1. X  Create N x d data matrix, with one row vector x n per data point 2. X subtract mean x from each dimension in X 3. Σ  covariance matrix of X 4. Find eigenvectors and eigenvalues of Σ 5. PC’s  the k eigenvectors with largest eigenvalues 7

Random Projection (RP) Idea  PCA even when calculated using SVD is computationally expensive  Complexity is O(dcN) Where d is the number of dimensions, c is the average number of non-zero entries per column and N the number of points  Idea  What if we randomly constructed principal component vectors? Johnson-Lindenstrauss lemma If points in vector space are projected onto a randomly selected subspace of suitably high dimensions, then the distances between the points are approximately preserved 8

Random Projection (RP) Idea  Use a random matrix (R) equivalently to the principal components matrix  R is usually Gaussian distributed  Complexity is O(kcn)  The generated random matrix (R) is usually not orthogonal  Making R orthogonal is computationally expensive However we can rely on a result by Hecht-Nielsen: In a high-dimensional space, there exists a much larger number of almost orthogonal than orthogonal directions. Thus vectors with random directions are close enough to orthogonal Euclidean distance in the projected space can be scaled to the original space by 9

Random Projection Simplified Random Projection (SRP)  Random matrix is usually gaussian distributed  mean: 0; standart deviation: 1  Achlioptas showed that a much simpler distribution can be used  This implies further computational savings since the matrix is sparse and the computations can be performed using integer arithmetic's 10

Discrete Cosine Transform (DCT)  Widely used method for image compression  Optimal for human eye  Distortions are introduced at the highest frequencies which humans tend to neglect as noise  DCT is not data-dependent, in contrast to PCA that needs the eigenvalue decomposition  This makes DCT orders of magnitude cheaper to compute 11

Results Noiseless Images 12

Results Noiseless Images 13

Results Noiseless Images 14  Original space 2500-d  (100 image pairs with 50x50 pixels)  Error Measurement  Average error on euclidean distance between 100 pairs of images in the original and reduced space  Amount of distortion  RP and SRP give accurate results for very small k (k>10) Distance scaling might be an explanation for the success  PCA gives accurate results for k>600 In PCA such scaling is not straightforward  DCT still as a significant error even for k > 600  Computational complexity  Number of floating point operations for RP and SRP is on the order of 100 times less than PCA  RP and SRP clearly outperform PCA and DCT at smallest dimensions

Results Noisy Images  Images were corrupted by salt and pepper impulse noise with probability 0.2  Error is computed in the high- dimensional noiseless space  RP, SRP, PCA and DCT perform quite similarly to the noiseless case 15

Results Text Data  Data set  Newsgroups corpus sci.crypt, sci.med, sci.space, soc.religion  Pre-processing  Term frequency vectors  Some common terms were removed but no stemming was used  Document vectors normalized to unit length Data was not made zero mean  Size  5000 terms  2262 newsgroup documents  Error measurement  100 pairs of documents were randomly selected and the error between their cosine before and after the dimensionality reduction was calculated 16

Results Text Data 17

Results Text Data  The cosine was used as similarity measure since it is more common for this task  RP is not as accurate as SVD  The Johnson-Lindenstrauss result states that the euclidean distance are retained well in random projection not the cosine  RP error may be neglected in most applications  RP can be used on large document collections with less computational complexity than SVD 18

Conclusion  Random Projection is an effective dimensionality reduction method for high-dimensional real-world data sets  RP preserves the similarities even if the data is projected into a moderate number of dimensions  RP is beneficial in applications where the distances of the original space are meaningful  RP is a good alternative for traditional dimensionality reduction methods which are infeasible for high dimensional data since it does not suffer from the curse of dimensionality 19

Questions 20