Download presentation
Presentation is loading. Please wait.
Published byCadence Nevill Modified over 10 years ago
1
RANDOM PROJECTIONS IN DIMENSIONALITY REDUCTION APPLICATIONS TO IMAGE AND TEXT DATA Ella Bingham and Heikki Mannila Ângelo Cardoso IST/UTL November 2009 1
2
Outline 1. Dimensionality Reduction – Motivation 2. Methods for dimensionality reduction 1. PCA 2. DCT 3. Random Projection 3. Results on Image Data 4. Results on Text Data 5. Conclusions 2
3
Dimensionality Reduction Motivation Many applications have high dimensional data Market basket analysis Wealth of alternative products Text Large vocabulary Image Large image window We want to process the data High dimensionality of data restricts the choice of data processing methods Time needed to use processing methods is too long Memory requirements make it impossible to use some methods 3
4
Dimensionality Reduction Motivation We want to visualize high dimensional data Some features may be irrelevant Some dimensions may be highly correlated with some other, e.g. height and foot size “Intrinsic” dimensionality may be smaller than the number of features The data can be best described and understood by a smaller number dimensions 4
5
Methods for dimensionality reduction Main idea is to project the high-dimensional (d) space into a lower-dimensional (k) space A statistically optimal way is to project into a lower- dimensional orthogonal subspace that captures as much variation of the data as possible for the chosen k The best (in terms of mean squared error ) and most widely used way to do this is PCA How to compare different methods? Amount of distortion caused Computational complexity 5
6
Principal Components Analysis (PCA) Intuition Given an original space in 2d How can we represent that points in a k-dimensional space (k<=d) while preserving as much information as possible Original axes * * * * * * * * * * * * * * * * * * * * * * * * Data points First principal component Second principal component 6
7
Principal Components Analysis (PCA) Algorithm Eigenvalues A measure of how much data variance is explained by each eigenvector Singular Value Decomposition (SVD) Can be used to find the eigenvectors and eigenvalues of the covariance matrix To project into the lower-dimensional space Multiply the principal components (PC’s) by X and subtract the mean of X in each dimension To restore into the original space Multiply the projection by the principal components and add the mean of X in each dimension Algorithm 1. X Create N x d data matrix, with one row vector x n per data point 2. X subtract mean x from each dimension in X 3. Σ covariance matrix of X 4. Find eigenvectors and eigenvalues of Σ 5. PC’s the k eigenvectors with largest eigenvalues 7
8
Random Projection (RP) Idea PCA even when calculated using SVD is computationally expensive Complexity is O(dcN) Where d is the number of dimensions, c is the average number of non-zero entries per column and N the number of points Idea What if we randomly constructed principal component vectors? Johnson-Lindenstrauss lemma If points in vector space are projected onto a randomly selected subspace of suitably high dimensions, then the distances between the points are approximately preserved 8
9
Random Projection (RP) Idea Use a random matrix (R) equivalently to the principal components matrix R is usually Gaussian distributed Complexity is O(kcn) The generated random matrix (R) is usually not orthogonal Making R orthogonal is computationally expensive However we can rely on a result by Hecht-Nielsen: In a high-dimensional space, there exists a much larger number of almost orthogonal than orthogonal directions. Thus vectors with random directions are close enough to orthogonal Euclidean distance in the projected space can be scaled to the original space by 9
10
Random Projection Simplified Random Projection (SRP) Random matrix is usually gaussian distributed mean: 0; standart deviation: 1 Achlioptas showed that a much simpler distribution can be used This implies further computational savings since the matrix is sparse and the computations can be performed using integer arithmetic's 10
11
Discrete Cosine Transform (DCT) Widely used method for image compression Optimal for human eye Distortions are introduced at the highest frequencies which humans tend to neglect as noise DCT is not data-dependent, in contrast to PCA that needs the eigenvalue decomposition This makes DCT orders of magnitude cheaper to compute 11
12
Results Noiseless Images 12
13
Results Noiseless Images 13
14
Results Noiseless Images 14 Original space 2500-d (100 image pairs with 50x50 pixels) Error Measurement Average error on euclidean distance between 100 pairs of images in the original and reduced space Amount of distortion RP and SRP give accurate results for very small k (k>10) Distance scaling might be an explanation for the success PCA gives accurate results for k>600 In PCA such scaling is not straightforward DCT still as a significant error even for k > 600 Computational complexity Number of floating point operations for RP and SRP is on the order of 100 times less than PCA RP and SRP clearly outperform PCA and DCT at smallest dimensions
15
Results Noisy Images Images were corrupted by salt and pepper impulse noise with probability 0.2 Error is computed in the high- dimensional noiseless space RP, SRP, PCA and DCT perform quite similarly to the noiseless case 15
16
Results Text Data Data set Newsgroups corpus sci.crypt, sci.med, sci.space, soc.religion Pre-processing Term frequency vectors Some common terms were removed but no stemming was used Document vectors normalized to unit length Data was not made zero mean Size 5000 terms 2262 newsgroup documents Error measurement 100 pairs of documents were randomly selected and the error between their cosine before and after the dimensionality reduction was calculated 16
17
Results Text Data 17
18
Results Text Data The cosine was used as similarity measure since it is more common for this task RP is not as accurate as SVD The Johnson-Lindenstrauss result states that the euclidean distance are retained well in random projection not the cosine RP error may be neglected in most applications RP can be used on large document collections with less computational complexity than SVD 18
19
Conclusion Random Projection is an effective dimensionality reduction method for high-dimensional real-world data sets RP preserves the similarities even if the data is projected into a moderate number of dimensions RP is beneficial in applications where the distances of the original space are meaningful RP is a good alternative for traditional dimensionality reduction methods which are infeasible for high dimensional data since it does not suffer from the curse of dimensionality 19
20
Questions 20
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.