Download presentation
Presentation is loading. Please wait.
Published byUrsula Logan Modified over 9 years ago
1
Chapter 2 Dimensionality Reduction. Linear Methods
2
2.1 Introduction Dimensionality reduction Goal:
the process of finding a suitable lower dimensional space in which to represent the original data Goal: Explore high-dimensional data Visualize the data using 2-D or 3-D. Analyze the data using statistical methods, such as clustering, smoothing
3
possible methods just select subsets of the variables for processing
An alternative would be to create new variables that are functions The methods we describe in this book are of the second type
4
Example 2.1 A projection will be in the form of a matrix that takes the data from the original space to a lower-dimensional one. To project onto a line that is θ radians from the horizontal or x axis projection matrix P
6
Example 2.1
7
2.2 Principal Component Analysis . PCA
Aim: (PCA) is to reduce the dimensionality from p to d, where d < p, while at the same time accounting for as much of the variation in the original data set as possible new set of coordinates or variables that are a linear combination of the original variables the observations in the new principal component space are uncorrelated.
8
2.2.1 PCA Using the Sample Covariance Matrix
centered data matrix Xc that has dimension Variable definition:
9
2.2.1 PCA Using the Sample Covariance Matrix
The next step is to calculate the eigenvectors and eigenvalues of the matrix S subject to the condition that the set of eigenvectors is orthonormal. orthonormal
10
2.2.1 PCA Using the Sample Covariance Matrix
A major result in matrix algebra shows that any square, symmetric, nonsingular matrix can be transformed to a diagonal matrix using the columns of A contain the eigenvectors of S, and L is a diagonal matrix with the eigenvalues along the diagonal. By convention, the eigenvalues are ordered in descending order
11
2.2.1 PCA Using the Sample Covariance Matrix
use the eigenvectors of S to obtain new variables called principal components (PCs) Equation 2.2 shows that the PCs are linear combinations of the original variables. Scaling the eigenvectors Using wj in the transformation yields PCs that are uncorrelated with unit variance.
12
2.2.1 PCA Using the Sample Covariance Matrix
transform the observations to the PC coordinate system via the following equation The matrix Z contains the principal component scores To summarize: the transformed variables are the PCs and the individual transformed data values are the PC scores.
13
2.2.1 PCA Using the Sample Covariance Matrix
linear algebra theorem: the sum of the variances of the original variables is equal to the sum of the eigenvalues The idea of dimensionality reduction with PCA is that one could include in the analysis only those PCs that have the highest eigenvalues Reduce the dimensionality to d with the following Ad contains the first d eigenvectors or columns of A
14
2.2.2 PCA Using the Sample Correlation Matrix
We can scale the data first to have standard units The standardized data x* are then treated as observations in the PCA process. sample correlation matrix R The correlation matrix should be used for PCA when the variances along the original dimensions are very different
15
2.2.2 PCA Using the Sample Correlation Matrix
Something should be noted: Methods for statistical inference based on the sample PCs from covariance matrices are easier and are available in the literature. the PCs obtained from the correlation and covariance matrices do not provide equivalent information.
16
2.2.3 How Many Dimensions Should We Keep?
following possible ways to address this question Cumulative Percentage of Variance Explained Scree Plot The Broken Stick Size of Variance Example 2.2 We show how to perform PCA using the yeast cell cycle data set.
17
2.2.3 How Many Dimensions Should We Keep?
Cumulative Percentage of Variance Explained The idea is to select those d PCs that contribute a specified cumulative percentage of total variation in the data
18
2.2.3 How Many Dimensions Should We Keep?
Scree Plot graphical way to decide the number of PCs The original idea: a plot of lk (the eigenvalue) versus k (the index of the eigenvalue) In some cases, we might plot the log of the eigenvalues when the first eigenvalues are very large looks for the elbow in the curve or the place where the curve levels off and becomes almost flat. The value of k at this elbow is an estimate for how many PCs to retain
19
2.2.3 How Many Dimensions Should We Keep?
The Broken Stick choose the number of PCs based on the size of the eigenvalue or the proportion of the variance explained by the individual PC. take a line segment and randomly divide it into p segments, the expected length of the k-th longest segment the proportion of the variance explained by the k-th PC is greater than gk, that PC is kept.
20
2.2.3 How Many Dimensions Should We Keep?
Size of Variance we would keep PCs if where
21
2.2.3 How Many Dimensions Should We Keep?
Example 2.2 Yeast that these contain 384 gene corresponding to five phases, measured at 17 time points.
22
2.3 Singular Value Decomposition . SVD
provides a way to find the PCs without explicitly calculating the covariance matrix the technique is valid for an arbitrary matrix We use the noncentered form in the explanation that follows The SVD of X is given by where U is an matrix D is a diagonal matrix with n rows and p columns, and V has dimensions
23
2.3 Singular Value Decomposition . SVD
24
2.3 Singular Value Decomposition . SVD
the first r columns of V form an orthonormal basis for the column space of X the first r columns of U form a basis for the row space of X As with PCA, we order the singular values decreasing ordered and impose the same order on the columns of U and V approximation to the original matrix X is obtained
25
2.3 Singular Value Decomposition . SVD
Example 2.3 applied to information retrieval (IR) start with a data matrix, where each row corresponds to a term, and each column corresponds to a document in the corpus this query is given by a column vector
26
Example 2.3
27
Example 2.3 Method to find the most relevant documents
cosine of the angle between the query vectors and the columns use a cutoff value of 0.5 the second query matches with the first book, but misses the fourth one
28
Example 2.3 The idea is that some of the dimensions represented by the full term-document matrix are noise and that documents will have closer semantic structure after dimensionality reduction using SVD find the representation of the query vector in the reduced space given by the first k columns of U
29
Example 2.3 Why? Consider Note that The columns of U and V are orthonormal Equation 2.6 left-multiply
30
Example 2.3 Using a cutoff value of 0.5,we now correctly have documents 1 and 4 as being relevant to our queries on baking bread and baking.
31
Thanks
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.