Principal Component Analysis (PCA) Group F: Minh Bao Nguyen-Khoa – Eldor Ibragimov – Huynjun woo
What is PCA? Principal component analysis (PCA) is a technique used to emphasize variation and bring out strong patterns in a dataset. It try to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. It's often used to reduce the data dimension and make data easy to explore and visualize.
What is PCA? ID Sepal Length Sepal Width Petal Length Petal Width Species 1 5.1 3.5 1.4 0.2 Iris-setosa 2 4.9 3.0 3 4.7 3.2 1.3 4 4.6 3.1 1.5 5 5.0 3.6 6 7.0 Iris-versicolor 7 6.4 4.5 8 6.9 9 5.5 2.3 4.0 10 6.5 2.8 11 6.3 3.3 6.0 2.5 Iris-virginica 12 5.8 2.7 1.9 13 7.1 5.9 2.1 14 2.9 5.6 1.8 15 2.2
Why do we need PCA? Real world data consists of numerous features, which can be redundant. As we working with those features, we may encounter several problems: Model overfitting. Time consuming. To deal with the problems, we can apply dimensionality reduction to the data. There are two type of dimensionality reduction: Feature elimination Feature extraction – PCA
When should we use PCA? In order to know if we should PCA, we can use the 3 following questions: Do you want to reduce the number of variables, but aren’t able to identify variables to completely remove from consideration? Do you want to ensure your variables are independent of one another? Are you comfortable making your independent variables less interpretable? If you answered “yes” to all three questions, then PCA is a good method to use.
How does PCA work? Let’s define our data as X and Y, where X is the features collection and Y is the data labels. ID Sepal Length Sepal Width Petal Length Petal Width Species 1 5.1 3.5 1.4 0.2 Iris-setosa 2 4.9 3.0 3 4.7 3.2 1.3 4 4.6 3.1 1.5 5 5.0 3.6 6 7.0 Iris-versicolor 7 6.4 4.5 8 6.9 9 5.5 2.3 4.0 10 6.5 2.8 11 6.3 3.3 6.0 2.5 Iris-virginica 12 5.8 2.7 1.9 13 7.1 5.9 2.1 14 2.9 5.6 1.8 15 2.2 𝑋= 5.1 4.9 4.7 3.5 1.4 0.2 3.0 1.4 0.2 3.2 1.3 0.2 4.6 5.0 7.0 6.4 6.9 5.5 6.5 6.3 5.8 7.1 6.3 6.5 3.1 1.5 0.2 3.6 1.4 0.2 3.2 3.2 3.1 2.3 2.8 3.3 2.7 3.0 2.9 3.0 4.7 4.5 4.9 4.0 4.6 6.0 5.1 5.9 5.6 5.8 1.4 1.5 1.5 1.3 1.5 2.5 1.9 2.1 1.8 2.2
How does PCA work? Next, we standardize X to create a new matrix Z by subtract the mean of each column from each entry in that column, then divide each observation in a column by that column’s standard deviation. 𝑋= 5.1 4.9 4.7 3.5 1.4 0.2 3.0 1.4 0.2 3.2 1.3 0.2 4.6 5.0 7.0 6.4 6.9 5.5 6.5 6.3 5.8 7.1 6.3 6.5 3.1 1.5 0.2 3.6 1.4 0.2 3.2 3.2 3.1 2.3 2.8 3.3 2.7 3.0 2.9 3.0 4.7 4.5 4.9 4.0 4.6 6.0 5.1 5.9 5.6 5.8 1.4 1.5 1.5 1.3 1.5 2.5 1.9 2.1 1.8 2.2 𝑍= −1.0 −1.2 −1.4 1.4 −1.4 −1.3 −0.2 −1.4 −1.3 0.5 −1.4 −1.3 −1.6 −1.1 1.3 0.6 1.2 −0.5 0.7 0.5 −0.1 1.4 0.5 0.7 0.1 −1.3 −1.3 1.7 −1.4 −1.3 0.5 0.5 0.1 −2.5 −0.8 0.8 −1.2 −0.2 −0.5 −0.2 0.5 0.3 0.6 0.1 0.4 1.2 0.7 1.1 0.9 1.1 0.2 0.3 0.3 0.1 0.3 1.6 0.8 1.1 0.7 1.2
How does PCA work? Take the matrix Z, transpose it, and multiply the transposed matrix by Z. Calculate the eigenvectors and their corresponding eigenvalues of ZᵀZ. We can decompose ZᵀZ into PDP⁻¹, where P is the matrix of eigenvectors 𝑣 and D is the diagonal matrix with eigenvalues 𝜆 on the diagonal and values of zero everywhere else. 𝑍 𝑇 𝑍=𝐴=𝑃𝐷 𝑃 −1
How does PCA work? In order to calculate the eigenvalues and eigenvectors, we have that 𝐴𝑣=𝜆𝑣 (Definition) →𝐴𝑣−𝜆𝑣=0 →𝐴𝑣−𝜆𝐼𝑣=0 → 𝐴−𝜆𝐼 𝑣=0(Equation 1) →det 𝐴−𝜆𝐼 =0 (Equation 2)
→ det 𝐴−𝜆𝐼 = 𝜆 2 −6𝜆−16=0 (𝐸𝑞𝑢𝑎𝑡𝑖𝑜𝑛 2) How does PCA work? For example, we have: 𝐴= 7 3 3 −1 →𝜆𝐼=𝜆 1 0 0 1 = 𝜆 0 0 𝜆 →𝐴−𝜆𝐼= 7 3 3 −1 − 𝜆 0 0 𝜆 = 7−𝜆 3 3 −1−𝜆 → det 𝐴−𝜆𝐼 = 𝜆 2 −6𝜆−16=0 (𝐸𝑞𝑢𝑎𝑡𝑖𝑜𝑛 2) →𝜆=8 𝑎𝑛𝑑 𝜆=−2
How does PCA work? 𝑊𝑖𝑡ℎ 𝜆=8 → 𝐴−𝜆𝐼 𝑣= 7−8 3 3 −1−8 𝑣 → 𝐴−𝜆𝐼 𝑣= 7−8 3 3 −1−8 𝑣 = −1 3 3 −9 𝑣 1 𝑣 2 = 0 0 (𝐸𝑞𝑢𝑎𝑡𝑖𝑜𝑛 1) → 𝑣 1 =3 𝑣 2 → 𝑣 2 =1, 𝑣 1 =3 𝑊𝑖𝑡ℎ 𝜆=−2 → 𝑣 2 =−3, 𝑣 1 =1
How does PCA work? Finally, we validate: 𝐴𝑃=𝑃𝐷(Definition) ↔ 7 3 3 −1 3 1 1 −3 = 3 1 1 −3 8 0 0 −2 ↔ 24 −2 8 6 = 24 −2 8 6 𝑇𝑟𝑢𝑒 →𝑃= 3 1 1 −3 ,𝐷= 8 0 0 −2
How does PCA work? There are numerous libraries can help us automatically calculate eigenvalues and eigenvectors. The eigenvalues on the diagonal of D will be associated with the corresponding column in P — that is, the first element of D is λ₁ and the corresponding eigenvector is the first column of P. This holds for all elements in D and their corresponding eigenvectors in P. Take the eigenvalues λ₁, λ₂, …, λp and sort them from largest to smallest. In doing so, sort the eigenvectors in P accordingly.
How does PCA work? From the Iris examples: 𝑍 𝑇 𝑍=𝐴= 15.4 −2.3 13.7 −2.3 13.7 12.6 15.0 −5.8 −5.2 −5.8 15.6 15.1 12.6 −5.2 15.1 15.1 →𝐷= 45 0 0 0 0 13.5 0 0 0 0 0 0 2.5 0 0 0.1 ,𝑃= 0.9 4.7 −1.3 0.3 −0.4 14.0 0.4 −0.1 1.0 1.0 0.8 1.0 0.4 1.0 −1.3 1.0
How does PCA work? Calculate Z* = ZP*. This new matrix, Z*, is a standardized version of X but now each observation is a combination of the original variables, where the weights are determined by the eigenvector. 𝑃= 0.9 4.7 −1.3 0.3 −0.4 14.0 0.4 −0.1 1.0 1.0 0.8 1.0 0.4 1.0 −1.3 1.0 → 𝑃 ∗ = 0.9 4.7 −0.4 14.0 1.0 1.0 0.8 1.0
How does PCA work? 𝑍= −1.0 1.4 −1.4 −1.3 , 𝑃 ∗ = 0.9 4.7 −0.4 14.0 1.0 1.0 0.8 1.0 → 𝑍 ∗ =𝑍 𝑃 ∗ = −4.2 12.5 We can do similarly with all remaining samples to get the new attribute values.
How does PCA work? In some cases, we may need to determine how many features to keep versus how many to drop. There are two common methods to determine this: We arbitrarily select how many dimensions we want to keep. Calculate the proportion of variance explained for each feature, pick a threshold, and add features until you hit that threshold. The proportion of variance explained is the sum of the eigenvalues of the features you kept divided by the sum of the eigenvalues of all features.
THANKS FOR YOU LISTENING