Principal Component Analysis (PCA)

Slides:



Advertisements
Similar presentations
Covariance Matrix Applications
Advertisements

EigenFaces.
Factor Analysis Continued
Machine Learning Lecture 8 Data Processing and Representation
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.
Principal Component Analysis
LISA Short Course Series Multivariate Analysis in R Liang (Sally) Shan March 3, 2015 LISA: Multivariate Analysis in RMar. 3, 2015.
Factor Analysis Research Methods and Statistics. Learning Outcomes At the end of this lecture and with additional reading you will be able to Describe.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Techniques for studying correlation and covariance structure
Computer Vision Spring ,-685 Instructor: S. Narasimhan WH 5409 T-R 10:30am – 11:50am Lecture #18.
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Summarized by Soo-Jin Kim
Principle Component Analysis Presented by: Sabbir Ahmed Roll: FH-227.
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Chapter 2 Dimensionality Reduction. Linear Methods
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Chapter 3 Data Exploration and Dimension Reduction 1.
Some matrix stuff.
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
SINGULAR VALUE DECOMPOSITION (SVD)
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Principal Components Analysis. Principal Components Analysis (PCA) A multivariate technique with the central aim of reducing the dimensionality of a multivariate.
Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.
Presented by: Muhammad Wasif Laeeq (BSIT07-1) Muhammad Aatif Aneeq (BSIT07-15) Shah Rukh (BSIT07-22) Mudasir Abbas (BSIT07-34) Ahmad Mushtaq (BSIT07-45)
FACTOR ANALYSIS.  The basic objective of Factor Analysis is data reduction or structure detection.  The purpose of data reduction is to remove redundant.
Principal Components Analysis ( PCA)
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Unsupervised Learning II Feature Extraction
Unsupervised Learning II Feature Extraction
Principal Component Analysis
Principal Component Analysis (PCA)
Matrices and Vector Concepts
PREDICT 422: Practical Machine Learning
Information Management course
Exploring Microarray data
Principle Component Analysis (PCA) Networks (§ 5.8)
School of Computer Science & Engineering
Erich Smith Coleman Platt
Linear Discrimant Analysis(LDA)
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Information Management course
Principal Component Analysis
Principal Component Analysis (PCA)
Principal Component Analysis
Warm-up a. Solve for k: 13 −5
MATRICES MATRIX OPERATIONS.
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Principal Component Analysis and Linear Discriminant Analysis
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Techniques for studying correlation and covariance structure
MATRICES MATRIX OPERATIONS.
Descriptive Statistics vs. Factor Analysis
CS623: Introduction to Computing with Neural Nets (lecture-15)
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Principal Components Analysis
Feature space tansformation methods
Principal Components What matters most?.
Digital Image Processing Lecture 21: Principal Components for Description Prof. Charlene Tsai *Chapter 11.4 of Gonzalez.
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Principal Component Analysis
MATRICES MATRIX OPERATIONS.
MATRICES MATRIX OPERATIONS.
Marios Mattheakis and Pavlos Protopapas
Presentation transcript:

Principal Component Analysis (PCA) Group F: Minh Bao Nguyen-Khoa – Eldor Ibragimov – Huynjun woo

What is PCA? Principal component analysis (PCA) is a technique used to emphasize variation and bring out strong patterns in a dataset. It try to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. It's often used to reduce the data dimension and make data easy to explore and visualize.

What is PCA? ID Sepal Length Sepal Width Petal Length Petal Width Species 1 5.1 3.5 1.4 0.2 Iris-setosa 2 4.9 3.0 3 4.7 3.2 1.3 4 4.6 3.1 1.5 5 5.0 3.6 6 7.0 Iris-versicolor 7 6.4 4.5 8 6.9 9 5.5 2.3 4.0 10 6.5 2.8 11 6.3 3.3 6.0 2.5 Iris-virginica 12 5.8 2.7 1.9 13 7.1 5.9 2.1 14 2.9 5.6 1.8 15 2.2

Why do we need PCA? Real world data consists of numerous features, which can be redundant. As we working with those features, we may encounter several problems: Model overfitting. Time consuming. To deal with the problems, we can apply dimensionality reduction to the data. There are two type of dimensionality reduction: Feature elimination Feature extraction – PCA

When should we use PCA? In order to know if we should PCA, we can use the 3 following questions: Do you want to reduce the number of variables, but aren’t able to identify variables to completely remove from consideration? Do you want to ensure your variables are independent of one another? Are you comfortable making your independent variables less interpretable? If you answered “yes” to all three questions, then PCA is a good method to use.

How does PCA work? Let’s define our data as X and Y, where X is the features collection and Y is the data labels. ID Sepal Length Sepal Width Petal Length Petal Width Species 1 5.1 3.5 1.4 0.2 Iris-setosa 2 4.9 3.0 3 4.7 3.2 1.3 4 4.6 3.1 1.5 5 5.0 3.6 6 7.0 Iris-versicolor 7 6.4 4.5 8 6.9 9 5.5 2.3 4.0 10 6.5 2.8 11 6.3 3.3 6.0 2.5 Iris-virginica 12 5.8 2.7 1.9 13 7.1 5.9 2.1 14 2.9 5.6 1.8 15 2.2 𝑋= 5.1 4.9 4.7 3.5 1.4 0.2 3.0 1.4 0.2 3.2 1.3 0.2 4.6 5.0 7.0 6.4 6.9 5.5 6.5 6.3 5.8 7.1 6.3 6.5 3.1 1.5 0.2 3.6 1.4 0.2 3.2 3.2 3.1 2.3 2.8 3.3 2.7 3.0 2.9 3.0 4.7 4.5 4.9 4.0 4.6 6.0 5.1 5.9 5.6 5.8 1.4 1.5 1.5 1.3 1.5 2.5 1.9 2.1 1.8 2.2

How does PCA work? Next, we standardize X to create a new matrix Z by subtract the mean of each column from each entry in that column, then divide each observation in a column by that column’s standard deviation. 𝑋= 5.1 4.9 4.7 3.5 1.4 0.2 3.0 1.4 0.2 3.2 1.3 0.2 4.6 5.0 7.0 6.4 6.9 5.5 6.5 6.3 5.8 7.1 6.3 6.5 3.1 1.5 0.2 3.6 1.4 0.2 3.2 3.2 3.1 2.3 2.8 3.3 2.7 3.0 2.9 3.0 4.7 4.5 4.9 4.0 4.6 6.0 5.1 5.9 5.6 5.8 1.4 1.5 1.5 1.3 1.5 2.5 1.9 2.1 1.8 2.2 𝑍= −1.0 −1.2 −1.4 1.4 −1.4 −1.3 −0.2 −1.4 −1.3 0.5 −1.4 −1.3 −1.6 −1.1 1.3 0.6 1.2 −0.5 0.7 0.5 −0.1 1.4 0.5 0.7 0.1 −1.3 −1.3 1.7 −1.4 −1.3 0.5 0.5 0.1 −2.5 −0.8 0.8 −1.2 −0.2 −0.5 −0.2 0.5 0.3 0.6 0.1 0.4 1.2 0.7 1.1 0.9 1.1 0.2 0.3 0.3 0.1 0.3 1.6 0.8 1.1 0.7 1.2

How does PCA work? Take the matrix Z, transpose it, and multiply the transposed matrix by Z. Calculate the eigenvectors and their corresponding eigenvalues of ZᵀZ. We can decompose ZᵀZ into PDP⁻¹, where P is the matrix of eigenvectors 𝑣 and D is the diagonal matrix with eigenvalues 𝜆 on the diagonal and values of zero everywhere else. 𝑍 𝑇 𝑍=𝐴=𝑃𝐷 𝑃 −1

How does PCA work? In order to calculate the eigenvalues and eigenvectors, we have that 𝐴𝑣=𝜆𝑣 (Definition) →𝐴𝑣−𝜆𝑣=0 →𝐴𝑣−𝜆𝐼𝑣=0 → 𝐴−𝜆𝐼 𝑣=0(Equation 1) →det 𝐴−𝜆𝐼 =0 (Equation 2)

→ det 𝐴−𝜆𝐼 = 𝜆 2 −6𝜆−16=0 (𝐸𝑞𝑢𝑎𝑡𝑖𝑜𝑛 2) How does PCA work? For example, we have: 𝐴= 7 3 3 −1 →𝜆𝐼=𝜆 1 0 0 1 = 𝜆 0 0 𝜆 →𝐴−𝜆𝐼= 7 3 3 −1 − 𝜆 0 0 𝜆 = 7−𝜆 3 3 −1−𝜆 → det 𝐴−𝜆𝐼 = 𝜆 2 −6𝜆−16=0 (𝐸𝑞𝑢𝑎𝑡𝑖𝑜𝑛 2) →𝜆=8 𝑎𝑛𝑑 𝜆=−2

How does PCA work? 𝑊𝑖𝑡ℎ 𝜆=8 → 𝐴−𝜆𝐼 𝑣= 7−8 3 3 −1−8 𝑣 → 𝐴−𝜆𝐼 𝑣= 7−8 3 3 −1−8 𝑣 = −1 3 3 −9 𝑣 1 𝑣 2 = 0 0 (𝐸𝑞𝑢𝑎𝑡𝑖𝑜𝑛 1) → 𝑣 1 =3 𝑣 2 → 𝑣 2 =1, 𝑣 1 =3 𝑊𝑖𝑡ℎ 𝜆=−2 → 𝑣 2 =−3, 𝑣 1 =1

How does PCA work? Finally, we validate: 𝐴𝑃=𝑃𝐷(Definition) ↔ 7 3 3 −1 3 1 1 −3 = 3 1 1 −3 8 0 0 −2 ↔ 24 −2 8 6 = 24 −2 8 6 𝑇𝑟𝑢𝑒 →𝑃= 3 1 1 −3 ,𝐷= 8 0 0 −2

How does PCA work? There are numerous libraries can help us automatically calculate eigenvalues and eigenvectors. The eigenvalues on the diagonal of D will be associated with the corresponding column in P — that is, the first element of D is λ₁ and the corresponding eigenvector is the first column of P. This holds for all elements in D and their corresponding eigenvectors in P. Take the eigenvalues λ₁, λ₂, …, λp and sort them from largest to smallest. In doing so, sort the eigenvectors in P accordingly.

How does PCA work? From the Iris examples: 𝑍 𝑇 𝑍=𝐴= 15.4 −2.3 13.7 −2.3 13.7 12.6 15.0 −5.8 −5.2 −5.8 15.6 15.1 12.6 −5.2 15.1 15.1 →𝐷= 45 0 0 0 0 13.5 0 0 0 0 0 0 2.5 0 0 0.1 ,𝑃= 0.9 4.7 −1.3 0.3 −0.4 14.0 0.4 −0.1 1.0 1.0 0.8 1.0 0.4 1.0 −1.3 1.0

How does PCA work? Calculate Z* = ZP*. This new matrix, Z*, is a standardized version of X but now each observation is a combination of the original variables, where the weights are determined by the eigenvector. 𝑃= 0.9 4.7 −1.3 0.3 −0.4 14.0 0.4 −0.1 1.0 1.0 0.8 1.0 0.4 1.0 −1.3 1.0 → 𝑃 ∗ = 0.9 4.7 −0.4 14.0 1.0 1.0 0.8 1.0

How does PCA work? 𝑍= −1.0 1.4 −1.4 −1.3 , 𝑃 ∗ = 0.9 4.7 −0.4 14.0 1.0 1.0 0.8 1.0 → 𝑍 ∗ =𝑍 𝑃 ∗ = −4.2 12.5 We can do similarly with all remaining samples to get the new attribute values.

How does PCA work? In some cases, we may need to determine how many features to keep versus how many to drop. There are two common methods to determine this: We arbitrarily select how many dimensions we want to keep. Calculate the proportion of variance explained for each feature, pick a threshold, and add features until you hit that threshold. The proportion of variance explained is the sum of the eigenvalues of the features you kept divided by the sum of the eigenvalues of all features.

THANKS FOR YOU LISTENING