Dimension Reduction via PCA (Principal Component Analysis)

Slides:



Advertisements
Similar presentations
Krishna Rajan Data Dimensionality Reduction: Introduction to Principal Component Analysis Case Study: Multivariate Analysis of Chemistry-Property data.
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Factor Analysis Continued
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Lecture 7: Principal component analysis (PCA)
1 Multivariate Statistics ESM 206, 5/17/05. 2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Psychology 202b Advanced Psychological Statistics, II April 7, 2011.
Principal Components An Introduction Exploratory factoring Meaning & application of “principal components” Basic steps in a PC analysis PC extraction process.
Dimensional reduction, PCA
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Goals of Factor Analysis (1) (1)to reduce the number of variables and (2) to detect structure in the relationships between variables, that is to classify.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Tables, Figures, and Equations
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Multivariate Methods EPSY 5245 Michael C. Rodriguez.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Summarized by Soo-Jin Kim
Chapter 2 Dimensionality Reduction. Linear Methods
Presented By Wanchen Lu 2/25/2013
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Chapter 3 Data Exploration and Dimension Reduction 1.
S1 File Principal component analysis for contiguous U.S. regional temperatures Contiguous U.S. regional atmospheric temperatures , 13-year moving.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Advanced Correlational Analyses D/RS 1013 Factor Analysis.
Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
CSSE463: Image Recognition Day 27 This week This week Today: Applications of PCA Today: Applications of PCA Sunday night: project plans and prelim work.
Lecture 12 Factor Analysis.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Education 795 Class Notes Factor Analysis Note set 6.
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
Principle Component Analysis and its use in MA clustering Lecture 12.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
 Seeks to determine group membership from predictor variables ◦ Given group membership, how many people can we correctly classify?
Feature Extraction 主講人:虞台文.
Principal Components Analysis ( PCA)
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Unsupervised Learning II Feature Extraction
Unsupervised Learning II Feature Extraction
Stat240: Principal Component Analysis (PCA). Open/closed book examination data >scores=as.matrix(read.table(" hs.leeds.ac.uk/~charles/mva-
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Principal Component Analysis
Factor and Principle Component Analysis
Principal Component Analysis (PCA)
CSSE463: Image Recognition Day 27
PREDICT 422: Practical Machine Learning
CSSE463: Image Recognition Day 26
Exploring Microarray data
Factor analysis Advanced Quantitative Research Methods
Principal Component Analysis
Principal Component Analysis (PCA)
Measuring latent variables
Measuring latent variables
Principal Component Analysis
Descriptive Statistics vs. Factor Analysis
Measuring latent variables
EPSY 5245 EPSY 5245 Michael C. Rodriguez
Principal Components Analysis
Dimensionality Reduction
CSSE463: Image Recognition Day 25
Feature space tansformation methods
CSSE463: Image Recognition Day 25
Principal Components Analysis
Principal Component Analysis
Measuring latent variables
Presentation transcript:

Dimension Reduction via PCA (Principal Component Analysis)

Motivation Derivation/calculation of PCA PCA in practice Things to note Examples in R and Python

Motivation In “big data” we often discuss how to handle big n (number of rows). However, often there are issues with big p (number of variables/parameters) Additionally, variables are often correlated which causes some to be redundant at some level. e.g. Web traffic data has visits and page views. Often correlated. We would like a way to scale down p in order to analyze a smaller, uncorrelated subset of data. We can use this smaller data set in other algorithms like linear regression, clustering, etc.

Geometric Introduction We begin with a sample dataset of 20 observations We have two dimensions, or variables, called x and y X and Y are correlated We’d like to somehow project the data onto 1 dimension in an intelligent way Note: y is not a dependent variable here. Think of x and y as something like height and weight and we want to predict basketball performance

Geometric Introduction Projecting raw data onto the x-axis

Geometric Introduction Projecting the raw data onto the y-axis

Geometric Introduction We’d like to rotate and stretch our data with a linear combination so that our data is uncorrelated and “stretched” as much as possible Rotating the data, we get the new points in red. Data in red has more separation amongst the points.

Geometric Introduction Our new X and Y are now uncorrelated.

Geometric Introduction Now if we project our rotation onto one dimension, we still have a lot of variation amongst our points. Doing a similar projection onto one dimension for the raw data does not yield as much variation.

Algebraic motivation/explanation We want to find a linear combination, a, that maximizes the variation of x Maximize 𝑣𝑎𝑟( 𝒂’𝒙) = 𝒂’𝑺𝒂 where 𝑆 is the variance of 𝒙 Since a is arbitrary and can increase by scaling a, we normalize 𝒂’𝑺𝒂 and maximize λ where 𝝀= 𝒂 ′ 𝑺𝒂 𝒂 ′ 𝒂 𝝀 𝒂 ′ 𝒂= 𝒂 ′ 𝑺𝒂 𝒂 ′ 𝑺𝒂 −𝝀 𝒂 ′ 𝒂 = 0 𝒂 ′ 𝑺𝒂 −𝝀𝒂 = 0 𝑺 −𝝀 𝑰 a = 0 Implies 𝒂 is an eigenvector with eigenvalue λ

Algebraic motivation/explanation (cont) For each parameter, p, we continue to maximize the variation such that 𝒂𝒌 is orthogonal to the other 𝒂 𝒚𝒊𝟏= 𝒂𝟏 ′ 𝒙𝒊 𝒚𝒊𝟐= 𝒂𝟐 ′ 𝒙𝒊 𝒚𝒊𝒑= 𝒂𝒑 ′ 𝒙𝒊 This gives us 𝑌 𝑛 𝑥 𝑝 = 𝑋 𝑛 𝑥 𝑝 ∗ 𝐴 𝑝 𝑥 𝑝 where Y are the transformation or “scores”. We can use just a few rows of 𝒀 to approximate 𝑿

Algebraic motivation/explanation (cont) If we take a subset of A we can use a lower dimension 𝑌 𝑛 𝑥 1 = 𝑋 𝑛 𝑥 𝑝 ∗ 𝐴 𝑝 𝑥 1 And since A is orthogonal, A’ = A-1 𝑌 𝑛 𝑥 1 ∗ 𝐴′ 1 𝑥 𝑝 = 𝑋 𝑛 𝑥 𝑝

PCA in Practice Run principal component analysis algorithm Determine number of components to use: Three common ways to do this: Retain enough components to account for k% of the total variability (where k is 80 or 90 or . . .) Accounted variability for 𝑖𝑡ℎ component can be found by 𝜆 𝑖 𝑖 𝜆 𝑖 Retain components whose variance (eigenvalue) is greater than the average eigenvalue (Σλi /p) Use scree plot to find natural break between “large/important” components and “small/unimportant” components Note: none of the 3 above methods have theoretical justification so sometimes, use what makes the most sense Use the “scores” with the corresponding number of dimensions in new algorithm

Things to Note PCA is NOT scale invariant!!! Changing the units of each column will change the results of PCA. Parameters with high variance will influence the principal components more. A good best practice is to center and scale your variables (prcomp in R does not scale automatically) Because PCA is based off of orthogonal vectors, the signs for each vector are arbitrary. Meaning you could run PCA on the same data set and get all the same numbers except the sign would be flipped. This doesn’t really change anything but it is good to know. One downfall of PCA is we usually lose the interpretability of variables.

Examples See code