Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.

Slides:



Advertisements
Similar presentations
Component Analysis (Review)
Advertisements

Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Lecture 7: Principal component analysis (PCA)
1 Multivariate Statistics ESM 206, 5/17/05. 2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions.
Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.
An introduction to Principal Component Analysis (PCA)
Principal Component Analysis
Principal Component Analysis
Principal component analysis (PCA)
CHAPTER 19 Correspondence Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.
09/05/2005 סמינריון במתמטיקה ביולוגית Dimension Reduction - PCA Principle Component Analysis.
Face Recognition Jeremy Wyatt.
Independent Component Analysis (ICA) and Factor Analysis (FA)
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Microarray analysis Algorithms in Computational Biology Spring 2006 Written by Itai Sharon.
Bayesian belief networks 2. PCA and ICA
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Techniques for studying correlation and covariance structure
Correlation. The sample covariance matrix: where.
Agenda Dimension reduction Principal component analysis (PCA)
Survey on ICA Technical Report, Aapo Hyvärinen, 1999.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Robust PCA in Stata Vincenzo Verardi FUNDP (Namur) and ULB (Brussels), Belgium FNRS Associate Researcher.
Summarized by Soo-Jin Kim
Principle Component Analysis Presented by: Sabbir Ahmed Roll: FH-227.
Chapter 2 Dimensionality Reduction. Linear Methods
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
es/by-sa/2.0/. Principal Component Analysis & Clustering Prof:Rui Alves Dept Ciencies Mediques.
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
Principal Components Analysis. Principal Components Analysis (PCA) A multivariate technique with the central aim of reducing the dimensionality of a multivariate.
Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
Principle Component Analysis and its use in MA clustering Lecture 12.
Principal Component Analysis (PCA)
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Presented by: Muhammad Wasif Laeeq (BSIT07-1) Muhammad Aatif Aneeq (BSIT07-15) Shah Rukh (BSIT07-22) Mudasir Abbas (BSIT07-34) Ahmad Mushtaq (BSIT07-45)
Principal Components Analysis ( PCA)
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.
Unsupervised Learning II Feature Extraction
Principal Component Analysis
Factor and Principle Component Analysis
Principal Component Analysis (PCA)
PREDICT 422: Practical Machine Learning
Principal Component Analysis
Information Management course
Exploring Microarray data
Information Management course
Principal Component Analysis (PCA)
Dimension Reduction via PCA (Principal Component Analysis)
Principal Component Analysis
Dynamic graphics, Principal Component Analysis
Principal Component Analysis (PCA)
Bayesian belief networks 2. PCA and ICA
Techniques for studying correlation and covariance structure
Descriptive Statistics vs. Factor Analysis
Principal Components Analysis
Feature space tansformation methods
Principal Components What matters most?.
Principal Component Analysis
Presentation transcript:

Principal Component Analysis

Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data in terms of a set of uncorrelated variables We typically have a data matrix of n observations on p correlated variables x 1,x 2,…x p PCA looks for a transformation of the x i into p new variables y i that are uncorrelated

The data matrix caseht (x 1 )wt(x 2 )age(x 3 )sbp(x 4 )heart rate (x 5 ) n

Reduce dimension The simplet way is to keep one variable and discard all others: not reasonable! Wheigt all variable equally: not reasonable (unless they have same variance) Wheigted average based on some citerion. Which criterion?

Let us write it first Looking for a transformation of the data matrix X ( n x p ) such that Y=  T X =  1 X 1 +  2 X  p X p Where  =(  1,  2,..,  p ) T is a column vector of wheights with  1 ²+  2 ²+..+  p ² =1

One good criterion Maximize the variance of the projection of the observations on the Y variables Find  so that Var(  T X )=  T Var(X)  is maximal The matrix C=Var(X) is the covariance matrix of the X i variables

Let us see it on a figure GoodBetter

Covariance matrix C=

And so.. We find that The direction of  is given by the eigenvector  1 correponding to the largest eigenvalue of matrix C The second vector that is orthogonal (uncorrelated) to the first is the one that has the second highest variance which comes to be the eignevector corresponding to the second eigenvalue And so on …

So PCA gives New variables Y i that are linear combination of the original variables ( x i ): Y i = a i1 x 1 +a i2 x 2 +…a ip x p ; i=1..p The new variables Y i are derived in decreasing order of importance; they are called ‘principal components’

Calculating eignevalues and eigenvectors The eigenvalues i are found by solving the equation det(C- I)=0 Eigenvectors are columns of the matrix A such that C=A D A T Where D=

An example Let us take two variables with covariance c>0 C = C - I = det( C - I )=(1- )²-c² Solving this we find 1 =1+c 2 =1-c < 1

and eigenvectors Any eigenvector A satisfies the condition C A= A Solving we find A=CA= = = A1A1 A2A2

PCA is sensitive to scale If you multiply one variable by a scalar you get different results (can you show it?) This is because it uses covariance matrix (and not correlation) PCA should be applied on data that have approximately the same scale in each variable

Interpretation of PCA The new variables (PCs) have a variance equal to their corresponding eigenvalue Var(Y i )= i for all i=1…p Small i  small variance  data change little in the direction of component Y i The relative variance explained by each PC is given by i /  i

How many components to keep? Enough PCs to have a cumulative variance explained by the PCs that is >50-70% Kaiser criterion: keep PCs with eigenvalues >1 Scree plot: represents the ability of PCs to explain de variation in data

Do it graphically

Interpretation of components See the wheights of variables in each component If Y 1 = 0.89 X X X X 4 Then X 1 and X 3 have the highest wheights and so are the mots important variable in the first PC See the correlation between variables X i and PCs: circle of correlation

Circle of correlation

Normalized (standardized) PCA If variables have very heterogenous variances we standardize them The standardized variables X i * X i *= (X i -mean)/  variance The new variables all have the same variance (1), so each variable have the same wheight.

Application of PCA in Genomics PCA is useful for finding new, more informative, uncorrelated features; it reduces dimensionality by rejecting low variance features Analysis of expression data Analysis of metabolomics data (Ward et al., 2003)

However PCA is only powerful if the biological question is related to the highest variance in the dataset If not other techniques are more useful : Independent Component Analysis Introduced by Jutten in 1987

What is ICA?

That looks like that

The idea behind ICA

How it works?

Rationale of ICA Find the components S i that are as independent as possible in the sens of maximizing some function F(s 1,s 2,.,s k ) that measures indepedence All ICs (except 1) should be non- Normal The variance of all ICs is 1 There is no hierarchy between ICs

How to find ICs ? Many choices of objective function F Mutual information We use the kurtosis of the variables to approximate the distribution function The number of ICs is chosen by the user

Difference with PCA It is not a dimensionality reduction technique There is no single (exact) solution for components; uses different algorithms (in R: FastICA, PearsonICA, MLICA) ICs are of course uncorrelated but also as independent as possible Uninteresting for Normally distributed variables

Example: Lee and Batzoglou (2003) Microarray expression data on 7070 genes in 59 Normal human tissue samples (19 types) We are not interested in reducing dimension but rather in looking for genes that show tissue specific expression profile (what make tissue types differents)

PCA vs ICA Hsiao et al (2002) applied PCA and by visual inspection observed three gene cluster of 425 genes: liver- specific, brain-specific and muscle- specific ICA identified more tissue-specific genes than PCA