Robust PCA in Stata Vincenzo Verardi FUNDP (Namur) and ULB (Brussels), Belgium FNRS Associate Researcher.

Slides:

Advertisements

Similar presentations

Krishna Rajan Data Dimensionality Reduction: Introduction to Principal Component Analysis Case Study: Multivariate Analysis of Chemistry-Property data.

Advertisements

Component Analysis (Review)

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.

An introduction to Principal Component Analysis (PCA)

Affine-invariant Principal Components Charlie Brubaker and Santosh Vempala Georgia Tech School of Computer Science Algorithms and Randomness Center.

Principal Component Analysis

Dimensionality Reduction and Embeddings

L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.

Dimensional reduction, PCA

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Independent Component Analysis (ICA) and Factor Analysis (FA)

The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.

Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.

CSE 300: Software Reliability Engineering Topics covered: Software metrics and software reliability.

1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.

Principal Component Analysis. Consider a collection of points.

Techniques for studying correlation and covariance structure

Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.

Survey on ICA Technical Report, Aapo Hyvärinen, 1999.

Summarized by Soo-Jin Kim

Principle Component Analysis Presented by: Sabbir Ahmed Roll: FH-227.

Linear Least Squares Approximation. 2 Definition (point set case) Given a point set x 1, x 2, …, x n  R d, linear least squares fitting amounts to find.

Chapter 2 Dimensionality Reduction. Linear Methods

Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.

Presented By Wanchen Lu 2/25/2013

Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.

CSE554AlignmentSlide 1 CSE 554 Lecture 5: Alignment Fall 2011.

Eigen Decomposition Based on the slides by Mani Thomas Modified and extended by Longin Jan Latecki.

Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Principal Component Analysis: Preliminary Studies Émille E. O. Ishida IF - UFRJ First Rio-Saclay Meeting: Physics Beyond the Standard Model Rio de Janeiro.

N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.

Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.

CSE554AlignmentSlide 1 CSE 554 Lecture 8: Alignment Fall 2013.

Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.

ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.

CSE 185 Introduction to Computer Vision Face Recognition.

Discriminant Analysis

Journal Club Journal of Chemometrics May 2010 August 23, 2010.

Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;

EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.

Principal Component Analysis (PCA)

MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.

Feature Extraction 主講人：虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.

CSSE463: Image Recognition Day 10 Lab 3 due Weds Lab 3 due Weds Today: Today: finish circularity finish circularity region orientation: principal axes.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:

Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)

CSSE463: Image Recognition Day 10 Lab 3 due Weds, 11:59pm Lab 3 due Weds, 11:59pm Take-home quiz due Friday, 4:00 pm Take-home quiz due Friday, 4:00 pm.

Principal Components Analysis ( PCA)

Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.

Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.

Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

Principal Component Analysis

Principal Component Analysis (PCA)

Principal Component Analysis

LECTURE 10: DISCRIMINANT ANALYSIS

Principal Component Analysis (PCA)

Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.

Descriptive Statistics vs. Factor Analysis

Feature space tansformation methods

Generally Discriminant Analysis

Eigen Decomposition Based on the slides by Mani Thomas

LECTURE 09: DISCRIMINANT ANALYSIS

Principal Component Analysis

Eigen Decomposition Based on the slides by Mani Thomas

Outline Variance Matrix of Stochastic Variables and Orthogonal Transforms Principle Component Analysis Generalized Eigenvalue Decomposition.

Presentation transcript:

Robust PCA in Stata Vincenzo Verardi FUNDP (Namur) and ULB (Brussels), Belgium FNRS Associate Researcher

PCA, transforms a set of correlated variables into a smaller set of uncorrelated variables (principal components). For p random variables X 1,…,X p. the goal of PCA is to construct a new set of p axes in the directions of greatest variability. Introduction Robust Covariance Matrix Robust PCA Application Conclusion

X1X1 X2X2 Introduction Robust Covariance Matrix Robust PCA Application Conclusion

X1X1 X2X2 Introduction Robust Covariance Matrix Robust PCA Application Conclusion

X1X1 X2X2 Introduction Robust Covariance Matrix Robust PCA Application Conclusion

X1X1 X2X2 Introduction Robust Covariance Matrix Robust PCA Application Conclusion

Hence, for the first principal component, the goal is to find a linear transformation Y=  1 X 1 +  2 X  p X p (=  T X) such that tha variance of Y (=Var(  T X) =  T   ) is maximal The direction of  is given by the eigenvector correponding to the largest eigenvalue of matrix Σ Introduction Robust Covariance Matrix Robust PCA Application Conclusion

The second vector (orthogonal to the first), is the one that has the second highest variance. This corresponds to the eigenvector associated to the second largest eigenvalue And so on … Introduction Robust Covariance Matrix Robust PCA Application Conclusion

The new variables (PCs) have a variance equal to their corresponding eigenvalue Var(Y i )= i for all i=1…p The relative variance explained by each PC is given by i /  i Introduction Robust Covariance Matrix Robust PCA Application Conclusion

How many PC should be considered? Sufficient number of PCs to have a cumulative variance explained that is at least 60-70% of the total Kaiser criterion: keep PCs with eigenvalues >1 Introduction Robust Covariance Matrix Robust PCA Application Conclusion

PCA is based on the classical covariance matrix which is sensitive to outliers … Illustration: Introduction Robust Covariance Matrix Robust PCA Application Conclusion

PCA is based on the classical covariance matrix which is sensitive to outliers … Illustration:. set obs drawnorm x1-x3, corr(C). matrix list C c1 c2 c3 r1 1 r2.7 1 r Introduction Robust Covariance Matrix Robust PCA Application Conclusion

Introduction Robust Covariance Matrix Robust PCA Application Conclusion

Introduction Robust Covariance Matrix Robust PCA Application Conclusion

Introduction Robust Covariance Matrix Robust PCA Application Conclusion

This drawback can be easily solved by basing the PCA on a robust estimation of the covariance (correlation) matrix. A well suited method for this is MCD that considers all subsets containing h% of the observations (generally 50%) and estimates Σ and µ on the data of the subset associated with the smallest covariance matrix determinant. Intuition … Introduction Robust Covariance Matrix Robust PCA Application Conclusion

The generalized variance proposed by Wilks (1932), is a one-dimensional measure of multidimensional scatter. It is defined as. In the 2x2 case it is easy to see the underlying idea: Raw bivariate spread Spread due to covariations Introduction Robust Covariance Matrix Robust PCA Application Conclusion

Remember, MCD considers all subsets containing 50% of the observations … However, if N=200, the number of subsets to consider would be: Solution: use subsampling algorithms … Introduction Robust Covariance Matrix Robust PCA Application Conclusion

The implemented algorithm: Rousseeuw and Van Driessen (1999) 1.P-subset 2.Concentration (sorting distances) 3.Estimation of robust Σ MCD 4.Estimation of robust PCA Introduction Robust Covariance Matrix Robust PCA Application Conclusion

Consider a number of subsets containing (p+1) points (where p is the number of variables) sufficiently large to be sure that at least one of the subsets does not contain outliers. Calculate the covariance matrix on each subset and keep the one with the smallest determinant Do some fine tuning to get closer to the global solution Introduction Robust Covariance Matrix Robust PCA Application Conclusion

The minimal number of subsets we need to have a probability (Pr) of having at least one clean if  % of outliers corrupt the dataset can be easily derived: Contamination: % Introduction Robust Covariance Matrix Robust PCA Application Conclusion

The minimal number of subsets we need to have a probability (Pr) of having at least one clean if  % of outliers corrupt the dataset can be easily derived: Will be the probability that one random point in the dataset is not an outlier Introduction Robust Covariance Matrix Robust PCA Application Conclusion

The minimal number of subsets we need to have a probability (Pr) of having at least one clean if  % of outliers corrupt the dataset can be easily derived: Will be the probability that none of the p random points in a p-subset is an outlier Introduction Robust Covariance Matrix Robust PCA Application Conclusion

The minimal number of subsets we need to have a probability (Pr) of having at least one clean if  % of outliers corrupt the dataset can be easily derived: Will be the probability that at least one of the p random points in a p-subset is an outlier Introduction Robust Covariance Matrix Robust PCA Application Conclusion

The minimal number of subsets we need to have a probability (Pr) of having at least one clean if  % of outliers corrupt the dataset can be easily derived: Will be the probability that there is at least one outlier in each of the N p- subsets considered (i.e. that all p- subsets are corrupt) Introduction Robust Covariance Matrix Robust PCA Application Conclusion

The minimal number of subsets we need to have a probability (Pr) of having at least one clean if  % of outliers corrupt the dataset can be easily derived: Will be the probability that there is at least one clean p-subset among the N considered Introduction Robust Covariance Matrix Robust PCA Application Conclusion

The minimal number of subsets we need to have a probability (Pr) of having at least one clean if  % of outliers corrupt the dataset can be easily derived: Rearranging we have: Introduction Robust Covariance Matrix Robust PCA Application Conclusion

The preliminary p-subset step allowed to estimate a preliminary Σ* and μ* Calculate Mahalanobis distances using Σ* and μ* for all individuals Mahalanobis distances, are defined as. MD are distributed as for Gaussian data. Introduction Robust Covariance Matrix Robust PCA Application Conclusion

The preliminary p-subset step allowed to estimate a preliminary Σ* and μ* Calculate Mahalanobis distances using Σ* and μ* for all individuals Sort individuals according to Mahalanobis distances and re-estimate Σ* and μ* using the first 50% observations Repeat the previous step till convergence Introduction Robust Covariance Matrix Robust PCA Application Conclusion

In Stata, Hadi’s method is available to estimate a robust Covariance matrix Unfortunately it is not very robust The reason for this is simple, it relies on a non-robust preliminary estimation of the covariance matrix Introduction Robust Covariance Matrix Robust PCA Application Conclusion

1.Compute a variant of MD 2.Sort individuals according to. Use the subset with the first p+1 points to re-estimate μ and Σ. 3.Compute MD and sort the data. 4.Check if the first point out of the subset is an outlier. If not, add this point to the subset and repeat steps 3 and 4. Otherwise stop. Introduction Robust Covariance Matrix Robust PCA Application Conclusion

clear set obs 1000 local b=sqrt(invchi2(5,0.95)) drawnorm x1-x5 e replace x1=invnorm(uniform())+5 in 1/100 mcd x*, outlier gen RD=Robust_distance hadimvo x*, gen(a b) p(0.5) scatter RD b, xline(`b') yline(`b') Introduction Robust Covariance Matrix Robust PCA Application Conclusion

Hadi Fast-MCD Introduction Robust Covariance Matrix Robust PCA Application Conclusion

Introduction Robust Covariance Matrix Robust PCA Application Conclusion

Introduction Robust Covariance Matrix Robust PCA Application Conclusion

Introduction Robust Covariance Matrix Robust PCA Application Conclusion

Introduction Robust Covariance Matrix Robust PCA Application Conclusion

Introduction Robust Covariance Matrix Robust PCA Application Conclusion

Introduction Robust Covariance Matrix Robust PCA Application Conclusion

Introduction Robust Covariance Matrix Robust PCA Application Conclusion

Introduction Robust Covariance Matrix Robust PCA Application Conclusion

QUESTION: Can a single indicator accurately sum up research excellence? GOAL: Determine the underlying factors measured by the variables used in the Shanghai ranking  Principal component analysis Introduction Robust Covariance Matrix Robust PCA Application Conclusion

Alumni: Alumni recipients of the Nobel prize or the Fields Medal; Award: Current faculty Nobel laureates and Fields Medal winners; HiCi : Highly cited researchers N&S: Articles published in Nature and Science; PUB: Articles in the Science Citation Index-expanded, and the Social Science Citation Index; Introduction Robust Covariance Matrix Robust PCA Application Conclusion

Introduction Robust Covariance Matrix Robust PCA Application Conclusion

Introduction Robust Covariance Matrix Robust PCA Application Conclusion

Introduction Robust Covariance Matrix Robust PCA Application Conclusion

The first component accounts for 68% of the inertia and is given by: Φ 1 =0.42Al.+0.44Aw.+0.48HiCi+0.50NS+0.38PUB VariableCorr. (Φ 1,X i ) Alumni0.78 Awards0.81 HiCi0.89 N&S0.92 PUB0.70 Total score0.99 Introduction Robust Covariance Matrix Robust PCA Application Conclusion

Introduction Robust Covariance Matrix Robust PCA Application Conclusion

Introduction Robust Covariance Matrix Robust PCA Application Conclusion

Introduction Robust Covariance Matrix Robust PCA Application Conclusion

Introduction Robust Covariance Matrix Robust PCA Application Conclusion

Two underlying factors are uncovered: Φ 1 explains 38% of inertia and Φ 2 explains 28% of inertia VariableCorr. (Φ 1,∙)Corr. (Φ 2,∙) Alumni Awards HiCi N&S PUB Total score Introduction Robust Covariance Matrix Robust PCA Application Conclusion

Classical PCA could be heavily distorted by the presence of outliers. A robustified version of PCA could be obtained either by relying on a robust covariance matrix or by removing multivariate outliers identified through a robust identification method. Introduction Robust Covariance Matrix Robust PCA Application Conclusion