Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.

Slides:



Advertisements
Similar presentations
Principal Component Analysis
Advertisements

Factor Analysis and Principal Components Removing Redundancies and Finding Hidden Variables.
Surface normals and principal component analysis (PCA)
Factor Analysis Continued
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Multivariate distributions. The Normal distribution.
Lecture 7: Principal component analysis (PCA)
An introduction to Principal Component Analysis (PCA)
Factor Analysis Purpose of Factor Analysis Maximum likelihood Factor Analysis Least-squares Factor rotation techniques R commands for factor analysis References.
Factor Analysis Research Methods and Statistics. Learning Outcomes At the end of this lecture and with additional reading you will be able to Describe.
Factor Analysis Purpose of Factor Analysis
Principal component analysis (PCA)
Data mining and statistical learning, lecture 4 Outline Regression on a large number of correlated inputs  A few comments about shrinkage methods, such.
Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.
Goals of Factor Analysis (1) (1)to reduce the number of variables and (2) to detect structure in the relationships between variables, that is to classify.
Principal component analysis (PCA) Purpose of PCA Covariance and correlation matrices PCA using eigenvalues PCA using singular value decompositions Selection.
Principal Component Analysis. Consider a collection of points.
Techniques for studying correlation and covariance structure
Correlation. The sample covariance matrix: where.
The Multivariate Normal Distribution, Part 2 BMTRY 726 1/14/2014.
Stats & Linear Models.
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Separate multivariate observations
Maximum Likelihood Estimation
Stats Multivariate Data Analysis Stats
Principle Component Analysis Presented by: Sabbir Ahmed Roll: FH-227.
Chapter 2 Dimensionality Reduction. Linear Methods
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Profile Analysis. Definition Let X 1, X 2, …, X p denote p jointly distributed variables under study Let  1,  2, …,  p denote the means of these variables.
The Multiple Correlation Coefficient. has (p +1)-variate Normal distribution with mean vector and Covariance matrix We are interested if the variable.
Marginal and Conditional distributions. Theorem: (Marginal distributions for the Multivariate Normal distribution) have p-variate Normal distribution.
Functions of Random Variables. Methods for determining the distribution of functions of Random Variables 1.Distribution function method 2.Moment generating.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
Chapter 9 Factor Analysis
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.
Chapter 7 Multivariate techniques with text Parallel embedded system design lab 이청용.
Lecture 12 Factor Analysis.
1 Factor Analysis and Inference for Structured Covariance Matrices Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/
Principle Component Analysis and its use in MA clustering Lecture 12.
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
Principal Component Analysis (PCA)
Multivariate Data Analysis Chapter 3 – Factor Analysis.
STA347 - week 91 Random Vectors and Matrices A random vector is a vector whose elements are random variables. The collective behavior of a p x 1 random.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Principal Component Analysis
Feature Extraction 主講人:虞台文.
Geology 6600/7600 Signal Analysis 04 Sep 2014 © A.R. Lowry 2015 Last time: Signal Analysis is a set of tools used to extract information from sequences.
Principal Components Analysis ( PCA)
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
Motto: Faster, Higher, Stronger Motto : Let me win, but if I can not win, let me be brave in the attempt Special Olympics.
Stat240: Principal Component Analysis (PCA). Open/closed book examination data >scores=as.matrix(read.table(" hs.leeds.ac.uk/~charles/mva-
Dynamic graphics, Principal Component Analysis Ker-Chau Li UCLA department of Statistics.
EXPLORATORY FACTOR ANALYSIS (EFA)
Factor Analysis An Alternative technique for studying correlation and covariance structure.
Dimension Reduction via PCA (Principal Component Analysis)
Dynamic graphics, Principal Component Analysis
Techniques for studying correlation and covariance structure
Matrix Algebra and Random Vectors
Ch11 Curve Fitting II.
Factor Analysis An Alternative technique for studying correlation and covariance structure.
Principal Components What matters most?.
Digital Image Processing Lecture 21: Principal Components for Description Prof. Charlene Tsai *Chapter 11.4 of Gonzalez.
Factor Analysis (Principal Components) Output
Principal Component Analysis
Chapter-1 Multivariate Normal Distributions
Outline Variance Matrix of Stochastic Variables and Orthogonal Transforms Principle Component Analysis Generalized Eigenvalue Decomposition.
Presentation transcript:

Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis

Principal Component Analysis

Then where are eigenvectors of  of length 1 and are eigenvalues of  Lethave a p-variate Normal distribution with mean vector

or and have covariance matrix The Principal Components are defined by are independent with Var(C j ) = j

Many times for large value of j, Var(C j ) = j, is small and contributes little to the total variance In this case the number of variables can be reduced to the small number of principal components. In regression analysis it is sometimes useful to transform the independent variables into their principal components

Scree Plot Proportion of variance Principal Components

Scree Plot Cumulative Proportion of variance Principal Components

Example In this example wildlife (moose) population density was measured over time (once a year) in three areas.

picture Area 3 Area 1 Area 2

The Sample Statistics The mean vector The covariance matrix The correlation matrix

Principal component Analysis The eigenvalues of S The eigenvectors of S The principal components

The Example

Scree Plots

More Examples

Computation of the eigenvalues and eigenvectors of  Recall:

continuing we see that: For large values of n

The algorithm for computing the eigenvector 1.Compute rescaling so that the elements do not become to large in value. i.e. rescale so that the largest element is 1. 2.Computeusing the fact that: 3.Compute 1 using

4.Repeat using the matrix 5.Continue with i = 2, …, p – 1 using the matrix Example – Using Excel - EigenEigen

Factor Analysis An Alternative technique for studying correlation and covariance structure

Let The Factor Analysis Model: have a p-variate Normal distribution with mean vector Let F 1, F 2, …, F k denote independent standard normal observations (the Factors) Suppose that there exists constants ij (the loadings) such that: Let  1,  2, …,  p denote independent normal random variables with mean 0 and var(  i ) =  p x 1 = 11 F F 2 + … +  k F k +   x 2 = 21 F F 2 + … +  k F k +   … x p = p1 F 1 + p2 F 2 + … + pk F k +  p

Using matrix notation where and with

Note: hence i.e. the component of variance of x i that is due to the common factors F 1, F 2, …, F k. and i.e. the component of variance of x i that is specific only to that observation.

Determine cov(x i,F j ) Recall

Also where Thus Now, if also then ij is the correlation between x i and F j.

Rotating Factors Recall the factor Analysis model This gives rise to the vectorhaving covariance matrix: Let P be any orthogonal matrix, then and

Hence if is a Factor Analysis model then so also is with where P is any orthogonal matrix. with

The process of exploring other models through orthogonal transformations of the factors is called rotating the factors There are many techniques for rotating the factors VARIMAX Quartimax Equimax VARIMAX rotation attempts to have each individual variables load high on a subset of the factors

Example: Olympic decathlon Scores Data was collected for n = 160 starts (139 athletes) for the ten decathlon events (100-m run, Long Jump, Shot Put, High Jump, 400-m run, 110-m hurdles, Discus, Pole Vault, Javelin, 1500-m run). The sample correlation matrix is given on the next slide

Correlation Matrix

Identification of the factors