Presentation is loading. Please wait.

Presentation is loading. Please wait.

Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.

Similar presentations


Presentation on theme: "Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by."— Presentation transcript:

1 Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by specific indicators. This provides a valuable insight but it is not sufficient to capture existing underlying phenomena nor to unfold the problem in a comprehensive manner.

2 Factor Analysis Many statistical methods are used to study the relation between independent and dependent variables. There is a domain of statistics that deals with this type of data analysis which is called factor or multivariate analysis.

3 Factor Analysis The purpose of factor analysis is to discover simple patterns in the network of relationships among variables. In particular, it seeks to discover if the observed variables can be explained largely or entirely in terms of a much smaller number of variables called factors.

4 Factor Analysis The multivariate statistical analysis provides several techniques to analyze continuous and categorical variables, to capture the essence of their relationship and build new indicators (i.e. factors or principal components) conveying this relationship.

5 Principal Component Analysis The objectives of a PCA are: To discover or reduce the dimensionality of the data set; To identify new meaningful underlying variables.

6 Principal Component Analysis Assume to have two correlated variables on a scatter-plot. A regression line can be fitted to represent the linear relationship between the two variables.

7 Principal Component Analysis If we could define a variable that would approximate the regression line on the plot, then the new variable would capture most of the essence of the two variables. Individual single scores on the new factor can be used to represent the essence of the two variables. The new factor is thus a linear combination of the two variables. We have reduced the two variables to one factor. The Principal Components are calculated from the correlation matrix.

8 Principal Component Analysis Graphically, the first principal component lies along the line of greatest variation and it is as close to all of the data as possible (red line).

9 Principal Component Analysis The second PCA axis also must be completely uncorrelated i.e. at right angles, or "orthogonal" to PCA axis 1 (green line).orthogonal PC1 PC2

10 Principal Component Analysis In a typical PCA however, there are more than two variables i.e more than two dimensions. If the second principal component will be both perpendicular to the first, and along the line of second next greatest variation. The third principal component will be along the line of the following greatest variation and perpendicular to the first two principal components. The same applies to the N dimensions under analysis. With several variables the computation is more complicated but the basic principle to express two or more variables by a single factor remains the same.

11 Principal Component Analysis By multiplying the original data-set by the principal components, the data is rotated so that the components form the new perpendicular axes and the objects lying exactly on the axes have now only one coordinate, i.e. are captured by one variable only. The most common use for PCA is in fact to reduce the dimensionality of the data while retaining the most information.

12 Principal Component Analysis PCA helps to simultaneously envision all variables selected to describe a statistical unit (e.g. a district, a household, etc). The PCA process intends to identify the maximum data variability and project onto new orthogonal axes. PCA takes the cloud of data points and rotates it such that maximum variability is visible.

13 Principal Component Analysis New factors, e.g. principal components are created by rotating the data plotted on orthogonal axes. So doing, PCA helps to determine whether there is/are hidden factors/components along which the data vary. It computes a compact and optimal description of the data set.

14 Principal Component Analysis Before rotating the data cloud, the PCA standardize the data by subtracting the mean and dividing by the standard deviation. Thus the centroid of the whole data set is zero. By standardizing we give all variables the same variation, i.e. standard deviation of 1. When we use variables measured in different units we must do it.

15 Principal Component Analysis Basically a PCA transforms a set of more or less correlated variables into a set of uncorrelated variables which are ordered by reducing variability. The uncorrelated variables are linear combinations of the original variables and the last of these variables can be removed with minimum loss of real data.

16 Principal Component Analysis If we have more than three initial variables, how do we determine how many axes we worth interpreting. This is left to the analyst, however the eigenvalues calculated by the PCA give us a big hint.

17 Principal Component Analysis Every axis has an eigenvalue associated with it, that is the variance extracted by the factor. They are ranked from the highest to the lowest and their level is related to the amount of variation explained by the axis. The sum of the eigenvalues is the number of variables. Usually, the eigenvalues are expressed as a percentage of the total.

18 Principal Component Analysis Example: PCA Axis 1: eigenvalue 63% PCA Axis 2: eigenvalue 33% PCA Axis 3: eigenvalue 4% In this example the first PCA axis explains 63% (about 2/3) of the variation of the entire data set and the second axis almost all the remaining variation. Axis 3 explain a trivial amount and can be dropped.


Download ppt "Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by."

Similar presentations


Ads by Google