Principal Component Analysis

Slides:



Advertisements
Similar presentations
Correlation and Linear Regression.
Advertisements

Principal component analysis
Chapter Nineteen Factor Analysis.
© LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
Lecture 7: Principal component analysis (PCA)
Factor Analysis Research Methods and Statistics. Learning Outcomes At the end of this lecture and with additional reading you will be able to Describe.
Factor Analysis Purpose of Factor Analysis
Factor Analysis There are two main types of factor analysis:
A quick introduction to the analysis of questionnaire data John Richardson.
1 Carrying out EFA - stages Ensure that data are suitable Decide on the model - PAF or PCA Decide how many factors are required to represent you data When.
Factor Analysis Factor analysis is a method of dimension reduction.
Principal component analysis
Goals of Factor Analysis (1) (1)to reduce the number of variables and (2) to detect structure in the relationships between variables, that is to classify.
Education 795 Class Notes Factor Analysis II Note set 7.
Correlation and Regression Analysis
Multivariate Methods EPSY 5245 Michael C. Rodriguez.
Factor Analysis Psy 524 Ainsworth.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Factor Analysis PowerPoint Prepared by Alfred.
Principal Components Principal components is a method of dimension reduction. Suppose that you have a dozen variables that are correlated. You might use.
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Factor Analysis © 2007 Prentice Hall. Chapter Outline 1) Overview 2) Basic Concept 3) Factor Analysis Model 4) Statistics Associated with Factor Analysis.
Statistical analysis Prepared and gathered by Alireza Yousefy(Ph.D)
Factor Analysis in Individual Differences Research: The Basics Psych 437.
Advanced Correlational Analyses D/RS 1013 Factor Analysis.
Applied Quantitative Analysis and Practices
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Factor Analysis Psy 524 Ainsworth. Assumptions Assumes reliable correlations Highly affected by missing data, outlying cases and truncated data Data screening.
Thursday AM  Presentation of yesterday’s results  Factor analysis  A conceptual introduction to: Structural equation models Structural equation models.
© 2007 Prentice Hall19-1 Chapter Nineteen Factor Analysis © 2007 Prentice Hall.
SW388R6 Data Analysis and Computers I Slide 1 Multiple Regression Key Points about Multiple Regression Sample Homework Problem Solving the Problem with.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
Principal Components Analysis. Principal Components Analysis (PCA) A multivariate technique with the central aim of reducing the dimensionality of a multivariate.
Lecture 12 Factor Analysis.
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
Copyright © 2010 Pearson Education, Inc Chapter Nineteen Factor Analysis.
Applied Quantitative Analysis and Practices
Exploratory Factor Analysis. Principal components analysis seeks linear combinations that best capture the variation in the original variables. Factor.
Education 795 Class Notes Factor Analysis Note set 6.
Factor Analysis I Principle Components Analysis. “Data Reduction” Purpose of factor analysis is to determine a minimum number of “factors” or components.
Advanced Statistics Factor Analysis, I. Introduction Factor analysis is a statistical technique about the relation between: (a)observed variables (X i.
Applied Quantitative Analysis and Practices LECTURE#19 By Dr. Osman Sadiq Paracha.
FACTOR ANALYSIS 1. What is Factor Analysis (FA)? Method of data reduction o take many variables and explain them with a few “factors” or “components”
Factor Analysis. Introduction 1. Factor Analysis is a set of techniques used for understanding variables by grouping them into “factors” consisting of.
SW388R7 Data Analysis & Computers II Slide 1 Principal component analysis Strategy for solving problems Sample problem Steps in principal component analysis.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
FACTOR ANALYSIS.  The basic objective of Factor Analysis is data reduction or structure detection.  The purpose of data reduction is to remove redundant.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
FACTOR ANALYSIS & SPSS. First, let’s check the reliability of the scale Go to Analyze, Scale and Reliability analysis.
1 FACTOR ANALYSIS Kazimieras Pukėnas. 2 Factor analysis is used to uncover the latent (not observed directly) structure (dimensions) of a set of variables.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Lecture 2 Survey Data Analysis Principal Component Analysis Factor Analysis Exemplified by SPSS Taylan Mavruk.
Stats Methods at IC Lecture 3: Regression.
FACTOR ANALYSIS & SPSS.
Exploratory Factor Analysis
EXPLORATORY FACTOR ANALYSIS (EFA)
Factor analysis Advanced Quantitative Research Methods
An introduction to exploratory factor analysis in IBM SPSS Statistics
© LOUIS COHEN, LAWRENCE MANION AND KEITH MORRISON
Descriptive Statistics vs. Factor Analysis
Measuring latent variables
Principal Component Analysis
Chapter_19 Factor Analysis
Factor Analysis.
Measuring latent variables
Presentation transcript:

Principal Component Analysis

Principal components analysis is similar to factor analysis in that it is a technique for examining the interrelationships among a set of variables. Both of these techniques differ from regression analysis in that we do not have a dependent variable to be explained by a set of independent variables. However, principal components analysis and factor analysis also differ from each other. In principal components analysis the major objective is to select a number of components that explain as much of the total variance as possible, that is, in PCA one is simply trying to statistically derive a relatively small number of variables. The values of the principal components for a given individual are relatively simple to compute and interpret. On the other hand, the factors obtained in factor analysis are selected mainly to explain the interrelationships among the original variables. Ideally, in exploratory factor analysis the number of factors expected is known in advance. The major emphasis is placed on obtaining easily understandable factors that convey the essential information contained in the original set of variables. Ordinary principal axis factor analysis should NOT be done if the number of Variables > the number of Participants.

When is principal components analysis used? Principal components analysis is performed in order to simplify the description of a set of interrelated variables. Each principal component is a linear combination of the original variables. One measure of the amount of information conveyed by each principal component is its variance. For this reason the principal components are arranged in order of decreasing variance. Thus the most informative principal component is the first, and the least informative is the last (a variable with zero variance does not distinguish between the members of the population).

Exploratory Factor Analysis vs. Principal Components Analysis Researcher assumes that there is a smaller set of unobserved constructs that underlie the measured variables Researcher is trying to derive statistically (using variances) a relatively small number of variables to use to convey as much of the information in the measured variables as possible Directed at understanding the relationships among variables by understanding underlying constructs Used to enable researcher to use fewer variables to obtain the same information as would be gathered with more variables Used when there is a theory about how the variables fit together Used when researcher is looking to use fewer variables to provide the same information

Assumptions Normality Independent sampling required Important only to the extent that skewness/outliers affect observed correlations OR if significance tests are performed (rare) Independent sampling required Variables should be linearly related to one another (in pairs) Many of the variables should be correlated at a moderate level (test with Bartlett’s test of sphericity)

Analysis of depression data set. Consider the data set depression.sav. We select for this example the 20 items that make up the cumulative employees depression (cesd) scale (sum of c1 through c20; 0 = lowest and 60 = highest possible level). Each item is a statement to which the response categories are ordinal. The answer “rarely or none of the time” (less than 1 day) is coded as 0, “some or a little of the time” (1–2 days) as 1, “occasionally or a moderate amount of the time” (3–4 days) as 2, and “most or all of the time” (5–7 days) as 3. Please note that these variables do not satisfy the assumptions often made in statistics of a multivariate normal distribution. In fact, they cannot even be considered to be continuous variables. However, they are typical of what is found in real-life applications.

Analyze – Data Reduction : Factor – Variables: variables 001 to 010; Descriptives : Statistics box – check Initial solution; Correlation Matrix box: check coefficients, KMO & Bartlett’s test of sphericity, Anti-image; Continue; Click Extraction : select Method – Principal Component, radio button correlation matrix, In extract box radio button Eigenvalues over 1, Display both unrotated factor solution and Screen plot; Continue; Rotation – Method : Varimax radio button, Display box check Rotated solution; Continue; Options – In the Coefficient Display Format box, select Sorted by size and Suppress absolute values less than .3; Continue; OK.

The Bartlett test of sphericity is a goodness of fit test The Bartlett test of sphericity is a goodness of fit test. Its significance shows that the principal component analysis is valid. The Kaiser-Meyer-Olkin measure of sampling adequacy is greater than 0.5. This implies that the principal component analysis for data reduction is effective. The measures of sampling adequacy are printed on the diagonal in the anti-image correlation matrix (see next slide). We can observe the most of the measures are well above the acceptable level of 0.5.

The table of Total Variance Explained displays the total variance explained in three stages. At the initial stage, it shows that the components and their associated eigenvalues, the percentage of variance explained and the cumulative percentages. In reference to eigenvalues, we would expect that five components to be extracted because they have eigenvalues greater than 1. If five factors were extracted, then 59% of the variance would be explained. The screen plot graphically displays the eigenvalues for each component.

It should be explained that the eigenvalues are estimated variances of the principal components and are therefore subject to large sample variations. Arbitrary cutoff points should thus not be taken too seriously. Once the number of principal components is selected, the investigator should examine the coefficients defining each of them in order to assign an interpretation to the components. A high coefficient of a principal component on a given variable is an indication of high correlation between that variable and the principal component. Principal components are interpreted in the context of the variables with high coefficients.

The component matrix is the matrix of loadings or correlations between the variables and components. Pure variables have loadings of 0.3 or greater on only one component. Complex variables may have high loadings on more than one component, and they make interpretation of the output difficult. Rotation may therefore be necessary. Varimax rotation, where the component axes are kept at right angles to each other, is the most frequently chosen. Ordinarily, rotation reduces the number of complex variables and improves interpretation. component 1 comprises of 7 items. Component 2 comprises of 6 items, component 3 comprises of 4 items, component 4 consists of 2 items and component 5 consists of 1 item. Some items have dual (or sometimes triple/multiple) loadings greater than 0.3 on more than one component. These items must be interpreted with caution, because simple structure in not apparent.

The depression data example illustrates a situation in which the results are not clear-cut. The conclusion may be reached from observing screen plot Figure, where we see that it is difficult to decide how many components to use. It is not possible to explain a very high proportion of the total variance with a small number of principal components. Also, the interpretation of the components is not straightforward. This is frequently the case in real-life situations.