Factor Analysis (v. PCA)

Slides:



Advertisements
Similar presentations
Chapter Nineteen Factor Analysis.
Advertisements

© LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON
Lecture 7: Principal component analysis (PCA)
Psychology 202b Advanced Psychological Statistics, II April 7, 2011.
Common Factor Analysis “World View” of PC vs. CF Choosing between PC and CF PAF -- most common kind of CF Communality & Communality Estimation Common Factor.
Factor Analysis Research Methods and Statistics. Learning Outcomes At the end of this lecture and with additional reading you will be able to Describe.
Factor Analysis There are two main types of factor analysis:
Lecture 7: Factor Analysis Laura McAvinue School of Psychology Trinity College Dublin.
1 Practicals, Methodology & Statistics II Laura McAvinue School of Psychology Trinity College Dublin.
1 Carrying out EFA - stages Ensure that data are suitable Decide on the model - PAF or PCA Decide how many factors are required to represent you data When.
Education 795 Class Notes Factor Analysis II Note set 7.
Multivariate Methods EPSY 5245 Michael C. Rodriguez.
Factor Analysis Psy 524 Ainsworth.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Factor Analysis PowerPoint Prepared by Alfred.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 10a, April 7, 2015 Factor Analysis (v. PCA), Fischer Linear Discriminant.
Psy 427 Cal State Northridge Andrew Ainsworth PhD.
MGMT 6971 PSYCHOMETRICS © 2014, Michael Kalsher
Advanced Correlational Analyses D/RS 1013 Factor Analysis.
Applied Quantitative Analysis and Practices
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
Factor Analysis Psy 524 Ainsworth. Assumptions Assumes reliable correlations Highly affected by missing data, outlying cases and truncated data Data screening.
Measurement Models: Exploratory and Confirmatory Factor Analysis James G. Anderson, Ph.D. Purdue University.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression Rubab G. ARIM, MA University of British Columbia December 2006.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
Lecture 12 Factor Analysis.
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
Applied Quantitative Analysis and Practices
Exploratory Factor Analysis. Principal components analysis seeks linear combinations that best capture the variation in the original variables. Factor.
Education 795 Class Notes Factor Analysis Note set 6.
Exploratory Factor Analysis Principal Component Analysis Chapter 17.
Chapter 13.  Both Principle components analysis (PCA) and Exploratory factor analysis (EFA) are used to understand the underlying patterns in the data.
Department of Cognitive Science Michael Kalsher Adv. Experimental Methods & Statistics PSYC 4310 / COGS 6310 Factor Analysis 1 PSYC 4310 Advanced Experimental.
Advanced Statistics Factor Analysis, I. Introduction Factor analysis is a statistical technique about the relation between: (a)observed variables (X i.
Applied Quantitative Analysis and Practices LECTURE#19 By Dr. Osman Sadiq Paracha.
FACTOR ANALYSIS 1. What is Factor Analysis (FA)? Method of data reduction o take many variables and explain them with a few “factors” or “components”
Principal Component Analysis
FACTOR ANALYSIS.  The basic objective of Factor Analysis is data reduction or structure detection.  The purpose of data reduction is to remove redundant.
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
FACTOR ANALYSIS & SPSS. First, let’s check the reliability of the scale Go to Analyze, Scale and Reliability analysis.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Lecture 2 Survey Data Analysis Principal Component Analysis Factor Analysis Exemplified by SPSS Taylan Mavruk.
A set of techniques for data reduction
FACTOR ANALYSIS & SPSS.
Exploratory Factor Analysis
Customer Research and Segmentation
EXPLORATORY FACTOR ANALYSIS (EFA)
Analysis of Survey Results
Evaluation of measuring tools: validity
Factor analysis Advanced Quantitative Research Methods
Showcasing the use of Factor Analysis in data reduction: Research on learner support for In-service teachers Richard Ouma University of York SPSS Users.
Principal Component Analysis (PCA)
Reliability and Validity of Measurement
An introduction to exploratory factor analysis in IBM SPSS Statistics
Advanced Data Preparation
Analysing data from a questionnaire:
© LOUIS COHEN, LAWRENCE MANION AND KEITH MORRISON
Descriptive Statistics vs. Factor Analysis
EPSY 5245 EPSY 5245 Michael C. Rodriguez
Principal Component Analysis
Product moment correlation
An Introduction to Correlational Research
Chapter_19 Factor Analysis
What are their purposes? What kinds?
MGS 3100 Business Analysis Regression Feb 18, 2016
Unsupervised Learning
Presentation transcript:

Factor Analysis (v. PCA) Peter Fox and Greg Hughes Data Analytics – ITWS-4600/ITWS-6600 Group 3 Module 9, March 6, 2017

Factor Analysis Exploratory factor analysis (EFA) is a common technique in qualitative sciences for explaining the (shared) variance among several measured variables as a smaller set of latent (hidden/not observed) variables. EFA is often used to consolidate survey data by revealing the groupings (factors) that underlie individual questions. A large number of observable variables can be aggregated into a model to represent an underlying concept, making it easier to understand the data.

Examples E.g. business confidence, morale, happiness and conservatism - variables which cannot be measured directly. E.g. Quality of life. Variables from which to “infer” quality of life might include wealth, employment, environment, physical and mental health, education, recreation and leisure time, and social belonging. Others? Tests, questionnaires, visual imagery… http://en.wikipedia.org/wiki/Latent_variable

Relation among factors “correlated” (oblique) or “orthogonal” factors? E.g. wealth, employment, environment, physical and mental health, education, recreation and leisure time, and social belonging Relations?

Factor Analysis

PCA and FA? CFA analyzes only the reliable common variance of data, while PCA analyzes all the variance of data. An underlying hypothetical process or construct is involved in CFA but not in PCA. PCA tends to increase factor loadings especially in a study with a small number of variables and/or low estimated communality. Thus, PCA is not appropriate for examining the structure of data.

FA vs. PCA conceptually FA produces factors; PCA produces components Factors cause variables; components are aggregates of the variables

Conceptual FA and PCA

PCA and FA? If the study purpose is to explain correlations among variables and to examine the structure of the data, FA provides a more accurate result. If the purpose of a study is to summarize data with a smaller number of variables, PCA is the choice. PCA can also be used as an initial step in FA because it provides information regarding the maximum number and nature of factors. Scree plots (Friday) More on this later…

The Relationship between Variables Multiple Regression Describes the relationship between several variables, expressing one variable as a function of several others, enabling us to predict this variable on the basis of the combination of the other variables Factor Analysis Also a tool used to investigate the relationship between several variables Investigates whether the pattern of correlations between a number of variables can be explained by any underlying dimensions, known as ‘factors’ Laura McAvinue School of Psychology Trinity College Dublin From: Laura McAvinue School of Psychology Trinity College Dublin

Uses of Factor Analysis Test / questionnaire construction For example, you wish to design an anxiety questionnaire… Create 50 items, which you think measure anxiety Give your questionnaire to a large sample of people Calculate correlations between the 50 items & run a factor analysis on the correlation matrix If all 50 items are indeed measuring anxiety… All correlations will be high One underlying factor, ‘anxiety’ Verification of test / questionnaire structure Hospital Anxiety & Depression Scale Expect two factors, ‘anxiety’ & ‘depression’ Laura McAvinue School of Psychology Trinity College Dublin

How does it work? Correlation Matrix Analyses the pattern of correlations between variables in the correlation matrix Which variables tend to correlate highly together? If variables are highly correlated, likely that they represent the same underlying dimension Factor analysis pinpoints the clusters of high correlations between variables and for each cluster, it will assign a factor Laura McAvinue School of Psychology Trinity College Dublin

Correlation Matrix Q1 Q2 Q3 Q4 Q5 Q6 1 .987 .801 .765 -.003 -.088 -.051 .044 .213 .968 -.190 -.111 0.102 .789 .864 Laura McAvinue School of Psychology Trinity College Dublin Q1-3 correlate strongly with each other and hardly at all with 4-6 Q4-6 correlate strongly with each other and hardly at all with 1-3 Two factors!

Factor Analysis Two main things you want to know… How many factors underlie the correlations between the variables? What do these factors represent? Which variables belong to which factors? Laura McAvinue School of Psychology Trinity College Dublin

Steps of Factor Analysis 1. Suitability of the Dataset 2. Choosing the method of extraction 3. Choosing the number of factors to extract 4. Interpreting the factor solution Laura McAvinue School of Psychology Trinity College Dublin

1. Suitability of Dataset Selection of Variables Sample Characteristics Statistical Considerations Laura McAvinue School of Psychology Trinity College Dublin

Selection of Variables Are the variables meaningful? Factor analysis can be run on any dataset ‘Garbage in, garbage out’ (Cooper, 2002) Psychometrics The field of measurement of psychological constructs Good measurement is crucial in Psychology Indicator approach Measurement is often indirect Can’t measure ‘depression’ directly, infer on the basis of an indicator, such as questionnaire Based on some theoretical / conceptual framework, what are these variables measuring? Laura McAvinue School of Psychology Trinity College Dublin

Selection of Variables, Example Laura McAvinue School of Psychology Trinity College Dublin

How would you group these faces? University of Alberta

Sample Characteristics Size At least 100 participants Participant : Variable Ratio Estimates vary Minimum of 5 : 1, ideal of 10 : 1 Characteristics Representative of the population of interest? Contains different subgroups? Laura McAvinue School of Psychology Trinity College Dublin

Statistical Considerations Assumptions of factor analysis regarding data Continuous Normally distributed Linear relationships These properties affect the correlations between variables Independence of variables Variables should not be calculated from each other e.g. Item 4 = Item 1 + 2 + 3 Laura McAvinue School of Psychology Trinity College Dublin

Statistical Considerations Are there enough significant correlations (> .3) between the variables to merit factor analysis? Bartlett Test of Sphericity Tests Ho that all correlations between variables = 0 If p < .05, reject Ho and conclude there are significant correlations between variables so factor analysis is possible Laura McAvinue School of Psychology Trinity College Dublin

Statistical Considerations Are there enough significant correlations (> .3) between the variables to merit factor analysis? Kaiser-Meyer-Olkin Measure of Sampling Adequacy Quantifies the degree of inter-correlations among variables Value from 0 – 1, 1 meaning that each variable is perfectly predicted by the others Closer to 1 the better If KMO > .6, conclude there is a sufficient number of correlations in the matrix to merit factor analysis Laura McAvinue School of Psychology Trinity College Dublin

Statistical Considerations, Example All variables Continuous Normally Distributed Linear relationships Independent Enough correlations? Bartlett Test of Sphericity (χ2; df ; p < .05) KMO Laura McAvinue School of Psychology Trinity College Dublin

2. Choosing the method of extraction Two methods Factor Analysis Principal Components Analysis Differ in how they analyse the variance in the correlation matrix Laura McAvinue School of Psychology Trinity College Dublin

Variable Specific Variance Error Variance Common Variance Variance unique to the variable itself Variance due to measurement error or some random, unknown source Variance that a variable shares with other variables in a matrix Laura McAvinue School of Psychology Trinity College Dublin When searching for the factors underlying the relationships between a set of variables, we are interested in detecting and explaining the common variance

Principal Components Analysis Ignores the distinction between the different sources of variance Analyses total variance in the correlation matrix, assuming the components derived can explain all variance Result: Any component extracted will include a certain amount of error & specific variance Factor Analysis Separates specific & error variance from common variance Attempts to estimate common variance and identify the factors underlying this V Which to choose? Different opinions Theoretically, factor analysis is more sophisticated but statistical calculations are more complicated, often leading to impossible results Often, both techniques yield similar solutions Laura McAvinue School of Psychology Trinity College Dublin

3. Choosing the number of factors to extract Statistical Modelling You can create many solutions using different numbers of factors An important decision Aim is to determine the smallest number of factors that adequately explain the variance in the matrix Too few factors Second-order factors Too many factors Factors that explain little variance & may be meaningless Laura McAvinue School of Psychology Trinity College Dublin

Criteria for determining Extraction Theory / past experience Latent Root Criterion Scree Test Percentage of Variance Explained by the factors Laura McAvinue School of Psychology Trinity College Dublin

Latent Root Criterion (Kaiser-Guttman) Eigenvalues Expression of the amount of variance in the matrix that is explained by the factor Factors with eigenvalues > 1 are extracted Limitations Sensitive to the number of variables in the matrix More variables… eigenvalues inflated… overestimation of number of underlying factors Laura McAvinue School of Psychology Trinity College Dublin

Scree Test (Cattell, 1966) Scree Plot Based on the relative values of the eigenvalues Plot the eigenvalues of the factors Cut-off point The last component before the slope of the line becomes flat (before the scree) Laura McAvinue School of Psychology Trinity College Dublin

Take the components above the elbow Elbow in the graph Laura McAvinue School of Psychology Trinity College Dublin Take the components above the elbow

Percentage of Variance Percentage of variance explained by the factors Convention Components should explain at least 60% of the variance in the matrix (Hair et al., 1995) Laura McAvinue School of Psychology Trinity College Dublin

3. Choosing the number of factors to extract Three components with eigenvalues > 1 Explained 67.26% of the variance Laura McAvinue School of Psychology Trinity College Dublin

BFI data in psych (R)

4. Interpreting the Factor Solution Factor Matrix Shows the loadings of each of the variables on the factors that you extracted Loadings are the correlations between the variables and the factors Loadings allow you to interpret the factors Sign indicates whether the variable has a positive or negative correlation with the factor Size of loading indicates whether a variable makes a significant contribution to a factor ≥ .3 Laura McAvinue School of Psychology Trinity College Dublin

Component 1 – Visual imagery tests Variables Component 1 Component 2 Component 3 Vividness Qu -.198 -.805 .061 Control Qu .173 .751 .306 Preference Qu .353 .577 -.549 Generate Test -.444 .251 .543 Inspect Test -.773 .051 -.051 Maintain .734 -.003 .384 Transform (P&P) Test .759 -.155 .188 Transform (Comp) Test -.792 .179 .304 Visual STM Test .792 -.102 .215 Laura McAvinue School of Psychology Trinity College Dublin Component 1 – Visual imagery tests Component 2 – Visual imagery questionnaires Component 3 – ?

Factor Matrix Interpret the factors Communality of the variables Percentage of variance in each variable that can be explained by the factors Eigenvalues of the factors Helps us work out the percentage of variance in the correlation matrix that the factor explains Laura McAvinue School of Psychology Trinity College Dublin

Variables Component 1 Component 2 Component 3 Communality Vividness Qu -.198 -.805 .061 69% Control Qu .173 .751 .306 Preference Qu .353 .577 -.549 76% Generate Test -.444 .251 .543 55% Inspect Test -.773 .051 -.051 60% Maintain .734 -.003 .384 Transform (P&P) Test .759 -.155 .188 64% Transform (Comp) Test -.792 .179 .304 75% Visual STM Test .792 -.102 .215 Eigenvalues 3.36 1.677 1.018 / % Variance 37.3% 18.6% 11.3% Laura McAvinue School of Psychology Trinity College Dublin Communality of Variable 1 (Vividness Qu) = (-.198)2 + (-.805)2 + (.061)2 = . 69 or 69% Eigenvalue of Comp 1 = ( [-.198]2 + [.173]2 + [.353]2 + [-.444]2 + [-.773]2 +[.734]2 + [.759]2 + [-.792]2 + [.792]2 ) = 3.36 3.36 / 9 = 37.3%

Factor Matrix Unrotated Solution Rotated Solution Initial solution Can be difficult to interpret Factor axes are arbitrarily aligned with the variables Rotated Solution Easier to interpret Simple structure Maximises the number of high and low loadings on each factor Laura McAvinue School of Psychology Trinity College Dublin

Factor Analysis through Geometry It is possible to represent correlation matrices geometrically Variables Represented by straight lines of equal length All start from the same point High correlation between variables, lines positioned close together Low correlation between variables, lines positioned further apart Correlation = Cosine of the angle between the lines Laura McAvinue School of Psychology Trinity College Dublin

V1 & V3 90º angle Cosine = 0 No relationship V1 & V2 30º angle 60º V3 Laura McAvinue School of Psychology Trinity College Dublin V2 & V3 60º angle Cosine = .5 R = .5 The smaller the angle, the bigger the cosine and the bigger the correlation

Fits a factor to each cluster of variables Factor Analysis Fits a factor to each cluster of variables Passes a factor line through the groups of variables V4 V5 F2 V6 Laura McAvinue School of Psychology Trinity College Dublin Factor Loading Cosine of the angle between each factor and the variable

Two Methods of fitting Factors V1 V5 V4 V2 V3 V6 F1 F2 V1 V5 V4 V2 V3 V6 F1 F2 Laura McAvinue School of Psychology Trinity College Dublin Orthogonal Solution Factors are at right angles Uncorrelated Oblique Solution Factors are not at right angles Correlated

Factors are rotated to fit the clusters of variables better Two Step Process V1 V5 V4 V2 V3 V6 F1 F2 V1 V5 V4 V2 V3 V6 F1 F2 Laura McAvinue School of Psychology Trinity College Dublin Factors are rotated to fit the clusters of variables better Factors are fit arbitrarily

Solution following Orthogonal Rotation For example… Solution following Orthogonal Rotation Unrotated Solution Variables C1 C2 C3 Vividness Qu -.198 -.805 .061 Control Qu .173 .751 .306 Preference Qu .353 .577 -.549 Generate Test -.444 .251 .543 Inspect Test -.773 .051 -.051 Maintain Test .734 -.003 .384 Transform (P&P) Test .759 -.155 .188 Transform (Comp) Test -.792 .179 .304 Visual STM Test .792 -.102 .215 Variables C1 C2 C3 Vividness Qu -.029 -.831 .008 Control Qu .174 .744 .323 Preference Qu -.010 .679 -.547 Generate Test -.197 .112 .709 Inspect Test -.717 -.103 .279 Maintain Test .819 .116 .043 Transform (P&P) Test .779 -.013 -.166 Transform (Comp) Test -.599 -.01 .626 VisualSTM Test .813 .045 -.147 Laura McAvinue School of Psychology Trinity College Dublin

Factor Rotation Changes the position of the factors so that the solution is easier to interpret Achieves simple structure Factor matrix where variables have either high or low loadings on factors rather than lots of moderate loadings Laura McAvinue School of Psychology Trinity College Dublin

Evaluating your Factor Solution Is the solution interpretable? Should you re-run and extract a bigger or smaller number of factors? What percentage of variance is explained by the factors? >60%? Are all variables represented by the factors? If the communality of one variable is very low, suggests it is not related to the other variables, should re-run and exclude Laura McAvinue School of Psychology Trinity College Dublin

For example… First Solution Second Solution Component 3 = ? Variables C1 C2 C3 Vividness Qu -.029 -.831 .008 Control Qu .174 .744 .323 Preference Qu -.010 .679 -.547 Generate Test -.197 .112 .709 Inspect Test -.717 -.103 .279 Maintain Test .819 .116 .043 Transform (P&P) Test .779 -.013 -.166 Transform (Comp) Test -.599 -.01 .626 VisualSTM Test .813 .045 -.147 Variables Component 1 Component 2 Vividness Qu .013 -.829 Control Qu -.023 .770 Preference Qu .195 .648 Generate Test -.493 .130 Inspect Test -.760 -.146 Maintain Test .711 .183 Transform (P&P) Test .773 .042 Transform (Comp) Test -.811 -.028 Visual STM Test .792 .103 Laura McAvinue School of Psychology Trinity College Dublin C1 = Efficiency of objective visual imagery C2 = Self-reported imagery efficacy Component 3 = ?

University of Alberta

University of Alberta

University of Alberta

University of Alberta

Cumulative percent of variance explained. University of Alberta We are looking for an eigenvalue above 1.0.

University of Alberta

University of Alberta

University of Alberta

Expensive Exciting Luxury Distinctive Not Conservative Not Family Not Basic Appeals to Others Attractive Looking Trend Setting Reliable Latest Features Trust University of Alberta

What shall these components be called? Expensive Exciting Luxury Distinctive Not Conservative Not Family Not Basic Appeals to Others Attractive Looking Trend Setting Reliable Latest Features Trust University of Alberta

EXCLUSIVE TRENDY RELIABLE Expensive Exciting Luxury Distinctive Not Conservative Not Family Not Basic Appeals to Others Attractive Looking Trend Setting Reliable Latest Features Trust EXCLUSIVE TRENDY RELIABLE University of Alberta

Calculate Component Scores EXCLUSIVE = (Expensive + Exciting + Luxury + Distinctive – Conservative – Family – Basic)/7 TRENDY = (Appeals to Others + Attractive Looking + Trend Setting)/3 University of Alberta RELIABLE = (Reliable + Latest Features + Trust)/3

University of Alberta

Not much differing on this dimension. University of Alberta Not much differing on this dimension.

University of Alberta

University of Alberta

References Some… Cooper, C. (1998). Individual differences. London: Arnold. Kline, P. (1994). An easy guide to factor analysis. London: Routledge. Laura McAvinue School of Psychology Trinity College Dublin

Assignments Assignment 6: Term project. Due at the end of week ~13. Assignment 7: After the break