EXPLORATORY FACTOR ANALYSIS (EFA)

Slides:

Advertisements

Similar presentations

Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 12, 2012.

Advertisements

Factor Analysis Continued

Chapter Nineteen Factor Analysis.

© LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON

Factor Analysis Ulf H. Olsson Professor of Statistics.

Principal Components An Introduction Exploratory factoring Meaning & application of “principal components” Basic steps in a PC analysis PC extraction process.

Common Factor Analysis “World View” of PC vs. CF Choosing between PC and CF PAF -- most common kind of CF Communality & Communality Estimation Common Factor.

Principal Components An Introduction exploratory factoring meaning & application of “principal components” Basic steps in a PC analysis PC extraction process.

Factor Analysis Research Methods and Statistics. Learning Outcomes At the end of this lecture and with additional reading you will be able to Describe.

Factor Analysis Purpose of Factor Analysis

Factor Analysis There are two main types of factor analysis:

When Measurement Models and Factor Models Conflict: Maximizing Internal Consistency James M. Graham, Ph.D. Western Washington University ABSTRACT: The.

Determining the # Of PCs Remembering the process Some cautionary comments Statistical approaches Mathematical approaches “Nontrivial factors” approaches.

A quick introduction to the analysis of questionnaire data John Richardson.

1 Carrying out EFA - stages Ensure that data are suitable Decide on the model - PAF or PCA Decide how many factors are required to represent you data When.

Principal component analysis

Dr. Michael R. Hyman Factor Analysis. 2 Grouping Variables into Constructs.

Education 795 Class Notes Factor Analysis II Note set 7.

Chapter 7 Correlational Research Gay, Mills, and Airasian

Multivariate Methods EPSY 5245 Michael C. Rodriguez.

Factor Analysis Psy 524 Ainsworth.

Principal Components An Introduction

Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Factor Analysis PowerPoint Prepared by Alfred.

What is Factor Analysis?

Exploratory Factor Analysis

CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.

Psy 427 Cal State Northridge Andrew Ainsworth PhD.

MGMT 6971 PSYCHOMETRICS © 2014, Michael Kalsher

Advanced Correlational Analyses D/RS 1013 Factor Analysis.

Applied Quantitative Analysis and Practices

6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)

Factor Analysis Psy 524 Ainsworth. Assumptions Assumes reliable correlations Highly affected by missing data, outlying cases and truncated data Data screening.

Factor Analysis ( 因素分析 ) Kaiping Grace Yao National Taiwan University

Measurement Models: Exploratory and Confirmatory Factor Analysis James G. Anderson, Ph.D. Purdue University.

Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression Rubab G. ARIM, MA University of British Columbia December 2006.

Explanatory Factor Analysis: Alpha and Omega Dominique Zephyr Applied Statistics Lab University of Kenctucky.

Lecture 12 Factor Analysis.

CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.

Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.

Applied Quantitative Analysis and Practices

Exploratory Factor Analysis. Principal components analysis seeks linear combinations that best capture the variation in the original variables. Factor.

Education 795 Class Notes Factor Analysis Note set 6.

Exploratory Factor Analysis Principal Component Analysis Chapter 17.

Chapter 13.  Both Principle components analysis (PCA) and Exploratory factor analysis (EFA) are used to understand the underlying patterns in the data.

Department of Cognitive Science Michael Kalsher Adv. Experimental Methods & Statistics PSYC 4310 / COGS 6310 Factor Analysis 1 PSYC 4310 Advanced Experimental.

Multivariate Data Analysis Chapter 3 – Factor Analysis.

Advanced Statistics Factor Analysis, I. Introduction Factor analysis is a statistical technique about the relation between: (a)observed variables (X i.

Applied Quantitative Analysis and Practices LECTURE#19 By Dr. Osman Sadiq Paracha.

Feature Extraction 主講人：虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.

FACTOR ANALYSIS 1. What is Factor Analysis (FA)? Method of data reduction o take many variables and explain them with a few “factors” or “components”

SW388R7 Data Analysis & Computers II Slide 1 Principal component analysis Strategy for solving problems Sample problem Steps in principal component analysis.

Principal Component Analysis

FACTOR ANALYSIS.  The basic objective of Factor Analysis is data reduction or structure detection.  The purpose of data reduction is to remove redundant.

Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.

FACTOR ANALYSIS & SPSS. First, let’s check the reliability of the scale Go to Analyze, Scale and Reliability analysis.

Lecture 2 Survey Data Analysis Principal Component Analysis Factor Analysis Exemplified by SPSS Taylan Mavruk.

A set of techniques for data reduction

FACTOR ANALYSIS & SPSS.

Exploratory Factor Analysis

EXPLORATORY FACTOR ANALYSIS (EFA)

Analysis of Survey Results

Factor analysis Advanced Quantitative Research Methods

CJT 765: Structural Equation Modeling

An introduction to exploratory factor analysis in IBM SPSS Statistics

© LOUIS COHEN, LAWRENCE MANION AND KEITH MORRISON

EPSY 5245 EPSY 5245 Michael C. Rodriguez

Principal Component Analysis

Chapter_19 Factor Analysis

Presentation transcript:

EXPLORATORY FACTOR ANALYSIS (EFA) Kalle Lyytinen & James Gaskin

Learning Objectives Understand what is the factor analysis technique and its applications in research Discuss exploratory factor analysis (EFA) Run EFA with SPSS and interpret the resulted output Estimate shortly reliability Assess shortly construct validity

The whole works Analyzing the factor structure of the multi-item data Theory Constructs Items linked to constructs EFA Collect data Build/Run Structural Model Modify the Measurement Model Link items to constructs; Label constructs Test structural hypotheses Conduct CFA Without CMB With CMB Conduct Multi-group CFA Goodness of fit & psychometric properties filter Data cleaning filter Structural Model Goodness of fit filter Contribute to theory

Family Tree of SEM Source: PIRE Multiple samples, multiple variables, over time, etc. Is the difference between samples on a variable significant? Is the correlation between different variables significant? Multiple variables, overall model, measurement model, etc. Source: PIRE

SCOPE of Factor Analysis today Factor analysis and principal component analysis Carrying out the analyses in SPSS Deciding on the number of factors Rotating factors Producing factor and component scores Assumptions and sample size Exploratory and confirmatory FA

Types of Measurement Models Exploratory (EFA) Confirmatory (CFA) Multitrait-Multimethod (MTMM) Hierarchical CFA

EFA vs. CFA Exploratory Factor Analysis is concerned with how many factors are necessary to explain the relations among a set of indicators and with estimation of factor loadings. It is associated with theory development. Confirmatory Factor Analysis is concerned with determining if the number of factors “conform” to what is expected on the basis of pre-established theory. Do items load as predicted on the expected number of factors. Hypothesize beforehand the number of factors.

End-User Computing Satisfaction (EUCS) EUCS: An instrument for measuring satisfaction with an information system CONTENT: Does the system provide the precise information you need? Does the information content meet your needs? Does the system provide reports that seem to be just about exactly what you need? Does the system provide sufficient information? ACCURACY: Is the system accurate? Are you satisfied with the accuracy of the system? FORMAT: Do you think the output is presented in a useful format? Is the information clear? EASE OF USE: Is the system user friendly? Is the system easy to use? TIMELINESS: Do you get the information you need in time? Does the system provide up-to-date information?

Factor Analysis Factor Analysis is a method for identifying a structure (or factors, or dimensions) that underlies the relations among a set of observed variables. Factor analysis is a technique that transforms the correlations among a set of observed variables into smaller number of underlying factors, which contain all the essential information about the linear interrelationships among the original test scores. Factor analysis is a statistical procedure that involves the relationship between observed variables (measurements) and the underlying latent factors.

Factor Analysis Factor analysis is a fundamental component of Structural Equation modeling. Factor analysis explores the inter-relationships among variables to discover if those variables can be grouped into a smaller set of underlying factors. Many variables are “reduced” (grouped) into a smaller number of factors These variables reflect the causal impact of the “latent” underlying factors Statistical technique for dealing with multiple variables

Applications of Factor Analysis Explore data for patterns. Often a researcher is unclear if items or variables have a discernible patterns. Factor Analysis can be done in an Exploratory fashion to reveal patterns among the inter-relationships of the items. Data Reduction. Factor analysis can be used to reduce a large number of variables into a smaller and more manageable number of factors. Factor analysis can create factor scores for each subject that represents these higher order variables. Factor Analysis can be used to reduce a large number of variables into a parsimonious set of few factors that account better for the underlying variance (causal impact) in the measured phenomenon. Confirm Hypothesis of Factor Structure. Factor Analysis can be used to test whether a set of items designed to measure a certain variable(s) do, in fact, reveal the hypothesized factor structure (i.e. whether the underlying latent factor truly “causes” the variance in the observed variables and how “certain” we can be about it). In measurement research when a researcher wishes to validate a scale with a given or hypothesized factor structure, Confirmatory Factor Analysis is used. Theory Testing. Factor Analysis can be used to test a priori hypotheses about the relations among a set of observed variables.

How would you group these Items?

Exploratory Factor Analysis In EFA, the researcher is attempting to explore the relationships among items to determine if the items can be grouped into a smaller number of underlying factors. In this analysis, all items are assumed to be related to all factors. V1 ε Factor 1 V2 ε V3 ε Factor 1 V4 ε

Factorial Solution Factor Loading Item Cross-Loading ?

Exploratory Factor Analysis Measured Variables or Indicators: These variables are those that the researcher has observed or measured. In this example, they are the four items on the scale. Note, they are drawn as rectangles or squares. V1 ε Factor 1 V2 ε V3 ε Factor 1 V4 ε

Exploratory Factor Analysis Unmeasured or Latent Variables: These variables are not directly measurable, rather the researcher only has indicators of these measures. These variables are more often the more interesting, but more difficult variables to measure (e.g., self-efficacy). In this example, the latent variables are the two factors. Note, they are drawn as elipses V1 ε Factor 1 V2 ε V3 ε Factor 1 V4 ε

Exploratory Factor Analysis Factor Loadings: Measure the relationship between the items and the factors. Factor loadings can be interpreted like correlation coefficients; ranging between -1.0 and +1.0. The closer the value is to 1.0, positive or negative, the stronger the relationship between the factor and the item. Loadings can be both positive or negative. V1 ε Factor 1 V2 ε V3 ε Factor 1 V4 ε

Exploratory Factor Analysis V1 ε Factor Loadings: Note the direction of the arrows; the factors are thought to influence the indicators, not vice versa. Each item is being predicted by the factors. Factor 1 V2 ε V3 ε Factor 1 V4 ε

Exploratory Factor Analysis Errors in Measurement: Each of the indicator variables has some error in measurement. The small circles with the ε indicate the error. The error is composed of 'we know not what' or are not measured directly. These errors in measurement are considered the reliability estimates for each indicator variable. V1 ε Factor 1 V2 ε V3 ε Factor 1 V4 ε

Multi-Indicator Approach A multiple-indicator approach reduces the overall effect of measurement error of any individual observed variable on the accuracy of the results A distinction is made between observed variables (indicators) and underlying latent variables or factors (constructs) Together the observed variables and the latent variables make up the measurement model

Conceptual Model This model holds that there are two uncorrelated factors that explain the relationships among the six emotion variables Awe Joy Happiness Guilt Fear Sadness Positive Affect Negative Affect Variables Factor (Observed) (Latent) 10

Measurement Model Items Positive Affect (Factor 1) Negative Affect Joy Loading* Awe Loading Happiness Fear Guilt Sadness *The loading is a data-driven parameter that estimates the relationships (correlation) between an observed item and a latent factor.

Assumptions of Factor Analysis Data Matrix must have sufficient number of correlations Variables must be inter-related in some way since factor analysis seeks the underlying common dimensions among the variables. If the variables are not related each variable will be its own factor!! Rule of thumb: substantial number of correlations greater than .30 Metric variables are assumed, although dummy variables may be used (coded 0,1). The factors or unobserved variables are assumed to be independent of one another. All variables in a factor analysis must consist of at least an ordinal scale. Nominal data are not appropriate for factor analysis.

Quick Quips about Factor Analysis How many cases? Rule of 10—10 cases for every item; rule of 100– number of respondents should be the larger of (1) 5 times number of variables or (2) 100. How many variables do I need to FA? More the better (at least 3) Is normality of data required? Nope Is it necessary to standardize one variables before FA? Nope Can you pool data from two samples together in a FA? Yep, but must show they have same factor structure.

Tests for Basic Assumptions Two statistics on the SPSS output allow you to look at some of the basic assumptions. Kaiser-Meyer-Olkin (KMO) Measure of Sampling Adequacy, and Bartlett's Test of Sphericity Kaiser-Meyer-Olkin Measure of Sampling Adequacy generally indicates whether or not the variables are able to be grouped into a smaller set of underlying factors. That is, will data factor well??? KMO varies from 0 to 1 and should be .60 or higher to proceed (can us .50 more lenient cut-off) High values (close to 1.0) generally indicate that a factor analysis may be useful with your data. If the value is less than .50, the results of the factor analysis probably won't be very useful.

Kaiser-Meyer-Olkin (KMO) Marvelous - - - - - - .90s Meritorious - - - - - .80s Middling - - - - - - - .70s Mediocre - - - - - - - .60s Miserable - - - - - - .50s Unacceptable - - - below .50

KMO Statistics: Interpreting the Output In this example, the data support the use of factor analysis and suggest that the data may be grouped into a smaller set of underlying factors. What does Bartlett’s Test of Sphericity explore?

Correlation Matrix Bartlett's Test of Sphericity Tests hypothesis that correlation matrix is an identity matrix. Diagonals are ones Off-diagonals are zeros Significant result indicates matrix is not an identity matrix.

Bartlett’s Test of Sphericity Bartlett’s Test of Sphericity compares your correlation matrix to an identity matrix’ An identity matrix is a correlation matrix with 1.0 on the principal diagonal and zeros in all other correlations. So clearly you want your Bartlett value to be significant as you are expecting relationships between your variables, if a factor analysis is going to be appropriate! Problem with Bartlett’s test occurs with large n’s as small correlations tend to be statistically significant – so test may not mean much!

Two Extraction Methods Principal Component Analysis Considers all of the available variance (common + unique) (places 1’s on diagonal of correlation matrix). Seeks a linear combination of variables such that maximum variance is extracted—repeats this step. Use when there is concern with prediction, parsimony and knows specific and error variance are small. Results in orthogonal (uncorrelated factors) Principal Axis Factoring (PFA) or Common Factor Analysis Considers only common variance (places communality estimates on diagonal of correlation matrix). Seeks least number of factors that can account for the common variance (correlation) of a set of variables. PAF is only analyzing common factor variability; removing the uniqueness or unexplained variability from the model. Called Principal Axis Factoring (PFA). PFA preferred in SEM cause it accounts for co-variation, whereas PCS accounts for total variance

Methods of Factor Extraction Principal-axis factoring (PAF) diagonals replaced by estimates of communalities iterative process continues until negligible changes in communalities

What is a Common Factor? It is an abstraction, a hypothetical construct that affects at least two of our measurement variables. We want to estimate the common factors that contribute to the variance in our variables. Is this an act of discovery or an act of invention?

What is a Unique Factor? It is a factor that contributes to the variance in only one variable. There is one unique factor for each variable. The unique factors are unrelated to one another and unrelated to the common factors. We want to exclude these unique factors from our solution.

Comparison of Extraction Models PCA vs. PAF Factor loadings and eigenvalues are a little larger with Principal Components One may always obtain a solution with Principal Components Often little practical difference FYI—Other less-used Extraction Methods (Image, alpha, ML ULS, GLS factoring)

Principal Components Extraction A communality (C) is the extent to which an item correlates with all other items. Thus, in PCA extraction method when the initial communalities are set to 1.0, then all of the variability of each item is accounted for in the analysis. Of course some of the variability is explained and some is unexplained. In PCA with these initial communalities set to 1.0, you are trying to find both the common factor variance and the unique or error variance.

Principal Components Extraction Statisticians have indicated that assuming that all of the variability of the items whether explained or unique can be accounted for in the analysis is flawed and definitely should not be used in an exploratory factor model. Some researchers suggest PAF as the appropriate method for factor extraction using EFA. In PAF extraction, the amount of variability each item shares with all other items is determined and this value is inserted into the correlation matrix replacing the 1.0 on the diagonals. As a result, PAF is only analyzing common factor variability; removing the uniqueness or unexplained variability from the model.

Factor Rotation: Orthogonal Varimax (most common) minimizes number of variables with high loadings (or low) on a factor—makes it possible to identify a variable with a factor Quartimax minimizes the number of factors needed to explain each variable. Tend to generate a general factor on which most variables load with med to high vales—not helpful for research Equimax combination of Varimax and Quartimax Q&A: Why use rotation method? Rotation causes factor loading to be more clearly differentiated—necessary to facilitate interpretation

Non-orthogonal (oblique) The real issue is you don’t have a basis for knowing how many factors there are or what they are much less whether they are correlated! Researchers assume variables are indicators of two or more factors, a measurement model which implies orthogonal rotation. Direct oblimin (DO) Factors are allowed to be correlated. Diminished interpretability Promax Computationally faster than DO Used for large datasets

Oblique Rotation The variables are assessed for the unique relationship between each factor and the variables (removing relationships that are shared by multiple factors) The matrix of unique relationships is called the pattern matrix. The pattern matrix is treated like the loading matrix in orthogonal rotation.

Decisions to be made EXTRACTION: ROTATION: PCA vs PAF Orthogonal or Oblique (non-orthogonal)

Procedures for Factor Analysis Multiple different statistical procedures exist by which the number of appropriate number of factors can be identified. These procedures are called "Extraction Methods." By default SPSS does PCA extraction This Principal Components Method is simpler and until more recently was considered the appropriate method for Exploratory Factor Analysis. Statisticians now advocate for a different extraction method due to a flaw in the approach that Principal Components utilizes for extraction.

What else? How many factors do you extract? One convention is to extract all factors with eigenvalues greater than 1 (e.g. PCA) Another is to extract all factors with non-negative eigenvalues Yet another is to look at the scree plot Number based on theory Try multiple numbers and see what gives best interpretation.

Eigenvalues greater than 1

Scree Plot Three Factor Solution

Criteria For Retention Of Factors Eigenvalue greater than 1 Single variable has variance equal to 1 Plot of total variance - Scree plot Gradual trailing off of variance accounted for is called the scree. Note cumulative % of variance of rotated factors

Interpretation of Rotated Matrix Loadings of .40 or higher Name each factor based on 3 or 4 variables with highest loadings. Do not expect perfect conceptual fit of all variables.

Loading size based on sample (from Hair et al 2010 Table 3-2) Significant Factor Loadings based on Sample Size Sample Size Sufficient Factor Loading 50 0.75 60 0.70 70 0.65 85 0.60 100 0.55 120 0.50 150 0.45 200 0.40 250 0.35 350 0.30

What else? How do you know when the factor structure is good? When it makes sense and has a (relatively) simple and clean structure. Total Variance Explained > .60 How do you interpret factors? Good question, that is where the true art of this comes in.

Why EFA?

Why EFA? ?

Reflective versus Formative Diet (Reflective) Health (Formative) R1. I eat healthy food. R2. I do not each much junk food. R3. I have a balanced diet. F1. I have a balanced diet F2. I exercise regularly F3. I get sufficient sleep each night e3 Diet Health R1 R2 R3 F1 F2 F3 e1 e2 e3 EDM 643

Diet (Reflective) Health (Formative) Direction of causality is from construct to measure Measures expected to be correlated Indicators are interchangeable Direction of causality is from measure to construct No reason to expect the measures are correlated Indicators are not interchangeable EDM 643 *From Jarvis et al 2003

Adequacy Residuals ≤ 5% KMO ≥ 0.8 is better Communalities ≥ 0.5 is better

Validity Face Validity (do they make sense?) Pattern Matrix Convergent (high loadings) Discriminant (no cross-loadings) Factor Correlations ≤.7 is better EDM 643

Reliability Split data and do two EFAs Cronbach’s Alpha (>.70) for each factor SPSS: Scale  Reliability Analysis EDM 643