Presentation is loading. Please wait.

Presentation is loading. Please wait.

A set of techniques for data reduction

Similar presentations


Presentation on theme: "A set of techniques for data reduction"— Presentation transcript:

1 A set of techniques for data reduction
Factor Analysis A set of techniques for data reduction Rakesh Pandey Professor of Psychology, B.H.U.

2 Factor Analysis FA can be conceived of as a method for examining interrelatedness of a set of variables in search of clusters or subsets of highly correlated variables. Visual glimpse of Factor Analysis

3 15 balls –different color

4

5

6 What is Factor Analysis (FA)?
Factor analysis is set of analytic techniques that permits the reduction of a large number of interrelated variables to a smaller number of latent or hidden dimensions (factors) that can explain the maximum variance in the original set of variables. Correlated variables are grouped together and separated from other variables with low or no correlation Grouping of variables in subset is done in such a way that variables within a subset are mutually highly correlated, whereas at the same time variables in different subsets are relatively uncorrelated. The latent variable underlying each subset or group of variables is referred as factor. FA accomplishes the said task by analysing the correlation matrix

7 Correlation Matrix Q1 Q2 Q3 Q4 Q5 Q6 1 .987 .801 .765 -.003 -.088
-.051 .044 .213 .968 -.190 -.111 0.102 .789 .864 Q1-3 palpitation, dry mouth, sweating Q4-6 worry, apprehension, nervousness

8 Example output and basic concepts
Test Factor h square s square rji eij I II III IV 1 .7 .0 .4 .3 .74 .12 .86 .14 2 .6 .2 .40 .10 .50 3 .8 .1 .5 .90 .05 .95 4 ? Eigen values

9 Terminology Communality. Amount of variance a variable shares with all the other variables. This is the proportion of variance explained by the common factors. Eigenvalue. Represents the total variance explained by each factor. Percentage of variance. The percentage of the total variance attributed to each factor. Factor loadings. Correlations between the variables and the factors. Factor matrix. A factor matrix contains the factor loadings of all the variables on all the factors Factor scores. Factor scores are composite scores estimated for each respondent on the derived factors.

10 Conducting Factor Analysis
Checking appropriateness of data matrix Composition of Data matrix All variables measured on same sample Remove outliers Handle missing data Sample Size adequacy Comrey (1973) suggested that n=100 is poor; 200 is fair; 300 is good; 500 is very good and 1000 is excellent. 5-10 subjects up to 300 respondents Independence of Measures (component-total, common items etc.) Construction of the Correlation Matrix & testing the Appropriateness of Correlation Matrix Extracting factors (choosing method – most common – PCA or FA) Determining Number of Factors Rotation of Factors Interpretation of Factors Validation of Factor Structure Suggested readings

11 Appropriateness of the Correlation Matrix
If visual inspection reveals no substantial number of correlations greater than .30, then factor analysis is probably inappropriate. No of zero or near zero correlations should be no more than 10 to 15% No variables correlated 1.0 with each other Remove one of each problematic pair, or use sum if appropriate. Significance of the Matrix: Bartlett’s (1950) test of sphericity should be signficant KMO-Measure of sampling adequacy (MSA). This index ranges from 0 to 1. KMO mediocre; good;.8-.9 great; .9 and above Marvelous Anti-image correlation matrix: the diagonal values should be greater than .50 like KMO Multicolinearity: Determinant should be greater than Very few residuals should be over .05

12 Methods of Factor Extraction
Two main approaches Differ in estimating communalities Principal components Simplest computationally Assumes all variance is common variance (implausible) but gives similar results to more sophisticated methods. SPSS default. Principal factor analysis Estimates communalities first

13 PCA & FA Principal components analysis
Analyses total variance A composite of the observed variables (component) as a summary of those variables Assumes no error in items Unity inserted on diagonal of matrix Precise mathematical solutions possible Factor (or common factors) analysis Analyses shared or common variance Explain relationship between observed variables in terms of latent variables or factors Assumes error in items SMC inserted in diagonal matrix Precise math not possible, solved by iteration SES Education Income IQ Reasoning memory

14 How many Factors? Initially unknown
Needs to be specified by the investigator on the basis of preliminary analysis No 100% foolproof statistical test for number of factors Several Methods Latent root method % Variance Scree plot Horn’s Parallel analysis Velicer’s MAP

15 Scree Plot Example

16 Parallel Analysis

17

18 Rotation The initial solution is “un-rotated” In un-rotated solution
Most items have large loadings on more than one factor Several items may have negative loadings Grouping of variables may not be obvious Rotation of factors helps to address the said problems The Purpose is to obtain simple structure & positive manifold

19 Simple structure .9 .8 .7

20 How rotation relates to “Simple Structure”
Factor Rotations -- changing the “viewing angle” of the factor space-- have been the major approach to providing simple structure structure is “simplified” if the factor vectors “spear” the variable clusters PC1’ Unrotated PC1 PC2 V V V V PC2 Rotated PC1 PC2 V V V V V2 V1 PC1 V3 V4 PC2’

21 Major Types of Rotation
Remember -- extracted factors are orthogonal (uncorrelated) Orthogonal Rotation -- resulting factors are uncorrelated more parsimonious & efficient, but less “natural” Oblique Rotation -- resulting factors are correlated more “natural” & better “spearing”, but more complicated Orthogonal Rotation Oblique Rotation PC1’ PC1’ PC2 PC2 Angle less than 90o Angle is 90o V2 V2 V1 V1 PC1 PC1 V3 V3 V4 V4 PC2’ PC2’

22 Major Types of Orthogonal Rotation & their “tendencies”
Varimax -- most commonly used and common default “simplifies factors” by maximizing variance of loadings of variables of a factor (minimized #vars with high loadings) Maximize column variance Quartimax “simplifies variables” by maximizing variance of loadings of a variable across factors (minimizes #factors a var loads on) Mximize row variance Equimax designed to “balance” varimax and quartimax tendencies didn’t work very well -- can’t do simultaneously - whichever is done first dominates the final structure

23 Major Types of Oblique Rotation & their “tendencies”
Promax computes best orthogonal solution and then “relaxes” orthogonality constraints to better “spear” variable clusters with factor vectors (give simpler structure) Direct Oblimin and others

24 Purpose or Application of FA
The main applications of factor analytic techniques are: to detect structure in the relationships between variables, that is to classify variables and identify latent construct underlying them. to reduce the number of variables and remove redundant, unclear and irrelevant variables test/scale construction & evaluation of psychometric quality of a measure Precursor to subsequent MV techniques Latent path modelling Dealing with multicolinearity Improving reliability of aggregate or summated scales

25 Thank You

26 Demo for various applications of FA
Use car_sales data of spss For reduction (vehicle type through fuel efficiency) For exploring structure (Select Long distance last month through Wireless last month and Multiple lines through Electronic billing )


Download ppt "A set of techniques for data reduction"

Similar presentations


Ads by Google