From PCA to Confirmatory FA (from using Stata to using Mx and other SEM software) References: Chapter 8 of Hamilton Chapter 10 of Lattin et al Data sets:

Slides:



Advertisements
Similar presentations
AP Statistics Course Review.
Advertisements

Multilevel analysis with EQS. Castello2004 Data is datamlevel.xls, datamlevel.sav, datamlevel.ess.
1 Regression as Moment Structure. 2 Regression Equation Y =  X + v Observable Variables Y z = X Moment matrix  YY  YX  =  YX  XX Moment structure.
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Canonical Correlation
Using MX for SEM analysis. Using Lisrel Analysis of Reader Reliability in Essay Scoring Votaw's Data Tau-Equivalent Model DA NI=4 NO=126 LA ORIGPRT1 WRITCOPY.
Multi-sample Equality of two covariance matrices.
Factor Analysis and Principal Components Removing Redundancies and Finding Hidden Variables.
Confirmatory Factor Analysis
Chapter 3 – Data Exploration and Dimension Reduction © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Factor Analysis Continued
Hypothesis Testing Steps in Hypothesis Testing:
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
Lecture 7: Principal component analysis (PCA)
Classical Regression III
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Simple Linear Regression Analysis
Goals of Factor Analysis (1) (1)to reduce the number of variables and (2) to detect structure in the relationships between variables, that is to classify.
Interpreting Bi-variate OLS Regression
LECTURE 16 STRUCTURAL EQUATION MODELING.
Lorelei Howard and Nick Wright MfD 2008
Correlation. The sample covariance matrix: where.
Relationships Among Variables
Review Guess the correlation. A.-2.0 B.-0.9 C.-0.1 D.0.1 E.0.9.
Lecture 5 Correlation and Regression
Segmentation Analysis
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Factor Analysis Psy 524 Ainsworth.
Example of Simple and Multiple Regression
The Chi-Square Distribution 1. The student will be able to  Perform a Goodness of Fit hypothesis test  Perform a Test of Independence hypothesis test.
PCA Example Air pollution in 41 cities in the USA.
Chapter 15 Correlation and Regression
APPLYING TO AMERICAN COLLEGES/UNIVERSITIES. AGENDA OVERVIEW OF STATISTICS TYPES OF U.S. COLLEGES WHAT DO COLLEGES LOOK FOR ADMISSIONS CRITERIA THE COMMON.
1 Dimension Reduction Examples: 1. DNA MICROARRAYS: Khan et al (2001): 4 types of small round blue cell tumors (SRBCT) Neuroblastoma (NB) Rhabdomyosarcoma.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Chapter 9 Factor Analysis
Factor Analysis ( 因素分析 ) Kaiping Grace Yao National Taiwan University
Measurement Models: Exploratory and Confirmatory Factor Analysis James G. Anderson, Ph.D. Purdue University.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Environmental Modeling Basic Testing Methods - Statistics III.
Explanatory Factor Analysis: Alpha and Omega Dominique Zephyr Applied Statistics Lab University of Kenctucky.
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
Statistics for Political Science Levin and Fox Chapter Seven
Advanced Statistics Factor Analysis, I. Introduction Factor analysis is a statistical technique about the relation between: (a)observed variables (X i.
FACTOR ANALYSIS 1. What is Factor Analysis (FA)? Method of data reduction o take many variables and explain them with a few “factors” or “components”
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
FACTOR ANALYSIS.  The basic objective of Factor Analysis is data reduction or structure detection.  The purpose of data reduction is to remove redundant.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
Stat240: Principal Component Analysis (PCA). Open/closed book examination data >scores=as.matrix(read.table(" hs.leeds.ac.uk/~charles/mva-
Basic statistical concepts Variance Covariance Correlation and covariance Standardisation.
INTRODUCTION TO MULTIPLE REGRESSION MULTIPLE REGRESSION MODEL 11.2 MULTIPLE COEFFICIENT OF DETERMINATION 11.3 MODEL ASSUMPTIONS 11.4 TEST OF SIGNIFICANCE.
Lecture 2 Survey Data Analysis Principal Component Analysis Factor Analysis Exemplified by SPSS Taylan Mavruk.
LangTest: An easy-to-use stats calculator Punjaporn P.
Mini-Revision Since week 5 we have learned about hypothesis testing:
Chapter 20 Linear and Multiple Regression
Turn Emotional Connections Into Assets that Never Stop
CHAPTER 7 Linear Correlation & Regression Methods
EXPLORATORY FACTOR ANALYSIS (EFA)
Correlation and Simple Linear Regression
Exploring Microarray data
Regression 11/6.
Regression 10/29.
 FACTOR ANALYSIS.
Factor analysis Advanced Quantitative Research Methods
Modelos para datos longitudinales
Structural Equation Modeling
Basic Practice of Statistics - 3rd Edition Inference for Regression
Presentation transcript:

From PCA to Confirmatory FA (from using Stata to using Mx and other SEM software) References: Chapter 8 of Hamilton Chapter 10 of Lattin et al Data sets: College.txt, Govern.sav, Adoption.txt

Class 1 Principal Components Exploratory Factor Model Confirmatory Factor Model

Principal Components Basic principles and the use of the method, with an example Chapter 8 of Hamilton, pp

data=read.table("G:/Albert/COURSES/RMMSS/Schools1.txt", header=T) names(data) [1] "School" "SchoolT" "SAT" "Accept" "CostSt" "Top10" "PhD" [8] "Grad" attach(data) pairs(data[,3:8]) lCost=log(CostSt) cdata=cbind(data[,3:4], lCost, data[,6:8]) pairs(cdata)

Principal Components Analysis (PCA) Y j = a j1 PC 1 + a j2 PC 2 + E j, j = 1, 2,... P the Y j are manifest variables E j = a j3 PC a jp PC p the PC are called principal components Let R j 2 the R2 of the (linear) regression of Y j on PC 1 and PC 2 In PCA, the a’s are choosen so to maximize sum j R j 2

plot(lCost, PhD) identify(lCost, PhD) [1] 40 data[40,1] [1] JohnsHopkins

> round(cor(cdata),3) SAT Accept lCost Top10 PhD Grad SAT Accept lCost Top PhD Grad > plot(lCost, PhD) > identify(lCost, PhD) [1] 40 > data[40,1] [1] JohnsHopkins 50 Levels: Amherst Barnard Bates Berkeley Bowdoin Brown BrynMawr... Yale > round(cor(cdata[-40,]),3) SAT Accept lCost Top10 PhD Grad SAT Accept lCost Top PhD Grad >

use "G:\Albert\COURSES\RMMSS\school1.dta", clear. edit - preserve. summarize sat accept costst top10 phd grad Variable | Obs Mean Std. Dev. Min Max sat | accept | costst | top10 | phd | grad |

. gen lcost = log(costst). pca sat accept lcost top10 phd grad, factors(2) (obs=50) (principal components; 2 components retained) Component Eigenvalue Difference Proportion Cumulative Eigenvectors Variable | sat | accept | lcost | top10 | phd | grad | greigen. score f1 f2 (based on unrotated principal components) Scoring Coefficients Variable | sat | accept | lcost | top10 | phd | grad | summarize f1 f2 Variable | Obs Mean Std. Dev. Min Max f1 | e f2 | e Normalized pc

. graph f2 f1, s([_n])

. cor sat accept lcost top10 phd grad f1 f2 (obs=50) | sat accept lcost top10 phd grad f1 f sat | accept | lcost | top10 | phd | grad | f1 | f2 |

library(mva) help('factanal') help('princomp') pca=princomp(cdata,cor=T, scores=T) biplot(pca) > summary(pca) Importance of components: Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Standard deviation Proportion of Variance Cumulative Proportion round(cov(pca$scores[,1:2]),3) Comp.1 Comp.2 Comp Comp

> data[,1] [1] Amherst Swarthmore Williams Bowdoin Wellesley [6] Pomona Wesleyan Middlebury Smith Davidson [11] Vassar Carleton ClarMcKenna Oberlin WashingtonLee [16] Grinnell MountHolyoke Colby Hamilton Bates [21] Haverford Colgate BrynMawr Occidental Barnard [26] Harvard Stanford Yale Princeton CalTech [31] MIT Duke Dartmouth Cornell Columbia [36] UofChicago Brown UPenn Berkeley JohnsHopkins [41] Rice UCLA UVa. Georgetown UNC [46] UMichican CarnegieMellon Northwestern WashingtonU UofRochester

DD=dist(pca$scores[,1:2], method ="euclidean", diag=FALSE) clust=hclust(DD, method="complete", members=NULL) plot(clust, labels=data[,1], cex=.8, col="blue", main="clustering of education")

(Exploratory) Factor Analysis Y j = a j1 F 1 + a j2 F 2 + E j, j = 1, 2,... P E j =.... uncorrelated across j !! The a’s are choosen by principal factor method, ML,... There is no unique solution (model is non-identified). Rotation methods to maximize interpretation (e.g., Varimax). Chapter 8 of Hamilton, pp

. factor sat accept lcost top10 phd grad, factors(3) ipf (obs=50) (iterated principal factors; 3 factors retained) Factor Eigenvalue Difference Proportion Cumulative Factor Loadings Variable | Uniqueness sat | accept | lcost | top10 | phd | grad | Exploratory Factor Analysis

> factanal(cdata, factors=2) Call: fac=factanal(cdata, factors=2, scores="regression") Uniquenesses: SAT Accept lCost Top10 PhD Grad Loadings: Factor1 Factor2 SAT Accept lCost Top PhD Grad Factor1 Factor2 SS loadings Proportion Var Cumulative Var Test of the hypothesis that 2 factors are sufficient. The chi square statistic is on 4 degrees of freedom. The p-value is > Exploratory Factor Analysis > summary(fac) Length Class Mode converged 1 -none- logical loadings 12 loadings numeric uniquenesses 6 -none- numeric correlation 36 -none- numeric criteria 3 -none- numeric factors 1 -none- numeric dof 1 -none- numeric method 1 -none- character scores 100 -none- numeric STATISTIC 1 -none- numeric PVAL 1 -none- numeric n.obs 1 -none- numeric call 4 -none- call >

> plot(fac$scores, type="n") > text(fac$scores[,1], fac$scores[,2], 1:50, cex=.8) >

(Confirmatory) Factor Analysis Y j = a j1 F 1 + a j2 F 2 + E j, j = 1, 2,... P E j =.... uncorrelated across j !! Some of the a’s are free, other restricted a priori (to 0s, 1s, or by equality among them), estimation method is ML, GLS,... There is uniqueness in the solution (an identified model).

Lattin and Roberts data of adoption new technologies p. 366 of Lattin et al. See the data file adoption.txt in RMMRS

Analysis of Adoption data

data=read.table("E:/Albert/COURSES/RMMSS/Mx/ADOPTION.txt", header=T) names(data) [1] "ADOPt1" "ADOPt2" "VALUE1" "VALUE2" "VALUE3" "USAGE1" "USAGE2" "USAGE3" attach(data) round(cov(data, use="complete.obs"),2) ADOPt1 ADOPt2 VALUE1 VALUE2 VALUE3 USAGE1 USAGE2 USAGE3 ADOPt ADOPt VALUE VALUE VALUE USAGE USAGE USAGE dim(data) [1] 188 8

Data Nimput=8 Nobservations=188 CMatrix Labels ADOPt1 ADOPt2 VALUE1 VALUE2 VALUE3 USAGE1 USAGE2 USAGE3 Adoption.dat

> data=read.table("E:/Albert/COURSES/RMMSS/Mx/ADOPTION.txt", header=T) > names(data) [1] "ADOPt1" "ADOPt2" "VALUE1" "VALUE2" "VALUE3" "USAGE1" "USAGE2" "USAGE3" attach(data) factanal(cbind(VALUE1, VALUE2,VALUE3,USAGE1, USAGE2,USAGE3), factors=2, rotation="varimax") Call: factanal(x = cbind(VALUE1, VALUE2, VALUE3, USAGE1, USAGE2, USAGE3), factors = 2) Uniquenesses: VALUE1 VALUE2 VALUE3 USAGE1 USAGE2 USAGE Loadings: Factor1 Factor2 VALUE VALUE VALUE USAGE USAGE USAGE Factor1 Factor2 SS loadings Proportion Var Cumulative Var Exploratory Factor Analysis, ML method Test of the hypothesis that 2 factors are sufficient. The chi square statistic is 1.82 on 4 degrees of freedom. The p-value is 0.768

Data Nimput=8 Nobservations=188 CMatrix Labels ADOPt1 ADOPt2 VALUE1 VALUE2 VALUE3 USAGE1 USAGE2 USAGE3 Adoption.dat

One factor model for Value

Two factor model

Factor Analysis Charles Spearman, 1904 Acording to the two-factor theory of intelligence, the performance of any intellectual act requires some combination of "g", which is available to the same individual to the same degree for all intellectual acts, and of "specific factors" or "s" which are specific to that act and which varies in strength from one act to another. If one knows how a person performs on one task that is highly saturated with "g", one can safely predict a similar level of performance for a another highly "g" saturated task. Prediction of performance on tasks with high "s" factors are less accurate. Nevertheless, since "g" pervades all tasks, prediction will be significantly better than chance. Thus, the most important information to have about a person's intellectual ability is an estimate of their "g".

Spearman, 1904 Variables CLASSIC = V1 FRENCH = V2 ENGLISH = V3 MATH = V4 DISCRIM = V5 MUSIC = V6 Correlation matrix cases = 23;

Single-Factor Model V1 V4V3V2 F1 ** * * * ** * V6V5 ** * *

EQS code for a factor model

NT analysis RESIDUAL COVARIANCE MATRIX (S-SIGMA) : CLASSIC FRENCH ENGLISH MATH DISCRIM V 1 V 2 V 3 V 4 V 5 CLASSIC V FRENCH V ENGLISH V MATH V DISCRIM V MUSIC V MUSIC V 6 MUSIC V CHI-SQUARE = BASED ON 9 DEGREES OF FREEDOM PROBABILITY VALUE FOR THE CHI-SQUARE STATISTIC IS THE NORMAL THEORY RLS CHI-SQUARE FOR THIS ML SOLUTION IS

Loadings’ estimates, s.e. and z-test statistics CLASSIC =V1 =.960*F E FRENCH =V2 =.866*F E ENGLISH =V3 =.807*F E MATH =V4 =.736*F E DISCRIM =V5 =.688*F E MUSIC =V6 =.653*F E

Estimates of unique-factors E1 -CLASSIC.078*I.064 I I I E2 -FRENCH.251*I.093 I I I E3 -ENGLISH.349*I.118 I I I E4 - MATH.459*I.148 I I I E5 -DISCRIM.527*I.167 I I I E6 -MUSIC.574*I.180 I I I

STANDARDIZED SOLUTION: CLASSIC =V1 =.960*F E1 FRENCH =V2 =.866*F E2 ENGLISH =V3 =.807*F E3 MATH =V4 =.736*F E4 DISCRIM =V5 =.688*F E5 MUSIC =V6 =.653*F E6

Data of Lawley and Maxwell /TITLE Lawley and Maxwell data /SPECIFICATIONS CAS=220; VAR=6; ME=ML; /LABEL v1 =Gaelic; v2 = English; v3 = Histo; v4 =aritm; v5 =Algebra; v6 =Geometry; /EQUATIONS V1= *F1 + E1; V2= *F1 + E2; V3= *F1 + E3; V4= *F1 + E4; V5= *F1 + E5; V6= *F1 + E6; /VARIANCES F1 = 1; E1 TO E6 = *; /COVARIANCES /MATRIX /END /EQUATIONS V1= *F1 + E1; V2= *F1 + E2; V3= *F1 + E3; V4= *F2 + E4; V5= *F2 + E5; V6= *F2 + E6; /VARIANCES F1 = 1; F2=1; E1 TO E6 = *; /COVARIANCES F1, F2 = *; GAELIC =V1 =.687*F E ENGLISH =V2 =.672*F E HISTO =V3 =.533*F E ARITM =V4 =.766*F E ALGEBRA =V5 =.768*F E GEOMETRY=V6 =.616*F E COVARIANCES AMONG INDEPENDENT VARIABLES I F2 - F2.597*I I F1 - F1.072 I M0: M1: M0, Single factor model CHI-SQUARE = , 9 df P-value LESS THAN M1, Two factor model with correlated factors: CHI-SQUARE = 7.953, 8 df P-value =