Multivariate Data Exploration with Stata: Evaluation and Wish List Stephen Soldz Boston Graduate School of Psychoanalysis
Principal Components Analysis Purpose: Data exploration and data reduction Available in Stata Base ado ( pca ) Built-in ( factor, pcf ) score will produce component scores Issues/Limitations pca just a wrapper for (now undocumented) pc option to factor, which user cannot access and modify Confusing documentation on difference between pca and factor, pcf (i.e., scaling of eigenvectors) Does not directly allow pca of correlation/ covariance matrix – must use corr2data, introducing error Does not allow rotate to “protect” user – seems patronizing and uncharacteristic of Stata
Exploratory Factor Analysis Purpose: Data exploration and data reduction Available in Stata Built-in factor allows principal factors (with and without iteration of communalities), maximum likelihood Built-in rotate allows varimax (with and without Horst correction) and promax Issues/Limitations factor, pfi (prinipal factors with iteration) does not allow specification of number of times to iterate – this directly conflicts with Gorsuch (1983) recommendation that communalities be iterated only 3-4 times As factor built-in, users cannot modify or build on it rotate options very limited (only varimax and promax) and users cannot modify, though they could access eigenvectors ( matix_get ) and write their own
Exploratory Factor Analysis, Continued Available in Stata Built-in factor allows principal factors (with and without iteration of communalities), maximum likelihood Built-in rotate allows varimax (with and without Horst correction) and promax Issues/Limitations rotate not well documented, so not clear if one could, e.g., rotate canonical correlations as suggested by Cliff & Krus (1976).
Correspondence Analysis Purpose: Data exploration and reduction of categorical data Available in Stata User-written coranal (correspondence analysis) User-written mca (multiple correspondence analysis) Issues/Limitations Graphics broken in Stata 8 Statalist question as to whether mca is producing correct output Few variations implemented
Optimal Scaling Purpose: Data exploration, reduction, and transformation Available in Stata None (that I’m aware of) Issues/Limitations
Multidimensional Scaling Purpose: Data exploration Available in Stata None (that I’m aware of) Issues/Limitations
Conclusion Stata is weak in”multivariate exploratory data analysis” procedures. Many existing procedures are inflexible and not extensible, or user-contributed and not currently maintained. Stata lags behind SPSS, SAS, S-Plus, and R in this area.