Fun With Structural Equation Modelling in Psychological Research Jeremy Miles IBS, Derby University
Structural Equation Modelling Analysis of Moment Structures Covariance Structure Analysis Analysis of Linear Structural Relationships (LISREL) Covariance Structure Models Path Analysis
Normal Statistics Modelling process –What is the best model to describe a set of data –Mean, sd, median, correlation, factor structure, t-value DataModel
SEM Modelling process –Could this model have led to the data that I have? ModelData
Theory driven process –Theory is specified as a model Alternative theories can be tested –Specified as models Data Theory A Theory B
Ooohh, SEM Is Hard It was. Now its not Jöreskog and Sörbom developed LISREL –Matrices: x y –Variables: X Y –Intercepts:
The Joy of Path Diagrams Variable Causal Arrow Correlational Arrow
Doing Normal Statistics xy Correlation
Doing Normal Statistics xy T-Test
Doing Normal Statistics x1x1 y One way ANOVA (Dummy coding) x2x2 x3x3
Doing Normal Statistics x1x1 y Two- way ANOVA (Dummy coding) x2x2 x 1 * x 2
Doing Normal Statistics x y Regression x x
Doing Normal Statistics MANOVA x1x1 x2x2 y1y1 y2y2 y3y3
Doing Normal Statistics ANCOVA xy z
etc...
Identification Often thought of as being a very sticky issue Is a fairly sticky issue The extent to which we are able to estimate everything we want to estimate
X = 4 Unknown: x
x = 4 y = 7 Unknown: x, y
x + y= 4 x - y = 1 Unknown: x, y
x + y = 4 Unknown: x, y
Things We Know Things We Want to Know = x=4 x + y = 4, x - y = 2 Just identified Can never be wrong Normal statistics are just identified
Things We Know Things We Want to Know < x + y = 7 Not identified Can never be solved
Things We Know Things We Want to Know > x + y = 4, x - y = 2, 2x - y = 3 over-identified Can be wrong SEM models are over-identified
Identification We have information –(Correlations, means, variances) Normal statistics –Use all of the information to estimate the parameters of the model –Just identified All parameters estimated Model cannot be wrong
Over-identification SEM –Over-identified –The model can be wrong If a model is a theory –Enables the testing of theories
Parameter Identification x - 2 = y x + 2 = y Should be identified according to our previous rules –its not though There is model identification –there is not parameter identification
Sampling Variation and 2 Equations and numbers –Easy to determine if its correct Sample data may vary from the model –Even if the model is correct in the population Use the 2 test to measure difference between the data and the model –Some difference is OK –Too much difference is not OK
Simple Over-identification xy Estimate 1 parameter -just-identified xy Estimate 0 parameters -over-identified
Example 1 R ab = 0.3, N = 100 Estimate = 0.3, SE = 0.105, C.R. = The correlation is significantly different from 0 ab
Model Tests the hypothesis that the correlation in the population is equal to zero –It will never be zero, because of sampling variation –The 2 tells us if the variation is significantly different from zero ab
Example 2 Test the model Force the value to be zero –Input parameters = 1 –Parameters estimated = 0 The model is now over-identified and can therefore be wrong ab
The program gives a 2 statistic The significance of difference between the data and the model –Distributed with df = known parameters - input parameters 2 = 9.337, df = = 1, p = So what? A correlation of 0.3 is significant?
Hardly a Revelation No. We have tested a correlation for significance. Something which is much more easily done in other ways But –We have introduced a very flexible technique –Can be used in a range of other ways
Testing Other Than Zero Estimated parameters usually tested against zero –Reasonable? Model testing allows us to test against other values 2 = 2.3, n.s. Example 3 ab 0.15
Example 4: Comparing correlations 4 variables –mothers' sensitivity –mothers' parental bonding –fathers' sensitivity –fathers' parental bonding Does the correlation differ between mothers and fathers?
M S M PB F PB F S
Example 4a –analyse with all parameters free –0 df, model is correct Example 4b –fix FS-FPB and MS-MPB to be equal. –See if that model can account for the data
M S M PB F PB F S dave 2 = 1.82, df = 1 p = dave = 0.41 (s.e. 0.08)
Latent Variables The true power of SEM comes from latent variable modelling Variables in psychology are rarely (never?) measured directly –the effects of the variable are measured –Intelligence, self-esteem, depression –Reaction time, diagnostic skill
Measuring a Latent Variable Latent variables are drawn as ellipses –hypothesised causal relationship with measured variables Measured variable has two causes –latent variable –other stuff random error Latent Measured
x = t + e Reliability is: the square root of proportion of variance in x that is accounted the correlation between x and e Measured True Score Error
Identification and Latent Variables 1 measured variable –not (even close to) identified 4 measured variables –6 known, 4 estimated model is identified
Need four measured variables to identify the model Need to identify the variance of the latent variable –fix to 1
Why oh why oh why? Why bother with all these tricky latent variables? 2 reasons –unidimensional scale construction –attenuation correction
Unidimensionality Correlation matrix 2 = 3.65, df = 2, p =
Attenuation Correction Why bother? –Gets accurate measure of correlation between true scores Why bother –theories in psychology are ordinal –attenuation can only cause relationships to lower
The Multivariate Case Much more complex and unpredictable x1x1 y1y1 x2x2 y2y2 a c d e b
Some More Models Multiple Trait Multiple Method Models (MTMM) Temporal Stability Multiple Indicator Multiple Cause (MIMIC)
MTMM Multiple Trait –more than one measure Multiple Method –using more than one technique Variance in measured score comes from true score, random error variance, and systematic error variance, associated with the shared methods
What? Example 6 (From Wothke, 1996) –Three traits Getting along with others (G) Dedication (D) Apply learning (L) Three methods Peer nomination (PN) Peer Checklist (PC) Supervisor ratings (SC)
Matrix g.pn d.pn l.pn g.pc d.pc l.pc g.sc d.sc l.sc
Analysis g.pnl.pnd.pc pn g.pnl.pcd.pc pc g.scl.scd.sc sc gl d
Temporal Stability Usually –sum the items –correlate them BUT –items may not be unidimensional –relationship will be attenuated due to measurement error –relationship will be inflated, due to correlated error
L1L1 X 3.1 X 4.1 X 5.1 X 2.1 X 1.1 L2L2 X 3.2 X 4.2 X 5.2 X 2.2 X 1.2 Corrects for attenuation But - correlated errors may be a problem
Added correlated errors Example 7b L1L1 X 3.1 X 4.1 X 5.1 X 2.1 X 1.1 L2L2 X 3.2 X 4.2 X 5.2 X 2.2 X 1.2
MIMIC Model Conventional wisdom in psychological measurement is that a latent variable is the cause of the measured variables Assumption is made (implicitly) in many types of measurement –Bollen and Lennox (1989) –not necessarily the case
Value of a Car Causes –type, size, age, rustiness –no reason they should, or should not, be correlated Effects –assessment of value by people who know
Level of Depression Questionnaire items –causes or effects? been feeling unhappy and depressed? been having restless and disturbed nights? found everything getting 'on top' of you? MIMIC
Example 8: MIMIC L1L1 c1c1 c2c2 c3c3 y4y4 y1y1 LY 1 LY 2 y2y2 y3y3 y5y5 y6y6 y7y7 y8y8
Concluding remarks Given a taster –some may be too simple? Much more to say –no time to say it See further reading (Books and WWW)
Further Info SEMNET - list (messages) bama.ua.edu (leave) – the semnet FAQ
Books See web page
References See web page