A Method for Estimating the Correlations Between Observed and IRT Latent Variables or Between Pairs of IRT Latent Variables Alan Nicewander Pacific Metrics Presented at a conference to honor Dr. Michael W. Browne of the Ohio State University, September 9-10, 2010
Using the factor analytic version of item response (IRT) models, – Estimates of the correlations between the latent variables measured by test items are derived. – Also, estimates of the correlations between the latent variables measured by test items and external, observed variables are derived
Brief Derivations of the Correlations The normal ogive model for multiple-choice, dichotomous items may be written as, where, θ is the latent proficiency variable, a i is the item slope parameter, b i is the item location parameter, c i is a guessing parameter, and φ(t) is the normal density function.
Another useful version of this model is the so- called factor analytic representation: Let Y i be a latent response variable that is a linear function of θ plus error, where λ i may be considered as a factor loading and ε i is an error variable. It is further assumed that Y i and θ are normally distributed with zero means and unit variances, and that ε i is uncorrelated with θ and Y i.
Let γ i be a response threshold, defined so that if Yi > γ i the item is gotten correct and then (1) may be rewritten as, A graphical representation of this equation is given on the following slide. Then if λ i and γ i are rescaled as, And (1) becomes,
A Graph Showing Y i, the Latent Response Variable, Mapped into (1, 0) Using the Response Threshold, γ i
Estimating correlations between the latent variables measured by dichotomous items Suppose one wants to determine the correlation between the latent variables, θ i and θ j, that underlie the observed item responses, u i and u j. Let Y i and Y j be the latent response variables for the two MC items. or,
The resulting equation, does not seem useful in that it involves two, latent correlations, and However, from the definition of the tetrachoric correlation it follows that, where is the guessing-corrected tetrachoric correlation coefficient.
Estimating the correlations between the latent variables for dichotomous items and external, observed variables It is fairly easy to extend the logic above in order to derive a means for computing correlations between observed variables and IRT latent variables. First, define Z k as an observed variable scaled to have zero mean and unit variance. Then the correlation between Z k and the latent response variable, Y i, assumed to underlying a MC item, u i is given by,
Repeating the previous equation, we once again have an equation with two, latent correlation coefficients; However, following from the definition of a biserial correlation, we may substitute and obtain a solution involving only observables, where ρ* bis (Z k,u i ) is the guessing-corrected biserial correlation between the observed variables, Z k and u i.
Extending the latent-variable correlations to polytomous test items. In order to simplify exposition, only polytomous items having three categories are modeled. Generalization of the methods described below to items with more than three categories is very straightforward.
Let x ij be the score for item i scored in category j (j = 1, 2, …, m). Under commonly-used scoring rules, a three-category item would be scored 0, 1, or 2. As was done above, for the case of MC items having binary scores, let Y* i be the latent response variable underlying the polytomous item x ij : where λ i is a factor loading.
Let γ i1 and γ i2 be two response thresholds, defined so that: λ i and these two thresholds may be rescaled into IRT slope and location parameters, viz.
Fitting the previous model to data may be done with the nomal ogive version of Samejima’s (1969) Graded Response model, or the more commonly used logistic approximation thereof. However the correlations we are seeking here do not depend on the locations, but only the estimates of the, the item slope parameters.
First, define the correlation between the two latent response variables that underlie two, polytomous items, x ij and x kj, using previous logic, Then, solve for ρ(θ i θ k ) and substitute for λ i and λ k
Computing the correlation between an external variable and the latent variable measured by a polytomous item. From earlier derivations, it is fairly obvious that the correlation between an external variable Z k and the latent variable measured by a polytomous item, x ij is, where, ρ poly_s (Z k, x ij ) is the polyserial correlation between the external variable and the score on the polytomous item.
Some Numerical Examples Computing the correlations between the latent variables measured by three polytomous items, each having three categories. Ten replications of 300 observations were simulated using the values of a, b 1 and b 2 given below in Table 1, and with true values of ρ(θ 1,θ 2 ) = ρ(θ 2,θ 3 ) =.6, and ρ(θ 1,θ 3 ) = 1.
θ1θ1 θ2θ2 θ3θ3 θ4θ4 θ5θ5 Z1Z1.4 Z2Z2.2 Z3Z Z4Z4.4 Z5Z5.2 Z6Z Summary of Regression Slopes of Five Latent Variables, θ i, on Six Observed Variables, Z k
The coefficients developed in this inquiry have a very simple form for two basic reasons: 1.They make strong assumptions about data and, 2.The exotic correlation coefficients on which they are based (tetrachoric-polychoric and biserial- polybiserial) do the “heavy lifting”, mathematically, because of the complex calculations that are entailed in their computation. 3.It is also the case that the standard errors of these coefficients are rather large, and they almost certainly will require large samples for accuracy.