Exploring the Full-Information Bifactor Model in Vertical Scaling With Construct Shift Ying Li and Robert W. Lissitz
contents QuestionsMethodsResults
1.Unidimensionality of tests at each grade level 2.Test construct invariance across grade ╳ bifactor model Vertical scaling
1.Computational simplicity for estimation 2.Ease of interpretation 3.Vertical scaling problems across grades Questions
Purpose to propose and evaluate a bifactor model for IRT vertical scaling that can incorporate construct shift across grades while extracting a common scale, to evaluate the robustness of the UIRT model in parameter recovery, and to compare parameter estimates from the bifactor and UIRT models
Method Bifactor Model Gibbons and Hedeker (1992) generalized the work of Holzinger and Swineford (1937) to derive a bifactor model for dichotomously scored item response data. 0 : general factor or ability i0 : discrimination parameter for general factor s :group special factor or ability is : discrimination parameter for group-special factor d i :overall multidimensional item difficulty
simulation Common item design Bifactor model for generating data Concurrent calibration –Stable results
Sample size Number or percentage of common items –Test length:60 Variance of grade-specific factor: degree of construct shift 100 replication
Data generation Item discrimination parameters were set deliberately and repeatedly at 1.2, 1.4, 1.6, 1.8, 2.0, and 2.2 for the general dimension, and fixed at 1.7 for grade-specific
Identifications of Bifactor Model Estimation For the general dimension, the variance of the general latent dimension was fixed to 1, and the discrimination parameters (loadings) were freely estimated in the study For the grade-specific dimensions (s = 1, 2,..., k), the discrimination parameters (s = 1, 2,..., k) (loadings) were fixed to the true parameter value 1.7, so that the variances of the grade-specific dimensions could be freely estimated the common items answered by multiple groups were restricted so that they would have unique item parameters in the multiple-group concurrent calibration
Model Estimation Multigroup concurrent calibration was implemented The computer program IRTPRO (Cai, Thissen, & du Toit, 2011), using marginal maximumlikelihood estimation (MML) with an EM algorithm, was used to estimate the models
Evaluation criteria
RMSE = root mean square error; SE = standard error; SS = sample size; CI = common item; VR = variance of grade-specific factor. Results
Person parameter person parameter estimates of the general dimension were better recovered than that of the grade-specific dimensions when the degree of construct shift was small or moderate sample size the estimation accuracy
Group parameter overestimated
UIRT discrimination: overestimated Difficulty: well recovered construct shift person & group mean
ANOVA Effects for the Simulated Factors Three-way tests of between-subject effects (ANOVA) –bias, RMSE, and SE Sample sizeDegree of Construct shift Percentage of common item Bifactor modelsmall~ moderate No~small Small Bias in d UIRTsmall~ moderate Large : & Group Mean ability Large bias in d & d & SE Group Mean ability grade-specific variance parameter Large: SE
comparison UIRT: overestimate discrimation parameter person and group mean parameter: Less accurate
Real data 2006 fall Michigan mathematics assessments Grade 3, 4, 5 Randomly 4000 examinees Bifactor vs UIRT
Variance estimation
R=0.983
Discussion sample size the estimation accuracy & stability Variance of grade-specific dimension stability Be caution about construct shift Polytomounsly/mixed item format Incorporate covariates longitudinal studies
Common item measure two group-specific ability ? the item discriminate parameter fixed to the true value Multidimensional IRT?