Investigating item difficulty change by item positions under the Rasch model Luc Le & Van Nguyen 17th International meeting of the Psychometric Society, Hong Kong July, 2011
Research Rational IRT item parameter variation: item context, content, format, position, instruction, sample size Impact of different item positions in common item equating: California Achievement Test (CAT; Yen, 1980) Graduate Record Examination (GRE; Kingston & Dorans, 1982) NAEP reading (Zwick, 1991) ACT math and reading (Pommerich & Harris, 2003) PISA science (Le, 2009)
Study Questions How does item difficulty change when changing their position in a test? Effect of gender on the relationship? Effect of ability levels on the relationship?
Study Method Data: Graduate Skills Assessment (GSA) for Columbia in 2010 78 multiple-choice items in three domains of generic skills: Problem Solving (PS), Critical Thinking (CT), and Interpersonal Understandings (IP) 26 items in each domain (1 CT item was removed) 8 test forms in a rotation complete design Each item appears in 6 different positions 8000 Colombian university students (50% males and 50% females) randomly did each test form
Study Method Analysis design: Step 1: Randomly select 1000 candidates from each test form Step 2: Calibrate items in each domain based on a three-faceted Rasch model (test form adjustment) Step 3: Examine the difference of item difficulty estimates from each pair of the forms in relation to the position difference for each item Step 4: (Gender effect) Repeat steps 1-3 for males and females separately Step 5: (Ability level effect) Repeat steps 1-3 for lower and higher ability groups separately
Three-faceted Rasch model x = 0, 1; : difficulty parameter of item i : difficulty parameter of item i in form j : Difficulty of test form j : response (score) of the examinee to the item : Examinee ability
Results
Table 1. Test form difficulty PS CT IP Form Difficulty SE 1 0.008 0.010 0.041 0.005 2 -0.025 0.053 -0.011 3 -0.034 0.011 4 0.004 -0.046 0.017 5 0.014 0.023 -0.006 6 0.020 -0.014 7 0.027 -0.013 8 0.034 -0.084
Figure 1. Mean of item difficulty estimates by item position order – PS items 11 29 32 49 51 69
Figure 2. Mean of item difficulty estimates by item position order – CT items 11 29 32 48 51 70
Figure 3. Mean of item difficulty estimates by item position order – IP items 10 27 30 47 50 68
Table 2. Frequency of item position change PS CT IP Difference in positions Number of Pairs % 52 7.1 50 2 34 4.7 18 2.6 32 4.4 4 2.5 4.6 20 2.7 13 6 0.8 0.9 15 3 0.4 10 1.4 0.5 17 89 12.2 53 7.6 85 11.7 19 128 17.6 130 18.6 17.9 21 23 3.2 2.1 22 3.0 55 82 58 8.0 25 8 1.1 0.6 7 1.0 36 60 8.2 44 6.3 38 48 6.6 8.6 40 56 7.7 5.7 42 6.0 4.9 2.9 59 61 12 1.6 16 2.2 Total 728 100 700
Figure 4. Mean of item difficulty difference by item position change – PS items
Figure 5. Mean of item difficulty difference by item position change – CT items
Figure 6. Mean of item difficulty difference by item position change – IP items
Table 4. Substantial difference by 0.3 logits PS CT IP Pairs % Easier 3 0.4 1 0.1 5 0.7 Harder 173 23.8 86 12.3 109 15.0 None 552 75.8 613 87.6 614 84.3 Total 728 100.0 700
Figure 7. Correlation between difference of item difficulty estimates and item position change
Summary Items tended to become more difficult when being located at the latter end of the test The positive relationship between item difficulty difference and position change was different by item domains The relationship was higher for males than for females The relationship was different by lower and higher ability groups
Application Findings give cautions for test linking designs and common item equating processes In horizontal equating: Common items from different test forms should be located in similar test positions In vertical equating: Should consider both item positions and different ability levels. A simple solution: common items in the beginning of the test.
Further study Which kind of items (by item characteristics) are most vulnerable with the changing of item positions?
Thank you