Mohamed Dirir, Norma Sinclair, and Erin Strauts

Effects of Item Pool Characteristics on Ability Estimation in Computerized Adaptive test in K-12
Mohamed Dirir, Norma Sinclair, and Erin Strauts Paper presented at Symposium Operationalizing Multi-State, K-12 Computer Adaptive Testing Programs: Technical Challenges and Solutions NCSA June 24, 2015, San Diego, CA

Purpose Examine how difficulty distribution in item pools affect CAT administration Examine effects of item pool difficulty on accuracy and precision of ability estimation Inform the development of adequate item pools for CAT Utilized results from 2014 SBAC field test as a guide in choosing study design and pool characteristics

Importance of the Study
CAT has arrived in K-12 large scale assessment Over 18 states have administered CAT to millions of students, most of whom have taken CAT for the first time These are high-stake assessments that need stringent quality control in administration, validity, and reliability Research in CAT in K-12 large scale has been spotty

Some Questions Addressed by the Study
How does an item bank with a large number of difficult items and few easy items impact the accuracy and precision of ability estimation? How does an item bank with a large number of difficult items and few easy items impact the exposure of items in a Computer Adaptive Test? How does an item bank containing items calibrated with varying sample sizes (small, medium, large) impact the accuracy and precision of ability estimation?

Design Condition Distribution Condition No. of responses
Distribution of difficulty level All with a U ~(0.6,1.8) n=700 Condition Distribution Uniformly distributed U~(-3,3) Moderately difficult 5% easy U~ (-3,-1) 47% moderate U~ (-1,1) 48% hard U~ (1,3) Mostly difficult 35% moderate U~ (-1,1) 60% hard U~ (1,3) Extremely difficult 15% moderate U~ (-1,1) 80% hard U~ (1,3) Distribution of # of responses Condition No. of responses Uniform 1500 Unbalanced Counts 56% receive 1800 20% receive 1500 17% receive 1000 7% receive 500 Equal No. of items 25% receive 1800 25% receive 1500 25% receive 1000 25% receive 500

Simulation Process From the sets of generated items (700 per pool) tests were built using Linear-on-the-Fly (LOFT) process First test lengths of 25, 35, and 50 were constructed, but the 35-item test results are presented in this paper. Examinees ability were generated from N~(0,1) Items were drawn randomly from the pools Each pool was calibrated with a selected sample, then theta and item estimates were saved For each pool and sample combination, this process was replicated 100 times

Extremely Difficult Unbalanced Equal # of Items Uniform
Condition Simulated Pool Difficulty Parameter Distribution Calibration Sample Size 1 Extremely Difficult Unbalanced 2 Equal # of Items 3 Uniform 4 Mostly Difficult 5 6 7 Moderately Difficult 8 9 10 Uniformly Difficult 11 12

Number of Items by Condition and Difficulty Range
Range of True Difficulty Condition -3 to -2 -2 to -1 -1 to 0 0 to 1 1 to 2 2 to 3 Grand Total 1 14 16 50 55 279 236 650 2 17 275 225 635 3 13 15 56 48 281 207 621 4 18 121 124 204 183 663 5 126 119 208 166 648 6 11 21 200 161 638 7 169 160 143 672 8 12 163 130 654 9 165 128 652 10 97 118 120 113 114 99 661 94 116 91 649 86 110 122 88 641

Results - LOFT Item difficulties were pushed out at the extremes
Difficult items got more difficult while easy items got easier The mid range difficulties were more stable across pools and calibration samples

Difference between True Difficulty and Estimated Difficulty
Range of True Difficulty Condition -3 to -2 -2 to -1 -1 to 0 0 to 1 1 to 2 2 to 3 Overall 1 0.045 (.21) (.09) 0.001 (.05) 0.002 (.04) 0.004 (.07) (.17) 0.007 (.10) 2 0.047 (.22) 0.006 (.12) 0.003 (.05) (.05) (.09) (.28) (.13) 3 0.034 (.22) (.08) 0.008 (.05) 0.003 (.07) 0.001 (.11) (.23) 0.003 (.13) 4 0.040 (.16) 0.029 (.08) 0.005 (.04) (.04) (.07) (.17) 0.006 (.09) 5 (.17) 0.001 (.09) (.05) (.09) (.23) (.12) 6 0.046 (.22) (.12) 0.001 (.07) (.06) (.11) (.27) (.14) 7 0.065 (.18) 0.004 (.09) (.04) (.18) 0.004 (.10) 8 0.041 (.24) (.09) 0.004 (.05) (.10) (.21) 9 0.016 (.19) 0.030 (.13) (.06) (.06) (.11) (.27) (.14) 10 0.005 (.18) 0.003 (.08) 0.001 (.04) (.08) (.19) (.10) 11 0.032 (.25) 0.003 (.09) (.05) (.09) (.24) (.13) 12 0.035 (.26) 0.011 (.12) 0.005 (.06) 0.002 (.06) 0.003 (.10) (.23) 0.005 (.14) 0.033 (.21) 0.006 (.10) (.09) (.22) 0.001 (.12) Note: Values listed are (true - estimated) difficulty averaged across items and banks. Standard deviation of bias averaged over banks is in parentheses.

Results – CAT: Bias and SEM
The average bias in theta was greater at the extremes Low theta values were underestimated while high theta values were overestimated All item pools were similar in bias at the high end, while uniform pool resulted less bias at the lower end of theta SEM analyses resulted in a similar pattern as the bias.

Bias and SEM of Theta By Pool Difficulty and Ability Range
Pool Difficulty Level ABILITY RANGE -1.8 & LOWER -1.7 TO -0.6 -0.5 TO 0.5 0.6 TO 1.6 1.7 & HIGHER Bias in Theta EXTREME 0.021 0.006 -0.001 -0.006 MOSTLY 0.030 0.004 0.000 -0.005 MODERATE 0.019 0.009 -0.002 -0.000 UNIFORM (-3,3) 0.007 0.002 Standard Error of Measurement 0.256 0.194 0.175 0.170 0.181 0.258 0.180 0.169 0.251 0.184 0.168 0.167 0.179 0.191

Bias in Theta Estimation

Standard Error of Theta Estimation Over Replications

Examinee Ability and the Last Item
The effectiveness of the last item in CAT was measured as the difference between the second-to-last ability estimate and the last item’s difficulty parameter At the lower ends of the ability, difficult item pools performed poorly The pool with the uniformly distributed item difficulty resulted in small differences at low abilities At high ability, all pools formed reasonably The range in differences between theta and difficulty across ability range was for the uniform pool, and 1.3 to 1.45 for the other pools.

Effectiveness of the Last Item
Pool Type ABILITY RANGE -1.8 & LOWER -1.7 TO -0.6 -0.5 TO 0.5 0.6 TO 1.6 1.7 & HIGHER Extremely Difficult -1.411 -0.452 -0.028 -0.053 0.026 Mostly Difficult -1.279 -0.358 0.004 0.003 0.030 Moderately Difficult -1.423 -0.413 -0.010 0.011 0.028 Uniform (-3,3) -0.101 -0.014 0.001 0.091

Progression through CAT
For low theta, Non-Uniform difficulty conditions were less likely to give a less difficult item following an incorrect response Uniform difficulty conditions were more likely to give an easy item after incorrect answer. Uniform difficulty pools were more consistent across ability groups than the non-uniform difficulty pools The average range in chance for an easier item after incorrect answer were: for uniform pool and for extremely difficult pool

Ability Range Likelihood of harder Item after correct answer
Pool Type -1.8 & LOWER -1.7 TO -0.6 -0.5 TO 0.5 0.6 TO 1.6 1.7 & HIGHER Likelihood of harder Item after correct answer Extremely Difficult 0.784 0.734 0.715 0.821 0.867 Mostly Difficult 0.771 0.738 0.770 0.836 Moderately Difficult 0.775 0.752 0.795 0.822 0.813 Uniform (-3,3) 0.747 0.766 0.788 Likelihood of easier item after incorrect answer 0.612 0.673 0.721 0.812 0.855 0.607 0.694 0.780 0.819 0.614 0.696 0.804 0.758 0.776 0.781 0.759

Item Exposure Conditions with few easy items had high exposure of easy items (10 to 50 percent of students saw the item) while about 1 percent saw each of the difficult items For Non-Uniform Difficulty conditions the number of times an item was administered was correlated with difficulty such that less difficult items were administered a greater number of times (r ~ .6) Conditions with uniformly distributed difficulty banks exposed items in the middle of the difficulty distribution the most (5 to 20 percent of students saw the item)

Conclusion Practitioners strive to build CAT item pools that are uniformly distributed in difficulty The goal is to measure all ability levels with good, equitable precision This paper highlighted what the lack of an ideal item pool, and hence precision, could result in It has been shown in this presentation that pools with negatively skewed difficulty distributions may not provide good results for all students in CAT

Mohamed Dirir, Norma Sinclair, and Erin Strauts

Similar presentations

Presentation on theme: "Mohamed Dirir, Norma Sinclair, and Erin Strauts"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Mohamed Dirir, Norma Sinclair, and Erin Strauts

Similar presentations

Presentation on theme: "Mohamed Dirir, Norma Sinclair, and Erin Strauts"— Presentation transcript:

Similar presentations

About project

Feedback