Ying (“Alison”) Cheng 1 John Behrens 2 Qi Diao 3 1 Lab of Educational and Psychological Measurement Department of Psychology, University of Notre Dame.

Ying (“Alison”) Cheng 1 John Behrens 2 Qi Diao 3 1 Lab of Educational and Psychological Measurement Department of Psychology, University of Notre Dame 2 Center for Digital Data, Analytics, & Adaptive Learning Pearson 3 CTB/McGraw-Hill Maximum Information per Unit Time in Adaptive Testing 1

Test Efficiency Weiss (1982): CAT can achieve the same measurement precision with half the number of items of linear tests when the maximum information (MI) method is used for item selection Maximum information method (Lord, 1980)  Choosing the item that yields the largest amount of information at the most recent ability estimate  Maximum information Per Item 2

Test Efficiency All tests are timed Maximum information given a time limit  Choosing the item that yields the largest ratio of amount of information and time required  Maximum information per unit time (MIPUT) (Fan, Wang, Chang, & Douglas, 2013) 3

MI vs. MIPUT 4

Implementation of MIPUT 5

Performance of MIPUT 6 Fan et al. (2013) showed that the MIPUT method when compared to the MI method leads to: i) shorter testing time; ii) small loss of measurement precision; iii) visibly worse item pool usage. Fan et al. (2013) used a-stratification (Chang & Ying, 1999) with the MIPUT method to balance item pool usage and found it effective

a-stratification 7

Questions that Remain 8 Fan et al. (2013) simulated items that:  Item difficulty and time intensity are either correlated or not correlated;  Item discrimination and difficulty are not correlated;  Item discrimination and time intensity are not correlated. In reality:  Item discrimination and difficulty are positively correlated (~.4-.6) (Chang, Qian, Ying, 2001). Q1: How about item discrimination and time intensity?

Follow-Up Questions 9 Q2: If item discrimination and time intensity are indeed related:  Will MIPUT still lead to worse item pool usage than MI?  If so, is that still due to highly discrimination items or due to highly time saving items? Q3: Under the 1PL model where item discrimination parameter is not a factor  Will MIPUT still lead to worse item pool usage than MI?  If so, is that due to highly time saving items?  If so, how can we control item exposure?

10 Calibration of a large item bank  Online math testing data  595 items  Over 2 million entries of testing data  3PL and 2PL model – in the following analysis, focus on 2PL  Time intensity measured by the log-transformed average time on each item Q1: Item Discrimination and Time Intensity

11 2PL_a2PL_b3PL_a3PL_b3PL_c Time Intensity 2PL_a1.111 **.702 **.009-.584 **.139 2PL_b.111 ** 1.387 **.935 ** -.369 **.562 3PL_a.387 **.702 ** 1.350 ** -.363 **.205 3PL_b.935 **.009.350 ** 1-.226 **.564 3PL_c-.369 ** -.584 ** -.363 ** -.226 ** 1-.0425 Time Intensity.522 **.080.153 **.522 ** -.355 ** 1 Q1: Item Discrimination and Time Intensity

Q2 12 So item discrimination and time intensity are indeed related. Then  Will MIPUT still lead to worse item pool usage than MI?  If so, is that still due to highly discrimination items or due to highly time saving items?

A Simplified Version of MIPUT 13

14 CAT simulation  Test length: 20 or 40  First item randomly chosen from the pool  5,000 test takers ~ N(0,1)  Ability update: EAP with prior of N(0,1)  No exposure control or content balancing if not specified otherwise Simulation Details

Q2 15 20-Item40-Item MI_2PLMIPUT_2PLMI_2PLMIPUT_2PL Bias.002.0030.001.002 MSE.019.0200.012.012.991.9900.994.994 Chi-square95.2996.4181.0084.10 No exposure73.1%71.9%50.9%52.1% Underexposed (<.02) 77.8%78.2%57.0%58.2% Overexposed (>.20) 4.37%5.04%11.6%11.8% Average time used (mins) 38.59634.43479.34670.112 Min testing time) 17.85716.37437.03934.383 Max testing time 84.03579.96979.346146.773

Findings 16 On average, MIPUT leads to shorter tests (on average by 4 minutes than MI if test length is 20 – 10%, and 9 minutes if test length is 40 – 11%) MIPUT leads to slightly worse exposure control When item discrimination and time intensity are positively related, the disadvantage of MIPUT in exposure control becomes less conspicuous MI and MIPUT lead to negligible difference in measurement precision Over-exposure is still largely attributable to highly discrimination items

Q3 17 Q3: Under the 1PL model where item discrimination parameter is not a factor  Will MIPUT still lead to worse item pool usage than MI?  If so, is that due to highly time saving items?  If so, how can we control item exposure?

18 MI_1PL MIPUT_1PLMIPUTR5_1PL MIPUTPR_1PL Bias -.001.001.007-.004 MSE.074.078.081.075.963.962.960.963 Chi-square 23.31152.6725.5990.21 No exposure 26.1%78.3%36.1%43.2% Underexposed (<.02) 59.2%82.5%56.0%75.8% Overexposed (>.20) 05.38%04.03% Average time used (mins) 38.90917.594 26.643 20.023 Min testing time 17.90911.547 16.079 12.388 Max testing time 94.40753.557 73.127 63.044 Test Length = 20

Findings if Test Length = 20 19 MI vs MIPUT  Negligible difference in measurement precision  MIPUT reduces testing time by 21 minutes for a 20-item test (55% reduction) But  MIPUT leads to much worse exposure control  Items that are highly time saving are favored o Correlation between the exposure rate and time intensity under MI-1PL: -.240 – an artifact of the item bank o Correlation between the exposure rate and time intensity under MIPUT-1PL: -.398

Exposure Control 20 a-stratification is not going to work Randomesque (Kingsbury & Zara, 1989)  Randomly choose one out of n best items, e.g., n = 5  MIPUT-R5 Progressive Restricted (Revuelta & Ponsoda, 1998)  A weighted index, weight determined by the stage of the test  Random number and the time-adjusted item information  Higher weight given to the time-adjusted item information later in the test

21 MI_1PL MIPUT_1PLMIPUTR5_1PL MIPUTPR_1PL Bias -.001.001.007-.004 MSE.074.078.081.075.963.962.960.963 Chi-square 23.31152.6725.5990.21 No exposure 26.1%78.3%36.1%43.2% Underexposed (<.02) 59.2%82.5%56.0%75.8% Overexposed (>.20) 05.38%04.03% Average time used (mins) 38.90917.594 26.643 20.023 Min testing time 17.90911.547 16.079 12.388 Max testing time 94.40753.557 73.127 63.044 Test Length = 20

Findings if Test Length = 20 22 MIPUT_R5  Maintains measurement precision  Much better exposure control  Reduces testing time on average by 12 minutes (>30% reduction) MIPUT_PR  Maintains measurement precision  Better exposure control but still not quite so good  Reduces testing time on average by 18 minutes (reduction almost by half)

23 MI_1PL MIPUT_1PLMIPUTR5_1PL MIPUTPR_1PL Bias -.003-.001.003.004 MSE.038.040.045.040.981.978.981 Chi-square 30.75135.3717.7399.31 No exposure 3.4%60.7%7.7%15.6% Underexposed (<.02) 25.7%66.9%18.3%59.2% Overexposed (>.20) 3.9%13.4%012.3% Average time used (mins) 77.66041.19267.90345.223 Min testing time 39.45127.30048.71829.181 Max testing time 162.889132.986136.585137.489 Test Length = 40

Findings if Test Length = 40 24 Same findings replicated when test length doubles MIPUT leads to much worse item pool usage because of the overreliance on time saving items MIPUT_R5  Maintains measurement precision  Much better exposure control  Reduces testing time on average by 13% MIPUT_PR  Maintains measurement precision  Better exposure control but still not quite so good  Reduces testing time on average by 41%

Overall Summary 25 MIPUT’s advantage of time saving is more conspicuous under the 1PL MIPUT leads to much worse item pool usage than MI and relies heavily on time saving items MIPUT_R5 is a promising method to maintain measurement precision, balance item pool usage and still keeps the time saving advantage

Future Directions 26 Develop a parallel exposure control method under MIPUT to a-stratify: stratifying by time Investigates the performance of the simplified MIPUT and the original MIPUT in the presence of violation of assumptions to the log-normal model for response time More data analysis to explore the relationship between time intensity and item parameters Control total testing time (van der Linden & Xiong, 2013)

Thank You! 27 CTB/McGraw-Hill 2014 R&D Grant Question or paper, please visit irtnd.wikispaces.com

Ying (“Alison”) Cheng 1 John Behrens 2 Qi Diao 3 1 Lab of Educational and Psychological Measurement Department of Psychology, University of Notre Dame.

Similar presentations

Presentation on theme: "Ying (“Alison”) Cheng 1 John Behrens 2 Qi Diao 3 1 Lab of Educational and Psychological Measurement Department of Psychology, University of Notre Dame."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ying (“Alison”) Cheng 1 John Behrens 2 Qi Diao 3 1 Lab of Educational and Psychological Measurement Department of Psychology, University of Notre Dame.

Similar presentations

Presentation on theme: "Ying (“Alison”) Cheng 1 John Behrens 2 Qi Diao 3 1 Lab of Educational and Psychological Measurement Department of Psychology, University of Notre Dame."— Presentation transcript:

Similar presentations

About project

Feedback