Download presentation
Presentation is loading. Please wait.
Published byTobias Higgins Modified over 9 years ago
1
Conceptual Issues in Response-Time Modeling Wim J. van der Linden CTB/McGraw-Hill
2
Outline Traditions of RT modeling RTs fixed or random? Item completion, responses, and RTs RT and speed Speed and ability
3
Outline Cont’d RT and item difficulty Dependences between responses and RTs Hierarchical model of responses and RTs Applications to testing problems
4
Traditions of RT Modeling Four different traditions –No model –Distinct models for RTs –Response models with RT parameters –RT models in mathematical psychology Alternative –Hierarchical model of responses and RTs Test design –Fixed tests –Adaptive tests –Test accommodations And many more
5
Fixed or Random RTs Some models treat RTs as fixed quantities: –Roskam (1987, 1997); Thurstone (1937) RTs treated as random in psychology Random responses but fixed RTs seems contradictory Conclusion 1: Just as responses, RTs on test items should be treated as realizations of random variables Test design –Fixed tests –Adaptive tests –Test accommodations And many more
6
Item Completion, Response, and RT Rasch (1960) models for misreadings and reading speed –Poisson-gamma framework –Same notation and terminology for parameters in both types of models Test design –Fixed tests –Adaptive tests –Test accommodations And many more “To which extent the two difficulty parameters … and the two ability parameters … run parallel is a question to be answered by empirical results, and at present we shall leave it open.” (Rasch, 1960, p. 42)
7
Item Completion, Response, and RT Cont’d Notion of equivalent scores of speed tests (Gulliksen, 1960; Woodbury (1951, 1963): –Total time on a fixed number of items –Number of items correct in a fixed time interval Three types of variables required to describe test behavior: –T ij : response time (person j and item i) –U ij : response Test design –Fixed tests –Adaptive tests –Test accommodations And many more
8
Item Completion, Response, and RT Cont’d Three sets of variables (cont’d) –D ij : item completion (design variable) U ij and D ij have different distributions –Same holds for their sums N U : number-correct scores N D : number of items completed Equivalence only when Pr{U ij =1|D ij =1}=1 for all items and persons Test design –Fixed tests –Adaptive tests –Test accommodations And many more
9
Item Completion, Response, and RT Cont’d Distinction between speed and power test makes no sense; all test are hybrids Conclusion 2: T ij, U ij, and D ij are random variables with different distributions. The same holds for their sums: total time (T), number correct (N U ), and number completed (N D ). Except for discreteness, T and N D are inversely related. (We’ll assume T and N U to be independent!) Test design –Fixed tests –Adaptive tests –Test accommodations And many more
10
RT and Speed Speed and time are no equivalent notions Generally, speed is a rate of change of some measure with respect to time, e.g., Test design –Fixed tests –Adaptive tests –Test accommodations And many more
11
RT and Speed Cont’d For achievement testing, an appropriate notion of speed is cognitive speed: Fundamental equation: Amount of labor required (“time intensity”) by item i Speed of person j Response time of person j on item i
12
RT and Speed Cont’d Lognormal RT model (van der Linden, 2006) –Log transformation to remove skewness from RT distributions –Addition of random term
13
RT and Speed Cont’d Lognormal RT model: Speed Time intensity Discrimination
14
RT and Speed Cont’d Conclusion 3: RT and speed are different concepts related through a fundamental equation. RT models with a speed parameter should also have an item parameter for their amount of cognitive labor (or time intensity)
15
Speed and Ability Speed-accuracy tradeoff in psychology is same as a speed-ability tradeoff in achievement testing –Negative within-person correlation between τ and θ –Change of speed required for tradeoff to become manifest Traditional IRT view of a person’s ability is of θ as a scale point, not as a function θ=θ(τ) –Effective ability level Test design –Fixed tests –Adaptive tests –Test accommodations And many more
16
Speed and Ability Cont’d At group level, any correlation between ability and speed may occur Basic assumption: constancy of speed during the test –Constant speed implies constant ability (ceteris paribus) In practice, speed and ability always fluctuate somewhat, but fluctuations should be minor and unsystematic Test design –Fixed tests –Adaptive tests –Test accommodations And many more
17
Speed and Ability Cont’d Conclusion 4: Speed and ability are related through a distinct function θ=θ(τ) for each test taker. The function itself need not be corporated into the response and RT models. But these models do require (fixed) parameters for the effective ability and speed of the test takers. Test design –Fixed tests –Adaptive tests –Test accommodations And many more
18
RT and Item Difficulty Descriptive research and speed-accuracy tradeoff suggest correlation between RT and item difficult –Item difficulty parameter in RT model? –Counterexample Item parameters in response and RT models are for different item effects (on probability of correct response and time, respectively) Test design –Fixed tests –Adaptive tests –Test accommodations And many more
19
RT and Item Difficulty Cont’d Latent vs. manifest effect parameters –Danger of reification of latent effects Conclusion 5: RT models require item parameters for their time intensity but difficulty parameters belong in response models Test design –Fixed tests –Adaptive tests –Test accommodations And many more
20
Dependences between Responses and RTs Descriptive vs. experimental studies However, these studies necessarily involve data aggregation across items and/or persons –Spurious correlations due to hidden sources of covariation (item and person parameters) Marginal vs. conditional independence between responses (spurious correlation, Simpson’s paradox, etc.) Test design –Fixed tests –Adaptive tests –Test accommodations And many more
21
Dependences between Responses and RTs Cont’d Conclusion 6: Regular test behavior is characterized by three different types of conditional (or “local”) independence, namely between –responses on different items –between RTs on different items –between responses and RTs on the same item Test design –Fixed tests –Adaptive tests –Test accommodations And many more
22
Dependences between Responses and RTs Cont’d For these conditional independencies to hold for an entire test, constant speed is a necessary condition Empirical results Test design –Fixed tests –Adaptive tests –Test accommodations And many more
23
Hierarchical Model of Responses and RTs Distinct models for responses and RTs for a fixed person and item –Regular IRT model –E.g., lognormal model for RTs –Models should have parameters for effective ability and speed parameters for item difficulty and time intensity conditional independence Test design –Fixed tests –Adaptive tests –Test accommodations And many more
24
Hierarchical Model of Responses and RTs Cont’d Second-level models for dependences between –ability and speed across persons –difficulty and time intensity across items Multivariate normal distributions (possibly after parameter transformation) Test design –Fixed tests –Adaptive tests –Test accommodations And many more
25
Hierarchical Model of Responses and RTs Cont’d Bayesian treatment of modeling framework –Parameter estimation and model fit analysis with MCMC (Gibbs sampler) –Plug-and-play approach –Calibration of items with respect to RT parameters is straightforward –R package available upon request (Fox, Klein Entink, & van der Linden, 2007; Klein Entink, Fox, & van der Linden, 2009) Test design –Fixed tests –Adaptive tests –Test accommodations And many more
26
Applications to Testing Problems Test design Adaptive testing –Item selection –Differential speededness Detection of cheating –Item memorization and preknowledge –Collusion Test design –Fixed tests –Adaptive tests –Test accommodations And many more
27
Applications to Testing Problems Use of RTs as collateral information in parameter estimation Cognitive research on problem solving Etc. Test design –Fixed tests –Adaptive tests –Test accommodations And many more
28
No RT Model Descriptive studies in educational testing –Correlation between responses and RTs –Regression of RT on item and person attributes Word counts, IRT item parameters, etc. Number-correct scores; ability estimates Experimental studies in psychology –Manipulation of task or conditions Test design –Fixed tests –Adaptive tests –Test accommodations And many more
29
Experimental reaction-time research (cont’d) –Speed-accuracy tradeoff (Luce, 1986) –Plot of proportion of correct responses against RT Test design –Fixed tests –Adaptive tests –Test accommodations And many more No RT Model Cont’d t
30
Problems –Spurious correlations between observed RTs –Speed-accuracy tradeoff is not a between-person phenomenon Test design –Fixed tests –Adaptive tests –Test accommodations And many more No RT Model Cont’d
31
Stroop Test Green
32
Stroop Test Cont’d Blue
33
RTs of two arbitrary students on a quantitative reasoning test Test design –Fixed tests –Adaptive tests –Test accommodations And many more Spurious Relations Subject 1: 22, 19, 40, 43, 27, 27, 45, 23, 14, … Subject 2: 26, 38, 101, 57, 37, 21, 116, 44, 10, …
34
RTs of two arbitrary students on a quantitative reasoning test Test design –Fixed tests –Adaptive tests –Test accommodations And many more Spurious Relations Cont’d Subject 1: 22, 19, 40, 43, 27, 27, 45, 23, 14, … Subject 2: 26, 38, 101, 57, 37, 21, 116, 44, 10, … r=.89
35
RTs of two arbitrary students on a quantitative reasoning test Responses of same students Test design –Fixed tests –Adaptive tests –Test accommodations And many more Spurious Relations Cont’d Subject 1: 22, 19, 40, 43, 27, 27, 45, 23, 14, … Subject 2: 26, 38, 101, 57, 37, 21, 116, 44, 10, … Subject 1: 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, … Subject 2: 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1,… r=.89 r=.20
36
RTs of two arbitrary students on a quantitative reasoning test Responses of same students Test design –Fixed tests –Adaptive tests –Test accommodations And many more Spurious Relations Cont’d Subject 1: 22, 19, 40, 43, 27, 27, 45, 23, 14, … Subject 2: 26, 38, 101, 57, 37, 21, 116, 44, 10, … Subject 1: 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, … Subject 2: 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1,… r=.21
37
Rasch’s (1960) models for reading speed Exponential models –Oosterloo (1975); Scheiblechner (1979) Gamma models –Maris (1993); Pieters & van der Ven (1982) Weibull models –Tatsuoka & Tatsuoka (1980) Distinct Models for RTs
38
Poisson distribution of number of reading errors a in a text of N words Gamma distribution of reading time for text of N words Rasch’s Models
39
This type of model mostly motivated by attempts to build speed-accuracy tradeoff in response model Response surface in Thurstone (1937) Logistic models –Roskam (1987; 1997); Verhelst, Verstralen & Janssen (1997) Test design –Fixed tests –Adaptive tests –Test accommodations And many more Response Models with RT Parameters
40
We also have RT models that incorporate response parameters –E.g., lognormal models by Gaviria (2005) and Thissen (1982) Test design –Fixed tests –Adaptive tests –Test accommodations And many more Response Models with RT Parameters
41
Thurstone’s Response Surface
42
Roskam’s Model (1997) RT “Speed-accuracy tradeoff” Item difficulty Ability
43
Models for underlying psychological processes –Diffusion models –Models for sequential and parallel processing Experimental data –Standardized task –Assumption of exchangeable subjects No subject or item parameters Test design –Fixed tests –Adaptive tests –Test accommodations And many more RT Models in Mathematical Psychology
44
RT and Speed 375 229 + 375 229 58 39 + Time: 9 sec 12 sec Item 1 Item 2 Speed: ? ?
45
Speed-Ability Tradeoff Speed Ability Within-person relation
46
Speed Ability Lower ability Higher ability Speed-Ability Tradeoff Cont’d
47
Effective speed Speed τ θ=θ(τ)θ=θ(τ) Effective ability
48
Speed-Ability Tradeoff Cont’d Speed Ability x x x x x x x x x
49
Speed-Ability Tradeoff Cont’d Speed x x x x x x x x x Ability
50
Speed-Ability Tradeoff Cont’d Speed x x x x x x x x x Ability
51
RT and Item Difficulty 375 229 + 375 229 58 39 + Item 1 Item 2
52
Person (ability) ResponseRT Item (difficulty) Item ( time intensity) Person (speed)
53
Person (ability) ResponseRT Item (difficulty) Item ( time intensity) Person (speed) Distribution of Item Parameters Distribution of Person Parameters
54
Test Design So far, issues of test speededness have been dealt with intuitively, with post hoc evaluation of time limits Alternatively, the time parameters of the items can be used to assemble a test to have a prespecified level of speededness –Example for LSAT Test design –Fixed tests –Adaptive tests –Test accommodations And many more
55
New Test Equally Speeded as Reference Test τ=0 New test Reference test
56
Adaptive Testing Application 1: use responses and RTs during the test to select the next item –Posterior predictive density of responses on candidate item given previous responses and RTs –Example for LSAT (simulation) Application 2: select items to prevent speededness of test –Example for ASVAB Test design –Fixed tests –Adaptive tests –Test accommodations And many more
57
Response and RTs in Adaptive Testing Item: a i,b i,c i U ij T ij Person: j Person: j Item: i, i Population: , Item Domain: abc , abc
58
Mean Square Error in Ability Estimates MSE θθ n=10n=20 No RTs ρ=.2 ρ=.8
59
Time Used to Complete Test (Without Constraint) Time Limit (39 min) =2 =-2 Speed
60
Time Used to Complete Test (With Constraint) Time =-2 =2 Speed Limit (39 min)
61
Time Used to Complete Test (With Constraint) Time =-2 =2 Speed Limit (34 min)
62
Time Used to Complete Test (With Constraint) Time =-2 =2 Speed Limit (29 min)
63
Detection of Cheating Item memorization and preknowledge –Check actual RTs on suspicious item against expectation based on (i) its time parameters and (ii) estimation of speed on other items –Baysian residuals –Case Study for GMAT Test design –Fixed tests –Adaptive tests –Test accommodations And many more
64
Detection of Cheating Cont’d Types of collusion –Sign language –Intra/internet –Wireless communication Collusion between test takers may manifest itself as correlation between their response times (RT)
65
Detection of Cheating Cont’d However, observed RTs always correlate because the time intensity of the items varies from one item to the next (see earlier example of spurious correlation)! Therefore, correlation between RTs of pairs of test takers should be analyzed under a model for their bivariate distribution
66
Detection of Cheating Cont’d Bivariate lognormal model for RTs by test takers j and k on item i
67
Detection of Cheating Cont’d Example for test of quantitative reasoning
68
Case Study for GMAT Cont’d Example 1: RT patterns with 15 flagged items –Test taker spent most time on Items 1-18 and then rushed through 19-27 –No cheating but serious time management problem –Observe RT on Item 2, which is quite time intensive but the residual RT is barely aberrant!
69
RT Pattern with 15 Flagged Items
70
Case Study for GMAT Cont’d Example 2: Observed vs. residual RTs (no flagged items!) –This case illustrates need of RT modeling and analysis of residual RTs –Observed RTs suggest same time management problem as in the preceding example but the pattern almost disappears for the residual RTs
71
Observed vs. Residual RTs
72
Case Study for GMAT Cont’d Example 3: Suspicious item –Large negative residual (-4.66) for Item 14 –RT of 12.3 seconds (expected RT under the model was 88.9 seconds!) –Test taker had correct response but very low estimated ability relative to item difficulty –Four other test takers with same behavior on same item!
73
Suspicious Item
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.