Functional Methods for Testing Data
The data Each person a = 1,…,N Each person a = 1,…,N responds to each item i = 1,…,n responds to each item i = 1,…,n and makes a binary response = u ai, where 0 indicates “wrong” and 1 indicates “right”. and makes a binary response = u ai, where 0 indicates “wrong” and 1 indicates “right”. We want to estimate P ai = the probability of person a getting item i right. We want to estimate P ai = the probability of person a getting item i right.
The model The response space is an n-dimensional unit hypercube. The data vectors {u a1,…,u an } are on the corners. The response space is an n-dimensional unit hypercube. The data vectors {u a1,…,u an } are on the corners. The vectors of correct response probabilities {P a1,…,P an } fall along a smooth curve in response space, the response manifold. The vectors of correct response probabilities {P a1,…,P an } fall along a smooth curve in response space, the response manifold. This manifold is, in principle, identifiable from the data, is therefore not a latent trait. This manifold is, in principle, identifiable from the data, is therefore not a latent trait.
Item response functions We can define a smooth charting function that maps each point on the response manifold to a corresponding real number θ. E.g.: arc length. We can define a smooth charting function that maps each point on the response manifold to a corresponding real number θ. E.g.: arc length. In this way we establish a metric defining positions on this manifold. In this way we establish a metric defining positions on this manifold. P i (θ) is the success probability for item i of all those at position θ. This is a smooth function of θ : The item response function for item i. P i (θ) is the success probability for item i of all those at position θ. This is a smooth function of θ : The item response function for item i.
The response manifold for 3 test items: 3, 4, and 29. The response manifold for 3 test items: 3, 4, and 29. The curve indicates the possible values of P ai = P i (θ). The curve indicates the possible values of P ai = P i (θ). The circles correspond to 11 fixed values of θ. The circles correspond to 11 fixed values of θ. Three items from an Intro Psych test Intro Psych test
What does “smooth” mean? If θ has a standard normal distribution, then experience indicates that usually: The function P i (θ) is monotonic. The function P i (θ) is monotonic. It has slopes near 0 for extreme θ values. It has slopes near 0 for extreme θ values. The lower asymptote is positive, and the upper asymptote is one. The lower asymptote is positive, and the upper asymptote is one. There is only one inflection point. There is only one inflection point.
The three-parameter logistic item response function is smooth
The challenges Let the item response functions take whatever shapes are supported by the data. Let the item response functions take whatever shapes are supported by the data. But control their smoothness in this sense. But control their smoothness in this sense. Constrain the function values to lie within [0,1]. Constrain the function values to lie within [0,1]. We want a smooth derivative, too, for the item information function. We want a smooth derivative, too, for the item information function.
The log-odds transformation deals with the [0,1] constraint We actually estimate W i (θ) = log [P i (θ) /Q i (θ)], where Q i (θ) = 1 – P i (θ). “Smooth” in terms of W i (θ) means linear behavior for extreme θ, with small slope on the left and larger positive slope on the right.
The log-odds transformation of a three-parameter logistic item response function
A three-dimensional model for a smooth log-odds function The function log(e θ +1) has the desired behavior at the extremes. The other two terms add vertical shift and tilt as required.
Comparing the 3D model and the 3PL log-odds functions
B-spline expansions for W(θ) But, in fact, we will actually estimate our log-odds functions by expanding W(θ) in terms of a set of K B-spline basis functions, whileexpanding W(θ) in terms of a set of K B-spline basis functions, while smoothing these expansions towards these simpler three-dimensional models.smoothing these expansions towards these simpler three-dimensional models.
Fitting the data We use maximum marginal likelihood estimation, using the EM algorithm to maximize where g(θ) is a prior density on θ, often taken to be the standard normal. Maximization is with respect to the nK coefficients defining the B-spline expansions of the log-odds functions.
What about smoothness? We have defined smoothness here in a less orthodox fashion; It isn’t defined only in terms of the second derivative. We have defined smoothness here in a less orthodox fashion; It isn’t defined only in terms of the second derivative. Instead, we define smooth in terms of the size of Instead, we define smooth in terms of the size of
How did you come up with this? If W(θ) conforms exactly to the three- dimensional smooth model, then LW(θ) = 0. If W(θ) conforms exactly to the three- dimensional smooth model, then LW(θ) = 0. In other words, if In other words, if then
Our strategy is to define a low- dimensional family of prototype functions that capture what we mean by “smooth.” Our strategy is to define a low- dimensional family of prototype functions that capture what we mean by “smooth.” Then we represent this family by a linear differential equation. Then we represent this family by a linear differential equation. This differential equation defines a measure of “roughness”, which we penalize. This differential equation defines a measure of “roughness”, which we penalize. The more we penalize this kind of roughness, the more we force the fitted functions to be smooth. The more we penalize this kind of roughness, the more we force the fitted functions to be smooth.
In general, if we begin with a linear model of dimension m, we can find a linear differential equation of order m such that all versions of this model will satisfy: In general, if we begin with a linear model of dimension m, we can find a linear differential equation of order m such that all versions of this model will satisfy: D m W(θ) = b 0 (θ) W(θ)+ b 1 (θ) DW(θ)+ … D m W(θ) = b 0 (θ) W(θ)+ b 1 (θ) DW(θ)+ … + b m-1 (θ) D m-1 W(θ) + b m-1 (θ) D m-1 W(θ) for some choice of coefficient functions b j (θ). for some choice of coefficient functions b j (θ). We change the equation to a roughness penalty by converting it to operator form: We change the equation to a roughness penalty by converting it to operator form: LW(θ) = b 0 (θ) W(θ)+ b 1 (θ) DW(θ)+…+ b m-1 (θ) D m-1 W (θ) + D m W(θ) = 0. + D m W(θ) = 0.
The roughness-penalty measures the departure ofW(θ) from this smooth model. measures the departure of W(θ) from this smooth model.
Roughness-penalized log marginal likelihood Consequently, we actually maximize Smoothing parameter λ controls the amount of smoothness in the W(θ)s; the larger it is, the smoothness in the W(θ) ‘s; the larger it is, the more these will look like the three-dimensional versions.
Some examples Here are three estimates of the item response functions for items 3, 4, 29, and 96 for an introductory psychology test Here are three estimates of the item response functions for items 3, 4, 29, and 96 for an introductory psychology test The test had 100 items, and was given to 379 students. The test had 100 items, and was given to 379 students. Each function W(θ) is defined by an expansion in terms of 13 B-spline basis functions. Each function W(θ) is defined by an expansion in terms of 13 B-spline basis functions.
λ=0
λ=0.01
λ=50
What does θ mean? We have fallen into the habit of calling θ a “latent trait score”. We have fallen into the habit of calling θ a “latent trait score”. Actually, it is the value of a function that is chosen more or less arbitrarily to map position along the response manifold. Actually, it is the value of a function that is chosen more or less arbitrarily to map position along the response manifold. The assumption of a standard normal distribution is pure convention. The assumption of a standard normal distribution is pure convention. We can choose otherwise. We can choose otherwise.
What charting functions would be more useful? Three choices of charting functions are especially interesting, and none are “latent” in any sense. Three choices of charting functions are especially interesting, and none are “latent” in any sense. Each leads to interesting diagnostic statistics and graphics. Each leads to interesting diagnostic statistics and graphics.
The arc length charting Arc length s measures the Euclidean distance traveled along the manifold from its origin at θ 0 to a given position θ: Arc length s measures the Euclidean distance traveled along the manifold from its origin at θ 0 to a given position θ:
Item discrimination in arc length metric One useful property of arc length is Each squared item discrimination is a proportion total test discrimination, and therefore has a familiar frame of reference.
Expected score charting Assuming that expected score is monotonically related to θ, (there aren’t too many items like 96), then Provides a metric that is familiar to users and easy for them to interpret. Expected score is already used extensively as a basis for assessing differential item functioning (DIF).
ACT Math test for males and females Three items from a 60 item math test. Three items from a 60 item math test. Around 2000 examinees. Around 2000 examinees. The male and female response manifolds differ. The male and female response manifolds differ.
Differential item functioning for an ACT Math test item
Total change charting The following total change in probability of success measure is closely related to arc length:
Some general lessons Fitting functional models to non-functional data is relatively straight-forward. Fitting functional models to non-functional data is relatively straight-forward. But we do need to transform constrained functions into unconstrained versions. But we do need to transform constrained functions into unconstrained versions. We can define smoothness or roughness in customized ways that capture the default or baseline behavior of our estimated functions. We can define smoothness or roughness in customized ways that capture the default or baseline behavior of our estimated functions.
“Latent trait models” aren’t really latent at all. “Latent trait models” aren’t really latent at all. They express the idea of a one-dimensional subspace for modeling the data. They express the idea of a one-dimensional subspace for modeling the data. Differential geometry gives us the appropriate mathematical tools. Differential geometry gives us the appropriate mathematical tools. There is room for creativity in choosing charting functions. There is room for creativity in choosing charting functions.
Looking ahead There is an intimate connection between designer roughness penalties and the estimation of differential equations from data. There is an intimate connection between designer roughness penalties and the estimation of differential equations from data. We will use discrete data to estimate a differential equation that describes the data. We will use discrete data to estimate a differential equation that describes the data.
References More technical details on fitting test data with functional models are in Rossi, N., Wang, X. and Ramsay, J. O. (2002) Nonparametric item response function estimates with the EM algorithm. Journal of Educational and Behavioral Statistics, 27,