1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill
2 Outline Review of Lord (1980) Local equating Few examples Discussion
3 Review of Lord (1980) Notation –X: old test form with observed score X –Y: new test form Y with observed score Y –θ: common ability measured by X and Y –x=φ(y): equating transformation
4 Review of Lord (1980) Cont’d Case 1: Infallible measures –X and Y order any population identically –Equivalence of ranks establishes equating transformation
5 Review of Lord (1980) Cont’d Case 1: Infallible measures Cont’d –Q-Q curve –Issues related to discreteness, strict monotonicity, and sampling error will be ignored –Equating is population invariant –Equating error always equal to zero
6 Review of Lord (1980) Cont’d Case 2: Fallible measures –For each test taker, observed score are random variables –Realizations of X and Y do not order populations of test takers identically –Criterion of equity of equating for all θ
7 Review of Lord (1980) Cont’d Case 2: Fallible measures Cont’d –Lord’s theorem: Under realistic conditions, scores X and Y on two tests cannot be equated unless either (1) both scores are perfectly reliable of (2) the two tests are strictly parallel [in which case φ(y)=y]
8 Review of Lord (1980) Cont’d Case 2: Fallible measures Cont’d –Equating no longer population invariant
9 Review of Lord (1980) Cont’d Two approximate methods –IRT true-score equating –Use ξ=ξ(η) to equate Y to X
10 Review of Lord (1980) Cont’d Two approximate methods Cont’d –IRT observed-score equating, for a sample of test takers a=1,…,N
11 Review of Lord (1980) Cont’d Lord’s forgotten question: What is really needed is a criterion for evaluating such approximate procedures, so as to be able to choose from among them. If you can’t be fair (provide equity) to everyone, what is the next best thing? (p.207)
12 Local Equating New definition of equating error Equity=no equating error! Setting e 2 (y) equal to zero and solving for φ(y) gives
13 Local Equating Cont’d Because of monotonicity of x=φ(y), the result is the family of error-free (or true) equating transformations Lord’s theorem is based on implicit assumption of a single transformation
14 Local Equating Cont’d Theorem: For a population of test takers P for which X and Y measure the same θ, equating with the family of transformations φ * (y;θ) has the following properties: (i) equity for each p P (ii) symmetry in X and Y for each p P (iii) population invariance within P
15 Local Equating Cont’d Theorem defines population P –No sampling of test takers required –Includes future test takers Alternative definition of equating error:
16 Local Equating Cont’d Definition of bias, MSE, etc., in equating now straightforward Lord’s criterion for finding the “next best thing”
17 Local Equating Cont’d Alternative motivations of local equating –Thought experiment –History of standard error of measurement –Comparison with true-score equating IRT observed-score equating –Same score but different equated scores?
18 Local Equating Cont’d Alternative motivations Cont’d –One measurement instrument but different transformations?
19 Few Examples It may seem as if local equating replaces Lord’s set of impossible conditions for equating (perfect reliability; parallel test) by another impossible condition (known ability) However, post hoc improvement of reliability or parallelness is impossible but we can always approximate an unknown ability
20 Few Examples Cont’d Possible approximations –Estimating ability –Anchor scores as a proxy of ability –Y=y as a proxy of ability –Proxies based on collateral information
21 Discussion Criterion of equity involves a different equating transformation for each ability level Traditional equating uses “one-size fits all” transformation, which compromises between the transformations for ability levels. As a result, the equating is always (i) biased and (ii) population dependent
22 Discussion Cont’d Lord’s theorem on the impossibility or unnecessity of observed-score equating was too pessimistic because it assumed the use of a single equating transformation for a population of test takers
23 Equipercentile Method Test Y Test X Test Score Cumulative Probability F (x)F (x) G(y)G(y) p
24 Thought Experiment y p Test Y
25 Thought Experiment Cont’d y x p p Test Y Test X
26 Thought Experiment Cont’d y x y x=φ(y) p p p Test Y Test X Transformation Y → X
27 Thought Experiment Cont’d y x y x=φ(y) p p p q Test Y Test X Transformation Y → X
28 Thought Experiment Cont’d y x y x=φ(y) p q p p Test Y Test X Transformation Y → X q
29 Thought Experiment Cont’d y y x=φ(y) p q p q qp x Test Y Test X Transformations Y → X
30 Thought Experiment Cont’d Test Y (Population 1) Test X (Population 2) y x y x=φ(y) Transformation Y → X
31 Thought Experiment Cont’d y x=φ(y) Transformation Y → X y x=φ(y) qp Transformations Y → X
32 Standard Error of Measurement Classical test theory involves one SEM for an entire population of test takers Stronger models condition on ability measured; e.g., IRT
33 True-Score Equating True-score equating is a degenerate case of local equating
34 Different Equated Scores? Why should two test takers, p and q, with the same score of 23 out of 30 items correct on a new test form need different equated scores on the same old form? –Would this not even be unfair? –Fallible scores
35 Different Equated Scores? Cont’d Observed-score distribution of pObserved-score distribution of q
36 Different Transformations? Example of measuring tape Number-correct scores are counts of responses, no fundamental measures Responses have person and item effects –Test equating requires “some type of control for differential examinee ability”—von Davier, Holland & Thayer (2004, p. 2)
37 Different Transformations? Cont’d An effective way to disentangle item and person effects is through IRT modeling Observed-score equating is an attempt to do the same through a transformation of total scores –Only possible way is (i) to first condition on the abilities and (ii) then transform the score to adjust for the item effects
38 Estimating Ability Assumption: fitting response model Calculate family of true equating transformations (Lord-Wingersky’s recursive procedure) Use member of family at point estimate of θ Bias study for 40-item subtests of LSAT Application in adaptive testing
39 Bias Study Bias Traditional Equating Local Equating at
40 Family of True Transformations for LSAT Subtest =-2.0 x y =2.0
41 Anchor Score as Proxy Current methods –Chain equating –Poststratification equating –Linear equating methods: Tucker, Levine, Braun-Holland, linear chain equating Use conditional distributions of X and Y given anchor score A=a
42 Anchor Score as Proxy Cont’d Empirical bias study for same LSAT subtests
43 Bias Study—Anchor-Test Design Chain Equating Poststratification Equating Local Equating
44 Y=y as Proxy of Ability Single-group design –Estimate distributions of X given Y=y directly from bivariate distribution of X and Y –Model-based estimate of Y given y
45 Y=y as Proxy of Ability Linear local equating Because μ Y|y =y (classical test theory),
46 Collateral Information Any variables correlating substantially with θ –Earlier tests –Battery of subtests –Response times Alternative sources give different equatings; just find the “next best thing”