Test co-calibration and equating Paul K. Crane, MD MPH General Internal Medicine University of Washington
Outline Definitions and motivation Educational testing literature Concurrent administration designs Separate administration designs PARSCALE coding considerations Illustration with CSI ‘D’ and CASI Coming attractions; comments
Definition Distinction between “equating” and “co-calibration” We almost always mean “co-calibration” General idea is to get all tests of a kind on the same metric Error terms will likely differ, but tests are trying to measure the same thing
5 things needed for “equating” Scale measures same concept Scales have same level of precision Procedures from scale A to B are inverse of scale B to A Distribution of scores should be identical for individuals of a given level Equating function should be population invariant (Linn, 1993; Mislevy, 1992; Dorans, 2000)
Motivation for co-calibration Many tests measure “the same thing” MMSE, 3MS, CASI, CSI ‘D’, Hasegawa, Blessed…. PRIME-MD, CESD, HAM-D, BDI, SCID…. Literature only interpretable if one is familiar with the nuances of the test(s) used Studies that employ multiple measures (such as the CHS) face difficulty in incorporating all their data into their analyses In sum: facilitates interpretation and analysis
Educational literature Distinct problems: Multiple levels of same topic, e.g. 4th grade math, 5th grade math, etc. (“vertical” equating) Multiple forms of same test, e.g. dozens of forms of SAT, GRE to prevent cheating (“horizontal” equating) Making sure item difficulty is constant year to year (item drift analyses)
Strategies are the same Either need to have common items in different populations, or common people with different tests Analyze big dataset that contains all items and people Verify that common (people or items) are acting as expected
Concurrent administration Common population design: Population Test 1 Test 2 Test 3
Separate administration Anchor test design – e.g., McHorney Pop. 1 Pop. 2 Pop.3 Anchor items Pop. 1 items (missing) Pop. 2 items Pop. 3 items
Item bank development 1 2 3 A unique A B B unique B C A C C unique
Comments Fairly simple; God is in the details! Afternoon workgroup will address the details Illustration to follow
PARSCALE code For concurrent administration, it’s as if there is a single longer test For separate administration, basically a lot of missing data Once data are in correct format, PARSCALE does the rest
Illustration: CSI‘D’ and CASI
Information curves
SEM
Relative information
Coming attractions Optimizing screening tests from a pool of items (on Friday) Item banking and computer adaptive testing (PROMIS initiative) Incorporation of DIF assessment (tomorrow) Comments and questions