Presentation is loading. Please wait.

Presentation is loading. Please wait.

Test co-calibration and equating

Similar presentations


Presentation on theme: "Test co-calibration and equating"— Presentation transcript:

1 Test co-calibration and equating
Paul K. Crane, MD MPH General Internal Medicine University of Washington

2 Outline Definitions and motivation Educational testing literature
Concurrent administration designs Separate administration designs PARSCALE coding considerations Illustration with CSI ‘D’ and CASI Coming attractions; comments

3 Definition Distinction between “equating” and “co-calibration”
We almost always mean “co-calibration” General idea is to get all tests of a kind on the same metric Error terms will likely differ, but tests are trying to measure the same thing

4 5 things needed for “equating”
Scale measures same concept Scales have same level of precision Procedures from scale A to B are inverse of scale B to A Distribution of scores should be identical for individuals of a given level Equating function should be population invariant (Linn, 1993; Mislevy, 1992; Dorans, 2000)

5 Motivation for co-calibration
Many tests measure “the same thing” MMSE, 3MS, CASI, CSI ‘D’, Hasegawa, Blessed…. PRIME-MD, CESD, HAM-D, BDI, SCID…. Literature only interpretable if one is familiar with the nuances of the test(s) used Studies that employ multiple measures (such as the CHS) face difficulty in incorporating all their data into their analyses In sum: facilitates interpretation and analysis

6 Educational literature
Distinct problems: Multiple levels of same topic, e.g. 4th grade math, 5th grade math, etc. (“vertical” equating) Multiple forms of same test, e.g. dozens of forms of SAT, GRE to prevent cheating (“horizontal” equating) Making sure item difficulty is constant year to year (item drift analyses)

7 Strategies are the same
Either need to have common items in different populations, or common people with different tests Analyze big dataset that contains all items and people Verify that common (people or items) are acting as expected

8 Concurrent administration
Common population design: Population Test 1 Test 2 Test 3

9 Separate administration
Anchor test design – e.g., McHorney Pop. 1 Pop. 2 Pop.3 Anchor items Pop. 1 items (missing) Pop. 2 items Pop. 3 items

10 Item bank development 1 2 3 A unique A  B B unique B  C A  C
C unique

11 Comments Fairly simple; God is in the details!
Afternoon workgroup will address the details Illustration to follow

12 PARSCALE code For concurrent administration, it’s as if there is a single longer test For separate administration, basically a lot of missing data Once data are in correct format, PARSCALE does the rest

13 Illustration: CSI‘D’ and CASI

14 Information curves

15 SEM

16 Relative information

17 Coming attractions Optimizing screening tests from a pool of items (on Friday) Item banking and computer adaptive testing (PROMIS initiative) Incorporation of DIF assessment (tomorrow) Comments and questions


Download ppt "Test co-calibration and equating"

Similar presentations


Ads by Google