Test co-calibration and equating

Test co-calibration and equating
Paul K. Crane, MD MPH General Internal Medicine University of Washington

Outline Definitions and motivation Educational testing literature
Concurrent administration designs Separate administration designs PARSCALE coding considerations Illustration with CSI ‘D’ and CASI Coming attractions; comments

Definition Distinction between “equating” and “co-calibration”
We almost always mean “co-calibration” General idea is to get all tests of a kind on the same metric Error terms will likely differ, but tests are trying to measure the same thing

5 things needed for “equating”
Scale measures same concept Scales have same level of precision Procedures from scale A to B are inverse of scale B to A Distribution of scores should be identical for individuals of a given level Equating function should be population invariant (Linn, 1993; Mislevy, 1992; Dorans, 2000)

Motivation for co-calibration
Many tests measure “the same thing” MMSE, 3MS, CASI, CSI ‘D’, Hasegawa, Blessed…. PRIME-MD, CESD, HAM-D, BDI, SCID…. Literature only interpretable if one is familiar with the nuances of the test(s) used Studies that employ multiple measures (such as the CHS) face difficulty in incorporating all their data into their analyses In sum: facilitates interpretation and analysis

Educational literature
Distinct problems: Multiple levels of same topic, e.g. 4th grade math, 5th grade math, etc. (“vertical” equating) Multiple forms of same test, e.g. dozens of forms of SAT, GRE to prevent cheating (“horizontal” equating) Making sure item difficulty is constant year to year (item drift analyses)

Strategies are the same
Either need to have common items in different populations, or common people with different tests Analyze big dataset that contains all items and people Verify that common (people or items) are acting as expected

Concurrent administration
Common population design: Population Test 1 Test 2 Test 3

Separate administration
Anchor test design – e.g., McHorney Pop. 1 Pop. 2 Pop.3 Anchor items Pop. 1 items (missing) Pop. 2 items Pop. 3 items

Item bank development 1 2 3 A unique A  B B unique B  C A  C
C unique

Comments Fairly simple; God is in the details!
Afternoon workgroup will address the details Illustration to follow

PARSCALE code For concurrent administration, it’s as if there is a single longer test For separate administration, basically a lot of missing data Once data are in correct format, PARSCALE does the rest

Illustration: CSI‘D’ and CASI

Information curves

Relative information

Coming attractions Optimizing screening tests from a pool of items (on Friday) Item banking and computer adaptive testing (PROMIS initiative) Incorporation of DIF assessment (tomorrow) Comments and questions

Test co-calibration and equating

Similar presentations

Presentation on theme: "Test co-calibration and equating"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Test co-calibration and equating

Similar presentations

Presentation on theme: "Test co-calibration and equating"— Presentation transcript:

Similar presentations

About project

Feedback