CRT Dependability Consistency for criterion- referenced decisions
Challenges for CRT dependability Raw scores may not show much variation (skewed distributions) CRT decisions are based on acceptable performance rather than relative position A measure of the dependability of the classification (i.e., master / non-master) is needed
Approaches using cut-score Threshold loss agreement –In a test-retest situation, how consistently are the students classified as master / non-master –All misclassifications are considered equally serious Squared error loss agreement –How consistent are the classifications –The consequences of misclassifying students far above or far below cut-point are considered more serious Berk, R. A. (1984). Selecting the index of reliability. In R. A. Berk (Ed.), A guide to criterion-referenced test construction (pp ). Baltimore, MD: The Johns Hopkins University Press.
Issues with cut-scores “The validity of the final classification decisions will depend as much upon the validity of the standard as upon the validity of the test content” (Shepard, 1984, p. 169) “Just because excellence can be distinguished from incompetence at the extremes does not mean excellence and incompetence can be unambiguously separated at the cut-off.” (p. 171) Shepard, L. A. (1984). Setting performance standards. In R. A. Berk (Ed.), A guide to criterion- referenced test construction (pp ). Baltimore, MD: The Johns Hopkins University Press.
Methods for determining cut-scores Method 1: expert judgments about performance of hypothetical students on test Method 2: test performance of actual students
Setting cut-scores (Brown, 1996, p. 257)
Institutional decisions (Brown, 1996, p. 260)
Agreement coefficient (p o ), kappa P o = (A + D) / N P o = (A + D) / N P o = (77+21) / 110 P o =.89 P chance = [(A+B)(A+C)+(C+D)(B+D)]/N 2 (p – p chance ) (1 – p chance ) K= K = ( ) / (1 -.63) K =.70
Short-cut methods for one administration Calculate an NRT reliability coefficient –Split-half, KR-20, Cronbach alpha Convert cut-score to standardized score –Z = [(cut-score -.5 – mean)] / SD Use Table 7.9 to estimate Agreement Use Table 7.10 to estimate Kappa
Estimate the dependability for the HELP Reading test Assume a cut point of 60%. What is the raw score? 27 z = Look at Table 9.1. What is the approximate value of the agreement coefficient? Look at Table 9.2. What is the approximate value of the kappa coefficient?
Squared-error loss agreement Sensitive to degrees of mastery / non- mastery Short-cut form of generalizability study Classical Test Theory –OS = TS + E Generalizability Theory –OS = TS + (E 1 + E E k ) Brennan, Robert (1995). Handout from generalizability theory workshop.
Phi (lambda) dependability index Cut-point# of items Mean of proportion scores Standard deviation of proportion scores
Domain score dependability Does not depend on cut-point for calculation “estimates the stability of an individual’s score or proportion correct in the item domain, independent of any mastery standard” (Berk, 1984, p. 252) Assumes a well-defined domain of behaviors
Phi dependability index
Confidence intervals Analogous to SEM for NRTs Interpreted as a proportion correct score rather than raw score
Reliability Recap Longer tests are better than short tests Well-written items are better than poorly written items Items with high discrimination (ID for NRT, B- index for CRT) are better A test made up of similar items is better CRTs – a test that is related to the objectives is better NRTs – a test that is well-centered and spreads out students is better