The first test of validity Reliability The first test of validity
Reliability Degree to which an assessment tool yields consistent evaluations across time, situations, and raters Is the instrument trustworthy to measure what it says it measures?
Types of Reliability Interrater Stability Internal Consistency
Model of Reliability Obtained score = True score + error Determines how much error is present in an obtained score
Standard Error of Measurement Standard deviation of error around a true score SEM = SD 1- rxx
True scores Mean score obtained if the entire domain were tested Because an entire domain cannot be tested, the true score can only be estimated
True Score Estimate X’ = X + rt1t2(X - X)
Confidence Intervals for Estimated True Scores C.I. = X’ ± (z) (SEM)
Common % Confidence z-score Intervals
Test-retest reliability Measures consistency of administration Test the same group with the same test over a short period of time one-two weeks is appropriate
Equivalent Forms Measures consistency across tests of same content but varying questions. Tests given to same group at the same time and correlated
Split-half reliability Measures consistency across items, i.E, internal consistency Divide a single test in half and correlate Different approaches to splitting: even/odd, 1st/2nd section, random selection Other possible split half measures include Coefficient alpha (Cronbach’s ) Kr-20
Interrater reliability Useful in measuring reliability across observers or scorers. Number of agreements Number of agreements + number of disagreements 100 % Agreement =
Desirable Standards for Reliability Test authors need to report rxx and validation data. test validation should be also be present for subtest or subscales. Group data for administrative purposes .7 - .9 is desirable .6 is a minimum
Desirable Standards for Reliability Individual Data For placement decisions .9 is the minimum. For screening decisions .8 is recommended.