Reliability Chapter 3
Every observed score is a combination of true score and error Obs. = T + E Reliability = Classical Test Theory
Systematic versus unsystematic error Reliability only takes unsystematic error into account Reliability
Reliability & Correlation Reliability often based on consistency between two sets of scores Correlation: Statistical technique used to examine consistency
Positive Correlation
Negative Correlation
Correlation coefficient: a numerical indicator of the relationship between two sets of data Pearson-Product Moment correlation coefficient is most common Pearson-Product Moment Correlation Coefficient
The percentage of shared variance between two sets of data Coefficient of Determination
Test-Retest Alternate/Parallel Forms Internal Consistency Measures Types of Reliability
Correlating performance on first administration with performance on the second Co-efficient of stability Test-Retest
Two forms of instrument, administered to same individuals Alternate/Parallel Forms
Split-half reliability Spearman-Brown formula Kuder-Richardson formulas KR 20 KR 21 Coefficient Alpha Internal Consistency Measures
Typical methods for determining reliability may not be suitable for: Speed tests Criterion-referenced tests Subjectively-scored instruments Interrater reliability Nontypical Situations
Examine purpose for using instrument Be knowledgeable about reliability coefficients of other instruments in that area Examine characteristics of particular clients against reliability coefficients Coefficients may vary based on SES, age, culture/ethnicity, etc. Evaluating Reliability Coefficients
Standard Error of Measurement Provides estimate of range of scores if someone were to take instrument repeatedly Based on premise that when individuals take a test multiple times, scores fall into normal distribution
Sam’s SAT Verbal = 550 r =.91; s = 100 SEM 68% of the time, Sam’s true score would fall between 520 and 580 95% of the time, Sam’s true score would fall between 490 and 610 99.5% of the time, Sam’s true score would fall between 460 and 640 SEM: Example
Determining Range of Scores Using SEM
Method to determine if difference between two scores is significant Takes into account SEM of both scores Standard Error of Difference
Generalizability or Domain Sampling Theory Focus is on estimating the extent to which specific sources of variation under defined conditions are contributing to the score on the instrument Alternative Theoretical Model