Questions What are the sources of error in measurement? How are these sources related to one another? How are these sources related to the ideas of reliability and validity? What are easier to deal with: problems in reliability or problems in validity? How is Cronbach’s alpha related to other measures of reliability? Why is it the disciplinary standard? What are the components of Cronbach’s reliability? From the point of view of generalizability theory
Generalizability Theory Matt, 2006
Reliability – formal definition Total variance in scores Systematic differences between subjects Unsystematic differences between subjects due to internal (to subjects) or external measurement errors
Example You measured the mathematics aptitude of a child who is suspected to be dysnumeric. The Wechsler standard aptitude score was 68. The reliability of the aptitude test was 0.88. Could you report to the parents that the child appears to be dysnumeric.
How do we increase reliability Reliability is proportional to the true score variance Larger the number of items, larger the variance, higher the reliability (under certain assumptions) # items new / # items old
“Generalizability” Mean sq person = estimate of the variance of all possible persons Mean sq item = estimate of the variance of all possible items Mean sq error = estimate of the unexplained variance
“Generalizability” Mean sq person = estimate of the variance of all possible persons Mean sq item = estimate of the variance of all possible items Mean sq error = estimate of the unexplained variance TRUE person-to-person variance RELIABILITY = TOTAL person-to-person variance MS (person) – MS (error) RELIABILITY = MS (person)
Generalizability Theory Data consist of person-item scores Variability in the data: Variability across individuals Variability across items Variability across person*item combinations
SPSS output
Generalizability with multiple “facets” Persons who are assessed Items used Informants providing data Occasions of measurement Settings in which measures were taken
Validity
Validity Reliable Valid Not Reliable Not Valid Systematic errors Unsystematic errors RELIABILITY Reliable Valid Not Reliable Not Valid
Types of Validity Content validity Criterion validity Concurrent validity Predictive validity Construct validity Convergent validity Discriminant validity
Content validity The test should elicit a range of responses that are representative of the entire domain of relevant attributes (e.g., skills, emotions, behaviors, attitudes, traits, symptoms). Achievement tests: Are all the objectives of instruction represented? Other assessments: Use guidance from theory and experts
Criterion Validity The test results should be in agreement with other accepted measures of the same attribute.
Reasons for criterion validity to be biased Group differences (Aiken unclear) Subgroups will have lower criterion validity than the aggregate due to restriction of range problem. If a test is developed and validated on a group, cross validation estimates will be lower than the original estimates of criterion validity. Length of the test Longer the test, higher the variance, higher the covariance, higher the criterion validity Criterion contamination When validation is not blind, criterion validity estimates will be upward biased
Construct validity Extent to which an instrument measures the targeted construct. Especially relevant for personality tests, attitude tests, and assessments of psychological problems/disorders. Establishing construct validity Expert judgment Reliability analysis Group differences should be in the expected direction. Convergent and discriminant validity (varying measures and methods) Talk-through administration of the assessment
Validity and social consequences of tests Social consequences of tests may be undesirable Because of the test Because of the phenomenon that the test measures