Measurement and Data Quality
Measurement The assignment of numbers to represent the amount of an attribute present in an object or person, using specific rules Advantages: Removes guesswork Provides precise information Less vague than words
Levels of Measurement There are four levels (classes) of measurement: Nominal (assigning numbers to classify characteristics into categories) Gender, religion Ordinal (ranking objects based on their relative standing on an attribute) "very dissatisfied," "somewhat dissatisfied," "somewhat satisfied," or "very satisfied." Interval (objects ordered on a scale that has equal distances between points on the scale) Fahrenheit scale of temperature Ratio (equal distances between score units; there is a rational, meaningful zero) amount of money you have in your pocket right now A variable’s level of measurement determines what mathematic operations can be performed in a statistical analysis.
Errors of Measurement Obtained Score = True score ± Error Obtained score: An actual data value for a participant (e.g., anxiety scale score) True score: The score that would be obtained with an infallible measure Error: The error of measurement, caused by factors that distort measurement
Factors That Contribute to Errors of Measurement Situational contaminants Transitory personal factors (e.g., fatigue) Response-set biases Administration variations Item sampling
Question Is the following statement True or False? The true score is data obtained from the actual research study.
Answer False The true score is the score that would be obtained with an infallible measure. The obtained score is an actual value (datum) for a participant.
Psychometric Assessments A psychometric assessment is an evaluation of the quality of a measuring instrument. Key criteria in a psychometric assessment: Reliability Validity
Reliability The consistency and accuracy with which an instrument measures the target attribute Reliability assessments involve computing a reliability coefficient. Reliability coefficients can range from .00 to 1.00. Coefficients below .70 are considered unsatisfactory. Coefficients of .80 or higher are desirable.
Three Aspects of Reliability Can Be Evaluated Stability Internal consistency Equivalence
Stability The extent to which scores are similar on two separate administrations of an instrument Evaluated by test–retest reliability Requires participants to complete the same instrument on two occasions Appropriate for relatively enduring attributes (e.g., creativity)
Internal Consistency The extent to which all the items on an instrument are measuring the same unitary attribute Evaluated by administering instrument on one occasion Appropriate for most multi-item instruments The most widely used approach to assessing reliability Assessed by computing coefficient alpha (Cronbach’s alpha) Alphas ≥.80 are highly desirable.
Question When determining the reliability of a measurement tool, which value would indicate that the tool is most reliable? 0.50 0.70 0.90 1.10
Answer c. 0.90 Reliability coefficients can range from 0.0 to 1.00. Coefficients of 0.80 or higher are desirable. Thus, a coefficient of 0.90 would indicate that the tool is very reliable. A value greater than 1.00 for a coefficient would be an error.
Equivalence The degree of similarity between alternative forms of an instrument or between multiple raters/observers using an instrument Most relevant for structured observations Assessed by comparing agreement between observations or ratings of two or more observers (interobserver/interrater reliability)
Reliability Principles Low reliability can undermine adequate testing of hypotheses. Reliability estimates vary depending on procedure used to obtain them. Reliability is lower in homogeneous than heterogeneous samples. Reliability is lower in shorter than longer multi-item scales.
Validity The degree to which an instrument measures what it is supposed to measure Four aspects of validity: Face validity Content validity Criterion-related validity Construct validity
Face Validity Refers to whether the instrument looks as though it is an appropriate measure of the construct Based on judgment; no objective criteria for assessment
Content Validity The degree to which an instrument has an adequate sample of items for the construct being measured Evaluated by expert evaluation, often via a quantitative measure—the content validity index (CVI)
Question Is the following statement True or False? Face validity of an instrument is based on judgment.
Answer True Face validity refers to whether the instrument looks like it is an appropriate measure of the construct. There are no objective criteria for assessment; it is based on judgment.
Criterion-Related Validity The degree to which the instrument is related to an external criterion Validity coefficient is calculated by analyzing the relationship between scores on the instrument and the criterion. Two types: Predictive validity: the instrument’s ability to distinguish people whose performance differs on a future criterion Concurrent validity: the instrument’s ability to distinguish individuals who differ on a present criterion
Construct Validity Concerned with these questions: What is this instrument really measuring? Does it adequately measure the construct of interest?
Some Methods of Assessing Construct Validity Known-groups technique Testing relationships based on theoretical predictions Factor analysis
Criteria for Assessing Screening/Diagnostic Instruments Sensitivity: the instruments’ ability to correctly identify a “case”—i.e., to diagnose a condition Specificity: the instrument’s ability to correctly identify noncases, that is, to screen out those without the condition Likelihood ratio: Summarizes the relationship between sensitivity and specificity in a single number LR+: the ratio of true positives to false positives LR-: the ratio of false negatives to true negatives