Foundations of Evidence-Based Outcome Measurement
Basic Question “What is the best way to measure this client’s problems so his or her progress can be monitored over time in a way that will result in the most favorable outcomes for this client?”
Measurement Systematic process that involves assigning labels (usually numbers), to characteristics of people, objects, or events, using explicit and consistent rules so, ideally, the labels accurately represent the characteristic measured
Measurement Plan Overall strategy used to measure a client’s outcomes, including the methods and instruments used, how to obtain the information, who can best provide the information, and when, where, and how often the information should be collected
Measurement Method Class of measurement procedures (e.g., standardized self-report scales)
Measurement Instrument Specific measurement tool (e.g., a specific standardized self-report scale measuring depression)
Measurement Errors Discrepancies between measured and actual (“true”) values of a variable Caused by flaws in the measurement process e.g., characteristics of clients or other respondents, measurement conditions, properties of measures)
Random Measurement Errors Discrepancies between measured and actual (“true”) values of a variable which are equally likely to be higher or lower than the actual values because they are caused by chance fluctuations in measurement Caused by flaws in the measurement process Tend to cancel each other out and average to zero but they increase the variability of measured values
Systematic Measurement Errors Discrepancies between measured and actual (“true”) values of a variable which are more likely to be higher or lower than the actual values of the variable Caused by flaws in the measurement process Lead to over- or under estimates of the actual values of a variable Also known as “bias” in measurement
Systematic and Random Measurement Errors
Correlation Originally “co-relation” Sir Francis Galton’s idea Born: 16 Feb 1822 in Sparkbrook, England Died: 17 Jan 1911 in Grayshott House, Haslemere, Surrey, England Charles Darwin’s cousin Revelation occurred about 1888 while he took cover during a rainstorm
Correlation Karl Pearson, Galton’s colleague, worked out the details about 1896 Born: 27 March 1857 in London, England Died: 27 April 1936 in London, England Coined the term “standard deviation”
Visualizing Correlations ndg/tiein/johnson/correlation.htm ndg/tiein/johnson/correlation.htm java/GCApplet/GCAppletFrame.html java/GCApplet/GCAppletFrame.html
Pearson Product-Moment Correlation (r) Indicates direction and magnitude of linear relationship Range from -1 to +1 0 indicates no linear relationship r 2 indicates the amount of variance in the DV accounted for by the IV
Fostering Challenges Assess foster parent applicants’ skills and abilities to manage some of the unique challenges of fostering Vignettes ask applicants what they would do if faced with common dilemmas that foster parents often experience
Reliability General term for the consistency of measurements, and unreliability means inconsistency caused by random measurement errors
Methods for Determining Reliability Test-retest Alternate form (not discussed) Internal consistency Interrater/interobserver
Test-retest Degree to which scores on a measure are consistent over time Independently measure the same people, with the same measure, under the same circumstances
Construct Complex concept (e.g., intelligence, well- being, depression) Inferred or derived from a set of interrelated attributes (e.g., behaviors, experiences, subjective states, attitudes) of people, objects, or events Typically embedded in a theory Oftentimes not directly observable but measured using multiple indicators
Internal Consistency Degree to which responses to a set of items on a standardized scale measure the same construct consistently Independently measure the same people, with a single multiple-item measure, under the same circumstances
Internal Consistency (cont’d) Coefficient alpha Statistic typically used to quantify the internal consistency reliability of a standardized scale Also known as “Cronbach’s alpha” and, when items are dichotomous, “KR-20”
Interrater/Interobserver Degree of consistency in ratings or observations across raters, observers, or judges Multiple observers independently observe the same people, using the same measure, under the same circumstances
Father (●) and mother ( ○ ) Report of Time with Family
Adequate Reliability?.90+, excellent , good , acceptable <.70 suspect
Measurement Validity General term for the degree to which accumulated evidence and theory support interpretations and uses of scores derived from a measure More important, but more difficult to determine than reliability
Methods for Determining Validity Face Content Criterion Concurrent and Predictive Construct Convergent, Discriminant, and Sensitivity to Change
Face Degree to which a measure of a construct or other variable appears to measure a given construct in the opinion of clients, other respondents, and other users of the measure
Content Degree to which questions, behaviors, or other types of content represent a given construct comprehensively (e.g., the full range of relevant content is represented, and irrelevant content is not) Outcome as conceptualized Irrelevant elements - Relevant elements Outcome as operationalized
Criterion Degree to which scores on a measure can predict performance or status on another measure that serves as a standard (i.e., the criterion, sometimes called a “gold standard”)
Concurrent-Criterion Degree to which scores on a measure can predict a contemporaneous criterion Usually concurrent validity evidence is collected when we want to replace an existing measure with a simpler, cheaper, or less invasive one
Predictive-Criterion Degree to which scores on a measure can predict a criterion measured at a future point in time Usually predictive validity evidence is collected when we want to use results from a measure (e.g., ACT or SAT scores), to find out what might happen in the future (e.g., successful graduation from college), in order to take some course of action in the present (e.g., admit a student to college)
Construct Degree to which scores on a measure can be interpreted as representing a given construct, as evidenced by theoretically predicted patterns of associations with: Measures of related variables Measures of unrelated variable Group differences Changes over time
Convergent Degree to which scores derived from a measure of a construct are correlated in the predicted way with other measures of the same or related constructs or variables Depression Anxiety Lost Work Days Negative Self Appraisal Stressors
Discriminant Degree to which scores derived from a measure of a construct are uncorrelated with, or otherwise distinct from, theoretically dissimilar or unrelated constructs or other variables Anxiety LiterarcyIntelligenceHeight
Sensitivity to Change Degree to which a measure detects genuine change in the variable measured
Unifying Concept Construct Validity FaceContentCriterionConvergentDiscriminant Sensitivity to change
Relationship Between Reliability and Validity Reliability prerequisite for validity Lower reliability leads to lower validity Validity implies reliability
Decisions, Decisions… Who, Where, When, How often to collect outcome data
Who Client Practitioner Relevant others Independent evaluators
Where and When Developmental psychology is “the science of the strange behavior of children in strange situations with strange adults for the briefest possible periods of time.” Bronfenbrenner, 1977, p. 513 Representative samples Subset of observations (e.g., people, situations, times) that has characteristics similar to observations of the population from which the sample was selected
Representative? Population Sample Population Sample Population Sample
How Often Regular, frequent, pre-designated intervals Often enough to detect significant changes in the problem, but not so often that it becomes problematic
Engage and Prepare Clients Be certain the client understands and accepts the value and purpose of monitoring progress Discuss confidentiality Present measures with confidence Don’t ask for information the client can’t provide
Engage and Prepare Clients (cont’d) Be sure the client is prepared Be careful how you respond to information Use the information that is collected
Practical and Contributes to Favorable Outcomes Reliability and validity are necessary, but not sufficient characteristics of outcome measures Cost, benefit, efficiency, and acceptability also are important Accurate Practical