Advising on Test Validity: Comments on Denny Borsboom Neil K. Aaronson The Netherlands Cancer Institute KNAW Colloquium on Advising on Research Methods Amsterdam, March 29, 2007
The way to capture an audience’s attention is with a demonstration where there is a possibility the speaker may die. Jearl Walker, Cleveland State University
It usually takes more than three weeks to prepare a good impromptu speech. Mark Twain
Who am I? Health outcomes researcher Clinical oncology Develop questionnaires to assess patients’ illness and treatment experience from their own perspective For use in observational and evaluative studies in clinical research and practice
What are we attempting to measure? Health outcomes Health status Quality of life Health-related quality of life Patient-reported outcomes (PROs)
State of affairs in defining QL "Quality of life is a vague and ethereal entity, something that many people talk about, but which nobody clearly knows what to do about.“ Campbell et al., 1976 “The idea has become a kind of umbrella under which are placed many different indexes dealing with whatever the user wants to focus on.” Feinstein, 1987 “Quality of life is an ill-defined term…it means different things to different people, and takes on different meanings according to the area of application.” Fayers & Machin, 2000
Key dimensions of quality of life as defined by David Karnofsky (1949), the WHO (1949) and ASCO (1995) PhysicalSymptoms commonly caused by cancer and the toxicities of treatment PsychologicalEffects of cancer and its treatment on cognitive function and emotional state SocialEffects of cancer and its treatment on interpersonal relationships, school, work and recreation
Attributes of QL definitions Non-specific versus health-related Health states (or status) versus personal evaluation of those states (e.g., expectations, discrepancies, satisfaction) Scope of concerns (e.g., spirituality or existential issues) Polarity of concerns (dysfunction and its resolution vs. positive well-being)
Does it matter? Yes, because the content of QL questionnaires reflects the underlying definition. It may be less important in clinical trials, where group comparisons will be internally valid, regardless of the definition used. It is more important in comparing results across trials and in observational (e.g., prevalence) studies.
Examples of QL definitions “The difference between the hopes and expectations of the individual and the individual’s present experience.” Calman, 1987 “The functional effect of an illness and its consequent therapy upon a patient, as perceived by the patient.” Schipper et al. 1996
Covinsky et al. Am J Med 1999; 106: elderly patients rated their physical functioning, psychological distress and overall QL More than 40% of those who reported the worst physical functioning and/or the highest levels of psychological distress rated their QL as “good or excellent” Approximately 20% of those with the best physical functioning and lowest levels of distress rated their QL as “poor”
Generic HRQL instruments Sickness Impact Profile (SIP) Nottingham Health Profile (NHP) Spitzer QL Index COOP/WONCA Charts MOS 36-Item Health Survey (SF-36) World Health Organization (WHOQoL)
Cancer-specific QL questionnaires Functional Living Index – Cancer (FLIC) Cancer Rehabilitation Evaluation System (CARES) Rotterdam Symptom Checklist (RSCL) EORTC QLQ-C30 Functional Assessment of Cancer Therapy (FACT-G)
Key psychometric attributes of HRQL instruments measurement model reliability validity responsiveness interpretability cultural adaptability burden
Assessing validity of HRQL instruments: classical approaches (SAC/MOT 2001) Content-related evidence that the content domain of an instrument is appropriate relative to its intended use the use of lay and expert panel (clinician) judgments complete the questionnaire(s) yourself
Future perspective items SF-36 “I expect my health to get worse.” FACT-G “I worry about dying.” CARES-SF “I worry about whether the cancer will progress.” QLQ-C30 --
Assessing validity of HRQL instruments: classical approaches (SAC/MOT 2001) Construct-related evidence that supports a proposed interpretation of scores based on theoretical implications associated with the constructs being measured. examine interscale correlations examine patterns of scores for groups known to differ on relevant variables Disease-stage; treatment status, response to treatment, etc.
Questions for Denny and audience (1) Examining correlations between measures purported to assess the same concept indeed tends to yield little useful information for instrument developers or for end-users – the exercise is theoretically and empirically anemic However, the “known groups” comparison approach is intuitively appealing and tends to be well-understood and accepted by end-users Is this latter approach equally “suspect”; i.e. does it also fail to truly address the validity of a measure?
Questions for Denny and audience (2) Item response theory (IRT) approaches are quickly coming to dominate the field of HRQL instrument development (NIH PROMIS INITIATIVE) Generating large item banks for each domain of interest, primarily based on existing literature (e.g., depression, pain, fatigue) Collecting large datasets to model item and scale information curves Generating computer-adaptive versions of measures Will this approach really yield theoretically grounded and valid measures, or is it yet another example of “dustbowl empiricism”?
Suggested reading Fayers P, Hays R (eds). Assessing quality of life in clinical trials: Methods and practice. Oxford: Oxford University Press, 2005 Lipscomb J, Gotay CC, Snyder CF (eds.) Outcomes Assessment in Cancer. Cambridge: Cambridge University Press, 2005.