Evaluating Health-Related Quality of Life Measures June 12, 2014 (1:00 – 2:00 PDT) Kaiser Methods Webinar Series Ron D.Hays, Ph.D.

Slides:



Advertisements
Similar presentations
ASSESSING RESPONSIVENESS OF HEALTH MEASUREMENTS. Link validity & reliability testing to purpose of the measure Some examples: In a diagnostic instrument,
Advertisements

The Research Consumer Evaluates Measurement Reliability and Validity
 A description of the ways a research will observe and measure a variable, so called because it specifies the operations that will be taken into account.
Methods for Assessing Safety Culture: A View from the Outside October 2, 2014 (9:30 – 10:30 EDT) Safety Culture Conference (AHRQ Watts Branch Conference.
1 Evaluating Multi-Item Scales Using Item-Scale Correlations and Confirmatory Factor Analysis Ron D. Hays, Ph.D. RCMAR.
Psychometric Evaluation of Survey Data Questionnaire Design and Testing Workshop October 24, 2014, 3:30-5:00pm Wilshire Blvd. Suite 710 Los Angeles,
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Internal Consistency Reliability Analysis PowerPoint.
1 8/14/2015 Evaluating the Significance of Health-Related Quality of Life Change in Individual Patients Ron Hays October 8, 2004 UCLA GIM/HSR.
1 Health-Related Quality of Life Ron D. Hays, Ph.D. - UCLA Department of Medicine: Division of General Internal Medicine.
Primer on Evaluating Reliability and Validity of Multi-Item Scales Questionnaire Design and Testing Workshop October 25, 2013, 3:30-5:00pm Wilshire.
Basic Methods for Measurement of Patient-Reported Outcome Measures Ron D. Hays, Ph.D. UCLA/RAND ISOQOL Conference on.
“A Critical Look at Health-Related Quality of Life Measures” SGIM Annual Meeting Ron D. Hays May 2, 2003 (12:30-1:30 pm)
Instrumentation.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
Is the Minimally Important Difference Really 0.50 of a Standard Deviation? Ron D. Hays, Ph.D. June 18, 2004.
Use of Health-Related Quality of Life Measures to Assess Individual Patients July 24, 2014 (1:00 – 2:00 PDT) Kaiser Permanente Methods Webinar Series Ron.
Patient-Centered Outcomes of Health Care Comparative Effectiveness Research February 3, :00am – 12:00pm CHS 1 Ron D.Hays, Ph.D.
Health-Related Quality of Life as an Indicator of Quality of Care May 4, 2014 (8:30 – 11:30 PDT) HPM216: Quality Assessment/ Making the Business Case for.
SAS PROC IRT July 20, 2015 RCMAR/EXPORT Methods Seminar 3-4pm Acknowledgements: - Karen L. Spritzer - NCI (1U2-CCA )
Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.
1 Assessing the Minimally Important Difference in Health-Related Quality of Life Scores Ron D. Hays, Ph.D. UCLA Department of Medicine October 25, 2006,
Tests and Measurements Intersession 2006.
Reliability & Agreement DeShon Internal Consistency Reliability Parallel forms reliability Parallel forms reliability Split-Half reliability Split-Half.
Development of Physical and Mental Health Summary Scores from PROMIS Global Items Ron D. Hays ( ) UCLA Department of Medicine
Measuring Health-Related Quality of Life
Overview of Health-Related Quality of Life Measures May 22, 2014 (1:00 – 2:00 PDT) Kaiser Methods Webinar Series 1 Ron D.Hays, Ph.D.
Appraisal and Its Application to Counseling COUN 550 Saint Joseph College For Class # 3 Copyright © 2005 by R. Halstead. All rights reserved.
Measurement Issues by Ron D. Hays UCLA Division of General Internal Medicine & Health Services Research (June 30, 2008, 4:55-5:35pm) Diabetes & Obesity.
1 Health-Related Quality of Life Assessment as an Indicator of Quality of Care (HPM 216) Ron D. Hays April 11, 2013(8:30-11:30 am) Wilshire Blvd.
1 11/17/2015 Psychometric Modeling and Calibration Ron D. Hays, Ph.D. September 11, 2006 PROMIS development process session 10:45am-12:30 pm.
1 Session 6 Minimally Important Differences Dave Cella Dennis Revicki Jeff Sloan David Feeny Ron Hays.
Item Response Theory (IRT) Models for Questionnaire Evaluation: Response to Reeve Ron D. Hays October 22, 2009, ~3:45-4:05pm
Evaluating Self-Report Data Using Psychometric Methods Ron D. Hays, PhD February 6, 2008 (3:30-6:30pm) HS 249F.
Multi-item Scale Evaluation Ron D. Hays, Ph.D. UCLA Division of General Internal Medicine/Health Services Research
Multitrait Scaling and IRT: Part I Ron D. Hays, Ph.D. Questionnaire Design and Testing.
CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.
Assessing Responsiveness of Health Measurements Ian McDowell, INTA, Santiago, March 20, 2001.
Multitrait Scaling and IRT: Part I Ron D. Hays, Ph.D. Questionnaire Design and Testing.
Psychometric Evaluation of Questionnaire Design and Testing Workshop December , 10:00-11:30 am Wilshire Suite 710 DATA.
Evaluating Self-Report Data Using Psychometric Methods Ron D. Hays, PhD February 8, 2006 (3:00-6:00pm) HS 249F.
Approaches for Estimating Minimally Important Differences Ron D. Hays, Ph.D. January 12, 2004 (8:50-9:10am) Minimal Clinically Important Differences in.
Measurement of Outcomes Ron D. Hays Accelerating eXcellence In translational Science (AXIS) January 17, 2013 (2:00-3:00 pm) 1720 E. 120 th Street, L.A.,
Considerations in Comparing Groups of People with PROs Ron D. Hays, Ph.D. UCLA Department of Medicine May 6, 2008, 3:45-5:00pm ISPOR, Toronto, Canada.
Patient-Reported Physical Functioning Ron D. Hays November 27, 2012 (11:15-11:30) UCLA Department of Medicine MCID for Orthopaedic Devices Silver Springs,
Multitrait Scaling and IRT: Part I Ron D. Hays, Ph.D. Questionnaire.
Evaluating Health-Related Quality of Life Measures Ron D. Hays, Ph.D. UCLA GIM & HSR February 9, 2015 (9:00-11:50 am) HPM 214, Los Angeles, CA.
Overview of Item Response Theory Ron D. Hays November 14, 2012 (8:10-8:30am) Geriatrics Society of America (GSA) Pre-Conference Workshop on Patient- Reported.
Estimating Minimally Important Differences on PROMIS Domain Scores Ron D. Hays UCLA Department of Medicine/Division of General Internal Medicine & Health.
Evaluating Multi-item Scales Ron D. Hays, Ph.D. UCLA Division of General Internal Medicine/Health Services Research HS237A 11/10/09, 2-4pm,
Evaluating Multi-Item Scales Health Services Research Design (HS 225B) January 26, 2015, 1:00-3:00pm CHS.
Health-Related Quality of Life in Outcome Studies Ron D. Hays, Ph.D UCLA Division of General Internal Medicine & Health Services Research GCRC Summer Session.
Health-Related Quality of Life (HRQOL) Assessment in Outcome Studies Ron D. Hays, Ph.D. UCLA/RAND GCRC Summer Course “The.
1 Measuring Agreement. 2 Introduction Different types of agreement Diagnosis by different methods  Do both methods give the same results? Disease absent.
Psychometric Evaluation of Items Ron D. Hays
UCLA Department of Medicine
UCLA Department of Medicine
Evaluating Multi-Item Scales
Understanding Results
Evaluating IRT Assumptions
Final Session of Summer: Psychometric Evaluation
Health-Related Quality of Life Assessment in Outcome Studies
Natalie Robinson Centre for Evidence-based Veterinary Medicine
Evaluating Multi-item Scales
Multitrait Scaling and IRT: Part I
Evaluating Multi-item Scales
Estimating Minimally Important Differences (MIDs)
Evaluating Multi-Item Scales
Introduction to Psychometric Analysis of Survey Data
Evaluating the Significance of Individual Change
UCLA Department of Medicine
Presentation transcript:

Evaluating Health-Related Quality of Life Measures June 12, 2014 (1:00 – 2:00 PDT) Kaiser Methods Webinar Series Ron D.Hays, Ph.D.

Burden of Kidney Disease Scale How true or false is each of the following statements for you? 1.My kidney disease interferes too much with my life. 2.Too much of my time is spent dealing with kidney disease. 3.I feel frustrated dealing with my kidney disease. 4.I feel like a burden on my family. - Definitely True = Mostly True = 75 - Don’t Know = 50 - Mostly false = 25 - Definitely false = 0 2

FDA PRO Process (released February 2006) 3

4 Aspects of Good Health-Related Quality of Life Measures Aside from being practical.. 1.Same people get same scores 2.Different people get different scores and differ in the way you expect 3.Measure is interpretable 4.Measure works the same way for different groups (age, gender, race/ethnicity)

5 Aspects of Good Health-Related Quality of Life Measures Aside from being practical.. 1.Same people get same scores 2.Different people get different scores and differ in the way you expect 3.Measure is interpretable 4.Measure works the same way for different groups (age, gender, race/ethnicity)

6 Aspects of Good Health-Related Quality of Life Measures Aside from being practical.. 1.Same people get same scores 2.Different people get different scores and differ in the way you expect 3.Measure is interpretable 4.Measure works the same way for different groups (age, gender, race/ethnicity)

7 Aspects of Good Health-Related Quality of Life Measures Aside from being practical.. 1.Same people get same scores 2.Different people get different scores and differ in the way you expect 3.Measure is interpretable 4.Measure works the same way for different groups (age, gender, race/ethnicity)

8 Aspects of Good Health-Related Quality of Life Measures Aside from being practical.. 1.Same people get same scores 2.Different people get different scores and differ in the way you expect 3.Measure is interpretable 4.Measure works the same way for different groups (age, gender, race/ethnicity)

Validity Does scale represent what it is supposed to be measuring? Content validity: Does measure “appear” to reflect what it is intended to (expert judges or patient judgments)? –Do items operationalize concept? –Do items cover all aspects of concept? –Does scale name represent item content? Construct validity –Are the associations of the measure with other variables consistent with hypotheses? 9

Relative Validity Example Severity of Kidney Disease NoneMildSevereF-ratio Relative Validity Burden of Disease # Burden of Disease # Burden of Disease # Sensitivity of measure to important (clinical) difference 10

Evaluating Construct Validity ScaleAge (years) (Better) Physical Functioning (-) 11

Evaluating Construct Validity ScaleAge (years) (Better) Physical Functioning Medium (-) 12

Evaluating Construct Validity ScaleAge (years) (Better) Physical Functioning Medium (-) 13 Effect size (ES) = D/SD D = Score difference SD = SD Small (0.20), medium (0.50), large (0.80)

Evaluating Construct Validity ScaleAge (years) (Better) Physical Functioning Medium (-) r ˜ ͂ 0.24 Cohen effect size rules of thumb (d = 0.20, 0.50, and 0.80): Small r = 0.100; medium r = 0.243; large r = r = d / [(d 2 + 4).5 ] = 0.80 / [( ).5 ] = 0.80 / [( ).5 ] = 0.80 / [( 4.64).5 ] = 0.80 / =

Evaluating Construct Validity ScaleAge (years)Obese yes = 1, no = 0 Kidney Disease yes = 1, no = 0 In Nursing home yes = 1, no = 0 (Better) Physical Functioning Medium (-) Small (-) Large (-) Cohen effect size rules of thumb (d = 0.20, 0.50, and 0.80): Small r = 0.100; medium r = 0.243; large r = r = d / [(d 2 + 4).5 ] = 0.80 / [( ).5 ] = 0.80 / [( ).5 ] = 0.80 / [( 4.64).5 ] = 0.80 / =

Evaluating Construct Validity ScaleAge (years)Obese yes = 1, no = 0 Kidney Disease yes = 1, no = 0 In Nursing home yes = 1, no = 0 (Better) Physical Functioning Medium (-) Small (-) Large (-) (More) Depressive Symptoms ? Small (+) ? Cohen effect size rules of thumb (d = 0.20, 0.50, and 0.80): Small r = 0.100; medium r = 0.243; large r = r = d / [(d 2 + 4).5 ] = 0.80 / [( ).5 ] = 0.80 / [( ).5 ] = 0.80 / [( 4.64).5 ] = 0.80 / = (r’s of 0.10, 0.30 and 0.50 are often cited as small, medium, and large.) 16

Questions? 17

Responsiveness to Change HRQOL measures should be responsive to interventions that change HRQOL Need external indicators of change (Anchors) –Clinical measure “improved” group = 100% reduction in seizure frequency “unchanged” group = <50% change in seizure frequency –Retrospective self- or provider-report of change Much better, A little better, Same, A little worse, Much worse Anchor correlated with change on target measure at or higher 18

Responsiveness Index Effect size (ES) = D/SD D = raw score change in “changed” (improved) group SD = baseline SD Small: 0.20->0.49 Medium: 0.50->0.79 Large: 0.80 or above 19

Responsiveness Indices (1)Effect size (ES) = D/SD (2) Standardized Response Mean (SRM) = D/SD† (3) Guyatt responsiveness statistic (RS) = D/SD‡ D = raw score change in “changed” group; SD = baseline SD; SD† = SD of D; SD‡ = SD of D among “unchanged” 20

Amount of Expected Change Varies SF-36 physical function score mean = 87 (SD = 20) Assume I have a score of 100 at baseline  Hit by Bike causes me to be –limited a lot in vigorous activities –limited a lot in climbing several flights of stairs –limited a little in moderate activities SF-36 physical functioning score drops to 75 (-1.25 SD)  Hit by Rock causes me to be –limited a little in vigorous activities SF-36 physical functioning score drops to 95 ( SD) 21

Partition Change on Anchor  A lot better  A little better  No change  A little worse  A lot worse 22

Use Multiple Anchors 693 RA clinical trial participants evaluated at baseline and 6- weeks post-treatment. Five anchors: 1.Self-report (global) by patient 2.Self-report (global) by physician 3.Self-report of pain 4.Joint swelling (clinical) 5.Joint tenderness (clinical) Kosinski, M. et al. (2000). Determining minimally important changes in generic and disease- specific health-related quality of life questionnaires in clinical trials of rheumatoid arthritis. Arthritis and Rheumatism, 43,

Patient and Physician Global Reports How are you (is the patient) doing, considering all the ways that RA affects you (him/her)? Very good (asymptomatic and no limitation of normal activities) Good (mild symptoms and no limitation of normal activities) Fair (moderate symptoms and limitation of normal activities) Poor (severe symptoms and inability to carry out most normal activities) Very poor (very severe symptoms that are intolerable and inability to carry out normal activities --> Improvement of 1 level over time 24

Global Pain, Joint Swelling and Tenderness 0 = no pain, 10 = severe pain Number of swollen and tender joints -> 1-20% improvement over time 25

Effect Sizes for SF-36 PF Change Linked to Minimal Change in Anchors ScaleSelf-RClin.-RPainSwellTenderMean Physical Function

Effect Sizes for SF-36 Changes Linked to Minimal Change in Anchors ScaleSelf-R PF.35 Role-P.56 Pain.83 GH.20 EWB.39 Role-E.41 SF.43 EF.50 PCS.49 MCS.42 27

Effect Sizes (mean = 0.34) for SF-36 Changes Linked to Minimal Change in Anchors ScaleSelf-RClin.-RPainSwellTenderMean PF Role-P Pain GH EWB Role-E SF EF PCS MCS

Reliability Degree to which the same score is obtained when the target or thing being measured (person, plant or whatever) hasn’t changed. Inter-rater (rater) Need 2 or more raters of the thing being measured Internal consistency (items) Need 2 or more items Test-retest (administrations) Need 2 or more time points 29

Ratings of Performance of Six Kaiser Presentations by Two Raters [1 = Poor; 2 = Fair; 3 = Good; 4 = Very good; 5 = Excellent] 1= Karen Kaiser (Good, Very Good) 2= Adam Ant (Very Good, Excellent) 3= Rick Dees (Good, Good) 4= Ron Hays (Fair, Poor) 5= John Adams (Excellent, Very Good) 6= Jane Error (Fair, Fair) (Target = 6 presenters; assessed by 2 raters) 30

Reliability Formulas ModelIntraclass CorrelationReliability One- way Two- way mixed Two-way random BMS = Between Ratee Mean Square N = n of ratees WMS = Within Mean Square k = n of items or raters JMS = Item or Rater Mean Square EMS = Ratee x Item (Rater) Mean Square 31

Two-Way Random Effects (Reliability of Ratings of Presentations) Presenters (BMS) Raters (JMS) Pres. x Raters (EMS) Total Source df SSMS 2-way R = 6 ( ) = (3.13) ICC =

Responses of Presenters to Two Questions about Their Health 1= Karen Kaiser (Good, Very Good) 2= Adam Ant (Very Good, Excellent) 3= Rick Dees (Good, Good) 4= Ron Hays (Fair, Poor) 5= John Adams (Excellent, Very Good) 6= Jane Error (Fair, Fair) (Target = 6 presenters; assessed by 2 items) 33

Two-Way Mixed Effects (Cronbach’s Alpha) Presenters (BMS) Items (JMS) Pres. x Items (EMS) Total Source df SSMS Alpha = = 2.93 = ICC =

Reliability Minimum Standards 0.70 or above (for group comparisons) 0.90 or higher (for individual assessment)  SEM = SD (1- reliability) 1/2  95% CI = true score +/ x SEM  if z-score = 0, then CI: -.62 to +.62 when reliability = 0.90  Width of CI is 1.24 z-score units 35

Guidelines for Interpreting Kappa ConclusionKappaConclusionKappa Poor <.40 Poor < 0.0 Fair Slight Good Fair Excellent >.74 Moderate Substantial Almost perfect Fleiss (1981) Landis and Koch (1977) 36

Questions? 37

Sufficient Unidimensionality One-Factor Categorical Confirmatory Factor Analytic Model (e.g., using Mplus) –Polychoric correlations; weighted least squares with adjustments for mean and variance Fit Indices –Comparative Fit Index, etc. 38

Local Independence After controlling for dominant factor(s), item pairs should not be associated. –Look for residual correlations > 0.20 Local dependence often caused by asking the same question multiple times. –“I’m generally sad about my life.” –“My life is generally sad.” 39

Item-scale correlation matrix 40

Item-scale correlation matrix 41

Posttraumatic Growth Inventory Indicate for each of the statements below the degree to which this change occurred in your life as a result of your crisis. A ppreciating each day (0) I did not experience this change as result of my crisis (1)I experienced this change to a very small degree as a result of my crisis (2)I experienced this change to a small degree as a result of my crisis (3)I experienced this change to a moderate degree as a result of my crisis (4)I experienced this change to a great degree as a result of my crisis (5)I experienced this change to a very great degree as a result of my crisis 42

Category Response Curves Great Change No Change  Very small change No change Small change Moderate change Great change Very great change “Appreciating each day.” 43

Differential Item Functioning (DIF) Probability of choosing each response category should be the same for those who have the same estimated scale score, regardless of other characteristics Evaluation of DIF – Different subgroups – Mode differences 44

Location DIF Slope DIF DIF (2-parameter model) Women Men AA White Higher Score = More Depressive Symptoms I cry when upsetI get sad for no reason 45

( ). Powerpoint file available for downloading at: