Evaluating Health-Related Quality of Life Measures Ron D. Hays, Ph.D. UCLA GIM & HSR February 9, 2015 (9:00-11:50 am) HPM 214, Los Angeles, CA.

Evaluating Health-Related Quality of Life Measures Ron D. Hays, Ph.D. UCLA GIM & HSR February 9, 2015 (9:00-11:50 am) HPM 214, Los Angeles, CA

Where are we now in HPM 214? http://hpm214.med.ucla.edu/ 1.Introduction to Outcomes and Effectiveness 2.HRQOL Profile Measures 3.HRQOL Preference-Based Measures 4.Designing HRQOL Measures 5.Evaluating HRQOL Measures  6.PROMIS/IRT/Internet Panels 7.Responding to reviews 8.Course Review ( Cognitive interview assignment due ) 9.Final Exam (3/16/15) 2

The 2nd class assignment is to conduct and summarize 5 cognitive interviews with a self-administered HRQOL survey instrument. Your written summary should be no more than 3 pages in length. Longer summaries will not be accepted. You are required to conduct 5 (and no more than 5) cognitive interviews with every item in your selected instrument. If you have a long instrument you can parse it up so that each respondent does not have to be interviewed on every item but 5 people need to be exposed to each item. http://www.chime.ucla.edu/qualitativemethods.htmhttp://www.chime.ucla.edu/qualitativemethods.htm.The cognitive interview write-up is due at 9am on 03/09/15. ---------------------------------------------------------------------- Extra credit can be obtained by writing a 2-page review of a published HRQOL article. The article selected needs to be cleared with the instructor in advance.

Four Levels of Measurement Nominal (categorical) Ordinal (rank) Interval (numerical) Ratio (numerical)

Levels of Measurement and Their Properties Property LevelMagnitude Equal Interval Absolute 0 NominalNoNoNo OrdinalYesNoNo IntervalYesYesNo RatioYesYesYes

Ordinal Scale In general, how would you rate your health? –Excellent –Very good –Good –Fair –Poor

Ordinal Scale In general, how would you rate your health is … –100 = Excellent? –075 = Very good? [84] [76] –050 = Good? [61] [52] –025 = Fair? [26] –000 = Poor?

Interval Scales Fahrenheit and Centigrade temperature –T (°C) = (T (°F) - 32) × 5/9 40°C ≠ 2 times as hot as 20°C 104°F ≠ 2 times as hot as 68°F

Ratio Scales Kelvin Temperature Scale (absolute 0) Days spent in hospital in last 30 days Age A 4- year old is twice as old as a 2-year old. If you subtract 1 from both of their ages, then 4 becomes 3 and 2 becomes 1. The 4-year old is still twice as old as the 2-year old despite the new age values being 3 versus 1 (i.e., “0” no longer means zero years).

Measurement Range for HRQOL Measures NominalOrdinalIntervalRatio

Levels of Measurement and Their Properties Item PersonMagnitude Equal Interval Absolute 0 Total Score NominalNoNoNo0 OrdinalYesNoNo1 IntervalYesYesNo2 RatioYesYesYes3

12 Four Types of Data Collection Errors Coverage Error Does each person in target population have an equal chance of selection? Sampling Error Only some members of the target population are sampled. Nonresponse Error Do people in the sample who respond differ from those who do not? Measurement Error Inaccuracy in answers given to survey questions.

Characteristics of Good Measures Acceptability Variability Reliability Validity Interpretability

Indicators of Acceptability Response rate Administration time Missing data (item, scale)

Variability Responses fall in each response category Distribution approximates bell-shaped “normal” curve (68.2%, 95.4%, and 99.6%)

Reliability Reliability is the degree to which the same score is obtained for thing being measured (person, plant or whatever) when that thing hasn’t changed. –Ratio of signal to noise

Observed Score is: observed score = “true” score + systematic error + random error

Flavors of Reliability Inter-rater (rater) –Need 2 or more raters of the thing being measured Test-retest (administrations) –Need 2 or more time points Internal consistency (items) –Need 2 or more items

Reliability Minimum Standards 0.70 or above (for group comparisons) 0.90 or higher (for individual assessment)  SEM = SD (1- reliability) 1/2  95% CI = true score +/- 1.96 x SEM  if z-score = 0, then CI: -.62 to +.62 when reliability = 0.90  Width of CI is 1.24 z-score units

Hypothetical Ratings of Performance of Six Students in HPM 214 by Two Raters Using Excellent to Poor Scale [1 = Poor; 2 = Fair; 3 = Good; 4 = Very good; 5 = Excellent] 1= Julian (Good, Very Good) 2= Narissa (Very Good, Excellent) 3= Alina (Good, Good) 4= Greg (Fair, Poor) 5= Linda (Excellent, Very Good) 6= Caroline (Fair, Fair) (Target = 6 students; assessed by 2 raters)

Kappa Coefficient of Agreement (Corrects for Chance) kappa = (observed - chance) (1 - chance) “Quality Index”

Cross-Tab of Ratings Rater 1Total PFGVGE P011 F11 G11 VG1012 E101 022116 Rater 2

Calculating KAPPA P C = (0 x 1) + (2 x 1) + (2 x 1) + (1 x 2) + (1 x 1) =0.19 (6 x 6) P obs. = 2 =0.33 6 Kappa = 0.33– 0.19 =0.17 1 - 0.19

Guidelines for Interpreting Kappa ConclusionKappaConclusionKappa Poor <.40 Poor < 0.0 Fair.40 -.59 Slight.00 -.20 Good.60 -.74 Fair.21 -.40 Excellent >.74 Moderate.41 -.60 Substantial.61 -.80 Almost perfect.81 - 1.00 Fleiss (1981) Landis and Koch (1977)

Weighted Kappa (Linear and Quadratic) PFGVGE P1.75 (.937).50 (.750).25 (.437)0 F.75 (.937)1.50 (.750).25 (.437) G.50 (.750).75 (.937)1.50 (.750) VG.25 (.437).50 (.750).75 (.937)1 E0.25 (.437).5 (.750).75 (.937)1 W l = 1 – ( i/ (k – 1)) W q = 1 – (i 2 / (k – 1) 2 ) i = number of categories ratings differ by k = n of categories Linear weighted kappa = 0.52; Quadratic weighted kappa = 0.77

26 Intraclass Correlation and Reliability ModelIntraclass CorrelationReliability One- way Two- way mixed Two-way random BMS = Between Ratee Mean Square N = n of ratees WMS = Within Mean Square k = n of items or raters JMS = Item or Rater Mean Square EMS = Ratee x Item (Rater) Mean Square

Two-Way Random Effects ( Reliability of Performance Ratings) Students (BMS) 5 15.67 3.13 Raters (JMS) 1 0.00 0.00 Stud. x Raters (EMS) 5 2.00 0.40 Total 11 17.67 Source df SSMS 2-way R = 6 (3.13 - 0.40) = 0.89 6 (3.13) + 0.00 - 0.40 01 13 01 24 02 14 02 25 03 13 03 23 04 12 04 21 05 15 05 24 06 12 06 22 ICC = 0.80

Responses of Students to Two Questions about Their Health 1= Julian (Good, Very Good) 2= Narissa (Very Good, Excellent) 3= Alina (Good, Good) 4= Greg (Fair, Poor) 5= Linda (Excellent, Very Good) 6= Caroline (Fair, Fair) (Target = 6 students; assessed by 2 items)

Two-Way Mixed Effects (Cronbach’s Alpha) Respondents (BMS) 5 15.67 3.13 Items (JMS) 1 0.00 0.00 Resp. x Items (EMS) 5 2.00 0.40 Total 11 17.67 Source df SSMS Alpha = 3.13 - 0.40 = 2.93 = 0.87 3.13 01 34 02 45 03 33 04 21 05 54 06 22 ICC = 0.77

Satisfaction of 12 Family Members with 6 Students (2 per student) 1. Julian (fam1: Good, fam2: Very Good) 2. Narissa (fam3: Very Good, fam4: Excellent) 3. Alina (fam5: Good, fam6: Good) 4. Greg (fam7: Fair, fam8: Poor) 5. Linda (fam9: Excellent, fam10: Very Good) 6. Caroline (fam11: Fair, fam12: Fair) (Target = 6 students; assessed by 2 family members each)

One-Way ANOVA (Reliability of Ratings of Students) Respondents (BMS) 5 15.67 3.13 Within (WMS) 6 2.00 0.33 Total 11 17.67 Source df SS MS 1-way = 3.13 - 0.33 = 2.80 = 0.89 3.13 01 13 01 24 02 34 02 45 03 53 03 63 04 72 04 81 05 95 05 04 06 12 06 22

Standardized Alpha for Different Numbers of Items and Average Inter-item Correlation 2.000.333.572.750.889 1.000 4.000.500.727.857.941 1.000 6.000.600.800.900.960 1.000 8.000.666.842.924.970 1.000 Number of Items (k).0.2.4.6.81.0 Average Inter-item Correlation ( r ) Alpha st = k * r 1 + (k -1) * r

Spearman-Brown Prophecy Formula alpha y = N alpha x 1 + (N - 1) * alpha x N = how much longer scale y is than scale x ) (

Example Spearman-Brown Calculations Estimating the reliability of the MHI-18 from the MHI-32 18/32 (0.98) = 0.55125 =0.96 (1+(18/32 –1)*0.980.57125

Number of Items and Reliability: Three Versions of the Mental Health Inventory (MHI) Measure Number of Items Completion Time (min.) Reliability MHI-32325-8.98 MHI-18183-5.96 MHI-55 1 or less.90 Data from McHorney et al. 1992

Multitrait Scaling Analysis Internal consistency reliability –Item convergence Item discrimination

37 Item-scale correlation matrix

38 Item-scale correlation matrix

Validity Does instrument measure what it is supposed to measure? A “validated” instrument is a holy grail

Reliability and Validity

Threats to Validity Socially Desirable Response Set Socially Desirable Response Set Acquiescent Response Set Acquiescent Response Set

Listed below are a few statements about your relationships with others. How much is each statement TRUE or FALSE for you? 1. I am always courteous even to people who are disagreeable. 2. There have been occasions when I took advantage of someone. 3. I sometimes try to get even rather than forgive and forget. 4. I sometimes feel resentful when I don’t get my way. 5. No matter who I’m talking to, I’m always a good listener. Definitely true; Most true; Don’t know; Mostly false; Definitely false

Two Types of Validity Content Validity –Includes face validity Construct Validity –Many synonyms

Content Validity Does the measure adequately represent the domain? –Do items operationalize concept? –Do items cover all aspects of concept? –Does scale name represent item content? Face validity is extent to which measure “appears” to reflect what it is intended to –E.g., by expert judges or by patient focus groups

Construct Validity Do scores on a measure relate to other variables in ways consistent with hypotheses?

Evaluating Construct Validity ScaleAgeObesityESRDNursing Home Resident Physical Functioning Medium (-). Small (-) Large (-) Depressive Symptoms ? Small (+) ? Medium (+) Cohen effect size rules of thumb (d = 0.2, 0.5, and 0.8): Small correlation = 0.100 Medium correlation = 0.243 Large correlation = 0.371 r = d / [(d 2 + 4).5 ] = 0.8 / [(0.8 2 + 4).5 ] = 0.8 / [(0.64 + 4).5 ] = 0.8 / [( 4.64).5 ] = 0.8 / 2.154 = 0.371 (Beware r’s of 0.10, 0.30 and 0.50 are often cited as small, medium, and large.)

Relative Validity Analyses Form of "known groups" validity Relative sensitivity of measure to important clinical difference One-way between group ANOVA

Relative Validity Example Severity of Heart Disease NoneMildSevereF-ratio Relative Validity Scale #1 8790912-- Scale #2 747888105 Scale #3 7787952010

Responsiveness to Change HRQOL measures should be responsive to interventions that changes HRQOL Need external indicators of change (Anchors)

Self-Report Indicator of Change Overall has there been any change in your asthma since the beginning of the study? Much improved; Moderately improved; Minimally improved No change Minimally worse; Moderately worse; Much worse

Clinical Indicator of Change “changed” group = seizure free (100% reduction in seizure frequency) “unchanged” group = <50% change in seizure frequency

Responsiveness Indices (1) Effect size (ES) = D/SD (2) Standardized Response Mean (SRM) = D/SD† (3) Guyatt responsiveness statistic (RS) = D/SD‡ D = raw score change in “changed” group; SD = baseline SD; SD† = SD of D; SD‡ = SD of D among “unchanged”

Effect Size Benchmarks Small: 0.20->0.49 Moderate: 0.50->0.79 Large: 0.80 or above

Minimally Important Difference (MID) External anchors –Self-report –Provider report –Clinical measure –Intervention Anchor correlated with change on target measure at 0.371 or higher Anchor indicates “minimal” change

Change in Physical Function Baseline = 100 (U.S. males mean = 87, SD = 20) Hit by Bike causes me to be limited a lot in vigorous activities, limited a little in moderate activities, and limited a lot in climbing several flights of stairs. Physical functioning drops to 75 (-1.25 SD) Hit by Rock causes me to be limited a little in vigorous activities and physical functioning drops to 95 (- 0.25 SD)

Example with Multiple Anchors 693 RA clinical trial participants evaluated at baseline and 6- weeks post-treatment. Five anchors: 1.patient global self-report; 2.physician global report; 3.pain self-report; 4.joint swelling; 5.joint tenderness Kosinski, M. et al. (2000). Determining minimally important changes in generic and disease- specific health-related quality of life questionnaires in clinical trials of rheumatoid arthritis. Arthritis and Rheumatism, 43, 1478-1487.

Patient and Physician Global Reports How are you (is the patient) doing, considering all the ways that RA affects you (him/her)? Very good (asymptomatic and no limitation of normal activities) Good (mild symptoms and no limitation of normal activities) Fair (moderate symptoms and limitation of normal activities) Poor (severe symptoms and inability to carry out most normal activities) Very poor (very severe symptoms that are intolerable and inability to carry out normal activities --> Improvement of 1 level over time

Global Pain, Joint Swelling and Tenderness 0 = no pain, 10 = severe pain Number of swollen and tender joints -> 1-20% improvement over time

Effect Sizes (mean = 0.34) for SF-36 Changes Linked to Minimal Change in Anchors ScaleSelf-RClin.-RPainSwell Tende r Mean PF.35.33.34.26.32.32 Role-P.56.52.29.35.36.42 Pain.83.70.47.69.42.62 GH.20.12.09.12.04.12 EWB.39.26.25.18.05.23 Role-E.41.28.18.38.26.30 SF.43.34.28.29.38.34 EF.50.47.22.22.35.35 PCS.49.48.34.29.36.39 MCS.42.27.19.27.20.27

Appendix-- ANOVA Computations A. Student’s SS (7 2 +9 2 +6 2 +3 2 +9 2 +4 2 )/2 – 38 2 /12 = 15.67 B. Rater/Item SS (19 2 +19 2 )/6 – 38 2 /12 = 0.00 C. Total SS (3 2 + 4 2 +4 2 +5 2 +3 2 +3 2 +2 2 +1 2 +5 2 +4 2 +2 2 +2 2 ) – 38 2 /10 = 17.67 Student x Item SS= A – (B + C SS)

options ls=130 ps=52 nocenter; options nofmterr; data one; input id 1-2 rater 4 rating 5; CARDS; 01 13 01 24 02 14 02 25 03 13 03 23 04 12 04 21 05 15 05 24 06 12 06 22 ; run; **************;

proc freq; tables rater rating; run; *******************; proc means; var rater rating; run; *******************************************; proc anova; class id rater; model rating=id rater id*rater; run; *******************************************;

data one; input id 1-2 rater 4 rating 5; CARDS; 01 13 01 24 02 14 02 25 03 13 03 23 04 12 04 21 05 15 05 24 06 12 06 22 ; run; *************************************************************** ***; %GRIP(indata=one,targetv=id,repeatv=rater,dv=rating, type=1,t1=test of GRIP macro,t2=); GRIP macro is available at: http://gim.med.ucla.edu/FacultyPages/Hays/util.htm

data one; input id 1-2 rater1 4 rater2 5; control=1; CARDS; 01 34 02 45 03 33 04 21 05 54 06 22 ; run; **************; DATA DUMMY; INPUT id 1-2 rater1 4 rater2 5; CARDS; 01 11 02 22 03 33 04 44 05 55 RUN;

DATA NEW; SET ONE DUMMY; PROC FREQ; TABLES CONTROL*RATER1*RATER2 /NOCOL NOROW NOPERCENT AGREE; *******************************************; data one; set one; *****************************************; proc means; var rater1 rater2; run; *******************************************; proc corr alpha; var rater1 rater2; run;

Evaluating Health-Related Quality of Life Measures Ron D. Hays, Ph.D. UCLA GIM & HSR February 9, 2015 (9:00-11:50 am) HPM 214, Los Angeles, CA.

Similar presentations

Presentation on theme: "Evaluating Health-Related Quality of Life Measures Ron D. Hays, Ph.D. UCLA GIM & HSR February 9, 2015 (9:00-11:50 am) HPM 214, Los Angeles, CA."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Evaluating Health-Related Quality of Life Measures Ron D. Hays, Ph.D. UCLA GIM & HSR February 9, 2015 (9:00-11:50 am) HPM 214, Los Angeles, CA.

Similar presentations

Presentation on theme: "Evaluating Health-Related Quality of Life Measures Ron D. Hays, Ph.D. UCLA GIM & HSR February 9, 2015 (9:00-11:50 am) HPM 214, Los Angeles, CA."— Presentation transcript:

Similar presentations

About project

Feedback