Download presentation
Presentation is loading. Please wait.
Published byDylan Floyd Modified over 9 years ago
1
Evaluating Self-Report Data Using Psychometric Methods Ron D. Hays, PhD (hays@rand.org) February 6, 2008 (3:30-6:30pm) HS 249F
2
H E A L T H Individual Change Interest in –Knowing how many patients benefit from group intervention, or –Tracking progress on individual patients Sample –54 patients –Average age = 56; 84% white; 58% female Method –Self-administered SF-36 version 2 at baseline and at end of therapy (about 6 weeks later).
3
H E A L T H Physical Functioning and Emotional Well-Being at Baseline for 54 Patients at UCLA-Center for East West Medicine Hays et al. (2000), American Journal of Medicine
4
H E A L T H Change in SF-36 Scores Over Time 0.13 0.35 0.21 0.53 0.36 0.11 0.41 0.24 0.30 Effect Size
5
H E A L T H t-test for within group change X D /(SD d /n ½ ) X D /(SD d /n ½ ) X D = is mean difference, SD d = standard deviation of difference
6
H E A L T H Significance of Group Change (T-scores) Changet-testprob. PF-101.72.38.0208 RP-44.13.81.0004 BP-23.62.59.0125 GH-52.42.86.0061 EN-45.14.33.0001 SF-24.73.51.0009 RE-31.50.96.3400 <- EWB-54.33.20.0023 PCS2.83.23.0021 MCS3.92.82.0067
7
H E A L T H Reliable Change Index (X 2 – X 1 )/ (SEM * SQRT [2]) SEM = SD b * (1- reliability) 1/2
8
H E A L T H Amount of Change in Observed Score Needed for Significant Individual Change RCIEffect size PF-10 8.4 0.67 RP-4 8.4 0.72 BP-2 10.4 1.01 GH-5 13.0 1.13 EN-4 12.8 1.33 SF-2 13.8 1.07 RE-3 9.7 0.71 EWB-5 13.4 1.26 PCS 7.1 0.62 MCS 9.7 0.73
9
H E A L T H Significant Change for 54 Cases % Improving % Declining Difference PF-1013% 2%+ 11% RP-431% 2%+ 29% BP-222% 7%+ 15% GH-5 7% 0%+ 7% EN-4 9% 2%+ 7% SF-217% 4%+ 13% RE-315% 0% EWB-519% 4%+ 15% PCS24% 7%+ 17% MCS22%11%+ 11%
10
H E A L T H Multiple Steps in Developing Good Survey Review literature Expert input (patients and clinicians) Define constructs you are interested in Draft items (item generation) Pretest – Cognitive interviews – Field and pilot testing Revise and test again Translate/harmonize across languages
11
H E A L T H What’s a Good Measure? Same person gets same score (reliability) Different people get different scores (validity) People get scores you expect (validity) It is practical to use (feasibility)
12
H E A L T H Scales of Measurement and Their Properties Nominal Ordinal + Interval + + Ratio + + + Type of Scale Rank Order Equal Interval Absolute 0 Property of Numbers
13
H E A L T H Measurement Range for Health Outcome Measures NominalOrdinalIntervalRatio
14
H E A L T H Indicators of Acceptability Unit non-response Item non-response Administration time
15
H E A L T H Variability All scale levels are represented Distribution approximates bell-shaped "normal"
16
H E A L T H Measurement Error observed = true score + systematic error + random error (bias)
17
H E A L T H Coverage Error Does each person in population have an equal chance of selection? Sampling Error Are only some members of the population sampled? Nonresponse Error Do people in the sample who respond differ from those who do not? Measurement Error Are inaccurate answers given to survey questions? Four Types of Data Collection Errors
18
H E A L T H Flavors of Reliability Test-retest (administrations) Intra-rater (raters) Internal consistency (items)
19
H E A L T H Test-retest Reliability of MMPI 317-362 r = 0.75 MMPI 317 True False 16915 2195 True False MMPI 362 184 116 190110 I am more sensitive than most other people.
20
H E A L T H Kappa Coefficient of Agreement (Corrects for Chance) (observed - chance) kappa = (1 - chance)
21
H E A L T H Example of Computing KAPPA 1234512345 1111 1212 2222 2 2 2222 2222222222 103 Column Sum Rater B 12345 Row Sum Rater A
22
H E A L T H Example of Computing KAPPA (Continued) P = (1 x 2) + (3 x 2) + (2 x 2) + (2 x 2) + (2 x 2) (10 x 10) = 0.20 c P = 9 10 = 0.90 obs. Kappa = 0.90 - 0.20 1 - 0.20 = 0.87
23
H E A L T H Conclusion Kappa Poor < 0.0 Slight.00 -.20 Poor <.40 Fair.21 -.40 Fair.40 -.59 Moderate.41 -.60 Good.60 -.74 Substantial.61 -.80 Excellent >.74 Almost perfect.81 - 1.00 Fleiss (1981) Landis and Koch (1977) Guidelines for Interpreting Kappa
24
H E A L T H Intraclass Correlation and Reliability ModelReliability Intraclass Correlation One-Way MS - MS MS - MS MS MS + (K-1)MS Two-Way MS - MS MS - MS Fixed MS MS + (K-1)MS Two-Way N (MS - MS ) MS - MS Random NMS +MS - MS MS + (K-1)MS + K (MS - MS )/N BMS JMS EMS BMS WMS BMS EMS BMS WMS BMS EMS BMS EMS BMS EMS JMS EMS WMS BMS EMS
25
H E A L T H Summary of Reliability of Plant Ratings Baseline Follow-up R TT R II R TT R II One-Way Anova 0.970.950.970.94 Two-Way Random Effects 0.970.950.970.94 Two-Way Fixed Effects 0.980.960.980.97 SourceLabel Baseline MS PlantsBMS 628.667 WithinWMS 17.700 RatersJMS 57.800 Raters X PlantsEMS 13.244
26
H E A L T H Raw Data for Ratings of Height ( 1/16 inch) of Houseplants (A1, A2, etc.) by Two Raters (R1, R2) A1 R11201211 R2118120 A2 R10840852 R2096088 B1 R11071082 R2105104 B2 R10941001 R2097104 C1 R10850882 R2091096 Plant Baseline Height Follow-up Height Experimental Condition
27
H E A L T H Ratings of Height of Houseplants (Cont.) C2 R1079086 1 R2078092 D1 R1070076 1 R2072080 D2 R1054056 2 R2056060 E1 R1085101 1 R2097108 E2 R1090084 2 R2092096 Plant Baseline Height Follow-up Height Experimental Condition
28
H E A L T H Reliability of Baseline Houseplant Ratings SourceDFSSMSF Plants 95658628.66735.52 Within10 177 17.700 Raters 1 57.8 57.800 Raters x Plants 9119.2 13.244 Total195835 Baseline Results Ratings of Height of Plants: 10 plants, 2 raters
29
H E A L T H Sources of Variance in Baseline Houseplant Height Source dfs MS Plants (N)9 628.67 (BMS) Within1017.70 (WMS) Raters (K)157.80 (JMS) Raters x Plants9 13.24 (EMS) Total19
30
H E A L T H Cronbach’s Alpha Respondents (BMS) 4 11.6 2.9 Items (JMS) 1 0.1 0.1 Resp. x Items (EMS) 4 4.4 1.1 Total 9 16.1 Source df SSMS Alpha = 2.9 - 1.1 = 1.8 = 0.62 2.9
31
H E A L T H Alpha for Different Numbers of Items and Homogeneity 2.000.333.572.750.889 1.000 4.000.500.727.857.941 1.000 6.000.600.800.900.960 1.000 8.000.666.842.924.970 1.000 Number of Items (k).0.2.4.6.81.0 Average Inter-item Correlation ( r ) Alpha st = k * r 1 + (k -1) * r
32
H E A L T H Spearman-Brown Prophecy Formula alpha y = N alpha x 1 + (N - 1) * alpha x N = how much longer scale y is than scale x ) (
33
H E A L T H Example Spearman-Brown Calculations MHI-18 18/32 (0.98) (1+(18/32 –1)*0.98 = 0.55125/0.57125 = 0.96
34
H E A L T H Number of Items and Reliability for Three Versions of the Mental Health Inventory (MHI)
35
H E A L T H Reliability Minimum Standards 0.70 or above (for group comparisons) 0.90 or higher (for individual assessment) SEM = SD (1- reliability) 1/2
36
H E A L T H Reliability of a Composite Score
37
H E A L T H Hypothetical Multitrait/Multi-Item Correlation Matrix
38
H E A L T H Multitrait/Multi-Item Correlation Matrix for Patient Satisfaction Ratings Technical Interpersonal Communication Financial Technical 10.66*0.63†0.67†0.28 20.55*0.54†0.50†0.25 30.48*0.410.44†0.26 40.59*0.530.56†0.26 50.55*0.60†0.56†0.16 60.59*0.58†0.57†0.23 Interpersonal 10.580.68*0.63†0.24 20.59†0.58*0.61†0.18 30.62†0.65*0.67†0.19 40.53†0.57*0.60†0.32 50.540.62*0.58†0.18 60.48†0.48*0.46†0.24 Note – Standard error of correlation is 0.03. Technical = satisfaction with technical quality. Interpersonal = satisfaction with the interpersonal aspects. Communication = satisfaction with communication. Financial = satisfaction with financial arrangements. *Item-scale correlations for hypothesized scales (corrected for item overlap). †Correlation within two standard errors of the correlation of the item with its hypothesized scale.
39
H E A L T H Construct Validity Does measure relate to other measures in ways consistent with hypotheses? Responsiveness to change including minimally important difference
40
H E A L T H
41
Construct Validity for Scales Measuring Physical Functioning 919087 2 --- 887874 10 5 958777 20 10 NoneMildSevereF-ratio Relative Validity Scale #1 Scale #2 Scale #3 Severity of Heart Disease
42
H E A L T H Responsiveness to Change and Minimally Important Difference (MID) HRQOL measures should be responsive to interventions that changes HRQOL Need e xternal indicators of change (Anchors) –mean change in HRQOL scores among people who have changed (“minimal” change for MID).
43
H E A L T H Self-Report Indicator of Change Overall has there been any change in your asthma since the beginning of the study? Much improved; Moderately improved; Minimally improved No change Much worse; Moderately worse; Minimally worse
44
H E A L T H Clinical Indicator of Change –“changed” group = seizure free (100% reduction in seizure frequency) –“unchanged” group = <50% change in seizure frequency
45
H E A L T H Responsiveness Indices (1) Effect size (ES) = D/SD (2) Standardized Response Mean (SRM) = D/SD † (3) Guyatt responsiveness statistic (RS) = D/SD ‡ D = raw score change in “changed” group; SD = baseline SD; SD † = SD of D; SD ‡ = SD of D among “unchanged”
46
H E A L T H Effect Size Benchmarks Small: 0.20->0.49 Moderate: 0.50->0.79 Large: 0.80 or above
47
H E A L T H Treatment Impact on PCS
48
H E A L T H Treatment Impact on MCS
49
H E A L T H IRT
50
H E A L T H Latent Trait and Item Responses Latent Trait Item 1 Response P(X 1 =1) P(X 1 =0) 1 0 Item 2 Response P(X 2 =1) P(X 2 =0) 1 0 Item 3 Response P(X 3 =0) 0 P(X 3 =2) 2 P(X 3 =1) 1
51
H E A L T H Item Responses and Trait Levels Item 1 Item 2 Item 3 Person 1Person 2Person 3 Trait Continuum
52
H E A L T H Item Characteristic Curves (1-Parameter Model) Prob. of “Yes”
53
H E A L T H Item Characteristic Curves (2-Parameter Model)
54
H E A L T H DIF – Location (Item 1) DIF – Slope (Item 2) Hispanics Whites Hispanics Dichotomous Items Showing DIF (2-Parameter Model)
55
H E A L T H
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.