1 Measuring Agreement
2 Introduction Different types of agreement Diagnosis by different methods Do both methods give the same results? Disease absent or Disease present Staging of carcinomas Will different methods lead to the same results? Will different raters lead to the same results? Measurements of blood pressure How consistent are measurements made Using different devices? With different observers? At different times?
3 Investigating agreement Need to consider Data type Categorical or continuous How are the data repeated? Measuring instrument (s), rater(s), time(s) The goal Are ratings consistent? Estimate the magnitude of differences between measurements Investigate factors that affect ratings Number of raters
4 Data type Categorical Binary Disease absent, disease present Nominal Hepatitis Viral A, B, C, D, E or autoimmune Ordinal Severity of disease Mild, moderate, severe Continuous Size of tumour Blood pressure
5 How are data repeated? Same person, same measuring instrument Different observers Inter-rater reliability Same observer at different times Intra-rater reliability Repeatability Internal consistency Do the items of a test measure the same attribute?
6 Measures of agreement Categorical Kappa Weighted Fleiss’ Continuous Limits of agreement Coefficient of variation (CV) Intraclass Correlation (ICC) Cronbach’s Internal consistency
7 Number of raters Two Three or more
8 Categorical data: two raters Kappa Magnitude quoted ≥0.75 Excellent, 0.40 to 0.75 Fair to good, < 0.40 as Poor 0 to 0.20 Slight, >0.20 to 0.40 Fair, >0.40 to 0.60 Moderate, >0.60 to 0.80 Substantial, >0.80 Almost perfect Degree of disagreement can be included Weighted kappa Values close together do not count to disagreement as much as those further apart Linear / quadratic weightings
9 Categorical data: > two raters Different tests for Binomial data Data with more than two categories Online calculators
10 Example 1 Two raters Scores 1 to 5 Unweighted kappa 0.79, 95% CI (0.62 to 0.96) Linear weighting 0.84, 95% CI (0.70 to 0.98) Quadratic weighting 0.90, 95% CI (0.77 to 1.00)
11 Example 2 Binomial data Two raters Two ratings each Inter-rater agreement Intra-rater agreement
12 Example 2 ctd. Inter-rater agreement Kappa 1,2 = (P<0.001) Kappa 1,3 = (P=0.765) Kappa 2,3 = (P=0.696) Intra-rater agreement Kappa 1 = (P<0.001) Kappa 2 = (P<0.001) Kappa 3 = (P=1.000)
13 Continuous data Test for bias Check differences not related to magnitude Calculate mean and SD of differences Limits of agreement Coefficient of variation ICC
14 Test for bias Student’s paired t (mean) Wilcoxon matched pairs (median) If there is bias, agreement cannot be investigated further
15 Example 3: Test for bias Paired t test P=0.362 No bias
16 Check differences unrelated to magnitude Clearly no relationship
17 Calculate Mean and SD differences this is s NMean Std. Deviation Difference Valid N (listwise) 17 this is mean
18 Limits of agreement Lower limit of agreement (LLA) = mean ×s = Upper limit of agreement (ULA) = mean ×s = % of differences between a pair of measurements for an individual lie in (-37.6, 47.5)
19 Coefficient of variation Measure of variability of differences Expressed as a proportion of the average measured value Suitable when error (the differences between pairs) increases with the measured values Other measures require this not to be the case 100 × s ÷ mean of the measurements 100 × ÷ 4.85%
20 Intraclass Correlation Continuous data Two or more sets of measurements Measure of correlation that adjusts for differences in scale Several models Absolute agreement of consistency Raters chosen randomly or same raters throughout Single or average measures
21 Intraclass Correlation ≥0.75 Excellent 0.4 to 0.75 Fair to Good <0.4 Poor
22 Cronbach’s α Internal consistency Total scores Several components. α ≥0.8 good ≥0.7 adequate
23 Investigating agreement Data type Categorical Chi squared Continuous Limits of agreement Coefficient of variation Intraclass correlation How are the data repeated? Measuring instrument (s), rater(s), time(s) Number of raters Two Straightforward Three or more Help!