1 Measuring Agreement. 2 Introduction Different types of agreement Diagnosis by different methods  Do both methods give the same results? Disease absent.

1 Measuring Agreement

2 Introduction Different types of agreement Diagnosis by different methods  Do both methods give the same results? Disease absent or Disease present  Staging of carcinomas Will different methods lead to the same results? Will different raters lead to the same results?  Measurements of blood pressure How consistent are measurements made  Using different devices?  With different observers?  At different times?

3 Investigating agreement Need to consider  Data type Categorical or continuous  How are the data repeated? Measuring instrument (s), rater(s), time(s)  The goal Are ratings consistent? Estimate the magnitude of differences between measurements Investigate factors that affect ratings  Number of raters

4 Data type Categorical  Binary Disease absent, disease present  Nominal Hepatitis  Viral A, B, C, D, E or autoimmune  Ordinal Severity of disease  Mild, moderate, severe Continuous  Size of tumour  Blood pressure

5 How are data repeated? Same person, same measuring instrument  Different observers Inter-rater reliability  Same observer at different times Intra-rater reliability  Repeatability  Internal consistency Do the items of a test measure the same attribute?

6 Measures of agreement Categorical  Kappa Weighted Fleiss’ Continuous  Limits of agreement  Coefficient of variation (CV)  Intraclass Correlation (ICC) Cronbach’s   Internal consistency

7 Number of raters Two Three or more

8 Categorical data: two raters Kappa Magnitude quoted ≥0.75 Excellent, 0.40 to 0.75 Fair to good, < 0.40 as Poor 0 to 0.20 Slight, >0.20 to 0.40 Fair, >0.40 to 0.60 Moderate, >0.60 to 0.80 Substantial, >0.80 Almost perfect Degree of disagreement can be included Weighted kappa  Values close together do not count to disagreement as much as those further apart  Linear / quadratic weightings

9 Categorical data: > two raters Different tests for  Binomial data  Data with more than two categories Online calculators  http://www.vassarstats.net/kappa.html

10 Example 1 Two raters  Scores 1 to 5 Unweighted kappa 0.79, 95% CI (0.62 to 0.96) Linear weighting 0.84, 95% CI (0.70 to 0.98) Quadratic weighting 0.90, 95% CI (0.77 to 1.00)

11 Example 2 Binomial data Two raters Two ratings each Inter-rater agreement Intra-rater agreement

12 Example 2 ctd. Inter-rater agreement  Kappa 1,2 = 0.865 (P<0.001)  Kappa 1,3 = 0.054 (P=0.765)  Kappa 2,3 = -0.071 (P=0.696) Intra-rater agreement  Kappa 1 = 0.800 (P<0.001)  Kappa 2 = 0.790 (P<0.001)  Kappa 3 = 0.000 (P=1.000)

13 Continuous data Test for bias Check differences not related to magnitude Calculate mean and SD of differences Limits of agreement Coefficient of variation ICC

14 Test for bias Student’s paired t (mean) Wilcoxon matched pairs (median) If there is bias, agreement cannot be investigated further

15 Example 3: Test for bias Paired t test P=0.362 No bias

16 Check differences unrelated to magnitude Clearly no relationship

17 Calculate Mean and SD differences this is s NMean Std. Deviation Difference 174.941221.72404 Valid N (listwise) 17 this is mean

18 Limits of agreement Lower limit of agreement (LLA) = mean - 1.96×s = -37.6 Upper limit of agreement (ULA) = mean + 1.96×s = 47.5 95% of differences between a pair of measurements for an individual lie in (-37.6, 47.5)

19 Coefficient of variation Measure of variability of differences  Expressed as a proportion of the average measured value Suitable when error (the differences between pairs) increases with the measured values  Other measures require this not to be the case 100 × s ÷ mean of the measurements  100 × 21.72 ÷ 447.88  4.85%

20 Intraclass Correlation Continuous data Two or more sets of measurements Measure of correlation that adjusts for differences in scale Several models  Absolute agreement of consistency  Raters chosen randomly or same raters throughout  Single or average measures

21 Intraclass Correlation ≥0.75 Excellent 0.4 to 0.75 Fair to Good <0.4 Poor

22 Cronbach’s α Internal consistency  Total scores  Several components. α ≥0.8 good ≥0.7 adequate

23 Investigating agreement Data type Categorical  Chi squared Continuous  Limits of agreement  Coefficient of variation  Intraclass correlation How are the data repeated? Measuring instrument (s), rater(s), time(s) Number of raters  Two Straightforward  Three or more Help!

1 Measuring Agreement. 2 Introduction Different types of agreement Diagnosis by different methods  Do both methods give the same results? Disease absent.

Similar presentations

Presentation on theme: "1 Measuring Agreement. 2 Introduction Different types of agreement Diagnosis by different methods  Do both methods give the same results? Disease absent."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Measuring Agreement. 2 Introduction Different types of agreement Diagnosis by different methods  Do both methods give the same results? Disease absent.

Similar presentations

Presentation on theme: "1 Measuring Agreement. 2 Introduction Different types of agreement Diagnosis by different methods  Do both methods give the same results? Disease absent."— Presentation transcript:

Similar presentations

About project

Feedback