Download presentation
Presentation is loading. Please wait.
Published byAlfred Ferguson Modified over 6 years ago
1
Measures of Agreement Dundee Epidemiology and Biostatistics Unit
Peter T. Donnan, PhD Professor of Epidemiology and Biostatistics
2
Objectives of session Understand what is quantitative agreement
Measures of agreement Calculate Kappa statistic Intracluster correlation More than 2 raters
3
Quantitative Agreement
1) In Science often measurement differs with different ‘raters’ Two radiologists reading same chest x-ray for signs of pneumoconiosis Two laboratory scientists counting radioactively marked cells from liver tissue 2) Often same rater differs when measuring the same thing on a different occasion
4
Measures of Agreement Without good agreement results are difficult to interpret Measurements are unreliable or inconsistent Need measures of agreement Mainly two types depending on whether data is 1) continuous or 2) categorical
5
Agreement NOT same as Correlation!
Pearson Correlation r = 0.775 ICC = 0.02 Rater 2
6
Cohen’s Kappa Statistic (κ)
Measure of agreement between raters for categorical measures
7
Two raters with binary measure
Biomarker present No Rater 2 15 5 4 35 No Biomarker present
8
Cohen’s Kappa Statistic (κ)
Measures agreement between raters more than expected by chance represents the marginal probabilities and i = 1,2 the score
9
Two raters with binary measure
Biomarker present No Marginal Total Rater 2 15 5 20 4 35 39 19 40 59 No Biomarker present Marginal Total
10
Two raters with binary measure
15 5 20 4 35 39 19 40 59 ii = ( )/59 = 0.847 i++i = (20x x39)/592 = 0.557
11
Two raters with binary measure
15 5 20 4 35 39 19 40 59 ii = 0.847 i++i = 0.557
12
No agreement on what is good agreement!
Kappa Strength of agreement 0.00 Poor 0.01 – 0.20 Slight 0.21 – 0.40 Fair 0.41 – 0.60 Moderate 0.61 – 0.80 Good 0.81 – 1.00 Excellent
13
Two raters with 4 categorical measure
Pathologist 1 Atypical squamous hyperplasia Squamous or Invasive Carcinoma Negative Carcinoma Pathologist 2 22 2 5 7 14 36 1 17 10 Negative Atypical squamous hyperplasia Carcinoma Squamous or Invasive Carcinoma
14
Atypical squamous hyperplasia Atypical squamous hyperplasia
Kappa = and Weighted Kappa = 0.649 Pathologist 1 Atypical squamous hyperplasia Squamous or Invasive Carcinoma Negative Carcinoma Pathologist 2 22 2 5 7 14 36 1 17 10 Negative Atypical squamous hyperplasia Carcinoma Squamous or Invasive Carcinoma
15
Extensions of Cohen’s Kappa
More than 2 categories for scale Weighted Kappa gives greater weight close to diagonals Kappa available in SPSS (Crosstabs) Weighted kappa in SAS or R Limited to comparing two raters but can be extended to more than two raters (not covered here)
16
Agreement for continuous - Intraclass correlation (ICC)
Equivalent of Kappa for a continuous measure But based on the means and standard deviations of each set of measurements Approximately equivalent to Cohen’s Weighted Kappa
17
Intraclass correlation (ICC)
Consider the following two sets of SBP measurements Rater 1 Rater 2 139 149 140 136 135 120 145 130 142 128
18
Intraclass correlation (ICC)
Rater 1 Rater 2 139 149 140 136 135 120 145 130 142 128 Mean = 142 Mean = 144 SD = 10 SD = 15 Pearson Correlation r =
19
Agreement NOT same as Correlation!
Pearson Correlation r = 0.775 ICC = 0.002 Rater 2
20
Intraclass Correlation
Measures agreement between raters more than expected by chance Simplest approach is to use analysis of variance Obtain ANOVA table (from SPSS) Extract Mean Square between raters (MSB) Extract Mean Square within raters (MSW)
21
Calculation of Intraclass Correlation
where MSB = , MSW = and N = 15
22
Calculation of Intraclass Correlation
23
Intraclass Correlation
Alternative Method is to use SCALE RELIABILITY ANALYSIS Add the rater columns and select Intraclass Correlation from option of Statistics
27
Agreement Bland-Altman Method*
Based on estimating the difference between raters If perfect agreement mean difference = 0 If positive or negative suggests systematic bias 95% CI for difference in raters shows range of differences * Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement Lancet 1986; i:
28
Bland-Altman Method Rater 1 Rater 2 Mean Difference 139 149 144 10 140
136 138 4 135 120 127.5 -15 145 142.5 5 130 137.5 15 142 141 -2 128 134 12 Mean = 142 Mean = 144 Mean 3.625 SD = 10 SD = 15 SD= 9.516
29
Bland-Altman Method 95% CI for difference in raters Perfect agreement
30
Bland-Altman method Rater 1 Rater 2 Mean Difference 139 149 144 10 140 136 138 4 135 120 127.5 -15 145 142.5 5 130 137.5 15 142 141 -2 128 134 12 Mean = 142 Mean = 144 Mean 3.625 SD = 10 SD = 15 SD= 9.516 On average rater 2 measures SBP 3.6 mmHG higher than rater 1 95% CI from -4 to 11 so does include zero However, CI mainly positive and bias is suggested
31
Summary Measurement crucial to any science
Agreement and consistency should be assessed Use Kappa and weighted kappa for categorical ratings Use ICC and Bland-Altman approach for continuous data
32
Summary Kappa available in SPSS in Descriptives /Crosstabs
Weighted kappa in SAS but not SPSS Use ANOVA to estimate intra-cluster correlation or SCALE / Reliability More sophisticated methods available for comparisons of more than two raters
33
Useful references Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement Lancet 1986; i: Brennan P, Silman A. Statistical methods for assessing observer variability in clinical measures BMJ 1992; 304: Dunn G. Design and Analysis of Reliability Studies. Edward Arnold, Oxford University Press, 1989.
34
Intraclass Correlation Practical
Read in Agreement.sav Use Scale to estimate ICC Use graph to construct the Bland-Altman plot Add a line of mean = 0 Interpret the ICC and plot
35
Intraclass Correlation Practical
Intraclass Correlation Coefficient Intraclass Correlationb 95% Confidence Interval F Test with True Value Lower Upper Value df1 df2 Sig Single Measures .105a Average Measures .190c Two-way mixed effects model where people effects are random and measures effects are fixed. a. The estimator is the same, whether the interaction effect is present or not. b. Type C intraclass correlation coefficients using a consistency definition-the between-measure variance is excluded from the denominator variance. c. This estimate is computed assuming the interaction effect is absent, because it is not estimable otherwise.
36
Bland-Altman Plot
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.