Measures of Agreement Dundee Epidemiology and Biostatistics Unit

Measures of Agreement Dundee Epidemiology and Biostatistics Unit
Peter T. Donnan, PhD Professor of Epidemiology and Biostatistics

Objectives of session Understand what is quantitative agreement
Measures of agreement Calculate Kappa statistic Intracluster correlation More than 2 raters

Quantitative Agreement
1) In Science often measurement differs with different ‘raters’ Two radiologists reading same chest x-ray for signs of pneumoconiosis Two laboratory scientists counting radioactively marked cells from liver tissue 2) Often same rater differs when measuring the same thing on a different occasion

Measures of Agreement Without good agreement results are difficult to interpret Measurements are unreliable or inconsistent Need measures of agreement Mainly two types depending on whether data is 1) continuous or 2) categorical

Agreement NOT same as Correlation!
Pearson Correlation r = 0.775 ICC = 0.02 Rater 2

Cohen’s Kappa Statistic (κ)
Measure of agreement between raters for categorical measures

Two raters with binary measure
Biomarker present No Rater 2 15 5 4 35 No Biomarker present

Cohen’s Kappa Statistic (κ)
Measures agreement between raters more than expected by chance  represents the marginal probabilities and i = 1,2 the score

Biomarker present No Marginal Total Rater 2 15 5 20 4 35 39 19 40 59 No Biomarker present Marginal Total

15 5 20 4 35 39 19 40 59 ii = ( )/59 = 0.847 i++i = (20x x39)/592 = 0.557

15 5 20 4 35 39 19 40 59 ii = 0.847 i++i = 0.557

No agreement on what is good agreement!
Kappa Strength of agreement 0.00 Poor 0.01 – 0.20 Slight 0.21 – 0.40 Fair 0.41 – 0.60 Moderate 0.61 – 0.80 Good 0.81 – 1.00 Excellent

Two raters with 4 categorical measure
Pathologist 1 Atypical squamous hyperplasia Squamous or Invasive Carcinoma Negative Carcinoma Pathologist 2 22 2 5 7 14 36 1 17 10 Negative Atypical squamous hyperplasia Carcinoma Squamous or Invasive Carcinoma

Atypical squamous hyperplasia Atypical squamous hyperplasia
Kappa = and Weighted Kappa = 0.649 Pathologist 1 Atypical squamous hyperplasia Squamous or Invasive Carcinoma Negative Carcinoma Pathologist 2 22 2 5 7 14 36 1 17 10 Negative Atypical squamous hyperplasia Carcinoma Squamous or Invasive Carcinoma

Extensions of Cohen’s Kappa
More than 2 categories for scale Weighted Kappa gives greater weight close to diagonals Kappa available in SPSS (Crosstabs) Weighted kappa in SAS or R Limited to comparing two raters but can be extended to more than two raters (not covered here)

Agreement for continuous - Intraclass correlation (ICC)
Equivalent of Kappa for a continuous measure But based on the means and standard deviations of each set of measurements Approximately equivalent to Cohen’s Weighted Kappa

Intraclass correlation (ICC)
Consider the following two sets of SBP measurements Rater 1 Rater 2 139 149 140 136 135 120 145 130 142 128

Intraclass correlation (ICC)
Rater 1 Rater 2 139 149 140 136 135 120 145 130 142 128 Mean = 142 Mean = 144 SD = 10 SD = 15 Pearson Correlation r =

Agreement NOT same as Correlation!
Pearson Correlation r = 0.775 ICC = 0.002 Rater 2

Intraclass Correlation
Measures agreement between raters more than expected by chance Simplest approach is to use analysis of variance Obtain ANOVA table (from SPSS) Extract Mean Square between raters (MSB) Extract Mean Square within raters (MSW)

Calculation of Intraclass Correlation
where MSB = , MSW = and N = 15

Calculation of Intraclass Correlation

Intraclass Correlation
Alternative Method is to use SCALE RELIABILITY ANALYSIS Add the rater columns and select Intraclass Correlation from option of Statistics

Agreement Bland-Altman Method*
Based on estimating the difference between raters If perfect agreement mean difference = 0 If positive or negative suggests systematic bias 95% CI for difference in raters shows range of differences * Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement Lancet 1986; i:

Bland-Altman Method Rater 1 Rater 2 Mean Difference 139 149 144 10 140
136 138 4 135 120 127.5 -15 145 142.5 5 130 137.5 15 142 141 -2 128 134 12 Mean = 142 Mean = 144 Mean 3.625 SD = 10 SD = 15 SD= 9.516

Bland-Altman Method 95% CI for difference in raters Perfect agreement

Bland-Altman method Rater 1 Rater 2 Mean Difference 139 149 144 10 140 136 138 4 135 120 127.5 -15 145 142.5 5 130 137.5 15 142 141 -2 128 134 12 Mean = 142 Mean = 144 Mean 3.625 SD = 10 SD = 15 SD= 9.516 On average rater 2 measures SBP 3.6 mmHG higher than rater 1 95% CI from -4 to 11 so does include zero However, CI mainly positive and bias is suggested

Summary Measurement crucial to any science
Agreement and consistency should be assessed Use Kappa and weighted kappa for categorical ratings Use ICC and Bland-Altman approach for continuous data

Summary Kappa available in SPSS in Descriptives /Crosstabs
Weighted kappa in SAS but not SPSS Use ANOVA to estimate intra-cluster correlation or SCALE / Reliability More sophisticated methods available for comparisons of more than two raters

Useful references Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement Lancet 1986; i: Brennan P, Silman A. Statistical methods for assessing observer variability in clinical measures BMJ 1992; 304: Dunn G. Design and Analysis of Reliability Studies. Edward Arnold, Oxford University Press, 1989.

Intraclass Correlation Practical
Read in Agreement.sav Use Scale to estimate ICC Use graph to construct the Bland-Altman plot Add a line of mean = 0 Interpret the ICC and plot

Intraclass Correlation Practical
Intraclass Correlation Coefficient Intraclass Correlationb 95% Confidence Interval F Test with True Value Lower Upper Value df1 df2 Sig Single Measures .105a Average Measures .190c Two-way mixed effects model where people effects are random and measures effects are fixed. a. The estimator is the same, whether the interaction effect is present or not. b. Type C intraclass correlation coefficients using a consistency definition-the between-measure variance is excluded from the denominator variance. c. This estimate is computed assuming the interaction effect is absent, because it is not estimable otherwise.

Bland-Altman Plot

Measures of Agreement Dundee Epidemiology and Biostatistics Unit

Similar presentations

Presentation on theme: "Measures of Agreement Dundee Epidemiology and Biostatistics Unit"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Measures of Agreement Dundee Epidemiology and Biostatistics Unit

Similar presentations

Presentation on theme: "Measures of Agreement Dundee Epidemiology and Biostatistics Unit"— Presentation transcript:

Similar presentations

About project

Feedback