Presentation is loading. Please wait.

Presentation is loading. Please wait.

Inter-rater Reliability of Clinical Ratings: A Brief Primer on Kappa Daniel H. Mathalon, Ph.D., M.D. Department of Psychiatry Yale University School of.

Similar presentations


Presentation on theme: "Inter-rater Reliability of Clinical Ratings: A Brief Primer on Kappa Daniel H. Mathalon, Ph.D., M.D. Department of Psychiatry Yale University School of."— Presentation transcript:

1 Inter-rater Reliability of Clinical Ratings: A Brief Primer on Kappa Daniel H. Mathalon, Ph.D., M.D. Department of Psychiatry Yale University School of Medicine

2 Inter-rater Reliability of Clinical Interview Based Measures Ratings of clinical severity for specific symptom domains (e.g, PANSS, BPRS, SAPS, SANS) –Continuous scales –Use intraclass correlations to assess inter-rater reliability. Diagnostic Assessment –Categorical Data / Nominal Scale Data –How do we quantify reliability between diagnosticians? –Percent Agreement, Chi-Square, Kappa

3 12..jk ∑ j n ij 1 n 11 n 12 n 1. 2 n 21 n 22 n 2.. i n ii n ij n i.. k ∑ i n ij n.1 n.2 n.j n.. Rater 2 Rater 1 Category n ij =number of cases falling into cell =freq of joint event ij n.. =total number of cases p ij = n ij / n.. = proportion of cases falling into particular cell. Two raters classify n cases into k mutually exclusive categories. Reliability by Percentage Agreement = ∑ i p ii = 1/n ∑ i n ii

4 Percent Agreement Fails to Consider Agreement by Chance SchizOther Schiz.81.09.90 Other.09.01.10.90.101.0 Rater 1 Rater 2 Assume that two raters whose judgments are completely independent (i.e., not influenced by the true diagnostic status of the patient) each diagnose 90% of cases to have schizophrenia and 10% of cases to not have schizophrenia (i.e., Other). Expected agreement by chance for each category obtained by multiplying the marginal probabilities together. Can get Percentage Agreement of 82% strictly by chance. Proportion Agreement =.82.90 x.90 =.81.10 x.10 =.01

5 Chi-Square Test of Association as Proposed Solution SzBPOther Sz0505 BP0055 Other5005 555n=15 Rater 1 Rater 2 Can perform a Chi-Square Test of Association to test null hypothesis that the two raters’ judgments are independent. To reject independence, show that observed agreement departs from what would be expected by chance alone. Chi-Square = ∑ cells (Observed - Expected) 2 / Expected Problem: In example below, we have a perfect association between the Raters with zero agreement. Chi-Square is a test of Association, not Agreement. It is sensitive to any departure from chance agreement, even when the dependency between the raters’ judgments involves perfect non-agreement. So, we cannot use Chi-Square Test to assess agreement between raters.

6 Kappa Coefficient (Cohen, 1960) SzBpOther ni.ni. pi.pi. Sz 106.53 (78).39 104 120.6 BP 2228.14 (15).075 10 60.3 Other 2126.03 (2).01 20.1 n. j 1305020200 p. j.65.25.11 Rater 2 Rater 1Rater 1 p i. x p. i.39.075.01 High reliability requires that the frequencies along the diagonal should be > chance and off diagonal frequencies should be < chance. Use marginal frequencies/probabilities to estimate chance agreement. Proportion agreement observed, p o = ∑ i p ii = 1/n ∑ i n ii Proportion agreement expected by chance, p c = ∑ i p i. x p. i p o - p c 1 - p c Kappa, K = p o =.53 +.14.03 =.7 p c =.39 +.075 +.01 =.475.7 -.475 K = 1 -.475 =.429 K = 1, perfect agreement K = 0, chance agreement K< 0, agreement worse than chance.

7 Interpretations of Kappa K = P (agreement | no agreement by chance) 1-p c = 1-.475 =.525 of cases where no agreement by chance p o - p c =.7-.475 =.225 of cases are those non-chance agreement cases where observers agreed. Kappa is the probability that judges will agree given no agreement by chance. Can test H o that Kappa = 0, Kappa is normally distributed with large samples, can test significance using normal distribution. Can erect confidence intervals for Kappa. p o - p c 1 - p c Kappa, K = p o =.53 +.14.03 =.7 p c =.39 +.075 +.01 =.475.7 -.475 K = 1 -.475 =.429

8 Weighted Kappa Coefficient Schizo- phrenia Other Psychosis Personality Disorder ni.ni. pi.pi. Schizo- phrenia 106.53.39 0 10.05.15 1 4.02.06 6 120.6 Other Psychosis 22.11.195 1 28.14.075 0 10.05.03 3 60.3 Personality Disorder 2.01.065 6 12.06.025 3 6.03.01 0 20.1 n. j 1305020200 p. j.65.25.11.0 Rater 2 Rater 1Rater 1 Can assign weights, w ij, to classification errors according to their seriousness using ratio scale weights. p o(w) - p c(w) Kw=Kw= 1 - p c(w)

9 Kappa Rules of Thumb K ≥.75 is considered excellent agreement. K ≤.46 is considered poor agreement.

10 Is an intraclass correlation coefficient ( except for factor of 1/n) when weights have following property: w ij = 1 - (i - j) 2 Weighted Kappa and the ICC (k - 1) 2

11 Problems with Kappa Affected by base rates of diagnoses. –Can’t easily compare across studies that have different base rates, either in the population, or in the reliability study. Chance agreement is a problem? –When the null hypothesis of rater independence is not met (which is most of the time), the estimate of chance agreement is inaccurate and possibly inappropriate).


Download ppt "Inter-rater Reliability of Clinical Ratings: A Brief Primer on Kappa Daniel H. Mathalon, Ph.D., M.D. Department of Psychiatry Yale University School of."

Similar presentations


Ads by Google