Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical methods for assessment of agreement Professor Dr. Bikas K Sinha Applied Statistics Division Indian Statistical Institute Kolkata, INDIA Organized.

Similar presentations


Presentation on theme: "Statistical methods for assessment of agreement Professor Dr. Bikas K Sinha Applied Statistics Division Indian Statistical Institute Kolkata, INDIA Organized."— Presentation transcript:

1 Statistical methods for assessment of agreement Professor Dr. Bikas K Sinha Applied Statistics Division Indian Statistical Institute Kolkata, INDIA Organized by Department of Statistics, RU 17 April, 2012

2 Lecture Plan Agreement for Categorical Data [Part I] 09.00 – 10.15 hrs Coffee Break: 10.15 – 10.30 hrs Agreement for Continuous Data [Part II] 10.30 – 11.45 hrs Discussion.....11.45 hrs – 12.00 hrs

3 Key References Cohen, J. (1960). A Coefficient of Agreement for Nominal Scales. Educational & Psychological Measurement, 20(1): 37 – 46. [Famous for Cohen’s Kappa] Cohen, J. (1968). Weighted Kappa : Nominal Scale Agreement with Provision for Scaled Disagreement or Partial Credit. Psychological Bulletin, 70(4): 213-220.

4 References ….contd. Banerjee, M., Capozzoli, M., Mcsweeney, L. & Sinha, D. (1999). Beyond Kappa : A Review of Interrater Agreement Measures. Canadian Jour. of Statistics, 27(1) : 3 - 23. Lin. L. I. (1989). A Concordance Correlation Coefficient to Evaluate Reproducibility. Biometrics, 45 : 255 - 268.

5 References …contd. Lin. L. I. (2000).Total Deviation Index for Measuring Individual Agreement : With Application in Lab Performance and Bioequi- valence.Statistics in Medicine,19:255 - 270. Lin, L. I., Hedayat, A. S., Sinha, Bikas & Yang, Min (2002). Statistical Methods in Assessing Agreement: Models, Issues, and Tools. Jour. Amer. Statist. Assoc. 97 (457) : 257 - 270.

6 Measurements : Provided by Experts / Observers / Raters Could be two or more systems, assessors, chemists, psychologists, radiologists, clinicians, nurses, rating system or raters, diagnosis or treatments, instruments or methods, processes or techniques or formulae……

7 Diverse Application Areas… Cross checking of data for agreement, Acceptability of a new or generic drug or of test instruments against standard instruments, or of a new method against gold standard method, statistical process control..

8 Nature of Agreement Problems… Assessment & Recording of Responses … Two Assessors for evaluation and recording… The Raters examine each “unit” Independently of one another and report separately : “+” for “Affected” or “-” for “OK” : Discrete Type. Summary Statistics UNIT Assessment Table Assessor \ Assessor # II # I + - + 40% 3% - 3% 54% Q. What is the extent of agreement of the two assessors ?

9 Nature of Data Assessor \ Assessor # II # I + - + 93% 2% - 4% 1% Assessor \ Assessor # II # I + - + 3% 40% - 44% 13% Same Question : Extent of Agreement / Disagreement ?

10 Cohen’s Kappa : Nominal Scales Cohen (1960) Proposed Kappa statistic for measuring agreement when the responses are nominal

11 Cohen’s Kappa Rater I vs Rater II : 2 x 2 Case Categories Yes & No : Prop.  (i,j)  (Y,Y) &  (N,N) : Agreement Prop  (Y,N) &  (N,Y) :Disagrmnt. Prop  0 =  (Y,Y) +  (N,N) = P[agreement]  e =  (Y,.)  (.,Y) +  (N,.)  (.,N) P [Chancy Agreement]  = [  0 -  e ] / [ 1 -  e ]  Chance-corrected Agreement Index

12 Kappa Computation…. II Total Yes No I Yes 0.40 0.03 0.43 No 0.03 0.54 0.57 Total 0.43 0.57 1.00 Observed Agreement  0 =  (Y,Y) +  (N,N)= 0.40 + 0.54 = 0.94 …94% Chance Factor towards agreement……  e =  (Y,.)  (.,Y) +  (N,.)  (.,N) = 0.43x0.43 + 0.57x0.57=0.5098 ……51%  = [  0 -  e ] / [ 1 -  e ]=.4302/.4902 = 87.76%....Chance-Corrected Agreement

13 Kappa Computations.. Raters I vs II 0.40 0.03 0.03 0.54 K = 0.8776 0.93 0.02 0.04 0.01 K = 0.2208 0.03 0.40 0.54 0.03 K = - 0.8439 0.02 0.93 0.01 0.04 K = - 0.0184

14 Nature of Categorical Data Illustrative Example Study on Diabetic Retinopathy Screening Problem : Interpretation of Single-Field Digital Fundus Images Assessment of Agreement WITHIN / ACROSS 4 EXPERT GROUPS Retina Specialists / General Opthalmologists / Photographers / Nurses : 3 from each Group

15 Description of Study Material 400 Diabetic Patients Selected randomly from a community hospital One Good Single-Field Digital Fundus Image Taken from each patient with Signed Consent Approved by Ethical Committee on Research with Human Subjects Raters : Allowed to Magnify / Move the Images NOT TO MODIFY Brightness / Contrasts

16 THREE Major Features #1. Diabetic Retinopathy Severity [6 options] No Retinopathy / Mild / Moderate NPDR Severe NPDR / PDR / Ungradable #2. Macular Edema [ 3 options] Presence / Absence / Ungradable #3. Referrals to Opthalmologists [3 options] Referrals / Non-Referrals / Uncertain

17 Retina Specialists’ Ratings [DR] RS1 \ RS2 CODES 0 1 2 3 4 9 Total 0 247 2 2 1 0 0 252 1 12 18 7 1 0 0 38 2 22 10 40 8 0 1 81 3 0 0 3 2 2 0 7 4 0 0 0 1 9 0 10 9 5 0 1 0 0 6 12 Total 286 30 53 13 11 7 400

18 Retina Specialists’ Ratings [DR] RS1 \ RS3 CODES 0 1 2 3 4 9 Total 0 249 2 0 1 0 0 252 1 23 8 7 0 0 0 38 2 31 4 44 2 0 0 81 3 0 0 7 0 0 0 7 4 0 0 0 0 10 0 10 9 9 1 0 0 0 2 12 Total 312 15 58 3 10 2 400

19 Retina Specialists’ Ratings [DR] RS2 \ RS3 CODES 0 1 2 3 4 9 Total 0 274 5 6 1 0 0 286 1 16 5 8 1 0 0 30 2 15 2 35 0 0 1 53 3 2 2 7 1 1 0 13 4 0 0 2 0 9 0 11 9 5 1 0 0 0 1 7 Total 312 15 58 3 10 2 400

20 Retina Specialists’ Consensus Rating [DR] RS1 \ RSCR CODES 0 1 2 3 4 9 Total 0 252 0 0 0 0 0 252 1 17 19 2 0 0 0 38 2 15 19 43 2 1 1 81 3 0 0 2 4 1 0 7 4 0 0 0 0 10 0 10 9 8 0 0 0 0 4 12 Total 292 38 47 6 12 5 400

21 Retina Specialists’ Ratings [Macular Edema] RS1 \ RS2 CODES Presence Absence Subtotal Ungradable Total Presence 326 11 337 1 338 Absence 18 22 40 3 43 Subtotal 344 33 377 -- -- Ungradable 9 0 -- 10 19 Total 353 33 -- 14 400

22 Retina Specialists’ Ratings [ME] RS1 \ RS3 CODES Presence Absence Subtotal Ungradable Total Presence 322 13 335 3 338 Absence 8 32 40 3 43 Subtotal 330 45 375 Ungradable 12 0 -- 7 19 Total 342 45 -- 13 400

23 Retina Specialists’ Consensus Rating [ME] RS1 \ RSCR CODES Presence Absence Subtotal Ungradable Total Presence 335 2 337 1 338 Absence 10 33 43 0 43 Subtotal 345 35 380 -- -- Ungradable 10 0 -- 9 19 Total 355 35 -- 10 400

24 Photographers on Diabetic ME PHOTOGRAPHERS 1 vs 2 1 2 Codes Presence Absence SubTotal Ungradable Total Presence 209 5 214 51 265 Absence 65 41 106 4 110 Subtotal 274 46 320 -- -- Ungradable 2 2 --- 21 25 Total 276 48 --- 76 400

25 Photographers’ Consensus Rating on Diabetic Macular Edema PHOTOGRAPHERS’s # 1 Consensus Rating Codes Presence Absence SubTotal Ungradable Total Presence 257 5 262 3 265 Absence 74 30 104 6 110 Subtotal 331 35 366 -- -- Ungradable 24 0 --- 1 25 Total 355 35 --- 10 400

26 Study of RS’s Agreement [ME] 2 x 2 Table : Cohen’s Kappa (K) Coefficient Retina Specialist Retina Specialist 2 1 Presence Absence Subtotal Presence 326 11 337 Absence 18 22 40 Subtotal 344 33 377 IGNORED ’Ungradable’ to work with 2 x 2 table % agreement : (326 + 22) / 377 = 0.9231 = Theta_0 % Chancy Agreement : %Yes. %Yes + %No. %No (337/377)(344/377) + (40/377)(33/377) = 0.8250 = Theta_e K = [Theta_0 – Theta_e] / [ 1 – Theta_e] = 56% only ! Nett Agreement Standardized

27 Study of Photographers’’s Agreement on Macular Edema 2 x 2 Table : Cohen’s Kappa (K) Coefficient Photographer Photographer 2 1 Presence Absence Subtotal Presence 209 5 214 Absence 65 41 106 Subtotal 274 46 320 IGNORED ’Ungradable’ to work with 2 x 2 table % agreement : (209 + 41) / 320 = 0.7813 = Theta_0 % Chancy Agreement : %Yes. %Yes + %No. %No (214/320)(274/320) + (106/320)(46/320) = 0.6202 = Theta_e K = [Theta_0 – Theta_e] / [ 1 – Theta_e] = 42% only ! Nett Agreement Standardized

28 What About Multiple Ratings like Diabetic Retinopathy [DR] ? 1 Retina Specialists 2 CODES 0 1 2 3 4 9 Total 0 247 2 2 1 0 0 252 1 12 18 7 1 0 0 38 2 22 10 40 8 0 1 81 3 0 0 3 2 2 0 7 4 0 0 0 1 9 0 10 9 5 0 1 0 0 6 12 Total 286 30 53 13 11 7 400

29 K Computation…… % Agreement =(247+18+40+2+9+6)/400 = 322/400 =0.8050 = Theta_0 % Chance Agreement = (252/400)(286/400) + ….+(12/400)(7/400) = 0.4860 = Theta_e K = [Theta_0 – Theta_e] / [ 1 – Theta_e] = 62% ! Note : 100% Credit for ’Hit’ & No Credit for ’Miss’. Criticism : Heavy Penalty for narrowly missed ! Concept of Unweighted Versus Weighted Kappa

30 Table of Weights for 6x6 Ratings Ratings Ratings [ 1 to 6 ] 1 2 3 4 5 6 1 1 24/25 21/25 16/25 9/25 0 2 24/25 1 24/25 21/25 16/25 9/25 3 21/25 24/25 1 24/25 21/25 16/25 4 16/25 21/25 24/25 1 24/25 21/25 5 9/25 16/25 21/25 24/25 1 24/25 6 0 9/25 16/25 21/25 24/25 1 Formula w_ij = 1 – [(i – j)^2 / (6-1)^2]

31 Formula for Weighted Kappa Theta_0(w) = sum sum w_ij f_ij / n Theta_e(w) = sum sum w_ij (f_i. /n)(f_.j/n) These sum sum are over ALL cells For unweighted Kappa : we take into account only the cell freq. along the main diagonal with 100% weight

32 Computations for Weighted Kappa Theta_0(w) = Theta_e(w) = Theta_0(w) – Theta_e(w) Weighted Kappa = --------------------------------- 1 – Theta_e(w) Unweighted Kappa =........ K works for pairwise evaluation of Raters’ agreement …….

33 K-statistics for Pairs of Raters… Categories DR ME Referral Retina Specialists 1 vs 2 0.63 0.58 0.65 1 vs 3 0.55 0.64 0.65 2 vs 3 0.56 0.51 0.59 1 vs CGroup 0.67 0.65 0.66 2 vs CGroup 0.70 0.65 0.66 3 vs CGroup 0.71 0.73 0.72 Unweighted Kappa......

34 K-statistics for Pairs of Raters… Categories DR ME Referral General Opthalmologists 1 vs 2 0.35 0.17 0.23 1 vs 3 0.44 0.27 0.27 2 vs 3 0.33 0.19 0.27 1 vs CGroup 0.33 0.16 0.18 2 vs CGroup 0.58 0.50 0.51 3 vs CGroup 0.38 0.20 0.24

35 K-statistics for Pairs of Raters… Categories DR ME Referral Photographers….. 1 vs 2 0.33 0.35 0.23 1 vs 3 0.49 0.38 0.41 2 vs 3 0.34 0.45 0.32 1 vs CGroup 0.33 0.29 0.33 2 vs CGroup 0.26 0.29 0.20 3 vs CGroup 0.39 0.49 0.49

36 K-statistics for Pairs of Raters… Categories DR ME Referral Nurses……….. 1 vs 2 0.28 0.15 0.20 1 vs 3 0.32 NA NA 2 vs 3 0.23 NA NA 1 vs CGroup 0.29 0.27 0.28 2 vs CGroup 0.19 0.15 0.17 3 vs CGroup 0.50 NA NA NA : Rater #3 did NOT rate ’ungradable’.

37 K for Multiple Raters’ Agreement Judgement on Simultaneous Agreement of Multiple Raters with Multiple Classification of Attributes….... # Raters = n # Subjects = k # Mutually Exclusive & Exhaustive Nominal Categories = c Example....Retina Specialists (n = 3), Patients (k = 400) & DR (6 codes)

38 Formula for Kappa Set k_ij = # raters to assign ith subject to jth category P_ j = sum_i k_ij / nk = Prop. Of all assignments to jth category Chance-corrected assignment to category j [sum k^2_ij – knP_ j{1+(n-1)P_ j} K_ j = ------------------------------------------- kn(n-1)P_ j (1 – P_ j)

39 Computation of Kappa Chance-corrected measure of over-all agreement Sum_ j Numerator of K_ j K = ----------------------------------------- Sum_ j Denominator of K_ j Interpretation ….Intraclass correlation

40 K-statistic for multiple raters… CATEGORIES DR ME Referral Retina Specialsts 0.58 0.58 0.63 Gen. Opthalmo. 0.36 0.19 0.24 Photographers 0.37 0.38 0.30 Nurses 0.26 0.20 0.20 All Raters 0.34 0.27 0.28 Other than Retina Specialists, Photographers also have good agreement for DR & ME…

41 Conclusion based on K-Study Of all 400 cases….. 44 warranted Referral to Opthalmologists due to Retinopathy Severity 5 warranted Referral to Opthalmologists due to uncertainty in diagnosis Fourth Retina Specialist carried out Dilated Fundus Exam of these 44 patients and substantial agreement [K = 0.68] was noticed for DR severity…… Exam confirmed Referral of 38 / 44 cases.

42 Discussion on the Study Retina Specialists : All in active clinical practice : Most reliable for digital image interpretation Individual Rater’s background and experience play roles in digital image interpretation Unusually high % of ungradable images among nonphysician raters, though only 5 out of 400 were declared as ’ungradable’ by consensus of the Retina Specialists’ Group. Lack of Confidence of Nonphysicians, rather than true image ambiguity ! For this study, other factors [blood pressure, blood sugar, cholesterol etc] not taken into account……

43 Cohen’s Kappa : Need for Further Theoretical Research COHEN’S KAPPA STATISTIC: A CRITICAL APPRAISAL AND SOME MODIFICATIONS BIKAS K. SINHA^1, PORNPIS YIMPRAYOON^2, AND MONTIP TIENSUWAN^2 ^1 : ISI, Kolkata ^2 : Mahidol Univ., Bangkok, Thailand CSA BULLETIN, 2007

44 CSA Bulletin (2007) Paper… ABSTRACT: In this paper we consider the problem of assessing agreement between two raters while the ratings are given independently in 2-point nominal scale and critically examine some features of Cohen’s Kappa Statistic, widely and extensively used in this context. We point out some undesirable features of K and, in the process, propose three modified Kappa Statistics. Properties and features of these statistics are explained with illustrative examples.

45 Further Theoretical Aspects of Kappa – Statistics…. Recent Study on Standardization of Kappa Why standardization ? K = [Theta_0 – Theta_e] / [ 1 – Theta_e] Range : -1 <= K <= 1 K = 1 iff 100% Perfect Rankings = 0 iff 100% Chancy Ranking = -1 iff 100% Imperfect BUT Split-Half

46 Why Split Half ? Example Presence Absence Presence ---- 30% Absence 70% --- K_C = - 73% [& not -100 %] ************************************ Only Split Half ---- 50% provides 50% ---- K_C = - 100 %

47 K-Modified…. [Theta_0 – Theta_e] K_C(M) = ------------------------------------------ P_I[Marginal Y]. P_I[Marginal N] + P_II[Marginal Y]. P_II[Marginal N] Y : ’Presence’ Category & N : ’Absence’ Category ’I’ & ’II’ represent the Raters I & II K_C(M) Satisfies K = 1 iff 100% Perfect Rankings..whatever = 0 iff 100% Chancy Ranking…whatever = -1 iff 100% Imperfect Ranking…whatever…

48 Other Formulae..…. What if it is known that there is 80% Observed Agreement i.e., Theta_0 = 80% ? K_max = 1 ? K_min = -1 ?...NOT Really.... So we need standardization of K_C as K_C(M2) = [K_C – K_C(min)] OVER [K_C (max) – K_C(min)] where Max. & Min. are to be evaluated under the stipulated value of observed agreement

49 Standardization yields…. K_C + (1-Theta_0) / (1+Theta_0) K_C(M2) = ----------------------------------------------- Theta_0^2 / [1+(1-Theta_0)^2] + (1-Theta_0) / (1+Theta_0) K_C(M3)={[K_C(M) + (1-Theta_0)/(1+Theta_0)} OVER {[Theta_0 /(2-Theta_0) + (1-Theta_0)/(1+Theta_0)}

50 Revisiting Cohen’s Kappa….. 2 x 2 Table : Cohen’s Kappa (K) Coefficient Retina Specialist Retina Specialist 2 1 Presence Absence Subtotal Presence 326 11 337 Absence 18 22 40 Subtotal 344 33 377 K_C = 56% [computed earlier]

51 Kappa - Modified K_C(M) = 56 % [same as K_C] Given Theta_0 = 92.30 % 0.5600 + 0.0400 K_C(M2) = ------------------------- = 61 % 0.8469 + 0.0400 0.5600 + 0.0400 K_C(M3) = --------------------------- = 67 % 0.8570 + 0.0400

52 Beyond Kappa ….. A Review of Inter-rater Agreement Measures Banerjee et al : Canadian Journal of Statistics : 1999; 3-23 Modelling Patterns of Agreement : Log Linear Models Latent Class Models

53 That’s it for now…… Thanks for your attention…. This is the End of Part I of my talk. Bikas Sinha UIC, Chicago April 29, 2011


Download ppt "Statistical methods for assessment of agreement Professor Dr. Bikas K Sinha Applied Statistics Division Indian Statistical Institute Kolkata, INDIA Organized."

Similar presentations


Ads by Google