Statistical Assessment of Agreement

Statistical Assessment of Agreement
Bikas K Sinha [Retd. Professor] Indian Statistical Institute, Kolkata ******************* RUDS September 13-14, 2017

Quotes of the Day “I now tend to believe …so long…I was completely wrong.” “Ah ! That’s good. You and I finally agree!“ *************** “When two men of science disagree, they do not invoke the secular arm; they wait for further evidence to decide the issue, because, as men of science, they know that neither is infallible”.

Latest Book on Agreement

A Statistician’s Call…..
In God We Trust…. All Others :----- Must Bring Data …….

Today’s Talk..... Agreement for Categorical Data [55 minutes]
Discussion [5 minutes]

Agreement : Categorical Data
A Revealing Study was conducted in a Specialist EYE Hospital in Bangkok 600+ Diabetic Patients All : In-house & confined to Hospital Beds All under Treatment for Diabetic Retinopathy ...someting to do with eye ...needed regular monitoring..... Doctors in the study group ?

Rajavithi Hospital, Bangkok
Dr. Paisan Ruamviboonsuk MD Dr. Khemawan Teerasuwanajak MD Dr. Kanokwan Yuttitham MD Affiliations : Thai Screening for Diabetic Retinopathy Study Group Department of Ophthalmology, Rajavithi Hospital, Bangkok, Thailand Statistician : Dr Montip Tiensuwan Deptt Mathematics, Mahidol University

Description of Study Material
400/600+ Diabetic Patients Selected randomly from the hospital One Good Single-Field Digital Fundus Image was taken from each patient with Signed Consent Approved by Ethical Committee on Research with Human Subjects Q. What did they do with the 400 images ? Purpose : Extract information on what ? Why ?

THREE Major Features #1. Diabetic Retinopathy Severity [6 options]
No Retinopathy / Mild / Moderate Severe / Critical / Ungradable #2. Macular Edema [ 2 options] Presence / Absence #3. Referral to Opthalmologists [2 options] Referrals / Non-Referrals

Who Extracted the Features ?
Retina Specialists General Opthalmologists Photographers Nurses All of them attached to the Hospital AND 3 from each Group !!! Altogether 12 ‘RATERS’ collected data on each of the 3 features…..from each of the 400 images…..loaded with data…..

Measurements : Provided by Experts / Observers / Raters
Rater....Generic Term Could be two or more systems, assessors, chemists, psychologists, radiologists, clinicians, nurses, rating systems or raters, diagnosis or treatments, instruments or methods, processes or techniques or formulae……

Retina Specialists’ Ratings [Macular Edema]
RS RS RS Remarks CODES Remarkable Agreement! Presence Too good to be valid ! Absence Total Q. Is there any inside story – yet to be revealed ? Called upon a Statistician : Dr Montip Tiensuwan, PhD [Statistics] from Western Australia Faculty, Mathematics & Statistics,Mahidol University, Bangkok Had already studied the literature on Statistical Assessment of Agreement... Successfully collaborated with the Medical Doctors......

Bring out the Inside Story….
RS1 \ RS yes no Total yes no versus RS3 yes no RS2 versus RS3 yes no Agreement…..not strong at all…..more than 25% disagreement upfront between any two raters

Cohen’s Kappa for 2x2 Rating
Rater I vs Rater II : 2 x 2 Case Categories : Yes & No (Y,Y) & (N,N) : Agreement Prop. along the main diagonal (Y,N) & (N,Y) :Disagreement Prop. along the anti-diagonal 0 = (Y,Y) + (N,N) = Prop. Agreement Chancy Agreement [CA] ? e = (Y,.) (.,Y) + (N,.) (.,N)=Prop. CA  = [ 0 - e ] / [ 1 - e ] x 100 % Kappa Chance-corrected Agreement Index

Study of Agreement [RS-ME]
2 x 2 Table : Cohen’s Kappa () Coefficient Retina Specialist Retina Specialist 2 Presence Absence Subtotal Presence Absence Subtotal % agreement : ( ) / 400 = 71% = 0 [Observed] % Chancy Agreement : %Yes. %Yes + %No. %No (330/400)(326/400) + (70/400)(74/400) = 0.825x x0.185 = 70.48%= e [expected by chance]  = [0 – e] / [ 1 – e ] = 1.8 % ....very poor agreement..... Net Agreement Standardized Agreement Index

Marginal Agreement vs Overall Agreement ….
Up front : Case of Marginal Agreement Should not be judged by Mar. Agr. Must look into all the 400 images and verify agreement case by case to decide on the extent of overall agreement….. Pairwise  -Index for Macular Edema RS1 vs RS2….1.8 % RS1 vs RS3… % RS2 vs RS3…..0% No or very poor overall agreement…..

Other Features….. #1. Diabetic Retinopathy Severity [6 options] No Retinopathy / Mild / Moderate Severe / Critical / Ungradable A bit tricky options.... #2. Macular Edema [ 2 options]....done Presence / Absence #3. Referral to Opthalmologists [2 options] Referrals / Non-Referrals #3 is similar to #2 : 2 x 2 Table [R vs NR]

Marginal Summary of Data : RS
Diab. Ret Classification of Patients by Status RS RS RS3 Nil Mild Moderate Severe Critical Ungradable Total Remark : Reasonably good agreement ….very good agreement between RS2 & RS3 indeed….. Inside story ? Chance-Corrected Kappa Index ?

Retina Specialists’ Ratings [DR]
RS1 \ RS2 CODES Total Total

Retina Specialists’ Consensus Rating [DR]
RS1 \ RSCR CODES Total Total

Understanding the 6x6 Table....
Retina Specialists 2 CODES Total Total

 - Computation…… % Agreement =(247+18+40+2+9+6)/400
= 322/400 = = % = 0 % Chancy Agreement = (252/400)(286/400) + ….+(12/400)(7/400) = = % e  = [0 – e ] / [ 1 – e ] = 62% ! Note : 100% Credit for ’Hit’ & No Credit for ’Miss’. Criticism : Heavy Penalty for narrowly missed ! Concept of Weighted Kappa

Hit or Miss….. 100% credit for ’hit’ along the diagonal
Retina Specialists 2 CODES Total Total

Table of Weights for 6x6 Ratings
Ratings Ratings [ 1 to 6 ] / / / / / / /25 16/ /25 /25 24/ / /25 16/25 /25 21/ / /25 21/25 / / / / /25 / / / / Formula wiJ = 1 – [(i – j)^2 / (6-1)^2]

Formula for Weighted Kappa
0 (w) = ∑∑wij f ij / n e (w) = ∑ ∑ wij (fi. /n)(f.j /n) These ∑ ∑ are over ALL cells with f ij as freq. in the (i,j)th cell For unweighted Kappa : we take into account only the cell freq. along the main diagonal with 100% weight

-statistics for Pairs of Raters
Categories DR ME Referral Retina Specialists 1 vs 1 vs 2 vs -coeff. Interpretation : Usually 70 % or more...sign of satisfactory agreeement.... Not very exciting form of agreement...

 for Multiple Raters’ Agreement
How to judge agreement among Retina Specialists vs Opthalmologists Retina Specialists vs Photographers Retina Specialists vs Nurses and so on..... Needed computational formlae for single Index of Agreement for each Category of Raters....for Category-wise Comparisons... Research Papers and Books.....

 -statistic for Multiple Raters…
CATEGORIES DR ME Referral Retina Specialsts Gen. Opthalmo Photographers Nurses All Raters Except for Retina Specialists, no other expert group shows good agreement in any feature

Conclusion based on  -Study
Of all 400 cases….. 44 warranted Referral to Opthalmologists due to Retinopathy Severity 5 warranted Referral to Opthalmologists due to uncertainty in diagnosis Fourth Retina Specialist carried out Dilated Fundus Exam of these 44 patients and substantial agreement [ = 0.68] was noticed for DR severity…… Exam confirmed Referral of 38 / 44 cases.

Discussion on the Study
Retina Specialists : All in active clinical practice : Most reliable for digital image interpretation Individual Rater’s background and experience play roles in digital image interpretation Unusually high % of ungradable images among nonphysician raters, though only 5 out of 400 were declared as ’ungradable’ by consensus of the Retina Specialists’ Group. Lack of Confidence of Nonphysicians, rather than true image ambiguity ! For this study, other factors [blood pressure, blood sugar, cholesterol etc] not taken into account……

References….. Cohen, J. (1960). A Coefficient of Agreement for Nominal Scales. Educational & Psychological Measurement, 20(1): 37 – 46. [Famous for Cohen’s Kappa] Cohen, J. (1968). Weighted Kappa : Nominal Scale Agreement with Provision for Scaled Disagreement or Partial Credit. Psychological Bulletin, 70(4):

References…. Lin. L. I. (2000).Total Deviation Index for Measuring Individual Agreement : With Application in Lab Performance and Bioequivalence. Statistics in Medicine, 19 : Lin, L. I., Hedayat, A. S., Sinha, Bikas & Yang, Min (2002). Statistical Methods in Assessing Agreement : Models, Issues, and Tools. Jour. Amer. Statist. Assoc. 97 (457)

References…. Banerjee, M., Capozzoli, M., Mcsweeney,
L. & Sinha, D. (1999). Beyond Kappa : A Review of Interrater Agreement Measures. Canadian Jour. of Statistics, 27(1) : Sinha, B.K., Tiensuwan, M. & Pharita (2007). Cohen’s Kappa Statistic : A Critical Appraisal and Some Modifications. Calcutta Statistical Association Bulletin, 58,

The End Thank you for your attention ! Bikas K Sinha Sept , 2017

Statistical Assessment of Agreement

Similar presentations

Presentation on theme: "Statistical Assessment of Agreement"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Statistical Assessment of Agreement

Similar presentations

Presentation on theme: "Statistical Assessment of Agreement"— Presentation transcript:

Similar presentations

About project

Feedback