Presentation is loading. Please wait.

Presentation is loading. Please wait.

On Some Statistical Aspects of Agreement Among Measurements

Similar presentations


Presentation on theme: "On Some Statistical Aspects of Agreement Among Measurements"— Presentation transcript:

1 On Some Statistical Aspects of Agreement Among Measurements
Bikas Sinha [ISI, Kolkata] Math. & Stat . Sciences ASU [Tempe] February 26, 2016

2 Quotes of the Day “I now tend to believe …somehow…for so long…I was completely wrong.” “Ah ! That’s good. You and I finally agree!“ *************** “When two men of science disagree, they do not invoke the secular arm; they wait for further evidence to decide the issue, because, as men of science, they know that neither is infallible”.

3 Latest Book on Measuring Agreement

4 Book Chapters…. Introduction 1 1.1 Precision, Accuracy, and Agreement
1.2 Traditional Approaches for Continuous Data 1.3 Traditional Approaches for Categorical Data

5 Chapter 2 2. Continuous Data 2.1 Basic Model 2.2 Absolute Indices
2.2.1 Mean Squared Deviation 2.2.2 Total Deviation Index 2.2.3 Coverage Probability 2.3 Relative Indices 2.3.1 Intraclass Correlation Coefficient 2.3.2 Concordance Correlation Coefficient

6 Chapter 3 3. Categorical Data
3.1 Basic Approach When Target Values Are Random 3.1.1 Data Structure 3.1.2 Absolute Indices 3.1.3 Relative Indices: Kappa and Weighted Kappa

7 Seminar Plan Agreement for Categorical Data [Part I] 30 minutes
Agreement for Continuous Data [Part II] 25 minutes Discussion minutes

8 Key References : Part I Cohen, J. (1960). A Coefficient of Agreement for Nominal Scales. Educational & Psychological Measurement, 20(1): 37 – 46. [Famous for Cohen’s Kappa] Cohen, J. (1968). Weighted Kappa : Nominal Scale Agreement with Provision for Scaled Disagreement or Partial Credit. Psychological Bulletin, 70(4):

9 References ….contd. Banerjee, M., Capozzoli, M., Mcsweeney,
L. & Sinha, D. (1999). Beyond Kappa : A Review of Interrater Agreement Measures. Canadian Jour. of Statistics, 27(1) :

10 Measurements : Provided by Experts / Observers / Raters
Could be two or more systems, assessors, chemists, psychologists, radiologists, clinicians, nurses, rating system or raters, diagnosis or treatments, instruments or methods, processes or techniques or formulae…… Rater....Generic Term

11 Agreement : Categorical Data
Illustrative Example Study on Diabetic Retinopathy Screening Problem : Interpretation of Single-Field Digital Fundus Images Assessment of Agreement WITHIN / ACROSS 4 EXPERT GROUPS Retina Specialists / General Opthalmologists / Photographers / Nurses : 3 from each Group

12 Description of Study Material
400 Diabetic Patients Selected randomly from a community hospital in Bangkok One Good Single-Field Digital Fundus Image Taken from each patient with Signed Consent Approved by Ethical Committee on Research with Human Subjects Raters : Allowed to Magnify / Move the Images NOT TO MODIFY Brightness / Contrasts

13 THREE Major Features #1. Diabetic Retinopathy Severity [6 options]
No Retinopathy / Mild / Moderate NPDR Severe NPDR / PDR / Ungradable #2. Macular Edema [ 3 options] Presence / Absence / Ungradable #3. Referral to Opthalmologists [3 options] Referrals / Non-Referrals / Uncertain

14 Retina Specialists’ Ratings [DR]
RS1 \ RS2 CODES Total Total

15 Retina Specialists’ Consensus Rating [DR]
RS1 \ RSCR CODES Total Total

16 Retina Specialists’ Ratings [Macular Edema]
RS1 \ RS2 CODES Presence Absence Subtotal Ungradable Total Presence Absence Subtotal Ungradable Total

17 Retina Specialists’ Consensus Rating [ME]
RS1 \ RSCR CODES Presence Absence Subtotal Ungradable Total Presence Absence Subtotal Ungradable Total

18 Cohen’s Kappa for 2x2 Rating
Rater I vs Rater II : 2 x 2 Case Categories : Yes & No : Prop. (i,j) (Y,Y) & (N,N) : Agreement Prop (Y,N) & (N,Y) :Disagreement Prop 0 = (Y,Y) + (N,N) = P[agreement] e = (Y,.) (.,Y) + (N,.) (.,N) P [Chancy Agreement]  = [ 0 - e ] / [ 1 - e ] Chance-corrected Agreement Index

19 Study of Agreement [RS-ME]
2 x 2 Table : Cohen’s Kappa () Coefficient Retina Specialist Retina Specialist 2 Presence Absence Subtotal Presence Absence Subtotal IGNORED ’Ungradable’ to work with 2 x 2 table % agreement : ( ) / 377 = = 0 % Chancy Agreement : %Yes. %Yes + %No. %No (337/377)(344/377) + (40/377)(33/377) = = e  = [0 – e] / [ 1 – e ] = 56% only ! Nett Agreement Standardized

20 What About Multiple Ratings like Diabetic Retinopathy [DR] ?
Retina Specialists 2 CODES Total Total

21  - Computation…… % Agreement =(247+18+40+2+9+6)/400
= 322/400 = = 0 % Chance Agreement = (252/400)(286/400) + ….+(12/400)(7/400) = = e  = [0 – e ] / [ 1 – e ] = 62% ! Note : 100% Credit for ’Hit’ & No Credit for ’Miss’. Criticism : Heavy Penalty for narrowly missed ! Concept of Weighted Kappa

22 Table of Weights for 6x6 Ratings
Ratings Ratings [ 1 to 6 ] / / / / / / /25 16/ /25 /25 24/ / /25 16/25 /25 21/ / /25 21/25 / / / / /25 / / / / Formula wiJ = 1 – [(i – j)^2 / (6-1)^2]

23 Formula for Weighted Kappa
0 (w) = ∑∑wij f ij / n e (w) = ∑ ∑ wij (fi. /n)(f.j /n) These ∑ ∑ are over ALL cells with f ij as freq. in the (i,j)th cell For unweighted Kappa : we take into account only the cell freq. along the main diagonal with 100% weight

24 -statistics for Pairs of Raters
Categories DR ME Referral Retina Specialists 1 vs 1 vs 2 vs 1 vs CGroup 2 vs CGroup 3 vs CGroup

25  for Multiple Raters’ Agreement
Judgement on Simultaneous Agreement of Multiple Raters with Multiple Classification of Attributes….... # Raters = n # Subjects = k # Mutually Exclusive & Exhaustive Nominal Categories = c Example....Retina Specialists (n = 3), Patients (k = 400) & DR (c=6 codes)

26 Formula for Kappa Set k ij = # raters to assign
ith subject to jth category PJ = ∑i k ij / nk = Prop. of all assignments to jth category Chance-corrected assignment to category j [∑i k2ij – knPJ {1+(n-1)PJ}] J = kn(n-1)PJ (1 – PJ)

27 Computation of Kappa Chance-corrected measure of over-all agreement
∑J Numerator of J  = ∑J Denominator of J Interpretation ….Intraclass correlation

28  -statistic for multiple raters…
CATEGORIES DR ME Referral Retina Specialsts Gen. Opthalmo Photographers Nurses All Raters Except for Retina Specialists, no other expert group shows good agreement in any feature

29 Conclusion based on K-Study
Of all 400 cases….. 44 warranted Referral to Opthalmologists due to Retinopathy Severity 5 warranted Referral to Opthalmologists due to uncertainty in diagnosis Fourth Retina Specialist carried out Dilated Fundus Exam of these 44 patients and substantial agreement [K = 0.68] was noticed for DR severity…… Exam confirmed Referral of 38 / 44 cases.

30 Discussion on the Study
Retina Specialists : All in active clinical practice : Most reliable for digital image interpretation Individual Rater’s background and experience play roles in digital image interpretation Unusually high % of ungradable images among nonphysician raters, though only 5 out of 400 were declared as ’ungradable’ by consensus of the Retina Specialists’ Group. Lack of Confidence of Nonphysicians, rather than true image ambiguity ! For this study, other factors [blood pressure, blood sugar, cholesterol etc] not taken into account……

31 That’s it in Part I …… Part II : Continuous Data Set-up

32

33

34 Cohen’s Kappa : Need for Further Theoretical Research
COHEN’S KAPPA STATISTIC: A CRITICAL APPRAISAL AND SOME MODIFICATIONS Sinha et al (2007) Calcutta Statistical Association Bulletin, 58,

35 Further Theoretical Studies on Kappa – Statistics….
Recent Study on Kappa : Attaining limits Where’s the problem ?  = [0 – e ] / [ 1 – e ] Range : -1 ≤  ≤ 1  = 1 iff 100% Perfect Rankings = 0 iff 100% Chancy Ranking = -1 iff 100% Imperfect AND Split-Half [?]

36 Why Split Half ? Example Presence Absence Presence ---- 30%
 = - 73% [& not %] ************************************ Only Split Half % provides 50%  = - 1

37 Kappa Modification… This modification originated from
M = [0 – e ] / [A – e ] and suggesting a value of ‘A’ to take care of the situations: (Y,Y) = (N,N) = 0 and (Y,N) = α and (N,Y) = 1 – α for all α along with M = -1.

38 Kappa Modification…. M = -2α(1-α) / [A – 2α(1-α) ] = -1 implies
The above implies M = -2α(1-α) / [A – 2α(1-α) ] = -1 implies A = 4α(1-α) It is seen that α has a dual interpretation [= (Y,.) = (.,N) and hence a choice is given by α = [(Y,.) + (.,N)]/2. Substituting for α in A and upon simplification, we end up with M1

39 Kappa-Modified….  M1= [0 – e ] / [(Y,.) (N,.) + (.,Y) (.,N)]
 M1 satisfies  M1 = 1 iff 100% Perfect Rankings..whatever = 0 iff 100% Chancy Ranking = -1 iff 100% Imperfect Ranking…whatever… whatever arbitrary distribution of freq. across the categories subject to perfect/imperfect ranking

40 Other Formulae..….  Max = 1 ?  Min = -1 ?...NOT Really....
What if it is apriori known that there is 80% (Observed) Agreement between the two raters i.e., 0 = 80% ?  Max = 1 ?  Min = -1 ?...NOT Really.... So we need standardization of  as  M2 = [ –  Min ] / [ Max –  Min] where Max. & Min. are to be evaluated under the stipulated value of observed agreement

41 Standardization yields…
 + (1- 0 )/(1+0 ) M2 = {20 / [1+(1-0)2]} +{(1- 0)/(1+0)} {[ M1+ (1- 0)/(1+0)} M3 = [{0 /(2-0 ) + {(1- 0)/(1+0)}] Related inference procedures are studied.

42 Beyond Kappa ….. A Review of Inter-rater Agreement Measures
Banerjee et al : Canadian Journal of Statistics : 1999; 3-23 Modelling Patterns of Agreement : Log Linear Models Latent Class Models

43 The End That’s it in Part I …… BKSinha


Download ppt "On Some Statistical Aspects of Agreement Among Measurements"

Similar presentations


Ads by Google