On Some Statistical Aspects of Agreement Among Measurements

Slides:



Advertisements
Similar presentations
Conceptualization and Measurement
Advertisements

© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
Chapter 4 – Reliability Observed Scores and True Scores Error
 A description of the ways a research will observe and measure a variable, so called because it specifies the operations that will be taken into account.
Part II Sigma Freud & Descriptive Statistics
Machine Learning in Practice Lecture 7 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Statistical methods for assessment of agreement Professor Dr. Bikas K Sinha Applied Statistics Division Indian Statistical Institute Kolkata, INDIA Organized.
Funded through the ESRC’s Researcher Development Initiative Department of Education, University of Oxford Session 3.3: Inter-rater reliability.
Measurement. Scales of Measurement Stanley S. Stevens’ Five Criteria for Four Scales Nominal Scales –1. numbers are assigned to objects according to rules.
Critical Thinking.
Measurement in Survey Research Developing Questionnaire Items with Respect to Content and Analysis.
Concept of Measurement
Research Methods in MIS
Chapter 7 Correlational Research Gay, Mills, and Airasian
Chapter 14 Inferential Data Analysis
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Internal Consistency Reliability Analysis PowerPoint.
Understanding Research Results
Work in the 21st Century Chapter 2
Data Analysis. Quantitative data: Reliability & Validity Reliability: the degree of consistency with which it measures the attribute it is supposed to.
Statistical Evaluation of Data
Quantitative Analysis. Quantitative / Formal Methods objective measurement systems graphical methods statistical procedures.
1 Chapter 4 – Reliability 1. Observed Scores and True Scores 2. Error 3. How We Deal with Sources of Error: A. Domain sampling – test items B. Time sampling.
Reliability & Agreement DeShon Internal Consistency Reliability Parallel forms reliability Parallel forms reliability Split-Half reliability Split-Half.
Descriptive Research: Quantitative Method Descriptive Analysis –Limits generalization to the particular group of individuals observed. –No conclusions.
1 Chapter 3: Attribute Measurement Systems Analysis (Optional) 3.1 Introduction to Attribute Measurement Systems Analysis 3.2 Conducting an Attribute MSA.
Inter-rater reliability in the KPG exams The Writing Production and Mediation Module.
Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng.
Inter-rater Reliability of Clinical Ratings: A Brief Primer on Kappa Daniel H. Mathalon, Ph.D., M.D. Department of Psychiatry Yale University School of.
Experimental Research Methods in Language Learning Chapter 12 Reliability and Reliability Analysis.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
Inter-observer variation can be measured in any situation in which two or more independent observers are evaluating the same thing Kappa is intended to.
Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.
Measurement Chapter 6. Measuring Variables Measurement Classifying units of analysis by categories to represent variable concepts.
On Some Statistical Aspects of Agreement Among Measurements – Part II Bikas Sinha [ISI, Kolkata]
Choosing and using your statistic. Steps of hypothesis testing 1. Establish the null hypothesis, H 0. 2.Establish the alternate hypothesis: H 1. 3.Decide.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
OBJECTIVE INTRODUCTION Emergency Medicine Milestones: Longitudinal Interrater Agreement EM milestones were developed by EM experts for the Accreditation.
1 Measuring Agreement. 2 Introduction Different types of agreement Diagnosis by different methods  Do both methods give the same results? Disease absent.
Meta-analysis Overview
Oral Health Training & Calibration Programme
INTRODUCTION AND DEFINITIONS
Statistical Assessment of Agreement
Chapter 9 Audit Sampling: An Application to Substantive Tests of Account Balances McGraw-Hill/Irwin ©2008 The McGraw-Hill Companies, All Rights Reserved.
Independent t-Test PowerPoint Prepared by Alfred P. Rovai
Measures of Agreement Dundee Epidemiology and Biostatistics Unit
Notes on Logistic Regression
Statistical Analysis Urmia University
MODULE 2 Myers’ Exploring Psychology 5th Ed.
Reliability and Validity
Measurement: Part 1.
Understanding Results
Basic Statistics Overview
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Natalie Robinson Centre for Evidence-based Veterinary Medicine
LEARNING OUTCOMES After studying this chapter, you should be able to
CHAPTER 11 Inference for Distributions of Categorical Data
Machine Learning in Practice Lecture 7
Correlation and the Pearson r
15.1 The Role of Statistics in the Research Process
Feature Selection Methods
CHAPTER 11 Inference for Distributions of Categorical Data
Measurement: Part 1.
Section 11-1 Review and Preview
Intermediate methods in observational epidemiology 2008
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
COMPARING VARIABLES OF ORDINAL OR DICHOTOMOUS SCALES: SPEARMAN RANK- ORDER, POINT-BISERIAL, AND BISERIAL CORRELATIONS.
CHAPTER 11 Inference for Distributions of Categorical Data
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Presentation transcript:

On Some Statistical Aspects of Agreement Among Measurements Bikas Sinha [ISI, Kolkata] Math. & Stat . Sciences ASU [Tempe] February 26, 2016

Quotes of the Day “I now tend to believe …somehow…for so long…I was completely wrong.” “Ah ! That’s good. You and I finally agree!“ *************** “When two men of science disagree, they do not invoke the secular arm; they wait for further evidence to decide the issue, because, as men of science, they know that neither is infallible”.

Latest Book on Measuring Agreement

Book Chapters…. Introduction 1 1.1 Precision, Accuracy, and Agreement 1.2 Traditional Approaches for Continuous Data 1.3 Traditional Approaches for Categorical Data

Chapter 2 2. Continuous Data 2.1 Basic Model 2.2 Absolute Indices 2.2.1 Mean Squared Deviation 2.2.2 Total Deviation Index 2.2.3 Coverage Probability 2.3 Relative Indices 2.3.1 Intraclass Correlation Coefficient 2.3.2 Concordance Correlation Coefficient

Chapter 3 3. Categorical Data 3.1 Basic Approach When Target Values Are Random 3.1.1 Data Structure 3.1.2 Absolute Indices 3.1.3 Relative Indices: Kappa and Weighted Kappa

Seminar Plan Agreement for Categorical Data [Part I] 30 minutes Agreement for Continuous Data [Part II] 25 minutes Discussion.....5 minutes

Key References : Part I Cohen, J. (1960). A Coefficient of Agreement for Nominal Scales. Educational & Psychological Measurement, 20(1): 37 – 46. [Famous for Cohen’s Kappa] Cohen, J. (1968). Weighted Kappa : Nominal Scale Agreement with Provision for Scaled Disagreement or Partial Credit. Psychological Bulletin, 70(4): 213-220.

References ….contd. Banerjee, M., Capozzoli, M., Mcsweeney, L. & Sinha, D. (1999). Beyond Kappa : A Review of Interrater Agreement Measures. Canadian Jour. of Statistics, 27(1) : 3 - 23.

Measurements : Provided by Experts / Observers / Raters Could be two or more systems, assessors, chemists, psychologists, radiologists, clinicians, nurses, rating system or raters, diagnosis or treatments, instruments or methods, processes or techniques or formulae…… Rater....Generic Term

Agreement : Categorical Data Illustrative Example Study on Diabetic Retinopathy Screening Problem : Interpretation of Single-Field Digital Fundus Images Assessment of Agreement WITHIN / ACROSS 4 EXPERT GROUPS Retina Specialists / General Opthalmologists / Photographers / Nurses : 3 from each Group

Description of Study Material 400 Diabetic Patients Selected randomly from a community hospital in Bangkok One Good Single-Field Digital Fundus Image Taken from each patient with Signed Consent Approved by Ethical Committee on Research with Human Subjects Raters : Allowed to Magnify / Move the Images NOT TO MODIFY Brightness / Contrasts

THREE Major Features #1. Diabetic Retinopathy Severity [6 options] No Retinopathy / Mild / Moderate NPDR Severe NPDR / PDR / Ungradable #2. Macular Edema [ 3 options] Presence / Absence / Ungradable #3. Referral to Opthalmologists [3 options] Referrals / Non-Referrals / Uncertain

Retina Specialists’ Ratings [DR] RS1 \ RS2 CODES 0 1 2 3 4 9 Total 0 247 2 2 1 0 0 252 1 12 18 7 1 0 0 38 2 22 10 40 8 0 1 81 3 0 0 3 2 2 0 7 4 0 0 0 1 9 0 10 9 5 0 1 0 0 6 12 Total 286 30 53 13 11 7 400

Retina Specialists’ Consensus Rating [DR] RS1 \ RSCR CODES 0 1 2 3 4 9 Total 0 252 0 0 0 0 0 252 1 17 19 2 0 0 0 38 2 15 19 43 2 1 1 81 3 0 0 2 4 1 0 7 4 0 0 0 0 10 0 10 9 8 0 0 0 0 4 12 Total 292 38 47 6 12 5 400

Retina Specialists’ Ratings [Macular Edema] RS1 \ RS2 CODES Presence Absence Subtotal Ungradable Total Presence 326 11 337 1 338 Absence 18 22 40 3 43 Subtotal 344 33 377 -- -- Ungradable 9 0 -- 10 19 Total 353 33 -- 14 400

Retina Specialists’ Consensus Rating [ME] RS1 \ RSCR CODES Presence Absence Subtotal Ungradable Total Presence 335 2 337 1 338 Absence 10 33 43 0 43 Subtotal 345 35 380 -- -- Ungradable 10 0 -- 9 19 Total 355 35 -- 10 400

Cohen’s Kappa for 2x2 Rating Rater I vs Rater II : 2 x 2 Case Categories : Yes & No : Prop. (i,j) (Y,Y) & (N,N) : Agreement Prop (Y,N) & (N,Y) :Disagreement Prop 0 = (Y,Y) + (N,N) = P[agreement] e = (Y,.) (.,Y) + (N,.) (.,N) P [Chancy Agreement]  = [ 0 - e ] / [ 1 - e ] Chance-corrected Agreement Index

Study of Agreement [RS-ME] 2 x 2 Table : Cohen’s Kappa () Coefficient Retina Specialist Retina Specialist 2 1 Presence Absence Subtotal Presence 326 11 337 Absence 18 22 40 Subtotal 344 33 377 IGNORED ’Ungradable’ to work with 2 x 2 table % agreement : (326 + 22) / 377 = 0.9231 = 0 % Chancy Agreement : %Yes. %Yes + %No. %No (337/377)(344/377) + (40/377)(33/377) = 0.8250 = e  = [0 – e] / [ 1 – e ] = 56% only ! Nett Agreement Standardized

What About Multiple Ratings like Diabetic Retinopathy [DR] ? 1 Retina Specialists 2 CODES 0 1 2 3 4 9 Total 0 247 2 2 1 0 0 252 1 12 18 7 1 0 0 38 2 22 10 40 8 0 1 81 3 0 0 3 2 2 0 7 4 0 0 0 1 9 0 10 9 5 0 1 0 0 6 12 Total 286 30 53 13 11 7 400

 - Computation…… % Agreement =(247+18+40+2+9+6)/400 = 322/400 =0.8050 = 0 % Chance Agreement = (252/400)(286/400) + ….+(12/400)(7/400) = 0.4860 = e  = [0 – e ] / [ 1 – e ] = 62% ! Note : 100% Credit for ’Hit’ & No Credit for ’Miss’. Criticism : Heavy Penalty for narrowly missed ! Concept of Weighted Kappa

Table of Weights for 6x6 Ratings Ratings Ratings [ 1 to 6 ] 1 2 3 4 5 6 1 1 24/25 21/25 16/25 9/25 0 2 24/25 1 24/25 21/25 16/25 9/25 3 21/25 24/25 1 24/25 21/25 16/25 4 16/25 21/25 24/25 1 24/25 21/25 5 9/25 16/25 21/25 24/25 1 24/25 6 0 9/25 16/25 21/25 24/25 1 Formula wiJ = 1 – [(i – j)^2 / (6-1)^2]

Formula for Weighted Kappa 0 (w) = ∑∑wij f ij / n e (w) = ∑ ∑ wij (fi. /n)(f.j /n) These ∑ ∑ are over ALL cells with f ij as freq. in the (i,j)th cell For unweighted Kappa : we take into account only the cell freq. along the main diagonal with 100% weight

-statistics for Pairs of Raters Categories DR ME Referral Retina Specialists 1 vs 2 0.63 0.58 0.65 1 vs 3 0.55 0.64 0.65 2 vs 3 0.56 0.51 0.59 1 vs CGroup 0.67 0.65 0.66 2 vs CGroup 0.70 0.65 0.66 3 vs CGroup 0.71 0.73 0.72

 for Multiple Raters’ Agreement Judgement on Simultaneous Agreement of Multiple Raters with Multiple Classification of Attributes….... # Raters = n # Subjects = k # Mutually Exclusive & Exhaustive Nominal Categories = c Example....Retina Specialists (n = 3), Patients (k = 400) & DR (c=6 codes)

Formula for Kappa Set k ij = # raters to assign ith subject to jth category PJ = ∑i k ij / nk = Prop. of all assignments to jth category Chance-corrected assignment to category j [∑i k2ij – knPJ {1+(n-1)PJ}] J = ------------------------------------------- kn(n-1)PJ (1 – PJ)

Computation of Kappa Chance-corrected measure of over-all agreement ∑J Numerator of J  = ----------------------------------------- ∑J Denominator of J Interpretation ….Intraclass correlation

 -statistic for multiple raters… CATEGORIES DR ME Referral Retina Specialsts 0.58 0.58 0.63 Gen. Opthalmo. 0.36 0.19 0.24 Photographers 0.37 0.38 0.30 Nurses 0.26 0.20 0.20 All Raters 0.34 0.27 0.28 Except for Retina Specialists, no other expert group shows good agreement in any feature

Conclusion based on K-Study Of all 400 cases….. 44 warranted Referral to Opthalmologists due to Retinopathy Severity 5 warranted Referral to Opthalmologists due to uncertainty in diagnosis Fourth Retina Specialist carried out Dilated Fundus Exam of these 44 patients and substantial agreement [K = 0.68] was noticed for DR severity…… Exam confirmed Referral of 38 / 44 cases.

Discussion on the Study Retina Specialists : All in active clinical practice : Most reliable for digital image interpretation Individual Rater’s background and experience play roles in digital image interpretation Unusually high % of ungradable images among nonphysician raters, though only 5 out of 400 were declared as ’ungradable’ by consensus of the Retina Specialists’ Group. Lack of Confidence of Nonphysicians, rather than true image ambiguity ! For this study, other factors [blood pressure, blood sugar, cholesterol etc] not taken into account……

That’s it in Part I …… Part II : Continuous Data Set-up

Cohen’s Kappa : Need for Further Theoretical Research COHEN’S KAPPA STATISTIC: A CRITICAL APPRAISAL AND SOME MODIFICATIONS Sinha et al (2007) Calcutta Statistical Association Bulletin, 58, 151-169

Further Theoretical Studies on Kappa – Statistics…. Recent Study on Kappa : Attaining limits Where’s the problem ?  = [0 – e ] / [ 1 – e ] Range : -1 ≤  ≤ 1  = 1 iff 100% Perfect Rankings = 0 iff 100% Chancy Ranking = -1 iff 100% Imperfect AND Split-Half [?]

Why Split Half ? Example Presence Absence Presence ---- 30%  = - 73% [& not -100 %] ************************************ Only Split Half ---- 50% provides 50% ----  = - 1

Kappa Modification… This modification originated from M = [0 – e ] / [A – e ] and suggesting a value of ‘A’ to take care of the situations: (Y,Y) = (N,N) = 0 and (Y,N) = α and (N,Y) = 1 – α for all α along with M = -1.

Kappa Modification…. M = -2α(1-α) / [A – 2α(1-α) ] = -1 implies The above implies M = -2α(1-α) / [A – 2α(1-α) ] = -1 implies A = 4α(1-α) It is seen that α has a dual interpretation [= (Y,.) = (.,N) and hence a choice is given by α = [(Y,.) + (.,N)]/2. Substituting for α in A and upon simplification, we end up with M1

Kappa-Modified….  M1= [0 – e ] / [(Y,.) (N,.) + (.,Y) (.,N)]  M1 satisfies  M1 = 1 iff 100% Perfect Rankings..whatever = 0 iff 100% Chancy Ranking = -1 iff 100% Imperfect Ranking…whatever… whatever ......arbitrary distribution of freq. across the categories subject to perfect/imperfect ranking

Other Formulae..….  Max = 1 ?  Min = -1 ?...NOT Really.... What if it is apriori known that there is 80% (Observed) Agreement between the two raters i.e., 0 = 80% ?  Max = 1 ?  Min = -1 ?...NOT Really.... So we need standardization of  as  M2 = [ –  Min ] / [ Max –  Min] where Max. & Min. are to be evaluated under the stipulated value of observed agreement

Standardization yields…  + (1- 0 )/(1+0 ) M2 = ----------------------------------------------- {20 / [1+(1-0)2]} +{(1- 0)/(1+0)} {[ M1+ (1- 0)/(1+0)} M3 = ----------------------------------------- [{0 /(2-0 ) + {(1- 0)/(1+0)}] Related inference procedures are studied.

Beyond Kappa ….. A Review of Inter-rater Agreement Measures Banerjee et al : Canadian Journal of Statistics : 1999; 3-23 Modelling Patterns of Agreement : Log Linear Models Latent Class Models

The End That’s it in Part I …… BKSinha