Statistical methods for assessment of agreement Professor Dr. Bikas K Sinha Applied Statistics Division Indian Statistical Institute Kolkata, INDIA Organized.

Slides:



Advertisements
Similar presentations
Conceptualization and Measurement
Advertisements

Topics: Quality of Measurements
 A description of the ways a research will observe and measure a variable, so called because it specifies the operations that will be taken into account.
Part II Sigma Freud & Descriptive Statistics
Part II Sigma Freud & Descriptive Statistics
Departments of Medicine and Biostatistics
Assessing verbal communication skills of medical students J Voges E Jordaan * L Koen DJH Niehaus Department of Psychiatry, University of Stellenbosch and.
Funded through the ESRC’s Researcher Development Initiative Department of Education, University of Oxford Session 3.3: Inter-rater reliability.
LECTURE 9.
Measurement. Scales of Measurement Stanley S. Stevens’ Five Criteria for Four Scales Nominal Scales –1. numbers are assigned to objects according to rules.
Critical Thinking.
Lecture 3: Chi-Sqaure, correlation and your dissertation proposal Non-parametric data: the Chi-Square test Statistical correlation and regression: parametric.
Concept of Measurement
Assessment Centre Procedures: Reducing Cognitive Load During the Observation Phase Nanja J. Kolk & Juliette M. Olman Department of Work and Organizational.
Research Methods in MIS
FOUNDATIONS OF NURSING RESEARCH Sixth Edition CHAPTER Copyright ©2012 by Pearson Education, Inc. All rights reserved. Foundations of Nursing Research,
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 13 Using Inferential Statistics.
Chapter 9 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 What is a Perfect Positive Linear Correlation? –It occurs when everyone has the.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Internal Consistency Reliability Analysis PowerPoint.
Statistical Methods for Multicenter Inter-rater Reliability Study
Diagnosis Articles Much Thanks to: Rob Hayward & Tanya Voth, CCHE.
Professor R Endacott Professor R Sheaff Professor R Jones Dr V Woodward.
Instrumentation.
Data Analysis. Quantitative data: Reliability & Validity Reliability: the degree of consistency with which it measures the attribute it is supposed to.
Chapter Eight The Concept of Measurement and Attitude Scales
Advanced Statistics for Researchers Meta-analysis and Systematic Review Avoiding bias in literature review and calculating effect sizes Dr. Chris Rakes.
Quantifying of avascular necrosis of femoral head The clinical problem Determining the risk of femoral head collapse in a patient with AVNFH.
Teaching Registrars Research Methods Variable definition and quality control of measurements Prof. Rodney Ehrlich.
An Introduction to Measurement and Evaluation Emily H. Wughalter, Ed.D. Summer 2008 Department of Kinesiology.
Quantitative Analysis. Quantitative / Formal Methods objective measurement systems graphical methods statistical procedures.
Andrew J. Mason 1, Brita Nellermoe 1,2 1 Physics Education Research and Development University of Minnesota, Twin Cities, Minneapolis, MN 2 University.
EVIDENCE ABOUT DIAGNOSTIC TESTS Min H. Huang, PT, PhD, NCS.
Reliability & Agreement DeShon Internal Consistency Reliability Parallel forms reliability Parallel forms reliability Split-Half reliability Split-Half.
All Hands Meeting 2005 The Family of Reliability Coefficients Gregory G. Brown VASDHS/UCSD.
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 16.
Introduction and descriptive statistics 30th August 2006 Tron Anders Moger.
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
Error & Uncertainty: II CE / ENVE 424/524. Handling Error Methods for measuring and visualizing error and uncertainty vary for nominal/ordinal and interval/ratio.
Inter-rater reliability in the KPG exams The Writing Production and Mediation Module.
Session 13: Correlation (Zar, Chapter 19). (1)Regression vs. correlation Regression: R 2 is the proportion that the model explains of the variability.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Data Analysis.
Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng.
Inter-rater Reliability of Clinical Ratings: A Brief Primer on Kappa Daniel H. Mathalon, Ph.D., M.D. Department of Psychiatry Yale University School of.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
Inter-observer variation can be measured in any situation in which two or more independent observers are evaluating the same thing Kappa is intended to.
DUAL ENERGY CONTRAST ENHANCED SPECTRAL MAMMOGRAM: AS A PROBLEM SOLVING TOOL IN EQUIVOCAL CASES Dr.Kalpana Devi K.R., Dr. Francis G., Dr. Murali K., Dr.
Chapter 6 - Standardized Measurement and Assessment
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.
On Some Statistical Aspects of Agreement Among Measurements – Part II Bikas Sinha [ISI, Kolkata]
Choosing and using your statistic. Steps of hypothesis testing 1. Establish the null hypothesis, H 0. 2.Establish the alternate hypothesis: H 1. 3.Decide.
OBJECTIVE INTRODUCTION Emergency Medicine Milestones: Longitudinal Interrater Agreement EM milestones were developed by EM experts for the Accreditation.
1 Measuring Agreement. 2 Introduction Different types of agreement Diagnosis by different methods  Do both methods give the same results? Disease absent.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Oral Health Training & Calibration Programme
On Some Statistical Aspects of Agreement Among Measurements
An Introduction to Measurement and Evaluation
Measuring Intergroup Agreement and Disagreement
Statistical Assessment of Agreement
Measures of Agreement Dundee Epidemiology and Biostatistics Unit
Statistical Analysis Urmia University
Comparison of Nurse Mentor and Instructor
Comparison of Hindi version of MMSE with its English version:
Natalie Robinson Centre for Evidence-based Veterinary Medicine
Machine Learning in Practice Lecture 7
Correlation and the Pearson r
15.1 The Role of Statistics in the Research Process
Presentation transcript:

Statistical methods for assessment of agreement Professor Dr. Bikas K Sinha Applied Statistics Division Indian Statistical Institute Kolkata, INDIA Organized by Department of Statistics, RU 17 April, 2012

Lecture Plan Agreement for Categorical Data [Part I] – hrs Coffee Break: – hrs Agreement for Continuous Data [Part II] – hrs Discussion hrs – hrs

Key References Cohen, J. (1960). A Coefficient of Agreement for Nominal Scales. Educational & Psychological Measurement, 20(1): 37 – 46. [Famous for Cohen’s Kappa] Cohen, J. (1968). Weighted Kappa : Nominal Scale Agreement with Provision for Scaled Disagreement or Partial Credit. Psychological Bulletin, 70(4):

References ….contd. Banerjee, M., Capozzoli, M., Mcsweeney, L. & Sinha, D. (1999). Beyond Kappa : A Review of Interrater Agreement Measures. Canadian Jour. of Statistics, 27(1) : Lin. L. I. (1989). A Concordance Correlation Coefficient to Evaluate Reproducibility. Biometrics, 45 :

References …contd. Lin. L. I. (2000).Total Deviation Index for Measuring Individual Agreement : With Application in Lab Performance and Bioequi- valence.Statistics in Medicine,19: Lin, L. I., Hedayat, A. S., Sinha, Bikas & Yang, Min (2002). Statistical Methods in Assessing Agreement: Models, Issues, and Tools. Jour. Amer. Statist. Assoc. 97 (457) :

Measurements : Provided by Experts / Observers / Raters Could be two or more systems, assessors, chemists, psychologists, radiologists, clinicians, nurses, rating system or raters, diagnosis or treatments, instruments or methods, processes or techniques or formulae……

Diverse Application Areas… Cross checking of data for agreement, Acceptability of a new or generic drug or of test instruments against standard instruments, or of a new method against gold standard method, statistical process control..

Nature of Agreement Problems… Assessment & Recording of Responses … Two Assessors for evaluation and recording… The Raters examine each “unit” Independently of one another and report separately : “+” for “Affected” or “-” for “OK” : Discrete Type. Summary Statistics UNIT Assessment Table Assessor \ Assessor # II # I % 3% - 3% 54% Q. What is the extent of agreement of the two assessors ?

Nature of Data Assessor \ Assessor # II # I % 2% - 4% 1% Assessor \ Assessor # II # I % 40% - 44% 13% Same Question : Extent of Agreement / Disagreement ?

Cohen’s Kappa : Nominal Scales Cohen (1960) Proposed Kappa statistic for measuring agreement when the responses are nominal

Cohen’s Kappa Rater I vs Rater II : 2 x 2 Case Categories Yes & No : Prop.  (i,j)  (Y,Y) &  (N,N) : Agreement Prop  (Y,N) &  (N,Y) :Disagrmnt. Prop  0 =  (Y,Y) +  (N,N) = P[agreement]  e =  (Y,.)  (.,Y) +  (N,.)  (.,N) P [Chancy Agreement]  = [  0 -  e ] / [ 1 -  e ]  Chance-corrected Agreement Index

Kappa Computation…. II Total Yes No I Yes No Total Observed Agreement  0 =  (Y,Y) +  (N,N)= = 0.94 …94% Chance Factor towards agreement……  e =  (Y,.)  (.,Y) +  (N,.)  (.,N) = 0.43x x0.57= ……51%  = [  0 -  e ] / [ 1 -  e ]=.4302/.4902 = 87.76%....Chance-Corrected Agreement

Kappa Computations.. Raters I vs II K = K = K = K =

Nature of Categorical Data Illustrative Example Study on Diabetic Retinopathy Screening Problem : Interpretation of Single-Field Digital Fundus Images Assessment of Agreement WITHIN / ACROSS 4 EXPERT GROUPS Retina Specialists / General Opthalmologists / Photographers / Nurses : 3 from each Group

Description of Study Material 400 Diabetic Patients Selected randomly from a community hospital One Good Single-Field Digital Fundus Image Taken from each patient with Signed Consent Approved by Ethical Committee on Research with Human Subjects Raters : Allowed to Magnify / Move the Images NOT TO MODIFY Brightness / Contrasts

THREE Major Features #1. Diabetic Retinopathy Severity [6 options] No Retinopathy / Mild / Moderate NPDR Severe NPDR / PDR / Ungradable #2. Macular Edema [ 3 options] Presence / Absence / Ungradable #3. Referrals to Opthalmologists [3 options] Referrals / Non-Referrals / Uncertain

Retina Specialists’ Ratings [DR] RS1 \ RS2 CODES Total Total

Retina Specialists’ Ratings [DR] RS1 \ RS3 CODES Total Total

Retina Specialists’ Ratings [DR] RS2 \ RS3 CODES Total Total

Retina Specialists’ Consensus Rating [DR] RS1 \ RSCR CODES Total Total

Retina Specialists’ Ratings [Macular Edema] RS1 \ RS2 CODES Presence Absence Subtotal Ungradable Total Presence Absence Subtotal Ungradable Total

Retina Specialists’ Ratings [ME] RS1 \ RS3 CODES Presence Absence Subtotal Ungradable Total Presence Absence Subtotal Ungradable Total

Retina Specialists’ Consensus Rating [ME] RS1 \ RSCR CODES Presence Absence Subtotal Ungradable Total Presence Absence Subtotal Ungradable Total

Photographers on Diabetic ME PHOTOGRAPHERS 1 vs Codes Presence Absence SubTotal Ungradable Total Presence Absence Subtotal Ungradable Total

Photographers’ Consensus Rating on Diabetic Macular Edema PHOTOGRAPHERS’s # 1 Consensus Rating Codes Presence Absence SubTotal Ungradable Total Presence Absence Subtotal Ungradable Total

Study of RS’s Agreement [ME] 2 x 2 Table : Cohen’s Kappa (K) Coefficient Retina Specialist Retina Specialist 2 1 Presence Absence Subtotal Presence Absence Subtotal IGNORED ’Ungradable’ to work with 2 x 2 table % agreement : ( ) / 377 = = Theta_0 % Chancy Agreement : %Yes. %Yes + %No. %No (337/377)(344/377) + (40/377)(33/377) = = Theta_e K = [Theta_0 – Theta_e] / [ 1 – Theta_e] = 56% only ! Nett Agreement Standardized

Study of Photographers’’s Agreement on Macular Edema 2 x 2 Table : Cohen’s Kappa (K) Coefficient Photographer Photographer 2 1 Presence Absence Subtotal Presence Absence Subtotal IGNORED ’Ungradable’ to work with 2 x 2 table % agreement : ( ) / 320 = = Theta_0 % Chancy Agreement : %Yes. %Yes + %No. %No (214/320)(274/320) + (106/320)(46/320) = = Theta_e K = [Theta_0 – Theta_e] / [ 1 – Theta_e] = 42% only ! Nett Agreement Standardized

What About Multiple Ratings like Diabetic Retinopathy [DR] ? 1 Retina Specialists 2 CODES Total Total

K Computation…… % Agreement =( )/400 = 322/400 = = Theta_0 % Chance Agreement = (252/400)(286/400) + ….+(12/400)(7/400) = = Theta_e K = [Theta_0 – Theta_e] / [ 1 – Theta_e] = 62% ! Note : 100% Credit for ’Hit’ & No Credit for ’Miss’. Criticism : Heavy Penalty for narrowly missed ! Concept of Unweighted Versus Weighted Kappa

Table of Weights for 6x6 Ratings Ratings Ratings [ 1 to 6 ] /25 21/25 16/25 9/ / /25 21/25 16/25 9/ /25 24/ /25 21/25 16/ /25 21/25 24/ /25 21/25 5 9/25 16/25 21/25 24/ / /25 16/25 21/25 24/25 1 Formula w_ij = 1 – [(i – j)^2 / (6-1)^2]

Formula for Weighted Kappa Theta_0(w) = sum sum w_ij f_ij / n Theta_e(w) = sum sum w_ij (f_i. /n)(f_.j/n) These sum sum are over ALL cells For unweighted Kappa : we take into account only the cell freq. along the main diagonal with 100% weight

Computations for Weighted Kappa Theta_0(w) = Theta_e(w) = Theta_0(w) – Theta_e(w) Weighted Kappa = – Theta_e(w) Unweighted Kappa = K works for pairwise evaluation of Raters’ agreement …….

K-statistics for Pairs of Raters… Categories DR ME Referral Retina Specialists 1 vs vs vs vs CGroup vs CGroup vs CGroup Unweighted Kappa......

K-statistics for Pairs of Raters… Categories DR ME Referral General Opthalmologists 1 vs vs vs vs CGroup vs CGroup vs CGroup

K-statistics for Pairs of Raters… Categories DR ME Referral Photographers….. 1 vs vs vs vs CGroup vs CGroup vs CGroup

K-statistics for Pairs of Raters… Categories DR ME Referral Nurses……….. 1 vs vs NA NA 2 vs NA NA 1 vs CGroup vs CGroup vs CGroup 0.50 NA NA NA : Rater #3 did NOT rate ’ungradable’.

K for Multiple Raters’ Agreement Judgement on Simultaneous Agreement of Multiple Raters with Multiple Classification of Attributes….... # Raters = n # Subjects = k # Mutually Exclusive & Exhaustive Nominal Categories = c Example....Retina Specialists (n = 3), Patients (k = 400) & DR (6 codes)

Formula for Kappa Set k_ij = # raters to assign ith subject to jth category P_ j = sum_i k_ij / nk = Prop. Of all assignments to jth category Chance-corrected assignment to category j [sum k^2_ij – knP_ j{1+(n-1)P_ j} K_ j = kn(n-1)P_ j (1 – P_ j)

Computation of Kappa Chance-corrected measure of over-all agreement Sum_ j Numerator of K_ j K = Sum_ j Denominator of K_ j Interpretation ….Intraclass correlation

K-statistic for multiple raters… CATEGORIES DR ME Referral Retina Specialsts Gen. Opthalmo Photographers Nurses All Raters Other than Retina Specialists, Photographers also have good agreement for DR & ME…

Conclusion based on K-Study Of all 400 cases….. 44 warranted Referral to Opthalmologists due to Retinopathy Severity 5 warranted Referral to Opthalmologists due to uncertainty in diagnosis Fourth Retina Specialist carried out Dilated Fundus Exam of these 44 patients and substantial agreement [K = 0.68] was noticed for DR severity…… Exam confirmed Referral of 38 / 44 cases.

Discussion on the Study Retina Specialists : All in active clinical practice : Most reliable for digital image interpretation Individual Rater’s background and experience play roles in digital image interpretation Unusually high % of ungradable images among nonphysician raters, though only 5 out of 400 were declared as ’ungradable’ by consensus of the Retina Specialists’ Group. Lack of Confidence of Nonphysicians, rather than true image ambiguity ! For this study, other factors [blood pressure, blood sugar, cholesterol etc] not taken into account……

Cohen’s Kappa : Need for Further Theoretical Research COHEN’S KAPPA STATISTIC: A CRITICAL APPRAISAL AND SOME MODIFICATIONS BIKAS K. SINHA^1, PORNPIS YIMPRAYOON^2, AND MONTIP TIENSUWAN^2 ^1 : ISI, Kolkata ^2 : Mahidol Univ., Bangkok, Thailand CSA BULLETIN, 2007

CSA Bulletin (2007) Paper… ABSTRACT: In this paper we consider the problem of assessing agreement between two raters while the ratings are given independently in 2-point nominal scale and critically examine some features of Cohen’s Kappa Statistic, widely and extensively used in this context. We point out some undesirable features of K and, in the process, propose three modified Kappa Statistics. Properties and features of these statistics are explained with illustrative examples.

Further Theoretical Aspects of Kappa – Statistics…. Recent Study on Standardization of Kappa Why standardization ? K = [Theta_0 – Theta_e] / [ 1 – Theta_e] Range : -1 <= K <= 1 K = 1 iff 100% Perfect Rankings = 0 iff 100% Chancy Ranking = -1 iff 100% Imperfect BUT Split-Half

Why Split Half ? Example Presence Absence Presence % Absence 70% --- K_C = - 73% [& not -100 %] ************************************ Only Split Half % provides 50% ---- K_C = %

K-Modified…. [Theta_0 – Theta_e] K_C(M) = P_I[Marginal Y]. P_I[Marginal N] + P_II[Marginal Y]. P_II[Marginal N] Y : ’Presence’ Category & N : ’Absence’ Category ’I’ & ’II’ represent the Raters I & II K_C(M) Satisfies K = 1 iff 100% Perfect Rankings..whatever = 0 iff 100% Chancy Ranking…whatever = -1 iff 100% Imperfect Ranking…whatever…

Other Formulae..…. What if it is known that there is 80% Observed Agreement i.e., Theta_0 = 80% ? K_max = 1 ? K_min = -1 ?...NOT Really.... So we need standardization of K_C as K_C(M2) = [K_C – K_C(min)] OVER [K_C (max) – K_C(min)] where Max. & Min. are to be evaluated under the stipulated value of observed agreement

Standardization yields…. K_C + (1-Theta_0) / (1+Theta_0) K_C(M2) = Theta_0^2 / [1+(1-Theta_0)^2] + (1-Theta_0) / (1+Theta_0) K_C(M3)={[K_C(M) + (1-Theta_0)/(1+Theta_0)} OVER {[Theta_0 /(2-Theta_0) + (1-Theta_0)/(1+Theta_0)}

Revisiting Cohen’s Kappa….. 2 x 2 Table : Cohen’s Kappa (K) Coefficient Retina Specialist Retina Specialist 2 1 Presence Absence Subtotal Presence Absence Subtotal K_C = 56% [computed earlier]

Kappa - Modified K_C(M) = 56 % [same as K_C] Given Theta_0 = % K_C(M2) = = 61 % K_C(M3) = = 67 %

Beyond Kappa ….. A Review of Inter-rater Agreement Measures Banerjee et al : Canadian Journal of Statistics : 1999; 3-23 Modelling Patterns of Agreement : Log Linear Models Latent Class Models

That’s it for now…… Thanks for your attention…. This is the End of Part I of my talk. Bikas Sinha UIC, Chicago April 29, 2011