A Unified Approach for Assessing Agreement Lawrence Lin, Baxter Healthcare A. S. Hedayat, University of Illinois at Chicago Wenting Wu, Mayo Clinic.

Slides:



Advertisements
Similar presentations
High Resolution studies
Advertisements

Prepared by Lloyd R. Jaisingh
STATISTICS Sampling and Sampling Distributions
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
STATISTICS POINT ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Detection of Hydrological Changes – Nonparametric Approaches
1 Superior Safety in Noninferiority Trials David R. Bristol To appear in Biometrical Journal, 2005.
0 - 0.
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
1 Week 1 Review of basic concepts in statistics handout available at Trevor Thompson.
Variation, uncertainties and models Marian Scott School of Mathematics and Statistics, University of Glasgow June 2012.
C82MST Statistical Methods 2 - Lecture 2 1 Overview of Lecture Variability and Averages The Normal Distribution Comparing Population Variances Experimental.
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
Tackling over- dispersion in NHS performance indicators Robert Irons (Analyst – Statistician) Dr David Cromwell (Team Leader) 20/10/2004.
Assumptions underlying regression analysis
School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 18 Statistical Data Analysis:
STATISTICAL INFERENCE ABOUT MEANS AND PROPORTIONS WITH TWO POPULATIONS
Chapter 7 Sampling and Sampling Distributions
Hypothesis Test II: t tests
Chapter 4: Basic Estimation Techniques
The logic behind a statistical test. A statistical test is the comparison of the probabilities in favour of a hypothesis H 1 with the respective probabilities.
6. Statistical Inference: Example: Anorexia study Weight measured before and after period of treatment y i = weight at end – weight at beginning For n=17.
(This presentation may be used for instructional purposes)
Non-Parametric Statistics
Slide 1 ILLINOIS - RAILROAD ENGINEERING Railroad Hazardous Materials Transportation Risk Analysis Under Uncertainty Xiang Liu, M. Rapik Saat and Christopher.
1 Sizing the Streaming Media Cluster Solution for a Given Workload Lucy Cherkasova and Wenting Tang HPLabs.
Chapter 6 The Mathematics of Diversification
Hypothesis Tests: Two Independent Samples
Chapter 4 Inference About Process Quality
Comparison of 2 Population Means Goal: To compare 2 populations/treatments wrt a numeric outcome Sampling Design: Independent Samples (Parallel Groups)
CHAPTER 2 – DISCRETE DISTRIBUTIONS HÜSEYIN GÜLER MATHEMATICAL STATISTICS Discrete Distributions 1.
“Students” t-test.
Lecture 3 Validity of screening and diagnostic tests
Insert Date HereSlide 1 Using Derivative and Integral Information in the Statistical Analysis of Computer Models Gemma Stephenson March 2007.
Comparing Two Population Parameters
Module 16: One-sample t-tests and Confidence Intervals
Module 17: Two-Sample t-tests, with equal variances for the two populations This module describes one of the most utilized statistical tests, the.
1 General Iteration Algorithms by Luyang Fu, Ph. D., State Auto Insurance Company Cheng-sheng Peter Wu, FCAS, ASA, MAAA, Deloitte Consulting LLP 2007 CAS.
Addition 1’s to 20.
Week 1.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Two Sample Proportions Large Sample Difference of Proportions z Test & Confidence.
Evaluation of precision and accuracy of a measurement
Chapter 18: The Chi-Square Statistic
ABOUT TWO INDEPENDENT POPULATIONS
Nonparametric estimation of non- response distribution in the Israeli Social Survey Yury Gubman Dmitri Romanov JSM 2009 Washington DC 4/8/2009.
Chapter 16: Correlation.
9. Two Functions of Two Random Variables
Chapter 5 The Mathematics of Diversification
1-Way Analysis of Variance
Shibing Deng Pfizer, Inc. Efficient Outlier Identification in Lung Cancer Study.
Topics: Quality of Measurements
Statistical Methods for Multicenter Inter-rater Reliability Study
ESTIMATION. STATISTICAL INFERENCE It is the procedure where inference about a population is made on the basis of the results obtained from a sample drawn.
1 G Lect 10a G Lecture 10a Revisited Example: Okazaki’s inferences from a survey Inferences on correlation Correlation: Power and effect.
On Some Statistical Aspects of Agreement Among Measurements BIKAS K SINHA [ISI, Kolkata] Tampere August 28, 2009.
1 G Lect 8b G Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey.
Reliability & Agreement DeShon Internal Consistency Reliability Parallel forms reliability Parallel forms reliability Split-Half reliability Split-Half.
Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng.
STOCHASTIC HYDROLOGY Stochastic Simulation of Bivariate Distributions Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National.
Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.
Lecture 8 Estimation and Hypothesis Testing for Two Population Parameters.
On Some Statistical Aspects of Agreement Among Measurements – Part II Bikas Sinha [ISI, Kolkata]
OBJECTIVE INTRODUCTION Emergency Medicine Milestones: Longitudinal Interrater Agreement EM milestones were developed by EM experts for the Accreditation.
1 Measuring Agreement. 2 Introduction Different types of agreement Diagnosis by different methods  Do both methods give the same results? Disease absent.
ESTIMATION.
Measures of Agreement Dundee Epidemiology and Biostatistics Unit
Psychology 202a Advanced Psychological Statistics
Natalie Robinson Centre for Evidence-based Veterinary Medicine
ESTIMATION.
Presentation transcript:

A Unified Approach for Assessing Agreement Lawrence Lin, Baxter Healthcare A. S. Hedayat, University of Illinois at Chicago Wenting Wu, Mayo Clinic

Outline Introduction Existing approaches A unified approach Simulation studies Examples

Introduction Different situations for agreement Two raters, each with single reading More than two raters, each with single reading More than two raters, each with multiple readings Agreement within a rater Agreement among raters based on means Agreement among raters based on individual readings

Existing Approaches (1) Agreement between two raters, each with single reading Categorical data: Kappa and weighted kappa Continuous data: Concordance Correlation Coefficient (CCC) Intraclass Correlation Coefficient (ICC)

Existing Approaches (2) Agreement among more than two raters, each with single reading Lin (1989): no inference Barnhart, Haber and Song (2001, 2002): GEE King and Chinchilli (2001, 2001): U-statistics Carrasco and Jover (2003): variance components

Existing Approaches (3) Agreement among more than two raters, each with multiple readings Barnhart (2005) Intra-rater/ inter-rater (based on means) /total (based on individual observations) agreement GEE method to model the first and second moments

Unified Approach Agreement among k (k2) raters, with each rater measures each of the n subjects multiple (m) times. Separate intra-rater agreement and inter- rater agreement Measure relative agreement, precision, accuracy, and absolute agreement, Total Deviation Index (TDI) and Coverage Probability (CP)

Unified Approach - summary Using GEE method to estimate all agreement indices and their inferences All agreement indices are expressed as functions of variance components Data: continuous/binary/ordinary Most current popular methods become special cases of this approach

Unified Approach - model Set up subject effect subject by rater effect error effect rater effect

Unified Approach - targets Intra-rater agreement: overall, are k raters consistent with themselves? Inter-rater agreement: Inter-rater agreement (agreement based on mean): overall, are k raters agree with each other based on the average of m readings? Total agreement (agreement based on individual reading): overall, are k raters agree with each other based on individual of the m readings?

Unified Approach – agreement(intra) : for over all k raters, how well is each rater in reproducing his readings?

Unified Approach – precision(intra) and MSD : for any rater j, the proportion of the variance that is attributable to the subjects (same as ) Examine the absolute agreement independent of the total data range:

Unified Approach – TDI(intra) : for each rater j, % of observations are within unit of their replicated readings from the same rater. is the cumulative normal distribution is the absolute value

Unified Approach – CP(intra) : for each rater j, of observations are within unit of their replicated readings from the same rater

Unified Approach – agreement(inter) : for over all k raters, how well are raters in reproducing each others based on the average of the multiple readings?

Unified Approach – precision(inter) : for any two raters, the proportion of the variance that is attributable to the subjects based on the average of the m readings

Unified Approach – accuracy(inter) : how close are the means of different raters:

Unified Approach – TDI(inter) : for overall k raters, % of the average readings are within unit of the replicated averaged readings from the other rater.

Unified Approach – CP(inter) : for each rater j, of averaged readings are within unit of replicated averaged readings from the other rater

Unified Approach – agreement(total) : for over all k raters, how well are raters in reproducing each others based on the individual readings?

Unified Approach – precision(total) : for any two raters, the proportion of the variance that is attributable to the subjects based on the individual readings

Unified Approach – accuracy(total) : how close are the means of different raters (accuracy)

Unified Approach – TDI(total) : for overall k raters, % of the readings are within unit of the replicated readings from the other rater.

Unified Approach – CP(total) : for each rater j, of readings are within unit of replicated readings from the other rater

Unified Approach is the inverse cumulative normal distribution is a central Chi-squre distribution with df=1 StatisticsINTRAINTERTOTALM=1 Agreement Precision Accuracy NA MSD TDIπ CP δ

Estimation and Inference Estimate all means, variance components, and their variances and covariances by GEE method Estimate all indices using above estimates Estimate variances of all indices using above estimates and delta method

Estimation and Inference (2) : the covariance of two replications, and,with coming from rater and coming from rater

Estimation and Inference (3) : the variance from each combination of (i, j), i.e., each cell. Thus is the average of all cells variances.

Estimation and Inference (4) : the variance of replication of rater : the covariance of two replications, and, both of them coming from rater.

Estimation and Inference (5) Using GEE method to estimate all indices through estimating the means and all variance components:

Estimation and Inference (6)

Estimation and Inference (7)

Estimation and Inference (8) is the working variance-covariance structure of, working means assume following normal distribution is the derivative matrix of expectation of with respective to all the parameters

Estimation and Inference (9) GEE method provides: estimates of all means estimates of all variance components estimates of variances for all variance components Estimates of covariances between any two variance components

Estimation and Inference (10) Delta method is used to estimate the variances for all indices

Estimation and Inference (12)

Estimation and Inference (13)

Estimation and Inference (14)

Estimation and Inference (15)

Estimation and Inference (16)

Estimation and Inference (17)

Estimation and Inference (18) Transformations for variances Z-transformation: CCC-indices and precision indices Logit-transformation: accuracy and CP indices Log-transformation: TDI indices

Simulation Study three types of data: binary/ordinary/normal three cases for each type of data k=2, m=1 / k=4, m=1 / k=2, m=3 for each case: 1000 random samples with sample size n=20 for binary and ordinary data: inferences obtained through transformation vs. no- transformation For normal data: transformation

Simulation Study (2) Conclusions: Algorithm works well for three types of data, both in estimates and in inferences For binary and ordinary data: no need for transformation For normal data, Carrasco s method is superior than us, but for categorical data, our is superior. For ordinal data, both Carrasco s method and ours are similar.

Example One Sigma method vs. HemoCue method in measuring the DCHLb level in patients serum 299 samples: each sample collected twice by each method Range: mg/dL

Example One – HemoCue method HemoCue method first readings vs. second readings

Example One – Sigma method Sigma method first readings vs. second readings

Example One – HemoCue vs. Sigma HemoCue s averages vs. Sigma s averages

Example One – analysis result (1) StatisticsEstimates95% CI*Allowance ccc_inter ccc_total precision_intra precision_inter precision_total accuracy_inter accuracy_total

Example One – analysis result (2) *: for all CCC, precision, accuracy and CP indices, the 95% lower limits are reported. For all TDI indices, the 95% upper limit are reported. StatisticsEstimates95% CI*Allowance TDI intra(0.9) TDI inter(0.9) TDI total(0.9) CP intra(75) CP inter(150) CP intra(150)

Example Two Hemagglutinin Inhibition (HAI) assay for antibody to Influenza A (H3N2) in rabbit serum samples from two labs 64 rabbit serum samples: measured twice by each lab Antibody level: negative/positive/highly positive

Example Two – Lab one Second Reading First Reading NegativePositiveHighly positive Negative610 Positive0490 Highly positive 008

Example Two – Lab two Second Reading First Reading NegativePositiveHighly positive Negative200 Positive0222 Highly positive 0533

Example Two: Lab one vs. lab two Lab Two First Reading Lab One First Reading NegativePositiveHighly positive Negative250 Positive01930 Highly positive 008

Example Two: lab one vs. lab two Lab Two Second Reading Lab One Second Reading NegativePositiveHighly positive Negative240 Positive02327 Highly positive 008

Example Two StatisticsEstimates95% CI*Allowance ccc_inter ccc_total precision_intra precision_inter precision_total accuracy_inter accuracy_total

Conclusions (1) When data are continuous and m goes to : agreement indices are the same as that proposed by Barnhart (2005), both in estimates and inferences improvements Precision indices, accuracy indices TDIs and CP Variance components

Conclusions (2) When m=1: agreement index degenerates into OCCC as proposed by King (2002), Carrasco (2003) for continuous data Improvements: For categorical data: –King s method: approximates to kappa and weighted kappa, our estimates (without transformation) are exactly the same as kappa and weighted kappa, both in estimate and in inference. –Our estimates superior to Carrasco s estimates when precision and accuracy are high Covariates adjustment become available

Conclusions (3) When data are continuous, k=2 and m=1: agreement index degenerates to the original CCC by Lin (1989) When data are binary, k=2 and m=1: agreement index degenerates into kappa, both in estimate and inference

Conclusions (4) When data are ordinary, k=2 and m=1: agreement index degenerates into weighted kappa with below weight set, both in estimate and in inference.

Conclusions (5) Unified approach Relative agreement indices: CCC with precision and accuracy – data range Absolute agreement: Total deviation indices and Coverage Probability – normal assumption Link function need more work Require balanced data

References Barkto, John J (1966): The intraclass correlation coefficient as a measure of reliability. Pshchological Reports 19, Barnhart, H. X. and Williamson, J. M. (2001). Modeling concordance correlation via GEE to evaluate reproducibility. Biometrics 57, Barnhart, H. X. Song, Jingli and Haber, Michael J. (2005): Assessing intra, inter and total agreement with replicated readings. Statistics in Medicine 19: Carrasco, J. L. and Jover, L. (2003). Estimating the generalized concordance correlation coefficient through variance components. Biometrics 59, Fleiss, J., Cohen, J. and Everitt, B (1969). Large sample standard errors of kappa and weighted kappa. Psychological Bulletin 72, King, Tonya S. and Chinchilli, Vernon M. (2001): A generalized concordance correlation coefficient for continuous and categorical data. Statistics in Medicine 20: Lin, L. I. (1989). A concordance correlation coefficient to evaluate reproducibility. Biometrics 45, Lin, L. I., Hedayat, A. S., Sinha, B., and Yang, M. (2002). Statistical methods in assessing agreement: models, issues & tools. Journal of American Statistical Association 97(457), Wu, Wenting. A unified approach for assessing agreement. Ph.D. thesis, UIC, 2006