Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng.

Slides:



Advertisements
Similar presentations
A Unified Approach for Assessing Agreement Lawrence Lin, Baxter Healthcare A. S. Hedayat, University of Illinois at Chicago Wenting Wu, Mayo Clinic.
Advertisements

Assumptions underlying regression analysis
“Students” t-test.
Regression and correlation methods
Welcome to PHYS 225a Lab Introduction, class rules, error analysis Julia Velkovska.
RELIABILITY Reliability refers to the consistency of a test or measurement. Reliability studies Test-retest reliability Equipment and/or procedures Intra-
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Chap 9: Testing Hypotheses & Assessing Goodness of Fit Section 9.1: INTRODUCTION In section 8.2, we fitted a Poisson dist’n to counts. This chapter will.
Business Statistics for Managerial Decision
ASYMPTOTIC PROPERTIES OF ESTIMATORS: PLIMS AND CONSISTENCY
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Chapter 10 Simple Regression.
9. SIMPLE LINEAR REGESSION AND CORRELATION
Introduction to Inference Estimating with Confidence Chapter 6.1.
ANOVA Determining Which Means Differ in Single Factor Models Determining Which Means Differ in Single Factor Models.
INTEGRALS 5. INTEGRALS We saw in Section 5.1 that a limit of the form arises when we compute an area.  We also saw that it arises when we try to find.
Evaluating Hypotheses
Chapter 11 Multiple Regression.
Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved.
Inferences About Process Quality
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Confidence Intervals: Estimating Population Mean
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: prediction Original citation: Dougherty, C. (2012) EC220 - Introduction.
Sample Size Determination Ziad Taib March 7, 2014.
Chapter 9 Numerical Integration Numerical Integration Application: Normal Distributions Copyright © The McGraw-Hill Companies, Inc. Permission required.
1 PREDICTION In the previous sequence, we saw how to predict the price of a good or asset given the composition of its characteristics. In this sequence,
Elec471 Embedded Computer Systems Chapter 4, Probability and Statistics By Prof. Tim Johnson, PE Wentworth Institute of Technology Boston, MA Theory and.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
9. Binary Dependent Variables 9.1 Homogeneous models –Logit, probit models –Inference –Tax preparers 9.2 Random effects models 9.3 Fixed effects models.
Statistical Methods for Multicenter Inter-rater Reliability Study
Inference for regression - Simple linear regression
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Section 10.1 ~ t Distribution for Inferences about a Mean Introduction to Probability and Statistics Ms. Young.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Estimation of Statistical Parameters
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Section 9-2 Inferences About Two Proportions.
Statistics for Data Miners: Part I (continued) S.T. Balke.
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 8-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
A Process Control Screen for Multiple Stream Processes An Operator Friendly Approach Richard E. Clark Process & Product Analysis.
1 Tests of Significance In this section we deal with two tests used for comparing two analytical methods, one is a new or proposed method and the other.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics (011)
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 9-1 Review and Preview.
BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.
6.1 Inference for a Single Proportion  Statistical confidence  Confidence intervals  How confidence intervals behave.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
Measurement MANA 4328 Dr. Jeanne Michalski
STOCHASTIC HYDROLOGY Stochastic Simulation of Bivariate Distributions Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
1 Chapter 8 Interval Estimation. 2 Chapter Outline  Population Mean: Known  Population Mean: Unknown  Population Proportion.
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
The p-value approach to Hypothesis Testing
+ Unit 5: Estimating with Confidence Section 8.3 Estimating a Population Mean.
STATISTICS People sometimes use statistics to describe the results of an experiment or an investigation. This process is referred to as data analysis or.
Copyright © 2009 Pearson Education, Inc t LEARNING GOAL Understand when it is appropriate to use the Student t distribution rather than the normal.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
1 Measuring Agreement. 2 Introduction Different types of agreement Diagnosis by different methods  Do both methods give the same results? Disease absent.
Estimating the Population Mean Income of Lexus Owners
Elementary Statistics
CONCEPTS OF ESTIMATION
Inferences and Conclusions from Data
Presentation transcript:

Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng

Introduction In Lin’s paper, they propose a series of indices for assessing agreement, precision and accuracy. In addition, this paper also proposes the CP and TDI for normal data. All these five indices are expressed as functions of variance components. Lin obtains the estimates and perform inferences for all the functions of variance components through GEE method. In their model, they measure the agreement among k raters, with each rater having multiple (m) readings from each of the n subjects for continuous and categorical data.

Introduction The approach of this paper integrates the approaches by Barnhart et al. (2005) and Carrasco and Jover (2003), for example, Barnhart et al. (2005) proposed a series of indices (intra-rater CCC, inter-rater CCC and total CCC) and estimate those indices and their inferences by GEE method. The definition of these three indices are listed below. In this paper, they introduce a unified approach which can be used for continuous, binary, and ordinal data. They provide the simulation results in assessing the performance of the unified approach in section 4 and give two examples to illustrate the use of the unified approach in section 5. IndexDefinition Intra-raterMultiple readings from same rater Inter-raterAverage of multiple readings among different raters Total-raterDifferent raters based on individual readings

Method In this paper, the model they use for measuring agreement is y ijl stands for the lth reading from subject i given by rater j, with i = 1, 2,…, n, j = 1, 2,…, k, and l = 1, 2,…, m. μ is the overall mean, α i is the random subject effect, β j is the rater effect, γ ij is the random interaction effect between rater and subject, e ijl is the random error effect. The variance among all raters is denoted as Based on this model, they propose a series of indices to measure agreement, precision and accuracy.

Method

In the total agreement part, the authors give CCC total, precision total, accuracy total, MSD, TDI and CP. Since total agreement is a measure of agreement based on any individual reading from each reader, we can see from the paper that these indices do not depend on the number of replications unlike inter-rater agreement. For the estimation and inference part, before we estimate all indices, we need to estimate the mean for each rater and all variance components first, this paper proposes a system of equations to estimate them.

Method Then delta method is used to obtain the estimates and inferences of estimates for all indices. The following table shows what transformation is used when perform the corresponding indices. IndicesTransformation method CCC-indices and precision indicesZ-transformation Accuracy and CP indicesLogit transformation TDIsNatural log transformation

Simulation In section 3, this paper gives the simulation based on binary data, ordinal data and normal data in order to evaluate the performance of proposed indices above and to compare them against other existing methods. The results are shown from table 1 to table 5, among the 5 tables, both table 1 and table 2 are the results from binary data simulation, but the binary data in table 1 has been transformed using the methods in the above table. Similarly, both table 3 and table 4 are the results from ordinal data simulation, but table 3 used the transformation. Table 5 gives normal data results with transformation. From the five tables we can see for all of them, this paper uses three cases, case one is k=2 & m=1, case two is k=4 & m=1, case three is k=2 & m=3. For each case, they generate 1000 random samples of size 20.

Simulation There are five columns in each table, the corresponding definition of each column is listed below. TheoreticalTheoretical value for this case Mean The mean of the 1000 estimated indices from the 1000 random samples. Std (Est) The standard deviation of the 1000 estimated indices from the 1000 random samples. Mean (Std)The mean of the 1000 estimated standard errors. SigProportion of estimates which are outside the 95% confidence interval

Simulation Why could they have the theoretical value? For example, for binary data with k=2 & m=1, they set the correlation equals to 0.6, the margin for the first variable is (0.3, 0.7) and the margin for the second variable is (0.5, 0.5). For binary data with k = 4 and m = 1, they set the vector mean μ = (0.55, 0.6, 0.65, 0.8) and ρ 12 = 0.75, ρ 13 = 0.7, ρ 14 = 0.5, ρ 23 = 0.8, ρ 24 = 0.6, and ρ 34 = 0.6. For binary data with k = 2 and m = 3, we set the vector mean μ = (0.7, 0.7, 0.7, 0.6, 0.6, 0.6). The correlation between any two of the first three variables is 0.8. The correlation between any two of the last three variables is also 0.8. The correlation between any one of the first three variables with any one of the last three variables is 0.7. With these settings, we can calculate the theoretical value in advance.

From the results of table 1 and table 2, all three cases perform very well, it is very straightforward to see that the numbers in first column and second column are very close and the numbers in the third column and forth column are very close, which means their estimates are very close to the corresponding theoretical value, and the means of the estimated standard error are very close to their corresponding standard deviations of the estimates. We can conclude that these indices they proposed fit well for binary data.

Table 3 and table 4 show the results for ordinal data simulation, similarly, we set the correlation and margin in advance to get the theoretical values. For both tables, the results are also similar to binary data, the numbers in first column are close to second column, so as the third column and forth column. We can say these indices fit well for ordinal data.

The last table in simulation part is the result for normal data with transformation. In order to get the theoretical value, the author have to set the precision, accuracy, within-rater precision, between-rater precision and between-rater precision in advance. Notice most of the means of estimated standard error are close to the corresponding standard deviations of the estimates except for CP inter, unlike the conclusion from the author, I would say Carrasco’s method performs better than CCC here when m=1, notice in the case of k=2 & m=3, the inter-rater agreement calculated from Barnhart’s method is a little bit bigger than ours, the reason is in Barnhart’s method, they assume m is infinite. Thus from the results of simulation, we can conclude that the indices they proposed in this paper work fairly well in estimates and in corresponding inferences for binary data, ordinal data and normal data.·

Example This paper gives two examples to illustrate the use of the unified approach. The first example is dispirin crosslinked hemoglobin (DCLHb) and the second example is assay validation. In this reading report we will discuss the results from the second example. They consider the Hemagglutinin Inhibition (HAI) assay for antibody to Influenza A (H3N2) in rabbit serum samples from two different labs. Serum samples from 64 rabbits are measured twice by each method. Antibody level is classified as: negative, positive, and highly positive.

Example In the paper, table 7 to 10 show the frequency tables for within lab and between lab readings. From table 9 and 10 we can see that lab two tends to report higher values than lab one, but table 7 and 8 suggest that within lab agreement is good.

Example Since it is an imprecise assay, the author allowed for looser agreement criteria where the agreement was defined as a within-sample total deviation not more than 50% of the total deviation if observations are from the same method, and a within- sample total deviation not more than 75% of the total deviation if observations are from different methods. Thus we get a least acceptable CCC intra of 0.75, and a least acceptable CCC inter of

Example Table 11 shows the result, the item 97.5% confidence limit is the one- sided 97.5% lower confidence limits of its corresponding agreement statistics. Now let’s see the data in this table, for example, precision intra was estimated to be , which means for observations from the same method, the within-sample deviation is about 34.1% of the total deviations. The CCC inter is estimated to be , which means for the average observations from different methods, the within- sample deviation is about 79.2% of the total deviations.

Conclusion Measuring agreements between different methods or different raters have received a great deal of attention recently, in this paper they proposes several indices include CCC, precision, accuracy, CP and TDI. They used these indices to measure intra, inter and total agreement among all raters. From the simulation part, we have figured out that these indices fit fairly well for binary, ordinal and normal data, and in the example of HAI assay, they also point out the ineffective of these indices for the agreement between two labs readings and suggest that kappa or weighted kappa could be applied to get the agreement within each lab.

Further Research we can consider to include the link functions such as log or logit in the GEE method in order to make the approach be more robust to different types of data. And in this paper, the variance components functions are based on balanced data, for the missing data, we may modify these functions or develop new functions.