Download presentation
Published bySabina Harris Modified over 9 years ago
1
Statistical Methods for Multicenter Inter-rater Reliability Study
Jingyu Liu Allergan, Inc., USA July 12, 2006
2
Overview Introduction Multicenter inter-rater reliability
Statistical methods References Acknowledgement July 12, 2006
3
Introduction Can BOTOX® improve arm or hand function of a post stroke spasticity patient? July 12, 2006
4
Function Assessments and Validation
Identify function items Identify the measurement scale Are the function assessments reliable, sensitive, and clinically meaningful? What is the minimum important change (MIC) of the assessment? Assessment (scale) validation: reliability and validity Inter-rater reliability: One of the key steps in clinical outcome assessment scale validation. July 12, 2006
5
Multicenter inter-rater reliability
Inter-rater reliability study: In a clinical setting, an inter-rater reliability study is to evaluate the agreement among different raters (or physicians) by using the same assessment scale on each of the subjects (or patients) enrolled in the study. Multicenter inter-rater reliability study: It is an inter-rater reliability study conducted in multiple clinical study centers (or sites). Why conduct multicenter inter-rater reliability study? It is difficult to conduct a large inter-rater reliability study in a single clinical study center. It is closer to an actual multicenter clinical study setting (preferred by some regulatory agency). July 12, 2006
6
Statistical Methods Will focus on the following:
Inter-class correlation coefficient (ICC) based on the ANOVA method under a random effect model Introduce a new approach to evaluate the inter-rater reliability Nonparametric methods Discuss the other methods July 12, 2006
7
Statistical Methods Assessments at the i-th study center (i=1, 2, …,a): Subject Rater … … 1 … 2 : July 12, 2006
8
Statistical Methods ANOVA method to evaluate the inter-rater reliability by an inter-class correlation coefficient (ICC) under a random effect model Statistical inference on is “straightforward”. July 12, 2006
9
Statistical Methods The ICC is the Pearson correlation coefficient of the assessments made by two raters on a same subject, i.e. where The Pearson correlation coefficient measures a linear relationship of two variables. It does not necessarily measure the “absolute” agreement. As noted by Lin [1] in a special case of two raters (or two assays), a good agreement between the two raters requires that the plot of the assessments of the two raters falls closely on a line through the origin. The ICC is influenced by the subject variation. In an inter-rater reliability study, the subject variation itself is not of interest and the subjects are usually not randomly selected. July 12, 2006
10
Statistical Methods For an interval variable with a domain [a, b], Liu [2] introduce a new agreement coefficient to evaluate the inter-rater reliability as follows or where is the theoretical range of the variable (i.e., = b – a), is the measurement variance of the raters, and is a pre-defined reliability scale parameter. is invariant under a linear transformation. With this property we can compare the reliabilities of the outcome assessment scales under different domains. Most clinical outcome assessments are either interval variable or ordinal numerical variable is applicable to these variables. July 12, 2006
11
Statistical Methods where and
For a multicenter inter-rater reliability study, can be estimated by where and When there are only two raters within each center, we have July 12, 2006
12
Statistical Methods Statistical inferences such as hypothesis test, confidence interval, and power calculation are obtained based on the results provided in Liu [2]. As a special case of two raters, the can be applied in evaluating the test-retest reliability or the intra-rater reliability. In clinical research, we select in calculating can be applied in more complicated situations such as imbalanced data as well as data with missing values. July 12, 2006
13
Statistical Methods Nonparametric approaches Other Statistical methods
When the distribution assumption or the model assumption is not appropriate, we can use the nonparametric approach to evaluating the inter-rater reliability. For a single center study, we can directly use Kendall’s W For a multicenter study, Liu[3] introduced a new statistic which can be considered as an extension of Kendall’s W to evaluate the inter-rater reliability. The distribution property, hypothesis test, confidence interval, etc. are provided. Other weighted approaches Other Statistical methods July 12, 2006
14
References Lin LI. A concordance correlation coefficient to evaluate reproducibility. Biometrics 1989: 45, Liu J. Measures of inter-rater reliability for interval data. (submitted for publication). Liu J. A nonparametric approach to evaluating multicenter inter-rater reliability. (submitted for publication). Shoukri M. Measures of Interobserver Agreement. Chapman & Hall/CRC. 2004 Fleiss JL. The Design and Analysis of Clinical Experiments. John Wiley & Sons Schuck P. Assessing reproducibility for interval data in health-related quality of life questionnaires: Which coefficient should be used? Quality of Life Research 2004: 13, Searle SR. Linear Model. John Wiley & Sons Kendall MG. A new measure of rank correlation. Biometrika :81-93. Mehta C, Patel N, Proc-StatXact 4, CYTEL Software Corporation, 1999. Friedman M. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of American Statistical Association, : July 12, 2006
15
Acknowledgement Thanks to Allergan Biostatistics and BOTOX-Neurology for the supports on this research. I also wish to thank the supports from the sponsors of the 2006 International Conference on Design of Experiments and Its Application. July 12, 2006
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.