THE BLAND-ALTMAN LIMITS OF AGREEMENT: HOW OFTEN HAVE THEY BEEN MISAPPLIED? Introdução à Medicina – 23/Maio/2011 Turma 13.

Slides:

Advertisements

Similar presentations

Quality control tools

Advertisements

WMS-IV Wechsler Memory Scale - Fourth Edition

Introductory Mathematics & Statistics for Business

Critical Reading Strategies: Overview of Research Process

Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION

Specialist Registrar in Occupational Medicine

One-sample T-Test Matched Pairs T-Test Two-sample T-Test

Experimental Measurements and their Uncertainties

Power and sample size.

Hypothesis testing 5th - 9th December 2011, Rome.

Experimental Design and Analysis of Variance

Scientific Literature Tutorial

Publications Reviewed Searched Medline Hand screening of abstracts & papers Original study on human cancer patients Published in English before December.

Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.

Ves-Matic Cube 200 vs. Test-1

Estimation of Sample Size

15 de Abril de A Meta-Analysis is a review in which bias has been reduced by the systematic identification, appraisal, synthesis and statistical.

How does the process work? Submissions in 2007 (n=13,043) Perspectives.

April 26, 2006 Class 9 Class 9 Tutor Professor: Dra. Cristina Santos Introdução à Medicina Did the quality of diagnostic test accuracy studies in infectious.

Analysis of Simulation Input.. Simulation Machine n Simulation can be considered as an Engine with input and output as follows: Simulation Engine Input.

SAMPLE SIZE AND POWER CALCULATION

Critical Appraisal of an Article by Dr. I. Selvaraj B. SC. ,M. B. B. S

Introduction to Regression Analysis, Chapter 13,

Chapter 19: Confidence Intervals for Proportions

Chemometrics Method comparison

How to Write a Scientific Paper Hann-Chorng Kuo Department of Urology Buddhist Tzu Chi General Hospital.

Chapter 2: The Research Enterprise in Psychology

Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides

Research Methods. Research Projects  Background Literature  Aims and Hypothesis  Methods: Study Design Data collection approach Sample Size and Power.

Introduction to Linear Regression and Correlation Analysis

Chapter 2: The Research Enterprise in Psychology

Reading Scientific Papers Shimae Soheilipour

Copyright © Allyn & Bacon 2007 Chapter 2: Research Methods.

Academic Viva POWER and ERROR T R Wilson. Impact Factor Measure reflecting the average number of citations to recent articles published in that journal.

Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.

Prof. of Clinical Chemistry, Mansoura University.

Chapter 2 The Research Enterprise in Psychology. Table of Contents The Scientific Approach: A Search for Laws Basic assumption: events are governed by.

Standard Error and Confidence Intervals Martin Bland Professor of Health Statistics University of York

Estimating a Population Proportion

Evaluation of refractive error measurements obtained by three different aberrometers Radha Ram, BA Li Wang, MD, PhD Mitchell P. Weikert, MD, MS Disclosure:

Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7-1 Review and Preview.

How to Read Scientific Journal Articles

Introduction to Hypothesis Testing: the z test. Testing a hypothesis about SAT Scores (p210) Standard error of the mean Normal curve Finding Boundaries.

Chapter 2 The Research Enterprise in Psychology. Table of Contents The Scientific Approach: A Search for Laws Basic assumption: events are governed by.

Chapter 6: Analyzing and Interpreting Quantitative Data

EBM --- Journal Reading Presenter ：呂宥達 Date ： 2005/10/27.

Sample Size Determination

Advanced Science and Technology Letters Vol.47 (Education 2014), pp Instructor’s Evaluation on Importance.

Unit 11: Evaluating Epidemiologic Literature. Unit 11 Learning Objectives: 1. Recognize uniform guidelines used in preparing manuscripts for publication.

Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.

Statistics 19 Confidence Intervals for Proportions.

Asteroid Strike! Research the answers to these questions: What caused the extinction of the dinosaurs? What is the evidence for this theory? What were.

Core Research Competencies:

Clinical practice involves measuring quantities for a variety of purposes, such as: aiding diagnosis, predicting future patient outcomes, serving as endpoints.

AP Seminar: Statistics Primer

Sample Size Determination

Understanding Standards: Advanced Higher Event

LIES, MORE LIES AND STATISTICS

AP Seminar: Statistics Primer

Understanding Results

AN INTRODUCTION TO EDUCATIONAL RESEARCH.

Advanced Science and Technology Letters Vol. 47 (Education 2014), pp

ABSTRACT PRESENTATION Marcus Vinicius Nascimento-Ferreira

INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Test Review: Ch. 7-9

There is Maths in Psychology!

CHAPTER 26: Inference for Regression

Overview and Chi-Square

Sampling Distributions

The Scientific Method.

Presentation transcript:

THE BLAND-ALTMAN LIMITS OF AGREEMENT: HOW OFTEN HAVE THEY BEEN MISAPPLIED? Introdução à Medicina – 23/Maio/2011 Turma 13

INTRODUCTION

Background  Due to the advances of technology, new methods of clinical measurement appear constantly, and they keep becoming more innovating. 1  In the 80s, Bland and Altman took knowledge of the wide use of the correlation coefficient as a way to evaluate the agreement between two methods of clinical measurement.  They realized it wasn’t adequate.  So, they created their own method - the limits of agreement of Bland-Altman Zietman A, Goitein M, Tepper JE. Technology evolution: is it survival of the fittest? Journal of Clinical Oncology: official journal of the American Society of Clinical Oncology, 2010 Sep 20; 28(27): Altman DG, Bland JM. Statistical Methods For Assessing Agreement Between Two Methods of Clinical Measurement. Lancet, 1986; i:

Statistical methods for assessing agreement between two methods of clinical measurement  The Lancet, 1986  Objective of the method  Assess the agreement between two methods of clinical measurement  Importance:  If the agreement isn’t accomplished, there is a high risk of diagnosis mistakes, which may lead to severe consequences Stoker, Mark. Common Errors in Clinical Measurement. Anesthesia & Intensive Care Medicine, December 2008; volume 9, issue 12:

Average Difference Correlation coefficient Instrument 1 Instrument 2 Measurement How do we apply the method? Altman DG, Bland JM. Statistical Methods For Assessing Agreement Between Two Methods of Clinical Measurement. Lancet, 1986; i:

= 0  No systematic error ≠ 0  Systematic error

If the limits of agreement are…  There are random mistakes associated with the measuring instrument;  It is unacceptable for clinical use.  There is a systematic error ;  The measuring device must be calibrated. Too wide… Small but the average of the differences is ≠ 0…

The evaluation of whether the limits of agreement are too wide or, on the other hand, adequate, may be a little subjective. Thereby, it is important that the maximum limits of agreement are defined according to the clinical needs.

Assumptions the differences between the measured values must follow a normal distribution; the standard deviation must be constant / there must be no relation between the averages and the differences; Images: Bland JM and Altman DG. Applying the Right Statistics: Analyses of Measurement Studies. Ultrasound in Obstetrics and Gynecology, 2003; 22,

Example of the existence of a relation between the averages and differences Images: Bland JM and Altman DG. Applying the Right Statistics: Analyses of Measurement Studies. Ultrasound in Obstetrics and Gynecology, 2003; 22,

had a great impact on the scientific community and, after being published in The Lancet, was quoted BUT, some of the quotes/applications of this method may not have been correctly made! Bland and Altman noticed themselves that their limits of agreement were being misapplied and, thereby, led to false conclusions about the agreement between two instruments of clinical measurement. 5 more than times Ryan TP and Woodall WH. The Most Cited Statistical Papers. Journal of Applied Statistics, 2005; 32: Bland JM and Altman DG. Applying the Right Statistics: Analyses of Measurement Studies. Ultrasound in Obstetrics and Gynecology, 2003; 22,

RESEARCH QUESTION AND AIMS

“What is the percentage of articles in which the Bland-Altman method is applied correctly?” Research Question

Our secondary aims are to find out: at what level the method is misapplied which assumption is the least fulfilled one if, through the years, the percentage of articles applying the method incorrectly has varied if the percentage of articles applying the method correctly varies according to whether it is used to obtain primary or secondary data. what percentage of articles fit into each of the document types defined by ISI. if the impact factor of a journal influences the percentage of articles published in it that apply the method correctly

METHODS

Methods  Sample  70 articles indexed by ISI that cite the article where Bland and Altman expose their method, published by The Lancet

Check-list  Evaluates the article when it comes to the:  Verification of the assumptions;  Application of the method itself;  Interpretation of the obtained limit of agreement.

Check-list  Evaluates the article when it comes to the:  Verification of the assumptions;  Application of the method itself;  Interpretation of the obtained limit of agreement.  The check list will also gather some relevant data related to the articles: type of article and year and journal in which it was published.

Reprodutibility of the check-list Student A Student B Article X Comparison between the answers given between the two students.

To analyze our results…  We calculated the median of the impact factor and year of publication  Created two groups ≤ median > median

How do we know the differences are significant? Data of the tables related to the year, journal of publication and type of data of each article Chi Square Test PERLA, Rocco J, CARIFIO James. Use of the Chi-square Test to Determine Significance of Cumulative Antibiogram Data. American Journal of Infectious Diseases, 2005; 1 (4):

EXPECTED RESULTS

Many articles will have misapplied the method  Main reason  lack of verification of the assumptions;  wrong verification of the assumptions. 5  Least fulfilled assumption  verifying if the differences follow a normal distribution Why? It requires the construction of a different graph (histogram of the differences), while the other assumption can be verified by analysis of the averages vs. differences one, which is often used to observe the limits of agreement. 5 - Bland JM and Altman DG. Applying the Right Statistics: Analyses of Measurement Studies. Ultrasound in Obstetrics and Gynecology, 2003; 22,

There will be variations of the percentage of articles misapplying the method throughout the years  WHY? researchers started to notice that the method was being misapplied  HOW? they realized that two methods of clinical measurement that had passed the test of Bland-Altman in terms of agreement weren’t actually agreeing very much.  Example: didn’t agree when it came to higher values than the ones used for the test Bland JM and Altman DG. Applying the Right Statistics: Analyses of Measurement Studies. Ultrasound in Obstetrics and Gynecology, 2003; 22,

The impact factor of a journal must have influence in the percentage of misapplications of the method present in the articles published there Why? > Impact Factor> Quality > Attention to scientific correction

RESULTS

The two students which disagreed re-evaluated the question and came to an agreement. - To ensure the correct analisis of the articles  two students analized the same article There was an agreement of 100% in all questions, except for the one that asked if the article had interpreted the outcome correctly according to the clinical needs, in which there was a disagreement relative to 1 article Of the 5 articles analyzed by two different students Reproducibility of the check list

What percentage of articles fit into each of the document types defined by ISI. Articles Reviews Meeting abstracts - 70 Reprints - 2 Proceeding papers Notes Corrections/Addictions - 2 Correction - 1 Letters n= 18360

Out of those 56, 5 weren’t applications of the Bland and Altman limits of agreement, while 51 were. 70 (articles and proceedings papers) The Sample

THE MAIN FINDINGS of our study in regards to our original research question and aims:

Table 1. n(%) of articles which fullfill each point of the check-list.

Table 2 – Percentage of articles fulfilling each main point of the check list, divided according to the impact factor of the journal where they were published. We used a Chi-Square test to compare the percentages amongst the two levels of impact factor.LA – Limits of agreement. … if the impact factor of a journal influences the percentage of articles published in it that apply the method correctly p>0,05!!

Table 3 – Percentage of articles fulfilling each main point of the check list, divided according to the year when they were published. We used a Chi-Square test to compare the percentages amongst the two levels of impact factor. LA – Limits of agreement.* - statistically significant. …if, through the years, the percentage of articles applying the method incorrectly has varied p<0,05!!

…if the percentage of articles applying the method correctly varies according to whether it is used to obtain primary or secondary data. Table 4 – Percentage of articles fulfilling each main point of the check list, divided according to the type of data obtained by using the limits of agreement. We used a Chi-Square test to compare the percentages amongst the two levels of impact factor. LA – Limits of agreement.* - statistically significant.

DISCUSSION

“What is the percentage of articles in which the Bland-Altman method is applied incorrectly?” The 7 articles where this assumption was applied correctly - a mere 14%– also correctly fulfilled the first one So, the errors of articles that correctly applied the second assumption were only minor ones. Table 1. n(%) of articles which fullfill each point of the check-list.

… if the impact factor of a journal influences the percentage of articles published in it that apply the method correctly Table 2 – Percentage of articles fulfilling each main point of the check list, divided according to the impact factor of the journal where they were published. We used a Chi-Square test to compare the percentages amongst the two levels of impact factor.LA – Limits of agreement. p>0,05!! The articles published in journals with a lower IF appear to have a higher percentage of correct applications! The differences are however not statistically significant.

…if, through the years, the percentage of articles applying the method incorrectly has varied Table 3 – Percentage of articles fulfilling each main point of the check list, divided according to the year when they were published. We used a Chi- Square test to compare the percentages amongst the two levels of impact factor. LA – Limits of agreement.* - statistically significant. p<0,05!! In every single category, the articles published at a more recent date always have a higher percentage of correct application of the method Only one of the results is not statistically significant. With the passing of time authors have come to realize that sometimes the employment of the Bland-Altman method leads to incorrect findings This would obviously lead the authors of more recent study to be more careful when employing the method.

…if the percentage of articles applying the method correctly varies according to whether it is used to obtain primary or secondary data. Table 4 – Percentage of articles fulfilling each main point of the check list, divided according to the type of data obtained by using the limits of agreement. We used a Chi-Square test to compare the percentages amongst the two levels of impact factor. LA – Limits of agreement.* - statistically significant. Articles which are using the method to obtain primary data have a higher percentage of correct application of the method than those that use it to obtain secondary data. It is more likely for authors of an article to pay more attention to the correct employment of a scientific method if it is their main method or one of their main methods for acquiring data.

Limitations of our work  Relatively small sample  Human error  No other works to cross-reference with

Acknowledgements  Professora Doutora Cristina Santos  Professor Doutor Altamiro Rodrigues da Costa Pereira  Mestre João Cláudio Antunes  Turma 4