27 de Maio de 2008. A Meta-Analysis is a review in which bias has been reduced by the systematic identification, appraisal, synthesis and statistical.

Slides:

Advertisements

Similar presentations

What is a review? An article which looks at a question or subject and seeks to summarise and bring together evidence on a health topic.

Advertisements

Estimating a Population Proportion

© 2011 Pearson Education, Inc

Sampling Distributions

Effect Size and Meta-Analysis

Conducting systematic reviews for development of clinical guidelines 8 August 2013 Professor Mike Clarke

ASYMPTOTIC PROPERTIES OF ESTIMATORS: PLIMS AND CONSISTENCY

Math 161 Spring 2008 What Is a Confidence Interval?

15 de Abril de A Meta-Analysis is a review in which bias has been reduced by the systematic identification, appraisal, synthesis and statistical.

Research Methods in MIS: Sampling Design

QUOROM checklist: are meta-analyses in good hands? Introdução à Medicina Turma Outubro 2007.

Point and Confidence Interval Estimation of a Population Proportion, p

Journal Club Alcohol and Health: Current Evidence January-February 2006.

Meta-analysis & psychotherapy outcome research

Estimating a Population Proportion

7-2 Estimating a Population Proportion

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.

19 de Dezembro de A Meta-Analysis is a review in which bias has been reduced by the systematic identification, appraisal, synthesis and statistical.

Sample size calculations

Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.

Quantitative Methods – Week 7: Inductive Statistics II: Hypothesis Testing Roman Studer Nuffield College

8/20/2015Slide 1 SOLVING THE PROBLEM The two-sample t-test compare the means for two groups on a single variable. the The paired t-test compares the means.

Are the results valid? Was the validity of the included studies appraised?

Research Methods. Research Projects  Background Literature  Aims and Hypothesis  Methods: Study Design Data collection approach Sample Size and Power.

Chapter 13: Inference in Regression

Determining Sample Size

Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.

Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.

Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.

PTP 560 Research Methods Week 8 Thomas Ruediger, PT.

Lecture 3 A Brief Review of Some Important Statistical Concepts.

Evidence Based Medicine Meta-analysis and systematic reviews Ross Lawrenson.

Analyzing and Interpreting Quantitative Data

Evolution of the reporting of diagnostic accuracy in obstetrics and gynecology literature between 1995 and 2006 Class1: Ana Costa, André Sena, Duarte Pinto,

Appraising Randomized Clinical Trials and Systematic Reviews October 12, 2012 Mary H. Palmer, PhD, RN, C, FAAN, AGSF University of North Carolina at Chapel.

Literature searching & critical appraisal Chihaya Koriyama August 15, 2011 (Lecture 2)

Meta-analysis and “statistical aggregation” Dave Thompson Dept. of Biostatistics and Epidemiology College of Public Health, OUHSC Learning to Practice.

Evidence-Based Medicine: What does it really mean? Sports Medicine Rounds November 7, 2007.

6/4/2016Slide 1 The one sample t-test compares two values for the population mean of a single variable. The two-sample t-test of population means (aka.

Appraisal and Its Application to Counseling COUN 550 Saint Joseph College For Class # 3 Copyright © 2005 by R. Halstead. All rights reserved.

How to write a scientific article Nikolaos P. Polyzos M.D. PhD.

Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.

통계적 추론 (Statistical Inference) 삼성생명과학연구소 통계지원팀 김선우 1.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7-1 Review and Preview.

Jeopardy Hypothesis Testing t-test Basics t for Indep. Samples Related Samples t— Didn’t cover— Skip for now Ancient History $100 $200$200 $300 $500 $400.

META-ANALYSIS, RESEARCH SYNTHESES AND SYSTEMATIC REVIEWS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.

Review Characteristics This review protocol was prospectively registered with BEME (see flow diagram). Total number of participants involved in the included.

Chapter 10 The t Test for Two Independent Samples

Chapter 6: Analyzing and Interpreting Quantitative Data

Chapter 8: Simple Linear Regression Yang Zhenlin.

EBM --- Journal Reading Presenter ：呂宥達 Date ： 2005/10/27.

Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.

Sample Size Determination

Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

How to Read a Journal Article. Basics Always question: – Does this apply to my clinical practice? – Will this change how I treat patients? – How could.

Systematic reviews and meta-analyses: when and how to do them Andrew Smith Royal Lancaster Infirmary 18 May 2015.

Experimental Psychology PSY 433 Chapter 5 Research Reports.

1 Chapter 11 Understanding Randomness. 2 Why Random? What is it about chance outcomes being random that makes random selection seem fair? Two things:

1 Lecture 10: Meta-analysis of intervention studies Introduction to meta-analysis Selection of studies Abstraction of information Quality scores Methods.

CORRELATION-REGULATION ANALYSIS Томский политехнический университет.

PSY 325 AID Education Expert/psy325aid.com FOR MORE CLASSES VISIT

NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.

Howard Community College

Sample Size Determination

Experimental Psychology

Supplementary Table 1. PRISMA checklist

Elementary Statistics

STROBE Statement revision

Lecture Slides Elementary Statistics Twelfth Edition

Meta-analysis, systematic reviews and research syntheses

Presentation transcript:

27 de Maio de 2008

A Meta-Analysis is a review in which bias has been reduced by the systematic identification, appraisal, synthesis and statistical aggregation of all relevant studies on a specific topic according to a predetermined and explicit method. Overview Systematic Review Meta-Analysis

In 1987, a survey showed that only 24, out of 86 English-language meta-analyses, reported all the six areas considered important to be part of a meta-analysis: Study Design Control of Bias Statistical Analysis Application of Results Sensitivity Analysis CombinalityCombinality

In 1992 this survey was updated with 78 meta-analyses and the researchers noted that methodology has definitely improved since their first survey; However it needed better searches of the: Literature; Quality evaluations of trials; Synthesis of the results. In 1992 this survey was updated with 78 meta-analyses and the researchers noted that methodology has definitely improved since their first survey; However it needed better searches of the: Literature; Quality evaluations of trials; Synthesis of the results.

So, in 1999, several researchers suggested and created the Quality of Reporting of Meta-Analyses (QUOROM) Statement to improve and standardise reporting. The QUOROM Statement – that includes a checklist and a trial flow diagram – describes the preferred way to present the different sections of a report of a Meta- Analysis. It is organized into 21 headings and subheadings. So, in 1999, several researchers suggested and created the Quality of Reporting of Meta-Analyses (QUOROM) Statement to improve and standardise reporting. The QUOROM Statement – that includes a checklist and a trial flow diagram – describes the preferred way to present the different sections of a report of a Meta- Analysis. It is organized into 21 headings and subheadings.

The number of published meta-analyses has definitely increased over time. According to a study, after the QUOROM statement the estimated mean quality score of the reports increased from 2.8 (95% CI: ) to 3.7 (95% CI: ), representing an estimated improvement of 0.96 (95% CI: ; p = two sided T-test). However, the QUOROM group admits itself that this checklist requires continuous research in order to improve the quality of a meta-analysis. The number of published meta-analyses has definitely increased over time. According to a study, after the QUOROM statement the estimated mean quality score of the reports increased from 2.8 (95% CI: ) to 3.7 (95% CI: ), representing an estimated improvement of 0.96 (95% CI: ; p = two sided T-test). However, the QUOROM group admits itself that this checklist requires continuous research in order to improve the quality of a meta-analysis.

But what is Reproducibility? Why is it so important? Reproducibility is one of the main principles of the scientific method, which refers to the ability of a test or experiment to be accurately reproduced by someone else working independently. But what is Reproducibility? Why is it so important? Reproducibility is one of the main principles of the scientific method, which refers to the ability of a test or experiment to be accurately reproduced by someone else working independently.

The lack of reproducibility can lead to major consequences: A failure in the reproducibility will most probably end in results' heterogeneity; At a clinical level, if a diagnostic test is not reproducible there is the risk of a patient being wrongly diagnosed; Non-reproducible items of a checklist can lead to a decrease on its credibility and, consequently, of the meta-analyses that used it as a model. The lack of reproducibility can lead to major consequences: A failure in the reproducibility will most probably end in results' heterogeneity; At a clinical level, if a diagnostic test is not reproducible there is the risk of a patient being wrongly diagnosed; Non-reproducible items of a checklist can lead to a decrease on its credibility and, consequently, of the meta-analyses that used it as a model.

The question we want to answer is if the QUOROM Checklist is a reproducible method in the evaluation of Meta-Analysis. Primary Aim: Evaluate the reproducibility degree of the QUOROM Checklist The question we want to answer is if the QUOROM Checklist is a reproducible method in the evaluation of Meta-Analysis. Primary Aim: Evaluate the reproducibility degree of the QUOROM Checklist

Secondary Aims: Specify which points of the QUOROM Checklist are less reproducible; Verify if there are differences in the reproducibility between the evaluation of meta- analysis from Low Impact Factor journals and from High Impact Factor ones. Secondary Aims: Specify which points of the QUOROM Checklist are less reproducible; Verify if there are differences in the reproducibility between the evaluation of meta- analysis from Low Impact Factor journals and from High Impact Factor ones.

Our target population was the meta-analyses. We had to select a considerable sample of meta-analyses, so we decided to select a total of 52. Our inclusion criteria were: The article being published in a medicine subjects’ journal; The article being published in a journal with impact factor ≤ 2 or ≥ 8; The article reporting a meta-analysis; The article being published in the last three years ( ); Having access to online full text. Our target population was the meta-analyses. We had to select a considerable sample of meta-analyses, so we decided to select a total of 52. Our inclusion criteria were: The article being published in a medicine subjects’ journal; The article being published in a journal with impact factor ≤ 2 or ≥ 8; The article reporting a meta-analysis; The article being published in the last three years ( ); Having access to online full text.

First, we separated 40 journals using a Stratified Sampling Method. From Journals of ISI Web of Knowledge that fit our criteria, we selected: 20 Journals 0 < IF ≤ 2 (1234 Journals) 0 < IF ≤ 2 (1234 Journals) IF ≥ 8 (82 Journals) IF ≥ 8 (82 Journals) IF – Impact Factor Low IF Journals High IF Journals

Low IF Journals: 48 Meta-Analyses Low IF Journals: 48 Meta-Analyses 26 Pool n.1 Low IF Meta- Analyses After this, we proceeded to the selection of the Meta- Analyses. For that, we used a Multi- Stage Sampling Method. The totality of the Journals’ articles were removed from each stratum, following the inclusion criteria previously described. After this, we proceeded to the selection of the Meta- Analyses. For that, we used a Multi- Stage Sampling Method. The totality of the Journals’ articles were removed from each stratum, following the inclusion criteria previously described. 26 Pool n.2 High IF Meta- Analyses High IF Journals: 219 Meta-Analyses We repeated the whole process of selection of the journals until we had 26 meta-analyses in each pool.

Low IF Meta- Analyses High IF Meta- Analyses The Impact Factor of the journal from where each Meta- Analysis came, the Name of the Journal, the Authors and the Year of Publication were recorded in a database, which was kept secret until the evaluation of the checklist was concluded. It was used only at the end to find out if Reproducibility and Impact Factor were related. Pool n.1 Pool n.2

Low IF Meta- Analyses Pool n.1 Pool n.2 High IF Meta- Analyses Pool n.3 52 Meta-Analyses Finally, we mixed all the articles in a single pool, occulting the strata from each one came.

Before analyzing we established some rules that helped us understanding each item of the checklist: If a certain item was present in the meta-analysis, but not in the place the checklist determines, we would not consider the item present; When a item had more than one point, we would only consider it present if the meta-analysis answered to more than half of the points; Before analyzing we established some rules that helped us understanding each item of the checklist: If a certain item was present in the meta-analysis, but not in the place the checklist determines, we would not consider the item present; When a item had more than one point, we would only consider it present if the meta-analysis answered to more than half of the points;

At the item (e), we would give more importance to the point that ensures the replication of the methods; At the item (o), the meta-analysis had to have a diagram describing the trial flow, so that the item could be considered. At the item (e), we would give more importance to the point that ensures the replication of the methods; At the item (o), the meta-analysis had to have a diagram describing the trial flow, so that the item could be considered.

1 st Evaluation: 4 Articles per Student 1 st Evaluation: 4 Articles per Student Articles were mixed again 2 nd Evaluation: 4 Articles per Student 2 nd Evaluation: 4 Articles per Student Each Student could not analyse the same article twice This way each student analysed 8 different articles Evaluation consisted in attributing a number to each item: Number 1 to those which were covered in the Meta- Analyses; Number 0 to those which were not. This data was inserted in SPSS program.

Thus, our study can be classified as an observational, cross sectional study, whose methods are characteristic of a survey study, and whose purpose is to study the reproducibility.

Our variables are: The actual Impact Factor of the journals from which we randomly selected the articles; The year of publication of the articles; The Impact Factor of the journals from which we randomly selected the articles at the year of publication; The classification of each item of the checklist: we considered thirty-six categorical variables, which can have two numerical codes: 1 or 0. These were our expected outcome of research. Our variables are: The actual Impact Factor of the journals from which we randomly selected the articles; The year of publication of the articles; The Impact Factor of the journals from which we randomly selected the articles at the year of publication; The classification of each item of the checklist: we considered thirty-six categorical variables, which can have two numerical codes: 1 or 0. These were our expected outcome of research.

From the classification of the items we had other variables: Summation of the present items by observer 1; Summation of the present items by observer 2; Average of the two summations; Difference between the summations; Absolute value of difference between the summations; Number of agreements between the two observers by article. From the classification of the items we had other variables: Summation of the present items by observer 1; Summation of the present items by observer 2; Average of the two summations; Difference between the summations; Absolute value of difference between the summations; Number of agreements between the two observers by article.

Global Reproducibility The comparison of the summation of each observer was done using the ICC method (Intraclass Correlation Coefficient). Then we represented the Limits of Agreement (95% CI) of the “difference between the summations” in a scatterplot: For that, we had to be sure that this variable followed a normal distribution and, if so, to calculate the mean and the standard deviation, all this by making an histogram. We also compared the two variables “Absolute value of difference between the summations” and “Number of agreements between the two observers by article” in a scatterplot. Global Reproducibility The comparison of the summation of each observer was done using the ICC method (Intraclass Correlation Coefficient). Then we represented the Limits of Agreement (95% CI) of the “difference between the summations” in a scatterplot: For that, we had to be sure that this variable followed a normal distribution and, if so, to calculate the mean and the standard deviation, all this by making an histogram. We also compared the two variables “Absolute value of difference between the summations” and “Number of agreements between the two observers by article” in a scatterplot.

Agreement in each Item of the Checklist (reproducibility of each Item) We made eighteen crosstabs to calculate: The proportion of agreement and 95% confidence intervals*; Positive proportion of agreement; Negative proportion of agreement; Kappa Factor. * we used a normal distribution but with those whose limit of confidence intervals was over one, we used a binomial distribution. Agreement in each Item of the Checklist (reproducibility of each Item) We made eighteen crosstabs to calculate: The proportion of agreement and 95% confidence intervals*; Positive proportion of agreement; Negative proportion of agreement; Kappa Factor. * we used a normal distribution but with those whose limit of confidence intervals was over one, we used a binomial distribution.

Relation between IF and Reproducibility For this analysis we didn’t use the actual impact factor, but the one at the year of publication of the articles*. We made two scatterplots, to see if there was correlation between: The “difference between the summations” and impact factor; The “number of agreements between the two observers by article” and the impact factor. * As the ISI Web of Knowledge database wasn’t updated with the impact factors of 2007, in the articles published in that year we used the impact factor of Relation between IF and Reproducibility For this analysis we didn’t use the actual impact factor, but the one at the year of publication of the articles*. We made two scatterplots, to see if there was correlation between: The “difference between the summations” and impact factor; The “number of agreements between the two observers by article” and the impact factor. * As the ISI Web of Knowledge database wasn’t updated with the impact factors of 2007, in the articles published in that year we used the impact factor of 2006.

We analysed 52 meta-analyses, which score had mean equal to 13,97, with a standard deviation of 2,95. Global analysis of the QUOROM checklist ICC = 0,729 ; 95% CI = [ 0,571 ; 0,835 ]. The ICC method revealed that 72,9% of the total variance is explained by the variance between the articles. We analysed 52 meta-analyses, which score had mean equal to 13,97, with a standard deviation of 2,95. Global analysis of the QUOROM checklist ICC = 0,729 ; 95% CI = [ 0,571 ; 0,835 ]. The ICC method revealed that 72,9% of the total variance is explained by the variance between the articles.

Histogram: differences between the summations L.A.: [- 4,934 ; 4,434]; 95% of the cases were within this interval. L.A.: [- 4,934 ; 4,434]; 95% of the cases were within this interval.

Comparison between number of agreements and absolute value of difference

The item that presents higher proportion of agreement was (q). It was the only item in which the observers always agreed with each other (100% PA). The item that presented lower proportion of agreement was (k). It also presents the lowest kappa, i.e. only 5% of agreement is not due to hazard. Although the items (h) and (r) have a high proportion of agreement, they have the negative proportion of agreement equal to zero, because the two observers had never agreed in the negative (observer 1 always considered these items present in all articles). Being one of the variables constant, kappa was not applicable. The kappa and the proportion of agreement vary approximately the same way. However, there are some items that present a considerable disparity, such as item (p). The positive proportion of agreement is higher than the negative, which means that the observers agreed more in the positive than in the negative. Analysis of each item of the QUOROM checklist

Correlation between impact factor and reproducibility r = – 0,002 ; p = 0,986 r = 0,108 ; p = 0,448 No correlation was found between these two variables and Impact Factor: In both scatterplots there was no preferential orientation of the points.

Global analysis of the QUOROM checklist The ICC we got can be seen as a good one, but this has to be interpreted carefully: the ICC could be increased by our result’s considerable high variance (heterogeneity). Global analysis of the QUOROM checklist The ICC we got can be seen as a good one, but this has to be interpreted carefully: the ICC could be increased by our result’s considerable high variance (heterogeneity). The limits of agreement are considerably high, allowing us to conclude about the QUOROM checklist’s weak global reproducibility. We also note that the mean of “difference between the summations” is lower than zero.

This means that there was a systematic error during the study: Generally, the Summation of the 2nd Evaluation was higher than the 1st Evaluation, which explains the negative mean. This error may be related to the fact that during the second analysis of the articles, the evaluators had a greater confidence, facility and dexterity in the application of the checklist. This way they could find some items in the meta-analyses that were not found at the first observation. This means that there was a systematic error during the study: Generally, the Summation of the 2nd Evaluation was higher than the 1st Evaluation, which explains the negative mean. This error may be related to the fact that during the second analysis of the articles, the evaluators had a greater confidence, facility and dexterity in the application of the checklist. This way they could find some items in the meta-analyses that were not found at the first observation. Difference = Sum of 1 st Evaluation – Sum of 2 nd Evaluation

This means that a pair of observers whose summations had the same value, didn’t necessarily agree in the same topics of the checklist. So, the limits of agreement could be even higher. This means that a pair of observers whose summations had the same value, didn’t necessarily agree in the same topics of the checklist. So, the limits of agreement could be even higher. In the scatterplot we can see that some values of “diff” are below the line equation, which means that, despite being low, they do not correspond to correct values of agreement.

Analysis of each item of the QUOROM checklist Item (q) Quantitative data synthesis in the Results section; We thought that this would be one of the less reproducible items of the list because it includes many sub-items; Highest P.A.; Objective and explicit item; easy to identify. Analysis of each item of the QUOROM checklist Item (q) Quantitative data synthesis in the Results section; We thought that this would be one of the less reproducible items of the list because it includes many sub-items; Highest P.A.; Objective and explicit item; easy to identify.

Item (a) - Title almost total agreement; a simple item, easy to understand. Items (h) and (r) Introduction and Discussion respectively ; Almost total P.A.; Essential in articles, so it is easy to agree about their presence. Item (a) - Title almost total agreement; a simple item, easy to understand. Items (h) and (r) Introduction and Discussion respectively ; Almost total P.A.; Essential in articles, so it is easy to agree about their presence.

Item (e) Review Methods in the Abstract section; Low PA; Many sub-items; Quantitative data synthesis in sufficient detail to permit replication. Item (e) Review Methods in the Abstract section; Low PA; Many sub-items; Quantitative data synthesis in sufficient detail to permit replication.

Item (m) Study characteristics on the Methods section; Low P.A.; Not so clear as desirable: “participants’ characteristics”; Many sub-items: “how clinical heterogeneity was assessed”. We also think that the observers may be confused by the existence of two items with the same name – Study characteristics – one in the Methods section and another on the Results section. Item (m) Study characteristics on the Methods section; Low P.A.; Not so clear as desirable: “participants’ characteristics”; Many sub-items: “how clinical heterogeneity was assessed”. We also think that the observers may be confused by the existence of two items with the same name – Study characteristics – one in the Methods section and another on the Results section.

Item (k) Validity assessment on the Methods section; Lowest P.A.; Not an explicit item; The value of kappa is so low that it seems its qualification was done by hazard. The positive P.A. was always higher than the negative. This tells us that we were more sure when we said yes that when we said no. The existence of many sub-items lead to doubts in qualifying items that presented only some sub-items and not all of them. Item (k) Validity assessment on the Methods section; Lowest P.A.; Not an explicit item; The value of kappa is so low that it seems its qualification was done by hazard. The positive P.A. was always higher than the negative. This tells us that we were more sure when we said yes that when we said no. The existence of many sub-items lead to doubts in qualifying items that presented only some sub-items and not all of them.

Correlation between impact factor and reproducibility Despite what we expected, there was no correlation between Impact Factor and Reproducibility. We thought that the analyses of the Articles that came from high Impact Factor Journals would present more concordance between our two reviewers, because, on our regard, those Articles were submitted to a more severe revision. Thus, they would probably satisfy more topics of the Quorum Checklist. However, this was not verified. Correlation between impact factor and reproducibility Despite what we expected, there was no correlation between Impact Factor and Reproducibility. We thought that the analyses of the Articles that came from high Impact Factor Journals would present more concordance between our two reviewers, because, on our regard, those Articles were submitted to a more severe revision. Thus, they would probably satisfy more topics of the Quorum Checklist. However, this was not verified.

The QUOROM checklist is reasonably reproducible However, some items should be re-evaluated and we propose a change in order to achieve a better degree of reproducibility No correlation was found between Reproducibility and Impact Factor The QUOROM checklist is reasonably reproducible However, some items should be re-evaluated and we propose a change in order to achieve a better degree of reproducibility No correlation was found between Reproducibility and Impact Factor

Ana Elisabete Costa Ana Rita Miranda Beatriz Carvalho Isabel Bravo João Moura Mariana Pereira Miguel Teles Pedro Marcos Sara Costa Sara Leite Sílvia Paredes Tatiana Gomes Valter Moreira Professora Cristina Santos Ana Elisabete Costa Ana Rita Miranda Beatriz Carvalho Isabel Bravo João Moura Mariana Pereira Miguel Teles Pedro Marcos Sara Costa Sara Leite Sílvia Paredes Tatiana Gomes Valter Moreira Professora Cristina Santos