27 de Maio de 2008
A Meta-Analysis is a review in which bias has been reduced by the systematic identification, appraisal, synthesis and statistical aggregation of all relevant studies on a specific topic according to a predetermined and explicit method. Overview Systematic Review Meta-Analysis
In 1987, a survey showed that only 24, out of 86 English-language meta-analyses, reported all the six areas considered important to be part of a meta-analysis: Study Design Control of Bias Statistical Analysis Application of Results Sensitivity Analysis CombinalityCombinality
In 1992 this survey was updated with 78 meta-analyses and the researchers noted that methodology has definitely improved since their first survey; However it needed better searches of the: Literature; Quality evaluations of trials; Synthesis of the results. In 1992 this survey was updated with 78 meta-analyses and the researchers noted that methodology has definitely improved since their first survey; However it needed better searches of the: Literature; Quality evaluations of trials; Synthesis of the results.
So, in 1999, several researchers suggested and created the Quality of Reporting of Meta-Analyses (QUOROM) Statement to improve and standardise reporting. The QUOROM Statement – that includes a checklist and a trial flow diagram – describes the preferred way to present the different sections of a report of a Meta- Analysis. It is organized into 21 headings and subheadings. So, in 1999, several researchers suggested and created the Quality of Reporting of Meta-Analyses (QUOROM) Statement to improve and standardise reporting. The QUOROM Statement – that includes a checklist and a trial flow diagram – describes the preferred way to present the different sections of a report of a Meta- Analysis. It is organized into 21 headings and subheadings.
The number of published meta-analyses has definitely increased over time. According to a study, after the QUOROM statement the estimated mean quality score of the reports increased from 2.8 (95% CI: ) to 3.7 (95% CI: ), representing an estimated improvement of 0.96 (95% CI: ; p = two sided T-test). However, the QUOROM group admits itself that this checklist requires continuous research in order to improve the quality of a meta-analysis. The number of published meta-analyses has definitely increased over time. According to a study, after the QUOROM statement the estimated mean quality score of the reports increased from 2.8 (95% CI: ) to 3.7 (95% CI: ), representing an estimated improvement of 0.96 (95% CI: ; p = two sided T-test). However, the QUOROM group admits itself that this checklist requires continuous research in order to improve the quality of a meta-analysis.
But what is Reproducibility? Why is it so important? Reproducibility is one of the main principles of the scientific method, which refers to the ability of a test or experiment to be accurately reproduced by someone else working independently. But what is Reproducibility? Why is it so important? Reproducibility is one of the main principles of the scientific method, which refers to the ability of a test or experiment to be accurately reproduced by someone else working independently.
The lack of reproducibility can lead to major consequences: A failure in the reproducibility will most probably end in results' heterogeneity; At a clinical level, if a diagnostic test is not reproducible there is the risk of a patient being wrongly diagnosed; Non-reproducible items of a checklist can lead to a decrease on its credibility and, consequently, of the meta-analyses that used it as a model. The lack of reproducibility can lead to major consequences: A failure in the reproducibility will most probably end in results' heterogeneity; At a clinical level, if a diagnostic test is not reproducible there is the risk of a patient being wrongly diagnosed; Non-reproducible items of a checklist can lead to a decrease on its credibility and, consequently, of the meta-analyses that used it as a model.
The question we want to answer is if the QUOROM Checklist is a reproducible method in the evaluation of Meta-Analysis. Primary Aim: Evaluate the reproducibility degree of the QUOROM Checklist The question we want to answer is if the QUOROM Checklist is a reproducible method in the evaluation of Meta-Analysis. Primary Aim: Evaluate the reproducibility degree of the QUOROM Checklist
Secondary Aims: Specify which points of the QUOROM Checklist are less reproducible; Verify if there are differences in the reproducibility between the evaluation of meta- analysis from Low Impact Factor journals and from High Impact Factor ones. Secondary Aims: Specify which points of the QUOROM Checklist are less reproducible; Verify if there are differences in the reproducibility between the evaluation of meta- analysis from Low Impact Factor journals and from High Impact Factor ones.
Our target population was the meta-analyses. We had to select a considerable sample of meta-analyses, so we decided to select a total of 52. Our inclusion criteria were: The article being published in a medicine subjects’ journal; The article being published in a journal with impact factor ≤ 2 or ≥ 8; The article reporting a meta-analysis; The article being published in the last three years ( ); Having access to online full text. Our target population was the meta-analyses. We had to select a considerable sample of meta-analyses, so we decided to select a total of 52. Our inclusion criteria were: The article being published in a medicine subjects’ journal; The article being published in a journal with impact factor ≤ 2 or ≥ 8; The article reporting a meta-analysis; The article being published in the last three years ( ); Having access to online full text.
First, we separated 40 journals using a Stratified Sampling Method. From Journals of ISI Web of Knowledge that fit our criteria, we selected: 20 Journals 0 < IF ≤ 2 (1234 Journals) 0 < IF ≤ 2 (1234 Journals) IF ≥ 8 (82 Journals) IF ≥ 8 (82 Journals) IF – Impact Factor Low IF Journals High IF Journals
Low IF Journals: 48 Meta-Analyses Low IF Journals: 48 Meta-Analyses 26 Pool n.1 Low IF Meta- Analyses After this, we proceeded to the selection of the Meta- Analyses. For that, we used a Multi- Stage Sampling Method. The totality of the Journals’ articles were removed from each stratum, following the inclusion criteria previously described. After this, we proceeded to the selection of the Meta- Analyses. For that, we used a Multi- Stage Sampling Method. The totality of the Journals’ articles were removed from each stratum, following the inclusion criteria previously described. 26 Pool n.2 High IF Meta- Analyses High IF Journals: 219 Meta-Analyses We repeated the whole process of selection of the journals until we had 26 meta-analyses in each pool.
Low IF Meta- Analyses High IF Meta- Analyses The Impact Factor of the journal from where each Meta- Analysis came, the Name of the Journal, the Authors and the Year of Publication were recorded in a database, which was kept secret until the evaluation of the checklist was concluded. It was used only at the end to find out if Reproducibility and Impact Factor were related. Pool n.1 Pool n.2
Low IF Meta- Analyses Pool n.1 Pool n.2 High IF Meta- Analyses Pool n.3 52 Meta-Analyses Finally, we mixed all the articles in a single pool, occulting the strata from each one came.
Before analyzing we established some rules that helped us understanding each item of the checklist: If a certain item was present in the meta-analysis, but not in the place the checklist determines, we would not consider the item present; When a item had more than one point, we would only consider it present if the meta-analysis answered to more than half of the points; Before analyzing we established some rules that helped us understanding each item of the checklist: If a certain item was present in the meta-analysis, but not in the place the checklist determines, we would not consider the item present; When a item had more than one point, we would only consider it present if the meta-analysis answered to more than half of the points;
At the item (e), we would give more importance to the point that ensures the replication of the methods; At the item (o), the meta-analysis had to have a diagram describing the trial flow, so that the item could be considered. At the item (e), we would give more importance to the point that ensures the replication of the methods; At the item (o), the meta-analysis had to have a diagram describing the trial flow, so that the item could be considered.
1 st Evaluation: 4 Articles per Student 1 st Evaluation: 4 Articles per Student Articles were mixed again 2 nd Evaluation: 4 Articles per Student 2 nd Evaluation: 4 Articles per Student Each Student could not analyse the same article twice This way each student analysed 8 different articles Evaluation consisted in attributing a number to each item: Number 1 to those which were covered in the Meta- Analyses; Number 0 to those which were not. This data was inserted in SPSS program.
Thus, our study can be classified as an observational, cross sectional study, whose methods are characteristic of a survey study, and whose purpose is to study the reproducibility.
Our variables are: The actual Impact Factor of the journals from which we randomly selected the articles; The year of publication of the articles; The Impact Factor of the journals from which we randomly selected the articles at the year of publication; The classification of each item of the checklist: we considered thirty-six categorical variables, which can have two numerical codes: 1 or 0. These were our expected outcome of research. Our variables are: The actual Impact Factor of the journals from which we randomly selected the articles; The year of publication of the articles; The Impact Factor of the journals from which we randomly selected the articles at the year of publication; The classification of each item of the checklist: we considered thirty-six categorical variables, which can have two numerical codes: 1 or 0. These were our expected outcome of research.
From the classification of the items we had other variables: Summation of the present items by observer 1; Summation of the present items by observer 2; Average of the two summations; Difference between the summations; Absolute value of difference between the summations; Number of agreements between the two observers by article. From the classification of the items we had other variables: Summation of the present items by observer 1; Summation of the present items by observer 2; Average of the two summations; Difference between the summations; Absolute value of difference between the summations; Number of agreements between the two observers by article.
Global Reproducibility The comparison of the summation of each observer was done using the ICC method (Intraclass Correlation Coefficient). Then we represented the Limits of Agreement (95% CI) of the “difference between the summations” in a scatterplot: For that, we had to be sure that this variable followed a normal distribution and, if so, to calculate the mean and the standard deviation, all this by making an histogram. We also compared the two variables “Absolute value of difference between the summations” and “Number of agreements between the two observers by article” in a scatterplot. Global Reproducibility The comparison of the summation of each observer was done using the ICC method (Intraclass Correlation Coefficient). Then we represented the Limits of Agreement (95% CI) of the “difference between the summations” in a scatterplot: For that, we had to be sure that this variable followed a normal distribution and, if so, to calculate the mean and the standard deviation, all this by making an histogram. We also compared the two variables “Absolute value of difference between the summations” and “Number of agreements between the two observers by article” in a scatterplot.
Agreement in each Item of the Checklist (reproducibility of each Item) We made eighteen crosstabs to calculate: The proportion of agreement and 95% confidence intervals*; Positive proportion of agreement; Negative proportion of agreement; Kappa Factor. * we used a normal distribution but with those whose limit of confidence intervals was over one, we used a binomial distribution. Agreement in each Item of the Checklist (reproducibility of each Item) We made eighteen crosstabs to calculate: The proportion of agreement and 95% confidence intervals*; Positive proportion of agreement; Negative proportion of agreement; Kappa Factor. * we used a normal distribution but with those whose limit of confidence intervals was over one, we used a binomial distribution.
Relation between IF and Reproducibility For this analysis we didn’t use the actual impact factor, but the one at the year of publication of the articles*. We made two scatterplots, to see if there was correlation between: The “difference between the summations” and impact factor; The “number of agreements between the two observers by article” and the impact factor. * As the ISI Web of Knowledge database wasn’t updated with the impact factors of 2007, in the articles published in that year we used the impact factor of Relation between IF and Reproducibility For this analysis we didn’t use the actual impact factor, but the one at the year of publication of the articles*. We made two scatterplots, to see if there was correlation between: The “difference between the summations” and impact factor; The “number of agreements between the two observers by article” and the impact factor. * As the ISI Web of Knowledge database wasn’t updated with the impact factors of 2007, in the articles published in that year we used the impact factor of 2006.
We analysed 52 meta-analyses, which score had mean equal to 13,97, with a standard deviation of 2,95. Global analysis of the QUOROM checklist ICC = 0,729 ; 95% CI = [ 0,571 ; 0,835 ]. The ICC method revealed that 72,9% of the total variance is explained by the variance between the articles. We analysed 52 meta-analyses, which score had mean equal to 13,97, with a standard deviation of 2,95. Global analysis of the QUOROM checklist ICC = 0,729 ; 95% CI = [ 0,571 ; 0,835 ]. The ICC method revealed that 72,9% of the total variance is explained by the variance between the articles.
Histogram: differences between the summations L.A.: [- 4,934 ; 4,434]; 95% of the cases were within this interval. L.A.: [- 4,934 ; 4,434]; 95% of the cases were within this interval.
Comparison between number of agreements and absolute value of difference
The item that presents higher proportion of agreement was (q). It was the only item in which the observers always agreed with each other (100% PA). The item that presented lower proportion of agreement was (k). It also presents the lowest kappa, i.e. only 5% of agreement is not due to hazard. Although the items (h) and (r) have a high proportion of agreement, they have the negative proportion of agreement equal to zero, because the two observers had never agreed in the negative (observer 1 always considered these items present in all articles). Being one of the variables constant, kappa was not applicable. The kappa and the proportion of agreement vary approximately the same way. However, there are some items that present a considerable disparity, such as item (p). The positive proportion of agreement is higher than the negative, which means that the observers agreed more in the positive than in the negative. Analysis of each item of the QUOROM checklist
Correlation between impact factor and reproducibility r = – 0,002 ; p = 0,986 r = 0,108 ; p = 0,448 No correlation was found between these two variables and Impact Factor: In both scatterplots there was no preferential orientation of the points.
Global analysis of the QUOROM checklist The ICC we got can be seen as a good one, but this has to be interpreted carefully: the ICC could be increased by our result’s considerable high variance (heterogeneity). Global analysis of the QUOROM checklist The ICC we got can be seen as a good one, but this has to be interpreted carefully: the ICC could be increased by our result’s considerable high variance (heterogeneity). The limits of agreement are considerably high, allowing us to conclude about the QUOROM checklist’s weak global reproducibility. We also note that the mean of “difference between the summations” is lower than zero.
This means that there was a systematic error during the study: Generally, the Summation of the 2nd Evaluation was higher than the 1st Evaluation, which explains the negative mean. This error may be related to the fact that during the second analysis of the articles, the evaluators had a greater confidence, facility and dexterity in the application of the checklist. This way they could find some items in the meta-analyses that were not found at the first observation. This means that there was a systematic error during the study: Generally, the Summation of the 2nd Evaluation was higher than the 1st Evaluation, which explains the negative mean. This error may be related to the fact that during the second analysis of the articles, the evaluators had a greater confidence, facility and dexterity in the application of the checklist. This way they could find some items in the meta-analyses that were not found at the first observation. Difference = Sum of 1 st Evaluation – Sum of 2 nd Evaluation
This means that a pair of observers whose summations had the same value, didn’t necessarily agree in the same topics of the checklist. So, the limits of agreement could be even higher. This means that a pair of observers whose summations had the same value, didn’t necessarily agree in the same topics of the checklist. So, the limits of agreement could be even higher. In the scatterplot we can see that some values of “diff” are below the line equation, which means that, despite being low, they do not correspond to correct values of agreement.
Analysis of each item of the QUOROM checklist Item (q) Quantitative data synthesis in the Results section; We thought that this would be one of the less reproducible items of the list because it includes many sub-items; Highest P.A.; Objective and explicit item; easy to identify. Analysis of each item of the QUOROM checklist Item (q) Quantitative data synthesis in the Results section; We thought that this would be one of the less reproducible items of the list because it includes many sub-items; Highest P.A.; Objective and explicit item; easy to identify.
Item (a) - Title almost total agreement; a simple item, easy to understand. Items (h) and (r) Introduction and Discussion respectively ; Almost total P.A.; Essential in articles, so it is easy to agree about their presence. Item (a) - Title almost total agreement; a simple item, easy to understand. Items (h) and (r) Introduction and Discussion respectively ; Almost total P.A.; Essential in articles, so it is easy to agree about their presence.
Item (e) Review Methods in the Abstract section; Low PA; Many sub-items; Quantitative data synthesis in sufficient detail to permit replication. Item (e) Review Methods in the Abstract section; Low PA; Many sub-items; Quantitative data synthesis in sufficient detail to permit replication.
Item (m) Study characteristics on the Methods section; Low P.A.; Not so clear as desirable: “participants’ characteristics”; Many sub-items: “how clinical heterogeneity was assessed”. We also think that the observers may be confused by the existence of two items with the same name – Study characteristics – one in the Methods section and another on the Results section. Item (m) Study characteristics on the Methods section; Low P.A.; Not so clear as desirable: “participants’ characteristics”; Many sub-items: “how clinical heterogeneity was assessed”. We also think that the observers may be confused by the existence of two items with the same name – Study characteristics – one in the Methods section and another on the Results section.
Item (k) Validity assessment on the Methods section; Lowest P.A.; Not an explicit item; The value of kappa is so low that it seems its qualification was done by hazard. The positive P.A. was always higher than the negative. This tells us that we were more sure when we said yes that when we said no. The existence of many sub-items lead to doubts in qualifying items that presented only some sub-items and not all of them. Item (k) Validity assessment on the Methods section; Lowest P.A.; Not an explicit item; The value of kappa is so low that it seems its qualification was done by hazard. The positive P.A. was always higher than the negative. This tells us that we were more sure when we said yes that when we said no. The existence of many sub-items lead to doubts in qualifying items that presented only some sub-items and not all of them.
Correlation between impact factor and reproducibility Despite what we expected, there was no correlation between Impact Factor and Reproducibility. We thought that the analyses of the Articles that came from high Impact Factor Journals would present more concordance between our two reviewers, because, on our regard, those Articles were submitted to a more severe revision. Thus, they would probably satisfy more topics of the Quorum Checklist. However, this was not verified. Correlation between impact factor and reproducibility Despite what we expected, there was no correlation between Impact Factor and Reproducibility. We thought that the analyses of the Articles that came from high Impact Factor Journals would present more concordance between our two reviewers, because, on our regard, those Articles were submitted to a more severe revision. Thus, they would probably satisfy more topics of the Quorum Checklist. However, this was not verified.
The QUOROM checklist is reasonably reproducible However, some items should be re-evaluated and we propose a change in order to achieve a better degree of reproducibility No correlation was found between Reproducibility and Impact Factor The QUOROM checklist is reasonably reproducible However, some items should be re-evaluated and we propose a change in order to achieve a better degree of reproducibility No correlation was found between Reproducibility and Impact Factor
Ana Elisabete Costa Ana Rita Miranda Beatriz Carvalho Isabel Bravo João Moura Mariana Pereira Miguel Teles Pedro Marcos Sara Costa Sara Leite Sílvia Paredes Tatiana Gomes Valter Moreira Professora Cristina Santos Ana Elisabete Costa Ana Rita Miranda Beatriz Carvalho Isabel Bravo João Moura Mariana Pereira Miguel Teles Pedro Marcos Sara Costa Sara Leite Sílvia Paredes Tatiana Gomes Valter Moreira Professora Cristina Santos