Download presentation
1
Basics of Meta-analysis
Steff Lewis, Rob Scholten Cochrane Statistical Methods Group (Thanks to the many people who have worked on earlier versions of this presentation)
2
Introduction
3
Effect measures – what they mean
Session plan Introduction Effect measures – what they mean Exercise 1 Meta-analysis Exercise 2 Heterogeneity Exercise 3 Summary
4
Before we start…this workshop will be discuss binary outcomes only…
e.g. dead or alive, pain free or in pain, smoking or not smoking each participant is in one of two possible, mutually exclusive, states There are other workshops for continuous data, etc
5
Where to start You need a pre-defined question “Does aspirin increase the chance of survival to 6 months after an acute stroke?” “Does inhaling steam decrease the chance of a sinus infection in people who have a cold?”
6
Collect data from all the trials and enter into Revman
Where to start Collect data from all the trials and enter into Revman For each trial you need: The total number of patients in each treatment group. The number of patients who had the relevant outcome in each treatment group
7
Effect measures – what they mean
8
In Revman you can choose:
Which effect measure? In Revman you can choose: Relative Risk (RR) = Risk Ratio, Odds Ratio (OR) Risk Difference (RD) = Absolute Risk Reduction (ARR),
9
Risk 24 people skiing down a slope, and 6 fall risk of a fall = 6 falls/24 who could have fallen = 6/24 = ¼ = 0.25 = 25% risk = number of events of interest total number of observations
10
Odds 24 people skiing down a slope, and 6 fall odds of a fall = 6 falls/18 did not fall = 6/18 = 1/3 = (not usually as %) odds = number of events of interest number without the event
11
Expressing it in words Risk the chances of falling were one in four, or 25% Odds the chances of falling were one third of the chances of not falling one person fell for every three that didn’t fall the chances of falling were 3 to 1 against
12
Do risks and odds differ much?
Control arm of trial by Blum 130 people still dyspeptic out of 164 chance of still being dyspeptic risk = 130/164 = 0.79; odds =130/34 = 3.82 Tanzania trial, control arm 4 cases in 63 women chance of pregnancy induced hypertension risk = 4/63 = 0.063; odds = 4/59 = 0.068 eg1 - Moayeddi et al BMJ 2000;321:659-64 eg2 - Knight M et al. Antiplatelet agents for preventing and treating pre-eclampsia (Cochrane Review). In: The Cochrane Library, Issue 3, Oxford: Update Software.
13
Comparing groups – 2x2 tables
Blum et al Still dyspeptic Not still dyspeptic Total Treatment 119 45 164 Control 130 34 249 79 328
14
Risk ratio (relative risk)
Blum et al Still dyspeptic Not still dyspeptic Total Treat 119 45 164 Control 130 34 249 79 328 risk of event on treatment = 119/164 risk of event on control = 130/164 risk ratio = 119/164 = = 0.92 130/ = risk on treatment risk on control Where risk ratio = 1, this implies no difference in effect
15
odds of event on treatment = 119/45 odds of event on control = 130/34
Odds ratio Blum et al Still dyspeptic Not still dyspeptic Total Treat 119 45 164 Control 130 34 249 79 328 odds of event on treatment = 119/45 odds of event on control = 130/34 odds ratio = 119/45 = 2.64 = 0.69 130/ = odds on treatment odds on control Where odds ratio = 1, this implies no difference in effect
16
What is the difference between Peto OR and OR?
The Peto Odds Ratio is an approximation to the Odds Ratio that works particularly well with rare events
17
Expressing risk ratios and odds ratios
the risk of still being dyspeptic on treatment was about 92% of the risk on control treatment reduced the risk by about 8% treatment reduced the risk to 92% of what it was Odds ratio 0.69 treatment reduced the odds by about 30% the odds of still being dyspeptic in treated patients were about two-thirds of what they were in controls
18
(Absolute) Risk difference
risk on treatment – risk on control for Blum et al 119/164 – 130/164 = – 0.793 = usually expressed as a %, -6.7% treatment reduced the risk of being dyspeptic by about 7 percentage points Where risk difference = 0, this implies no difference in effect
19
What do we want from our summary statistic?
Communication of effect Users must be able to use the result Consistency of effect It would be ideal to have one number to apply in all situations Mathematical properties We would like these three properties everyone knows what it means and it is useful for communicating treatment effects in practice it would be nice to have one number we can take away and use in a range of situations easy to manipulate mathematically
20
Further info in “Dealing with dichotomous data” workshop.
Summary OR RR RD Communication - + ++ Consistency _ Mathematics Summary table Few people use odds because of communication RR or RD usually Further info in “Dealing with dichotomous data” workshop.
21
Exercise 1
22
Meta-analysis
23
What is meta-analysis? A way to calculate an average Estimates an ‘average’ or ‘common’ effect Improves the precision of an estimate by using all available data
24
What is a meta-analysis?
Optional part of a systematic review Systematic reviews Meta-analyses
25
When can we do a meta-analysis?
When more than one study has estimated an effect When there are no differences in the study characteristics that are likely to substantially affect outcome When the outcome has been measured in similar ways When the data are available (take care with interpretation when only some data are available)
26
A simple average gives each study equal weight
Averaging studies Starting with the summary statistic for each study, how should we combine these? A simple average gives each study equal weight This seems intuitively wrong Some studies are more likely to give an answer closer to the ‘true’ effect than others One way of combining studies would be to calculate a simple average – but this seems intuitively wrong
27
More weight to the studies which give us more information
Weighting studies More weight to the studies which give us more information More participants More events Lower variance Weight is closely related to the width of the study confidence interval: wider confidence interval = less weight What is done in meta-analysis is that studies are weighted – so that the studies that give us more information get more weight. Check if this makes sense to participant group
28
For example Deaths on hypothermia Deaths on control Weight (%)
Clifton 1992 1/5 3.6 Clifton 1993 8/23 8/22 21.5 Hirayama 1994 4/12 5/10 11.3 Jiang 1996 6/23 14/24 23.4 Marion 1997 9/39 10/42 30.0 Meissner 1998 3/12 3/13 9.7 Usually, it is Marion that participants identify. Point out that it is in fact Jiang that gets the most weight, and that this is because it has a higher event rate, even though Marion has a higher number of participants.
29
Displaying results graphically
Revman produces forest plots We’ll now go through a forest plot, looking at the various components
30
there’s a label to tell you what the comparison is and what the outcome of interest is
31
Here the outcome is death and towards the left the
At the bottom there’s a horizontal line. This is the scale measuring the treatment effect. Here the outcome is death and towards the left the scale is less than one, meaning the treatment has made death less likely. Take care to read what the labels say – things to the left do not always mean the treatment is better than the control. Note that when using RevMan you can change the labels at the bottom of the graph, if need be
32
The vertical line in the
middle is where the treatment and control have the same effect – there is no difference between the two
33
The data for each trial are here, divided into the experimental and control groups This is the % weight given to this study in the pooled analysis For each study there is an id
34
The label above the graph tells you what statistic has been used
The data shown in the graph are also given numerically Talk about the blob first, and then the CI, then show them the numerical data to the right The label above the graph tells you what statistic has been used Each study is given a blob, placed where the data measure the effect. The size of the blob is proportional to the % weight The horizontal line is called a confidence interval and is a measure of how we think the result of this study might vary with the play of chance. The wider the horizontal line is, the less confident we are of the observed effect.
35
The pooled analysis is given a diamond shape
where the widest bit in the middle is located at the calculated best guess (point estimate), and the horizontal width is the confidence interval Definition of a 95% confidence interval: If a trial was repeated 100 times, then 95 out of those 100 times, the best guess (point estimate) would lie within this interval.
36
Could we just add the data from all the trials together?
One approach to combining trials would be to add all the treatment groups together, add all the control groups together, and compare the totals This is wrong for several reasons, and it can give the wrong answer It is wrong to add the studies together because it gives the wrong result
37
If we just add up the columns we get 34.3% vs 32.5% , a RR of 1.06,
An extreme example, where simple addition even makes the result seem to go in the wrong direction. Compare the two approaches If we just add up the columns we get 34.3% vs 32.5% , a RR of 1.06, a higher death rate in the steroids group From a meta-analysis, we get RR=0.96 , a lower death rate in the steroids group
38
Problems with simple addition of studies
breaks the power of randomisation imbalances within trials introduce bias Totally lose the power of randomisation (that it provides comparable groups) Remember that this is the whole reason for choosing this design of study in the first place
39
In effect we are comparing this experimental group directly
* # In effect we are directly comparing Cooper treatment group with Gaab control group (or any of them) . It makes no sense to go to all the trouble of finding randomised studies only to ignore their design when we come to analyse them. In effect we are comparing this experimental group directly with this control group – this is not a randomised comparison
40
* The Pitts trial contributes 17% (201/1194) of all the data to the
As Pitts contributes relatively more data to the experimental column, the average death rate in that column is pulled up towards the high level in the Pitts trial more than it is in the control column. The Pitts trial contributes 17% (201/1194) of all the data to the experimental column, but 8% (74/925) to the control column. Therefore it contributes more information to the average death rate in the experimental column than it does to the control column. There is a high death rate in this trial, so the death rate for the expt column is higher than the control column.
41
Interpretation - “Evidence of absence” vs “Absence of evidence”
If the confidence interval crosses the line of no effect, this does not mean that there is no difference between the treatments It means we have found no statistically significant difference in the effects of the two interventions
42
In the example below, as more data is included, the overall odds ratio remains the same but the confidence interval decreases. It is not true that there is ‘no difference’ shown in the first rows of the plot – there just isn’t enough data to show a statistically significant result.
43
Interpretation - Weighing up benefit and harm
When interpreting results, don’t just emphasise the positive results. A treatment might cure acne instantly, but kill one person in 10,000 (very important as acne is not life threatening).
44
Interpretation - Quality
Rubbish studies = unbelievable results If all the trials in a meta-analysis were of very low quality, then you should be less certain of your conclusions. Instead of “Treatment X cures depression”, try “There is some evidence that Treatment X cures depression, but the data should be interpreted with caution.”
45
Exercise 2
46
Heterogeneity
47
What is heterogeneity? Heterogeneity is variation between the studies’ results
48
Causes of heterogeneity
Differences between studies with respect to: Patients: diagnosis, in- and exclusion criteria, etc. Interventions: type, dose, duration, etc. Outcomes: type, scale, cut-off points, duration of follow-up, etc. Quality and methodology: randomised or not, allocation concealment, blinding, etc. We can look at heterogeneity from two perspectives: the clinical and the statistical point of view. The clinical perspective concerns the qualitative differences between studies, which lead us to expect the studies to give slightly different results. For example the patients may be different. They may have different diagnoses: if the outcome is cancer, different studies may include different tumour types; if the outcome is cardiovascular events, some studies may include angina while others may exclude it. The inclusion and exclusion criteria of studies may differ. The patients may differ in age or sex. If we are interested in interventions for alcohol abuse, it may not make sense to lump together studies of middle-aged women and studies of young men who binge drink. The interventions may differ in type, dose and duration. When considering dietary interventions for HT, would we put together diets to lose weight and diets to reduce intake of polysaturated fats? The interventions may differ in dose: we might expect different results for high and low doses, for drugs given orally an intra-venously. The interventions may differ in duration: drugs may given by bolus and those given over a more prolonged period. The outcomes measured may differ: in cancer some studies may measure mortality, some relapse - is it valid to put these together. Studies of psychological outcomes such as depression may use different scales to measure depression. If depression is treated as a Yes/No dichotomous outcome, different studies may use different cut-points to define depression. The duration of follow-up may matter a lot: Studies of cancer may have very different results if follow-up is one year, 5 years or 10 years. Finally, studies may differ in their quality: some may be randomised, some may not be: non-randomised studies often give a larger estimate of the effect of treatment - we say they tend to be biased some studies may be designed so that neither the patient nor the person who gives the treatment can predict in advance which treatment group a patient will be assigned to - we call this allocation concealment. studies which do not have allocation concealment have again been shown to be biased and to over-estimate the treatment effect some studies may be designed so that neither the patient, not the treatment provider nor the person which assesses the outcome know which treatment group the patients was in - we say that these studies are blinded. Studies which are not blind tend to over-estimate the treatment effect These are qualitative sources of heterogeneity which we can predict from what we know about the design of the studies.
49
How to deal with heterogeneity
1. Do not pool at all 2. Ignore heterogeneity: use fixed effect model 3. Allow for heterogeneity: use random effects model 4. Explore heterogeneity: (“Dealing with heterogeneity” workshop ) What do we do if there is statistical heterogeneity? We have 4 choices: - don’t pool the studies - do pool them, but ignore the heterogeneity. The standard fixed effects model which we have just used does this. - or we can pool the studies, but allow for the heterogeneity between them using a random effects model - or we can investigate the heterogeneity and use special statistical methods to try to explain it. Let’s look at these options in turn.
50
How to assess heterogeneity from a Revman forest plot
51
Statistical measures of heterogeneity
The Chi2 test measures the amount of variation in a set of trials, and tells us if it is more than would be expected by chance Small p values suggest that heterogeneity is present This test is not very good at detecting heterogeneity. Often a cut-off of p<0.10 is used, but lack of statistical significance does not mean there is no heterogeneity The most important thing to remember about this is how to apply it – covered already. However, here is some technical information about the chi squared test for heterogeneity.
52
Statistical measures of heterogeneity (2)
A new statistic, I2 is available in RevMan 4.2 I2 is the proportion of variation that is due to heterogeneity rather than chance Large values of I2 suggest heterogeneity Roughly, I2 values of 25%, 50%, and 75% could be interpreted as indicating low, moderate, and high heterogeneity For more info see: Higgins JPT et al. Measuring inconsistency in meta-analyses. BMJ 2003;327: The most important thing to remember about this is how to apply it – covered already. However, here is some technical information about the chi squared test for heterogeneity.
53
Philosophy behind fixed effect model:
there is one real value for the treatment effect all trials estimate this one value Problems with ignoring heterogeneity: confidence intervals too narrow We can ignore the heterogeneity and use the standard fixed effects model which we have used above. The philosophy behind the fixed effects model is that there is one real value for the parameter of interest and all trials estimate this one and only real value. Differences between estimates of the treatment effect are assumed to be caused by variation within studies (= variation between patients = sampling variation) Problems with ignoring heterogeneity are firstly, that the confidence intervals on the pooled overall effect are too narrow, so we may believe a treatment has a significant effect when in fact, it does not and, secondly, it is difficult to interpret the pooled estimate - it may not apply to all the different types of patients, interventions and outcomes in the various studies.
54
Philosophy behind random effects model:
there are many possible real values for the treatment effect (depending on dose, duration, etc etc). each trial estimates its own real value But if there is heterogeneity between studies, we can allow for it using a random effects model. This has a different philosophical basis. It assumes that there is a whole range of real values of the treatment effect - because there is a whole range of populations of different patients. Each trial estimates its own real value. The real values from all the different studies are not exactly the same, but they are closely related. So there are two sources of variation: variation within studies = between patients (as before) variation between studies
55
Interpretation of fixed and random effects results
If there is heterogeneity, Fixed effect and Random effects models: may give different pooled estimates have different interpretations: RD = 0.3: Fixed Effects Model The best estimate of the one and only real RD is 0.3 RD = 0.3: Random Effects Model The best estimate of the mean of all possible real values of the RD is 0.3 Random Effects Model gives wider confidence interval In practice, people tend to interpret fixed and random effects the same way. If there is heterogeneity, Fixed and Random Effects Models may give different pooled estimates - but they usually don’t They have different interpretations: If a Fixed Effects Model gives a RD = 0.3: , it means that The best estimate of the one and only real RD is 0.3 If a Random Effects Model gives a RD = 0.3: , it means that The best estimate of the mean of all possible real values of the RD is 0.3 Random Effects Model gives wider confidence interval - so it gives a more conservative estimate of the effects of the treatment
56
Exercise 3
57
Summary
58
Precisely define the question you want to answer
Summary Precisely define the question you want to answer Choose an appropriate effect measure Collect data from trials and do a meta-analysis if appropriate Interpret the results carefully Evidence of absence vs absence of evidence Benefit and harm Quality Heterogeneity
59
Other sources of help and advice
The Reviewer’s handbook The distance learning material The Revman user guide. The Collaborative Review Group you are working with
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.