Systematic Review Module 9: Quantitative Synthesis I Joseph Lau, MD Thomas Trikalinos, MD, PhD Tufts EPC
CER Process Overview 1
Learning Objectives Basic principles of combining data Common metrics for meta-analysis Basics of combining results across studies and effects of weights Meaning of heterogeneity Fixed effect and random effects model 2
Synonyms for Meta-analysis Quantitative overview Research (evidence) synthesis Research integration Pooling (less precise—suggests data from multiple sources are simply lumped together) Combining (preferred by some— connotation of applying procedures to data) 3
Caveats of Meta-analysis Few will criticize you for doing a systematic review. But as soon as you combine data (or draw conclusions based on “similarly grouped” studies), you will likely get disagreements. Most meta-analyses are retrospective exercises, suffering from all the problems of being an observational design. We cannot make up missing information or fix badly collected, analyzed, or reported data. Therefore, care is needed when deciding on the type of data that should be included in a meta-analysis. 4
Why Is That? Apples and oranges (heterogeneity) Garbage in, garbage out (quality) Selection of outcomes (soft or hard) Selection of studies Publication bias Many assumptions are used in quantifying results (there are no “assumption-free” statistics) 5
Reasons for Meta-analysis Get an overall estimate of treatment effect Appreciate the degree of uncertainty Appreciate heterogeneity Forces you to think rigorously about the data 6
Types of Data that Could Be Combined in Meta-analysis of Summary Data Dichotomous (events, e.g., deaths) Measures (odds ratios, correlations) Continuous data (mmHg, pain scores, proportions) Survival curves Diagnostic test performance (sensitivity, specificity) “Effect size” 7
Basic Principles in Combining Data For each analysis, one study should contribute only one effect. The effect may be a single outcome or a composite of several independent outcomes. Effect being combined should be the same across studies or similar. Know your question. The question drives your systematic review, meta-analysis, and interpretation of the results. 8
Things to Know about the Data before Combining Biological plausibility Scale Fragility of small numbers 9
True Associations May Disappear When You Combine Data Inappropriately 10
Apparent Association May Be Seen When There is None 11
Same Changes in One Scale May Have Different Meaning in Another Both A–B and C–D involve a change of one absolute unit. A–B change (1 to 2) represents a 100% relative change. C–D change (7 to 8) represents only a 14% relative change. 12
Same Change in One Scale May Have Different Meaning in Another Scale TreatmentControl StudyEventsTotalRateEventsTotalRate Relative Risk Absolute Risk A % %0.510% B % %0.50.1% 13
Effect of Small Changes on the Estimate (small numbers give fragile estimates) Baseline case Effect of decrease of 1 event Effect of increase of 1 event Relative change of estimate 2/1020%1/1010%3/1030% ± 50% 20/10020%19/10019%21/10021% ± 5% 200/1,00020%199/1, %201/1, % ± 0.5% 14
Dichotomous Outcomes Binary outcomes, event or no event, yes or no It is the most common type of outcomes reported in clinical trials Some examples are dead or alive, stroke or no stroke, cure or failure 2x2 tables commonly used to report their results Sometimes continuous variables are converted into dichotomous outcomes. For example, a threshold value may be used to report pain scores as improved or not improved 15
Example of 2x2 Table: ISIS-2 StreptokinasePlacebo Vascular deaths 7911,029 Survive7,8017,566 TOTAL8,5928,595 Randomized trial of intravenous streptokinase, oral aspirin, both, or neither among 17,817 cases of suspected acute myocardial infarction Lancet 1988;ii:
OR = (a d) / (b c) Definitions of Treatment Effects from 2x2 Table 17
Available Metrics for Combining Dichotomous Outcome Data Odds ratio (OR) Risk ratio (RR) Risk difference (RD) NNT (number needed to treat) can be derived (inverse of the combined risk difference) = 1/RD 18
Effect Size Dimensionless metric The basic idea is to combine standard deviations of diverse types of related effects However, availability and selection of reported effects may be biased, variable importance of different effects Frequently used in education, social science literature Infrequently used in medicine, difficulty in interpreting results 19
Properties of Odds Ratio Desirable mathematical properties, unbiased estimator Symmetrical outcome meaning (the odds of dying is equal to the opposite [inverse] of the odds of living). The 0.5 odds of dying is 2.0 odds of living. Can approximate risk ratio at low event rates Not easy to interpret 20
Properties of Risk Difference Symmetrical meaning of outcome (5% more of dying is 5% less of living) Magnitude of effect directly interpretable NNT can be calculated and clinically useful Risk difference across studies more likely to be heterogeneous Combining heterogeneous RD in a meta- analysis may not be meaningful Unbiased estimator 21
Properties of Risk Ratio Easy to understand by clinicians Needs to be interpreted in view of baseline rate Asymmetric meaning for outcome (the risk ratio of dying is not the same as the opposite of the risk ratio of living) Less desirable mathematical properties (not an unbiased estimator) Unstable variance (usually not a big problem) 22
The Complementary Outcome of Risk Ratio Is Not Symmetrical DeadAliveTotal Treatment Control OR (dead) = 20x60 / 40x80 = 0.25 OR (alive) = 80x40 / 20x60 = 4.0 RR (dead) = 20/100 / 40/100 = 0.5 RR (alive) = 80/100 / 60/100 =
Calculation of Treatment Effects of ISIS-2 Data StreptokinasePlacebo Vascular deaths 7911,029 Survive7,8017,566 TOTAL8,5928,595 RR = / = 0.77 OR = (791 x 7566) / (1029 x 7801) = 0.75 RD = – = TR = 791/8592 = CR = 1029 / 8595 =
ISIS-2 Streptokinase vs. Placebo Vascular Death Estimate with the 95% CI Estimate 95% CI Risk ratio (RR) –0.84 Odds ratio (OR) –0.82 Risk difference (RD) − − 0.037– − NNT (1/RD) 3627–54 25
Beta-Blockers after Myocardial Infarction - Secondary Prevention Experiment Control Odds 95% CI N Study Year Obs Tot Obs Tot Ratio Low High === ============ ==== ====== ====== ====== ====== ===== ===== ===== 1 Reynolds Wilhelmsson Ahlmark Multctr. Int Baber Rehnqvist Norweg.Multr Taylor BHAT Julian Hansteen Manger Cats Rehnqvist ASPS EIS LITRG Herlitz
Simpson’s Paradox (Rothman, Modern Epidemiology) (I) Suppose a man enters a shop to buy a hat and finds a table of 30 hats, 10 black and 20 gray. He discovers that 9 of 10 black hats fit, but only 17 of the 20 gray hats fit. Thus, he notes that the proportion of black hats that fit is 90% compared with 85% of the gray hats. At another table in the same shop, he finds another 30 hats, 20 black and 10 gray. At this table, 3 (15%) of the black hats fit, but only 1 (10%) of the gray hats fits. 27
Simpson’s Paradox (II) Before he chooses a hat, the shop closes for the evening, so he returns on the following morning. Overnight, the clerk has piled all the hats on the same table: Now there are 30 hats of each color. The shopper remembers that yesterday the proportion of black hats that fit was greater at each of the two tables. Today he finds that, although all the same hats are displayed, when mixed together only 40% (12 of 30) of the black hats fit, whereas 60% (18 of 30) of the gray hats fit. 28
Simpson’s Paradox (III) Fit Not fit Black91 Gray173 Fit Black317 Gray19 Fit Black1218 Gray1812 Table 1 Table 2 Pooling Tables 1 and 2 black (90%) > gray (85%)black (15%) > gray (10%) black (40%) < gray (60%) 29
What Is the “Average (Overall)” Treatment—Control Difference in DBP? 30
Simple Average (−6.2) + (−7.7) + (−0.1) 3 = −4.7 mmHg −4.7 mmHg ____ 31
Weighted Average (554 x 6.2) + (304 x 7.7) + (39 x 0.1) (554 x − 6.2) + (304 x − 7.7) + (39 x − 0.1) = 6.4 mmHg − 6.4 mmHg 32
General Formula: Weighted Average Effect Size (d+) where:d i = effect size of the i th study w i = weight of the i th study k = number of studies 33
Calculation of Weights Generally the inverse of the variance of treatment effect (that captures both study size and precision) Different formula for odds ratio, risk ratio, risk difference Readily available in books and software 34
Heterogeneity (Diversity) Is it reasonable (are studies and effects sufficiently similar) to estimate an average effect? Types of heterogeneity – – Conceptual (clinical) heterogeneity – – Statistical heterogeneity 35
Conceptual (Clinical) Heterogeneity Are the studies of similar treatments, populations, settings, design, etc., such that an average effect would be clinically meaningful? 36
Endoscopic Hemostasis. An Effective Therapy for Bleeding Peptic Ulcers. Sacks HS, Chalmers TC, et al. JAMA : RCTs compared endoscopic hemostasis with standard therapy for bleeding peptic ulcer 5 different types of treatment (monopolar electrode, bipolar electrode, argon laser, neodymium-YAG laser, sclerosant injection) 4 different conditions (active bleeding, nonspurting blood vessel, no blood vessels seen, undesignated) 3 different outcomes (emergency surgery, overall mortality, recurrent bleeding) 37
Statistical Heterogeneity Is the observed variability of effects greater than that expected by chance alone? 38
A Container with a Fixed (Known) Number of White and Black Balls (fixed effect model) 39
Random Sampling from a Container with a Fixed Number of White and Black Balls (equal sample size) 40
Random Sampling from a Container with Fixed Number of White and Black Balls (different sample size) 41
Different Containers with Different Proportions of White and Black Balls (random effects model) 42
Random Sampling from Containers to Get an Overall Estimate of the Percentage of White (or Black) Balls 43
Statistical Models of Pooling 2x2 Tables Fixed Effect Model: weights studies by the inverse of the within-study (sampling) variance. Assumes a common treatment effect. The size of the study and the number of events are the main determinants of its importance. Random Effect Model: weights studies by the inverse of the sum of the within-study variation and the among-study variation. Allows for different treatment effects. Tends to be more conservative (gives broader confidence interval) when heterogeneity is present. 44
Fixed Effect Example I ask all of you to measure the height of the flag pole outside of this building. There will be some variations in the reported values, but all of you are measuring the same flag pole. Discounting potential errors (biases) from using different measuring instruments, the variation is due to “random errors” around the truth. 45
Fixed Effects Model 46
Fixed Effects Model 47
48
Random Effects Example Suppose that I am interested in knowing the average height of the flag poles in a city so that I can compare with another city’s average flag pole heights. I ask all of you to randomly measure the height of flag poles around the city. There will be a lot more variations in the reported values because of measurements of different flag poles and different measurements of the same flag pole. The greater variation is due to “random errors” around the true height of each flag pole and the distribution of the heights of different flag poles. 49
Random Effects Model 50
Random Effects Model 51
Influenza Vaccine Efficacy from Observational Studies Gross et al. Ann Intern Med
Fixed Effect and Random Effects Models Random Effects Weight Fixed Effect Weight where:v i = within study variance v * = between study variance 53
Dealing with Heterogeneity 54
Chi-Square Homogeneity Test Mantel-Haenszel NOTE: d = ln(OR i d+ = ln(OR MH )w i = 1/variance (OR i ) Variance (OR i ) = 1/a i + 1/b i + 1/c i + 1/d i 55
Summary: Basic Statistical Methods of Combining 2x2 Tables OddsRatioRiskRatio Risk Difference Fixed Effect Model Mantel- Haenszel PetoExact Inverse variance weighted Random Effects Model DerSimonian & Laird 56
Summary: Statistical Models of Combining 2x2 Tables Most meta-analyses of clinical trials combine treatment effects (risk ratio, odds ratio, risk difference) across studies to produce a common estimate, using either a fixed effect or random effect model. In practice, the results using these two models are often similar when there is little or no heterogeneity. When heterogeneity is present, the random effect model generally produces a more conservative result (smaller Z-score) with a similar estimate but with a wider confidence interval. However, there are rare exceptions of extreme heterogeneity where random effects model may yield counterintuitive results. 57
Summary Decision to do a meta-analysis should be based on well-formulated question and an appreciation of how the results will be used. Math is relatively simple. Can be easily programmed using statistical or spreadsheet software. Many commercial or shareware meta-analysis software are readily available. Decisions (e.g., fixed or random effects models, measure of effect) can be complex. Results may vary with assumptions made. 58