Intro to Single Paper Meta-Analyses Courtney Soderberg Center for Open Science Statistical and Methodological Consultant
A Hypothetical Set of Studies Study Number T-Statistic P-Value N Cohen’s D 1 2.49 .0138 158 .3961 2 3.98 .0001 .6339 3 .86 .3887 .1375 4 1.41 .1611 .2241
‘Imperfect’ sets of studies are pretty likely Sampling Error What’s the likelihood of getting 4 significant results if all have 80% power? .8^4 = 41%
Sampling Distributions 30% Power 90% Power
What is a researcher to do? Hide the non-significant studies - NO! Throw a bunch of covariates at them - NO! Cry and drawn their sorrows in wine - ok, but not needed Pre-register a highly powered 5th study to ‘decide’ whether they have an effect or not Combine your evidence - YAY!
Meta-Analyses aren’t just for Psych Bull Typically think of meta-analyses as huge undertakings Most of same techniques can be applied to small sets of studies Even just two studies
Combining Evidence Don’t ignore the fact that data come from different studies
Simpson’s Paradox
Combining Evidence Don’t ignore the fact that data come from different studies Individual Patient Data (IPD) meta-analysis Uses all raw data with clustering for study/trial Multilevel model Meta-analysis (aggregate data meta-analysi) Each study provides aggregate effect size estimates
Why might we combine evidence? Get a more precise estimate of the effect size Figure out if the variability we’re seeing is real or chance variability Under powered individuals studies can gain power in the aggregate
Meta-Analysis 101 Calculates the average effect size and it’s confidence intervals from a set of studies Average of the studies is weighted so that more informative studies affect the average more Can also get information about heterogeneity of effect size
Meta-Analysis 101 Assumes the effect sizes are independent One effect size per study Apples to Apples comparisons Study 1: 1 piece chocolate vs. 5 pieces of chocolate -> happiness Study 2: (1 vs. 5 pieces choc) x (crappy vs. high quality) -> happiness What would we meta-analyze? Two types: Fixed-effect or Random-effects
Fixed Effects Meta-Analysis Assumes that all studies have the same population effect size All variation we see from study to study is due simply to purely to sampling error Average weighted by 1/variance of each effect size More precise effect sizes get more weight Generally this means that larger studies get more weight Tells you the average of these studies Doesn’t justify generalizing to studies outside your sample
Random Effects Meta-analysis Allows for the possibility that you’re drawing from heterogenous population effect sizes Variability due to sampling error and real differences in effect sizes Gives you some measures/tests of variability Weighting is a bit more complicated, but same general principle applies 1/(SE^2 + tau^2) tau^2 is population variability in effect sizes Allows you to generalize to studies outside of your sample
Which to choose? Theoretical considerations Power considerations Outlier considerations What happens if I choose incorrectly?
What tools are out there? Various R packages SPSS macro META Some shinyapps (e.g. https://blakemcshane.shinyapps.io/spmeta/) We’re mostly going to use R because it’s free and you can save the script
Metafor package Flexible package that can calculate sample sizes, run meta-analyses, and graph results With good documentation! https://www.rdocumentation.org/packages/metafor/versions/1.9-9 http://www.metafor-project.org/doku.php Highly functioned, which means many options to sift through, but functions themselves are pretty easy to run
Example 1 - Between Subjects T-tests 4 between studies t-tests What we’ll need: Means, Standard Deviations, and the n per group for each study Put this in a ‘data frame’ with each study as it’s own row
Output Notes SMD is actually Hedges G, not Cohen’s D Cohen’s d slightly underestimates population variance, Hedge’s g is correction for this Bias larger in smaller samples
Output Notes SMD is actually Hedges G, not Cohen’s D Cohen’s d slightly underestimates population variance, Hedge’s g is correction for this Bias larger in smaller samples Keep your small sample in mind when interpreting heterogeneity information
Output Notes SMD is actually Hedges G, not Cohen’s D Keep sample in mind when interpreting heterogeneity Card (2012) Applied Meta-Analysis for Social Science Research
Output Notes SMD is actually Hedges G, not Cohen’s D Keep sample in mind when interpreting heterogeneity Keep sample/N per study in mind when interpreting overall results
https://medium.com/towards-data-science/how-to-calculate-statistical-power-for-your-meta-analysis-e108ee586ae8
Example 1 - Practice Try it with ttest_exp2.csv
SPSS Option Davis Wilson Macros INCLUDE ‘U:\MEANES.SPS’. http://mason.gmu.edu/~dwilsonb/ma.html INCLUDE ‘U:\MEANES.SPS’. MEANES ES = Hedges_g/W=Fixed_weight/ Model = REML. W needs to be inverse variance Will need to calculate ES and Variances yourself
SPSS Output Tau = population variability in ES
Example 2 3 studies, all correlations What we’ll need Correlation for each study N for each study
More complicated designs... HERE BE DRAGONS!
Example 3 4 studies, 2 between subjects, 2 within subjects tests
Example 4 Some things to think through: Effect sizes need to be in the same metric Within and Between ES typically use different SD measures, so in different metrics Which version makes most theoretical sense? Raw score or change score? With or without correlation? Which standard deviation makes the most sense? Morris & Deshon (1997)
Top who is, basically, the cohen’s d we know and love Using pre test scores because it’s assumed you have an experimental/control, and so control should be the ‘natural’ SD. Using pooled SD for within subjects has unknown variances, so problematic (or did when they wrote this article). If don’t meet homogeneity of variance assumption, then might want to use Bonett (2007), the math is uninviting, at best, but escalc will do it for you
Why am I showing you this slide Why am I showing you this slide? Because look at the last line - need ot know p (population correlation) if you don’t have a great estimate of p then turning independent *into* repeated meaures is a bit tricky. Coudl do a sensitivity analysis, or could just turn it into raw scores
Example 5 - Multiple Outcomes Per Study 7 studies, two continuous outcomes per study Does watching GGBO increase liking of desserts? Outcomes use the same subjects, so they are correlated Need to take this into account somehow Usually this is a pain but...WE HAVE THE CORRELATION!
WARNING GARBAGE IN, GARBAGE OUT
Warnings Like regular meta-analyses, p-hacking/selective reporting of studies will mess of results Will lead to invalid, inflated estimates Best cases would be to: Pre-register studies and then meta-analyse these Pre-register a prospective meta-analysis
What to report? Fixed or Random What effect size specification you used How you dealt with dependencies Effect size and CIs, measure of heterogeneity
Best Case: Post Code and Aggregate Data What to report? Fixed or Random What effect size specification you used How you dealt with dependencies Effect size and CIs, measure of heterogeneity Best Case: Post Code and Aggregate Data
Resources Goh, Hall, & Rosenthal (2016) Morris & Deshon (2002) Combining within and between studies http://www.metafor-project.org/doku.php/analyses Fantastic examples