Intro to Single Paper Meta-Analyses

Slides:



Advertisements
Similar presentations
Effect Size Mechanics.
Advertisements

1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 9: Hypothesis Tests for Means: One Sample.
BHS Methods in Behavioral Sciences I
PSY 1950 Confidence and Power December, Requisite Quote “The picturing of data allows us to be sensitive not only to the multiple hypotheses that.
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
Heterogeneity in Hedges. Fixed Effects Borenstein et al., 2009, pp
Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,
Bootstrapping applied to t-tests
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Overview of Meta-Analytic Data Analysis
Chapter Eleven Inferential Tests of Significance I: t tests – Analyzing Experiments with Two Groups PowerPoint Presentation created by Dr. Susan R. Burns.
Meta-Analyses: Combining, Comparing & Modeling ESs inverse variance weight weighted mean ES – where it starts… –fixed v. random effect models –fixed effects.
The Campbell Collaborationwww.campbellcollaboration.org Introduction to Robust Standard Errors Emily E. Tanner-Smith Associate Editor, Methods Coordinating.
Stats Lunch: Day 7 One-Way ANOVA. Basic Steps of Calculating an ANOVA M = 3 M = 6 M = 10 Remember, there are 2 ways to estimate pop. variance in ANOVA:
Ch 8 Estimating with Confidence. Today’s Objectives ✓ I can interpret a confidence level. ✓ I can interpret a confidence interval in context. ✓ I can.
Basic Meta-Analyses Transformations Adjustments Outliers The Inverse Variance Weight Fixed v. Random Effect Models The Mean Effect Size and Associated.
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
Statistical Power 1. First: Effect Size The size of the distance between two means in standardized units (not inferential). A measure of the impact of.
Statistical Applications for Meta-Analysis Robert M. Bernard Centre for the Study of Learning and Performance and CanKnow Concordia University December.
1 rules of engagement no computer or no power → no lesson no SPSS → no lesson no homework done → no lesson GE 5 Tutorial 5.
Multilevel Linear Models Field, Chapter 19. Why use multilevel models? Meeting the assumptions of the linear model – Homogeneity of regression coefficients.
Comparing Samples. Last Time I talked about what could go wrong in an experiment where you compared a sample mean against a population with a known population.
The Campbell Collaborationwww.campbellcollaboration.org C2 Training: May 9 – 10, 2011 Introduction to meta-analysis.
Advanced Meta-Analyses Heterogeneity Analyses Fixed & Random Efffects models Single-variable Fixed Effects Model – “Q” Wilson Macro for Fixed Efffects.
ANOVA: Analysis of Variance.
KNR 445 Statistics t-tests Slide 1 Introduction to Hypothesis Testing The z-test.
Review - Confidence Interval Most variables used in social science research (e.g., age, officer cynicism) are normally distributed, meaning that their.
Chapter 10 The t Test for Two Independent Samples
Sample Size Determination
Biostatistics Case Studies Peter D. Christenson Biostatistician Session 3: Missing Data in Longitudinal Studies.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
I-squared Conceptually, I-squared is the proportion of total variation due to ‘true’ differences between studies. Proportion of total variance due to.
Exploring Group Differences
Comparing Two Proportions
Comparing Systems Using Sample Data
Dependent-Samples t-Test
CHAPTER 9 Testing a Claim
CHAPTER 8 Estimating with Confidence
Sample Size Determination
H676 Week 3 – Effect Sizes Additional week for coding?
Confidence Intervals.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 20th February 2014  
AP Biology Intro to Statistics
Meta-analysis: Conceptual and Methodological Introduction
Using the t-distribution
CHAPTER 8 Estimating with Confidence
Hypothesis Testing and Confidence Intervals (Part 2): Cohen’s d, Logic of Testing, and Confidence Intervals Lecture 9 Justin Kern October 17 and 19, 2017.
Practice & Communication of Science
Testing for moderators
This Week Review of estimation and hypothesis testing
Meta-analysis statistical models: Fixed-effect vs. random-effects
Lecture 4: Meta-analysis
12 Inferential Analysis.
Comparing Two Proportions
CHAPTER 9 Testing a Claim
Comparing Two Proportions
I. Statistical Tests: Why do we use them? What do they involve?
Inferences about Population Means
Reasoning in Psychology Using Statistics
12 Inferential Analysis.
Psych 231: Research Methods in Psychology
Reasoning in Psychology Using Statistics
Effect sizes, power, and violations of hypothesis testing
CHAPTER 8 Estimating with Confidence
1. Homework #2 (not on posted slides) 2. Inferential Statistics 3
Psych 231: Research Methods in Psychology
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Meta-analysis in R: An introductory guide
Presentation transcript:

Intro to Single Paper Meta-Analyses Courtney Soderberg Center for Open Science Statistical and Methodological Consultant

A Hypothetical Set of Studies Study Number T-Statistic P-Value N Cohen’s D 1 2.49 .0138 158 .3961 2 3.98 .0001 .6339 3 .86 .3887 .1375 4 1.41 .1611 .2241

‘Imperfect’ sets of studies are pretty likely Sampling Error What’s the likelihood of getting 4 significant results if all have 80% power? .8^4 = 41%

Sampling Distributions 30% Power 90% Power

What is a researcher to do? Hide the non-significant studies - NO! Throw a bunch of covariates at them - NO! Cry and drawn their sorrows in wine - ok, but not needed Pre-register a highly powered 5th study to ‘decide’ whether they have an effect or not Combine your evidence - YAY!

Meta-Analyses aren’t just for Psych Bull Typically think of meta-analyses as huge undertakings Most of same techniques can be applied to small sets of studies Even just two studies

Combining Evidence Don’t ignore the fact that data come from different studies

Simpson’s Paradox

Combining Evidence Don’t ignore the fact that data come from different studies Individual Patient Data (IPD) meta-analysis Uses all raw data with clustering for study/trial Multilevel model Meta-analysis (aggregate data meta-analysi) Each study provides aggregate effect size estimates

Why might we combine evidence? Get a more precise estimate of the effect size Figure out if the variability we’re seeing is real or chance variability Under powered individuals studies can gain power in the aggregate

Meta-Analysis 101 Calculates the average effect size and it’s confidence intervals from a set of studies Average of the studies is weighted so that more informative studies affect the average more Can also get information about heterogeneity of effect size

Meta-Analysis 101 Assumes the effect sizes are independent One effect size per study Apples to Apples comparisons Study 1: 1 piece chocolate vs. 5 pieces of chocolate -> happiness Study 2: (1 vs. 5 pieces choc) x (crappy vs. high quality) -> happiness What would we meta-analyze? Two types: Fixed-effect or Random-effects

Fixed Effects Meta-Analysis Assumes that all studies have the same population effect size All variation we see from study to study is due simply to purely to sampling error Average weighted by 1/variance of each effect size More precise effect sizes get more weight Generally this means that larger studies get more weight Tells you the average of these studies Doesn’t justify generalizing to studies outside your sample

Random Effects Meta-analysis Allows for the possibility that you’re drawing from heterogenous population effect sizes Variability due to sampling error and real differences in effect sizes Gives you some measures/tests of variability Weighting is a bit more complicated, but same general principle applies 1/(SE^2 + tau^2) tau^2 is population variability in effect sizes Allows you to generalize to studies outside of your sample

Which to choose? Theoretical considerations Power considerations Outlier considerations What happens if I choose incorrectly?

What tools are out there? Various R packages SPSS macro META Some shinyapps (e.g. https://blakemcshane.shinyapps.io/spmeta/) We’re mostly going to use R because it’s free and you can save the script

Metafor package Flexible package that can calculate sample sizes, run meta-analyses, and graph results With good documentation! https://www.rdocumentation.org/packages/metafor/versions/1.9-9 http://www.metafor-project.org/doku.php Highly functioned, which means many options to sift through, but functions themselves are pretty easy to run

Example 1 - Between Subjects T-tests 4 between studies t-tests What we’ll need: Means, Standard Deviations, and the n per group for each study Put this in a ‘data frame’ with each study as it’s own row

Output Notes SMD is actually Hedges G, not Cohen’s D Cohen’s d slightly underestimates population variance, Hedge’s g is correction for this Bias larger in smaller samples

Output Notes SMD is actually Hedges G, not Cohen’s D Cohen’s d slightly underestimates population variance, Hedge’s g is correction for this Bias larger in smaller samples Keep your small sample in mind when interpreting heterogeneity information

Output Notes SMD is actually Hedges G, not Cohen’s D Keep sample in mind when interpreting heterogeneity Card (2012) Applied Meta-Analysis for Social Science Research

Output Notes SMD is actually Hedges G, not Cohen’s D Keep sample in mind when interpreting heterogeneity Keep sample/N per study in mind when interpreting overall results

https://medium.com/towards-data-science/how-to-calculate-statistical-power-for-your-meta-analysis-e108ee586ae8

Example 1 - Practice Try it with ttest_exp2.csv

SPSS Option Davis Wilson Macros INCLUDE ‘U:\MEANES.SPS’. http://mason.gmu.edu/~dwilsonb/ma.html INCLUDE ‘U:\MEANES.SPS’. MEANES ES = Hedges_g/W=Fixed_weight/ Model = REML. W needs to be inverse variance Will need to calculate ES and Variances yourself

SPSS Output Tau = population variability in ES

Example 2 3 studies, all correlations What we’ll need Correlation for each study N for each study

More complicated designs... HERE BE DRAGONS!

Example 3 4 studies, 2 between subjects, 2 within subjects tests

Example 4 Some things to think through: Effect sizes need to be in the same metric Within and Between ES typically use different SD measures, so in different metrics Which version makes most theoretical sense? Raw score or change score? With or without correlation? Which standard deviation makes the most sense? Morris & Deshon (1997)

Top who is, basically, the cohen’s d we know and love Using pre test scores because it’s assumed you have an experimental/control, and so control should be the ‘natural’ SD. Using pooled SD for within subjects has unknown variances, so problematic (or did when they wrote this article). If don’t meet homogeneity of variance assumption, then might want to use Bonett (2007), the math is uninviting, at best, but escalc will do it for you

Why am I showing you this slide Why am I showing you this slide? Because look at the last line - need ot know p (population correlation) if you don’t have a great estimate of p then turning independent *into* repeated meaures is a bit tricky. Coudl do a sensitivity analysis, or could just turn it into raw scores

Example 5 - Multiple Outcomes Per Study 7 studies, two continuous outcomes per study Does watching GGBO increase liking of desserts? Outcomes use the same subjects, so they are correlated Need to take this into account somehow Usually this is a pain but...WE HAVE THE CORRELATION!

WARNING GARBAGE IN, GARBAGE OUT

Warnings Like regular meta-analyses, p-hacking/selective reporting of studies will mess of results Will lead to invalid, inflated estimates Best cases would be to: Pre-register studies and then meta-analyse these Pre-register a prospective meta-analysis

What to report? Fixed or Random What effect size specification you used How you dealt with dependencies Effect size and CIs, measure of heterogeneity

Best Case: Post Code and Aggregate Data What to report? Fixed or Random What effect size specification you used How you dealt with dependencies Effect size and CIs, measure of heterogeneity Best Case: Post Code and Aggregate Data

Resources Goh, Hall, & Rosenthal (2016) Morris & Deshon (2002) Combining within and between studies http://www.metafor-project.org/doku.php/analyses Fantastic examples