Wim Van den Noortgate Katholieke Universiteit Leuven, Belgium Belgian Campbell Group Workshop systematic reviews Leuven June 4-6,
1. Modelling heterogeneity 2. Publication bias 2
3
Growing popularity of evidence-based thinking: Decisions in practice and policy should be based on scientific research about the effects of these decisions/interventions But: conflicting results (failures to replicate), especially in social sciences! 4
1. The role of chance - in measuring variables - in sampling study participants 2. Study results may be systematically biased due to - the way variables are measured - the way the study is set up 3. Studies differ from each other (e.g., in the kind of treatment, the duration of treatment, the dependent variable, the characteristics of the investigated population, …) 5
Differences between observed effect sizes due to chance only Population effect sizes all equal 6
7
Rough guidelines: 0% to 40%: might not be important 30% to 60%: may represent moderate heterogeneity 50% to 90%: may represent substantial heterogeneity 75% to 100%: considerable heterogeneity Interpretation based on both I² and heterogeneity test! = percentage of variability in effect estimates due to heterogeneity rather than chance 8
( Raudenbush, S. W. (1984). Magnitude of teacher expectancy effects on pupil IQ as a function of the credibility of expectancy induction: A synthesis of findings from 18 experiments. Journal of Educational Psychology, 76, ) Study Weeks prior contact gjgj Rosenthal et al. (1974) Conn et al. (1968) Jose & Cody (1971) Pellegrini & Hicks (1972) Evans & Rosenthal (1969) Fielder et al. (1971) Claiborn (1969) Kester & Letchworth (1972) Maxwell (1970) Carter (1970) Flowers (1966) Keshock (1970) Henrickson (1970) Fine (1972) Greiger (1970) Rosenthal & Jacobson (1968) Fleming & Anttonen (1971) Ginsburg (1970)
Q = 35,83, df = 18, I²= 50 %, p =
Not always wise: make set of studies more homogeneous! Can help to say something about ‘fruit’ Can help to make detailed conclusions: Does the effect depend on the kind of fruit? 11
12
Population effect size possibly depends on study category Differences between observed effect sizes within the same category due to chance only 13
( Raudenbush, S. W. (1984). Magnitude of teacher expectancy effects on pupil IQ as a function of the credibility of expectancy induction: A synthesis of findings from 18 experiments. Journal of Educational Psychology, 76, ) Study Weeks prior contact gjgj Rosenthal et al. (1974) Conn et al. (1968) Jose & Cody (1971) Pellegrini & Hicks (1972) Evans & Rosenthal (1969) Fielder et al. (1971) Claiborn (1969) Kester & Letchworth (1972) Maxwell (1970) Carter (1970) Flowers (1966) Keshock (1970) Henrickson (1970) Fine (1972) Greiger (1970) Rosenthal & Jacobson (1968) Fleming & Anttonen (1971) Ginsburg (1970)
15
Total variability in observed ES’s Variability within groups Variability between groups =+ H 0 : Q T ~ ² k-1 H 0 : Q B ~ ² J-1 H 0 : Q W ~ ² k-J Q T : homogeneity test Q B : moderator test Q W : test for within group homogeneity 16
Q total =Q Between +Q within ²² df18315 p
= Mean ES REM 18
Population effect size possibly depends on continuous study characteristic e.g., After taking into account this study characteristic, differences between observed effect sizes due to chance only 19
Initial effect is moderate (0.41, p <.001), but decreases with increasing prior contact (with per week, p <.001) 20
Population effect size possibly varies randomly over studies Differences between observed effect sizes are due to - chance - ‘true’ differences 21
22
Population effect size possibly depends on study category Differences between observed effect sizes within the same category are due to - chance - ‘true’ differences 23
Population effect size possibly depends on continuous study characteristic e.g., After taking into account this study characteristics, differences between observed effect sizes are due to - chance - ‘true’ differences 24
Random effects model with moderators: ◦ The least restrictive model: allows moderator variables & random variation ◦ Also called a ‘Mixed effects model’ 25
FEMREM Without moderator Categorical moderator Continuous moderator 26
1. Is there an overall effect? 2. How large is this effect? 3. Is the effect the same in all studies? 4. How large is the variation over studies? 5. Is this variation related to study characteristics? 6. Is there variation that remains unexplained? 7. What is the effect in the specific studies? 27
( Raudenbush, S. W. (1984). Magnitude of teacher expectancy effects on pupil IQ as a function of the credibility of expectancy induction: A synthesis of findings from 18 experiments. Journal of Educational Psychology, 76, ) Study Weeks prior contact gjgj Rosenthal et al. (1974) Conn et al. (1968) Jose & Cody (1971) Pellegrini & Hicks (1972) Evans & Rosenthal (1969) Fielder et al. (1971) Claiborn (1969) Kester & Letchworth (1972) Maxwell (1970) Carter (1970) Flowers (1966) Keshock (1970) Henrickson (1970) Fine (1972) Greiger (1970) Rosenthal & Jacobson (1968) Fleming & Anttonen (1971) Ginsburg (1970)
ParameterREM Fixed Intercept0.084 (0.052) Between study variance0.019 (0.023) 29
ParameterREMMEM Fixed Intercept0.084 (0.052)0.41 (0.087) Weeks-0.16 (0.036) Between study variance0.019 (0.023)0.00 (-) 30
1. Models can include multiple moderators 2. REM assumes randomly sampled studies 3. REM requires enough studies 4. Association (over studies) ≠ causation! Be aware of potential confounding moderators (studies are not ‘RCT participants’!) 31
Dependencies between studies ◦ E.g., research group, country, … Multiple effect sizes per study ◦ Several samples ◦ Same sample but, e.g., several indicator variables 32
Ignoring dependence? NO! Avoiding dependence ◦ (Randomly choosing one ES for each study) ◦ Averaging ES’s within a study ◦ Performing separate meta-analyses for each kind of treatment or indicator Modelling dependence ◦ Performing a multivariate meta-analysis, accounting for sampling covariance. ◦ Performing a three level analysis 33
34
( Egger, M. D., & Smith, G. (1998). Meta-analysis. Bias in location and selection of studies. British Medical Journal, 316,
Proportion of publication within 5 years after conference: 81 % (of 233 trials) for significant results 68 % (of 287 trials) for nonsignificant results ( Kryzanowska, M. K., Pintilie, M., & Tennock, I. F. (2003). Factors associated with failure to publish large randomized trials presented at an oncology meeting. Journal of the American Medical Association, 290, ). 36
37
38
Thorough search for all relevant published and unpublished study results a)Articles b)Books c)Conference papers d)Dissertations e)(Un)finished research reports f)… 39
- outliers - detection using graphs (or tests) - conduct analysis with and without outliers - calculation effect sizes : several analyses - publication bias: analysis with and without unpublished results - design & quality: compare results from studies with strong design or good quality, with those of all studies - researcher: literature search, effect size calculation, coding quality, …, done by two researchers -…-… 40
41
Spreadsheets (e.g., MS Excel, …) Some general statistical software (note: often not possible to fix the sampling variance) SAS Proc Mixed, Splus, R Metafor package, … Software for meta-analysis (note: often not MEM; often only one moderator!) CMA ( RevMan, … Software for multilevel/mixed models HLM, MLwiN, … 42
43
Cooper, H., Hedges, L. V., & Valentine, J. C. (Eds.) (2009). The handbook of research synthesis and meta-analysis. New York: The Russell Sage Foundation. Lipsey, M. W., & Wilson, D. B. (2001). Practical meta- analysis. Thousand Oaks, CA: Sage. Van den Noortgate, W., & Onghena, P. (2005). Meta- analysis. In B. S. Everittt, & D. C. Howell (Eds), Encyclopedia of Statistics in Behavioral Science (Vol. 3 pp ). Chichester, UK: John Wiley & Sons. 44
Site of David Wilson Site of William Shadish faculty.ucmerced.edu/wshadish/ 45