Evaluation Research and Meta-analysis
Significance and effect sizes What is the problem with just using p-levels to determine whether one variable has an effect on another? Be careful with comparisons--sample results: For boys, r (87) = .31, p = .03 For girls, r (98) = .24, p = .14 How does sample size affect effect size? Significance? Why are effect sizes important? What is the difference between statistical, practical, and clinical significance?
What should you report? 2 group comparison—treatment vs. control on anxiety symptoms 3 group comparison—positive prime vs. negative prime vs. no prime on number of problems solved 2 continuous variables—relationship between neuroticism and goal directedness 3 continuous variables—anxiety as a function of self-esteem and authoritarian parenting 2 categorical variables—relationship between answers to 2 multiple choice questions
Narrative vs. quantitative reviews When was the first meta-analysis? When was the term first used? What are the advantages of quantitative reviews? What are problems with them?
Steps to meta-analysis
1. Define your variables/question 1 df contrasts What is a contrast?
2. Decide on inclusion criteria What factors do you want to consider here?
3. Collect studies systematically Where do you find studies? File drawer problem
4. Check for publication bias Rosenthal’s fail-safe N # studies needed at p < .05= (K/2.706) (K(mean Z squared) = 2.706) Z = Z for that level of p K = number of studies in meta-analysis Funnel plot (Egger’s test) Rank correlation test for pub bias
Fig. 3. Funnel plots of 11 (subsets of) meta-analyses from 2011 and Greenwald, Poehlman, Uhlman, and Banaij (2009). Marjan Bakker et al. Perspectives on Psychological Science 2012;7:543-554 Copyright © by Association for Psychological Science
What can you do if publication bias is a problem? Trim and fill Sensitivity analysis Weight studies PET-PEESE (Figure 1; Elk et al., 2015) Bayesian approaches P-curve analysis (Figure 1; Simmons 7 Simonsohn, 2017) www.p-curve.com (detour: power pose response)
5. Calculate effect sizes If there is more than 1 effect per study, what do you do? What does the sign mean on an effect size? What are small, medium, and large effects? How can you convert from one to another? r or d? http://www.soph.uab.edu/Statgenetics/People/MBeasley/Courses/EffectSizeConversion.pdf
Families of effect sizes—d family 2 group comparisons (difference between the means) Table 1; Lakens, 2013 Cohen’s d (with various subscripts) Hedge’s g Glass’s d or delta Within vs. between-participants designs
Families of effect sizes—R family Continuous or multi-group (proportion of variability) Table 2; Lakens, 2013 η2 ηp2 ηG2 ω2 and its parts r, fisher’s z, R2, adjusted R2 difference between η2 and R2 family
Other effect sizes Nonparametric effect sizes Nonnormal data: convert z to r or d Categorical data: Rho Cramer’s V Goodman-Kruskal’s Lambda How can you increase your effect sizes? https://www.frontiersin.org/articles/10.3389/fpsyg.2013.00863/full
CIs How can you calculate confidence intervals around your effect sizes? http://daniellakens.blogspot.com/2014/06/calculating-confidence-intervals-for.html https://thenewstatistics.com/itns/esci/ http://www.cem.org/effect-size-calculator
Interpretation of effect sizes Recommended for at least most important findings Benchmarks? SD units Practical or clinical significance and compare to lit PS or common language effect size U Binomial effect size display Relative risk Odds ratio Risk difference
6. Combine effect sizes When should you do fixed vs. random effects? Should you weight effect sizes, and if so, on what? How can you deal with dependent effect sizes? Hunter and Schmidt method vs. Hedges et al. method
7. Calculate confidence intervals Credibility intervals vs. confidence intervals
8. Look at heterogeneity of effect sizes Chi-square test I2 (measure based on Chi-square) Cochran’s Q Standard deviations of effect sizes Stem and leaf plot Box plot Forest plot
Forest plot
9. Look for moderators What are common moderators you might test? How do you compare moderators?
“little ‘m’ meta-analysis” Comparing and combining effect sizes on a smaller level—when might you want to do this? How would you do it? Average within-cell r’s with fisher z transforms To compare independent r’s: Z = z1-z2/sqrt ((1/n-3) + (1/n-3)) To combine independent r’s: z = z1+z2/2
Write-up Inclusion criteria, search, what effect size Which m-a tech and why Stem and leaf plots of effect sizes (and maybe mods) Forest plots Stats on variability of effect sizes, estimate of pop effect size and confidence intervals Publication bias analyses
Evaluation Research
Terms Evolutionary epistemology Evidence-based practice Systems thinking Dynamical systems approaches Evaluation research
Issues with evaluation research What questions are asked? What methods are used? What unique issues emerge?
Types of evaluation Formative Summative Needs assessment Evaluability assessment Structured conceptualization Implementation evaluation Process evaluation Summative Outcome evaluation Impact evaluation Cost-benefit analysis Secondary analysis Meta-analysis
Methods used for different ?s What is the scope of the problem? How big is the problem? How should we deliver the program? How well did we deliver it? What type of evaluation can we do? Was the program effective? What parts of the program work? Should we continue the program?
Evidence based medicine (Sackett et al.) Convert problem into question Find evidence Evaluate validity, impact, applicability Integrate patient experience and clinical judgment Review evaluation
What does the book author Mean by an “evaluation culture”? Is it a good thing?
Coming up Quantitative article review due today Next week: Presentations for proposal Formal presentations—dress nice, stand up No more than 12 minutes I’ll take notes and send to you Go through your FINAL plan for your study—background, method, expected results, and discussion We might go a little over class time