Funded through the ESRC’s Researcher Development Initiative Department of Education, University of Oxford Sessions : Effect Size Calculation
2
The effect size makes meta-analysis possible It is based on the “dependent variable” (i.e., the outcome) It standardizes findings across studies such that they can be directly compared Any standardized index can be an “effect size” (e.g., standardized mean difference, correlation coefficient, odds-ratio), but must be comparable across studies (standardization) represent magnitude & direction of the relationship be independent of sample size Different studies in same meta-analysis can be based on different statistics, but have to transform each to a standardized effect size that is comparable across different studies
XLS Sample size, significance and d effect size
5 XLS ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 5 Sample size, significance and d effect size
6 XLS ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 6 Simulate ds on homemade calculator (ES.xls) Change direction of effects Change Ns (equal or same?) Change SDs
79% of T above69% of T above 7 Effect size as proportion in the Treatment group doing better than the average Control group person 57% of T above = Control = Treatment
8 Effect size as proportion of success in the Treatment versus Control group (Binomial Effect Size Display = BESD): Success: 55% of T, 45% of C = Control = Treatment Success: 62% of T, 38% of C Success: 68% of T, 32% of C
Long focus on significance level (safe-guarding against Type I ( ) error) – today focus on practical and meaningful significance. Cohen, J. (1994). The earth is round (p <.05), American Psychologist, 49, 997– Why effect size?
10 A short history of the effect size (Huberty, 2002; see also Olejnik & Algina, 2000 for review of effect sizes)
Power: “Finding what is out there” Type II ( ) error “not finding what is out there” Power (1 – ): the probability of rejecting a false H 0 hypothesis Power of.80 or.90 in primary research 11 Power and effect size
12 Power, sought effect size, at significance level =.05 in primary research (prior to conducting study)
13 How meaningful is a “small” effect size? A small effect size changed the course of an RCT in 1987: placebo group participants were given aspirin instead (see Rosenthal, 1994, p. 242) XLS
Within the one meta-analysis, can include studies based on any combination of statistical analysis (e.g., t-tests, ANOVA, correlation, odds-ratio, chi- square, etc). The “art” of meta-analysis is how to compute effect sizes based on non-standard designs and studies that do not supply complete data (see Lipsey&Wilson_AppB.pdf). Convert all effect sizes into a common metric based on the “natural” metric given research in the area. E.g. d, r, OR 14 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg)
Standardized mean difference Group contrast research Treatment groups Naturally occurring groups Inherently continuous construct Correlation coefficient Association between inherently continuous constructs Odds-ratio Group contrast research Treatment or naturally occurring groups Inherently dichotomous construct Regression coefficients and other multivariate effects Requires access to covariance-variance (correlation) matrices for each included study 15
16 Means and standard deviations Correlations P-values F -statistics d t -statistics “other” test statistics Almost all test statistics can be transformed into an standardized effect size “d” ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 16 Calculating ds (1)
Represents a standardized group contrast on an inherently continuous measure Uses the pooled standard deviation Commonly called “d” ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) Calculating ds (1)
Cohen’s d Hedge’s g Glass’s 18 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) Various contrast effect sizes
19 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) Calculating d (1) using Ms, SDs and ns Remember to code treatment effect in positive direction!
20 ES_calculator.xls
21 Calculating d (2) using ES calculator, using Ms, ns, and t-value
22 Calculating d (3) using ES calculator, using ns, and t-value The treatment group scored higher than the control group at Time 2 (t [28] = 4.11; p<.001). From sample description we learn that n 1 = n 2
Hedges proposed a correction for small sample size bias (ns < 20) Must be applied before analysis 23 Calculating d (3) correcting for small sample bias
24 Calculating d (4) using ES calculator, using ns, and F-value Remember: in a two-group ANOVA F = t 2
25 Calculating d (5) using ES calculator, using p-value “The mean-level comparison was not significant (p =.53)”
26 T-test table df = (n1 + ns –2) Sometimes authors only report e.g., p<.01 (n = 22). If so, use a conservative approach to reading the t- test table. NOTE: When p = n.s. some researchers code d = 0 in data base
27 Example dataset so far (1) (ES_enter.sav):
28 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) Use all available tools for calculating the following 5 effect sizes ES 6: M T = 21, M C = 20, n T = 60, n C = 60, t =.55 ES 7: M T = 103.5, M C = 100, SD T = 22.0, SD C = 18.5, n T = 45, n C = 35, ES 8: n T = 45, n C = 40, p <.05 ES 9: n T = 100, n C = 120, F = 8.73 ES 10: n T = 200, n C = 160, t = 5.66 (see electronic document: “Correct ds for 5 effect sizes.doc”)
29 Example dataset so far (2) (ES_enter.sav):
30 Calculating d (11) using ES calculator, using number of successful outcomes per group
31 Calculating d (11) using ES calculator, using number of successful outcomes per group
32 Calculating d (12) using ES calculator, using proportion of successes per group (53% vs. 48.5%)
33 Calculating d (13) using paired t-test (only one experimental group; “each person their own control” ) Don’t use the SD of the change score! r = correlation between Time 1 and Time 2
34 Calculating d (14) using paired t-test (only one experimental group) n (pairs) = 90, t-value = 6.5, r =.70
35 Calculating d (15) “The 20 participants increased.84 z-scores between time 1 and time 2 (p<.01)” ES =.84 Correct for small sample bias
36 Example dataset so far 3 (ES_enter.sav): Method difference: mean contrast and gain scores
37 Summary of equations from Lipsey & Wilson (2001) ( for more formulae see Lipsey & Wilson Appendix B )
The effect sizes are weighted by the inverse of the variance to give more weight to effects based on larger sample sizes Variance for mean level comparison is calculated as The standard error of each effect size is given by the square root of the sampling variance SE = v i 38 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 38 Weighting for mean-level differences
39 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 39 Enter_w.xls
SE for gain scores Inverse variance for gain scores 40 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 40 Weighting for gain scores T1 and T2 scores are dependent so we need to get correlation between T1 and T2 into equation (not always reported)
41 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 41 XLS Enter_w.xls
42 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 42 Compute the weighted mean ES and s.e. of the ES in SPSS (var_ofES.sps) (1)
43 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 43 Compute the weighted mean ES and s.e. of the ES in SPSS (var_ofES.sps) (2)
Weight the ES by the inverse of the s.e. The average ES Standard error of the ES 44 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 44 Compute the weighted mean ES and s.e. of the ES
45 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 45 Enter_w.xls
46 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 46
Does average of ES converge toward the average of the largest (n) study? 47 Funnel plot for x = sample size, y = ES 95% C.I. = ±1.96 * s.e. 99% C.I. = ±2.58 * s.e. 99.9% C.I. = ±3.29 * s.e.
ES in smaller sample has larger standard error (s.e.) 48 Funnel plot including s.e. of ES
N = ‘size’ = ‘mean’ = ‘effect size’ Population The “likely” population parameter is the sample parameter ± uncertainty Standard errors (s.e.) Confidence intervals (C.I.) Interval estimates 49 Sample n = ‘size’ m = ‘mean’ d = ‘effect size’ ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 49
Means and standard deviations (d) 2 2 P-values F -statistics r t -statistics “other” test statistics Almost all test statistics can be transformed into an standardized effect size “r” ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 50 Calculating rs
51 Correlations / relationships between variables r xy Pearson’s product moment coefficient (continuous continuous) R pb Bi-serial correlation (dichotomous continuous) 2 (dichotomous dichotomous) r s Spearman’s rank-order coefficient (ordinal ordinal) And others, e.g., coefficient, Odds-Ratio (OR) Cramer’s V, Contingency coefficient C Tetrachoric and polychoric correlations …. (etc)
52 Bias when dichotomising continuous variables X or Y are both “truly” continuous, but in the study either is dichotomised X = continuous, Y =50/50 split gives an r pb that is 80% of its value, had it been continuous X or Y are both “truly” continuous, but both are dichotomised Maximum value of if x = 30/70 split and Y = 50/50 split is =.33
53 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 53 Calculating rs from d (1) r can be used in all situations d can, but d cannot be used in all situations where r is appropriate
If inherently continuous X and Y, mean-contrast is a better option than r pb 54 Calculating r pb (2)
55 Calculating r (3) from t-value Appropriate for both independent and dependent samples t-test values Calculating r (4) from 2 -value
56 Sources of error Cf. Structural Equation Model (circle = latent/ unobserved construct, rectangle = manifest/ observed variable) Manifest (observed) variable x Manifest (observed) variable y Latent (unobserved) X Latent (unobserved) Y r x*y* r xx r yy r xy
57 Alternatively: transform rs into Fisher’s Z r -transformed rs, which are more normally distributed ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 57
58 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) rr.xls
59
60 Calculating OR (chi2.sps)
61
62
63 Pearson’s 5 studies escaping Enteric Fever (1904)
64
XLS 65 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg)
66
67
68 Each study is one line in the data base Effect sizeDurationSample sizes Reliability of the instrument Variance of the effect size
69 Organising effect sizes within study (1) “Flat dataset”
70 Organising effect sizes within study (2) “hierarchical dataset” (effect sizes nested within study)
71 Organising effect sizes within study (3) “hierarchical dataset”, with one construct per DV per study
72 Organising effect sizes within study (4) “hierarchical dataset”, with one DV per study NOTE: alternative to aggregating ESs within study: multilevel meta- analysis
73 Exercise: effect size calculation (4 method/result extracts from journals): Do boys have higher general (global) self-concept (self-worth) than girls? Decide which effect size to use (d, r, OR)? Calculate appropriate effect sizes
74 Effect size literature Cohen, J. (1969). Statistical Power Analysis for the Behavioral Sciences, 1st Edition, Lawrence Erlbaum Associates, Hillsdale (2nd Edition, 1988). Cohen, J. (1994). The earth is round (p <.05), American Psychologist, 49, 997–1003. Gwet, K. (2001). Handbook of interrater reliability. How to estimate the level of agreement between two of multiple raters. Gaithersburg: STATAXIS Publishing. Huberty, C. J. (2002). A history of effect size indices. Educational and Psychological Measurement, 62, McCartney, K., & Rosenthal, R. (2000). Effect size, practical importance, and social policy for children. Child Development, 71, Olejnik, S., & Algina, J. (2000). Measures of effect size for comparative studies: Applications, interpretations, and limitations. Contemporary Educational Psychology, 25,