Effect Sizes for Continuous Variables William R. Shadish University of California, Merced
Indices for Treatment Outcome Studies Correlation coefficient (r) between treatment and outcome Standardized mean difference statistic (d) Either can be transformed into the other, so we will work with d since it is most common. Other indices do exist but are rare in social science meta-analyes.
Estimating d d itself Algebraic equivalents to d Good approximations to d Methods that require intraclass correlation Methods that require ICC and change scores Methods that underestimate effect Note: Italicized methods will be covered in this workshop.
Sample Data Set I: Two Independent Groups TreatmentComparison Mean Standard Deviation Sample Size1010 Correlation between treatment and outcome is r = -.055
Calculating d
Algebraic Equivalent: Between Groups t-test on raw posttest scores,
Algebraic Equivalent: t-test for two matched groups, sample sizes, correlation between groups
Algebraic Equivalent: Two-group between-groups F-statistic on raw posttest scores (Data Set I)
Algebraic Equivalent: Multifactor Between Subjects ANOVA with Two Treatment Conditions 1.Sums of Squares and Degrees of Freedom for all sources, and Marginal Means for Treatment Conditions 2.Mean Squares and Degrees of Freedom for all sources, and Marginal Means for Treatment Conditions 3.Sums of Squares and Degrees of Freedom for all sources, with Cell Means and Cell Sample Sizes 4.Mean Squares and Degrees of Freedom for all sources, with Cell Means and Cell Sample Sizes 5.Cell means, cell sample sizes, the F-statistic for the treatment factor, and the degrees of freedom for the error term 6.F-statistics and degrees of freedom for all sources, sample size for treatment and comparison groups, where treatment factor has only two levels
Example: Sums of Squares and Degrees of Freedom for all sources, and Marginal Means for Treatment Conditions: Data Set II B1B2B3 A A Row B1B2B3 Marginal A (3)(3)(3)(9) A (3)(3)(3)(9) Column Marginal (6)(6)(6)(18) Sum of Squaresdf Mean Square F Probability A B AB Residual Total
Example: Sums of Squares and Degrees of Freedom for all sources, and Marginal Means for Treatment Conditions For a two group one factor ANOVA: For a two factor ANOVA: Which is the same as would have been obtained had Factor B not existed (with equal n per cell),
Algebraic Equivalent : Oneway two-group ANCOVA: Covariance error term, F for covariate, raw score means, and total sample size (Data Set III) Time 1Time 2Change Time 2 Group Mean Group Group
ANCOVA Table, Time 2 as Outcome, Time 1 as covariate Source Sum of SquaresdfMean SquareFSig. Covariate Groups Error Total Note: This table was computed using the unique sum of squares method as defined in SPSS for Windows Version 7.5.
Algebraic Equivalent: Oneway two-group ANCOVA: Covariance error term, F for covariate, raw score means, and total sample size (Data Set III) Which is the same as would have been obtained had the standard method been applied to the Time 2 scores
Algebraic Equivalent: Exact Probability and Sample Sizes If exact p value from t-test or two group F-test Use sample size to get df, which in turn allows you to get exact t statistic Then apply t-test method previously shown From Data Set I – exact probability for t-test was p =.818. –for df = 20-2 = 18, t =.2336 –so d = -.104, same as before
Algebraic Equivalent: r to d To convert r to d uncorrected for small sample bias, using Data Set I: Which is the same as originally obtained using the standard formula for d
Algebraic Equivalent: Raw Data Sometimes raw data is tabled as, say, –Treatment group N = 10: A = 20%, B =20%, C = 30%, D = 20%, and F = 10% –Comparison group N = 10: A = 10%, B = 20%, C = 20%, D = 30%, and F = 20% Create raw data as, say, A = 4, B = 3, C = 2, D = 1, and F = 0 –treatment group is 4, 4, 3, 3, 2, 2, 2, 1, 1, 0 –comparison group is 4, 3, 3, 2, 2, 1, 1, 1, 0, 0 Then d =.377
Good Approximation Three-group or higher between-groups oneway ANOVA on posttest scores: group means, sample sizes, and F-statistic,
Example Data Set IV PosttestGroup Mean.00 Group Group Group G = F = This is similar but not identical to d = using the standard method comparing groups 1 and 2. Difference due to different s p.
Good Approximation Three-group or higher between-groups oneway ANOVA on raw posttest scores: treatment and comparison group means and mean square error. For Data Set IV:
Good Approximations: Two-Factor RM-ANOVA (groups x time) Between-groups mean square error, within- groups mean square error, posttest means, and sample sizes F-ratio for groups, F-ratio for time, cell means and sample sizes F-ratio for groups, F-ratio for group × time interaction, cell means and sample sizes
Example: Data Set V This data set is taken from Winer (1972, p. 525). It presents a two-factor model with factor A as a between subjects factor having two levels, A1 and A2, and factor B as a within-subjects factor having four levels (columns B1 through B4). The raw data are: B1B2B3B4 A A
Example: Data Set V RM-ANOVA Here are the cell means (and sample sizes) for the same data, along with marginals and grand means. Row B1B2B3B4Marginal A (3)(3)(3)(3)(12) A (3)(3)(3)(3)(12) Col = Grand Mean Marginal (6)(6)(6)(6)(24) Repeated Measures ANOVA Table Tests of Within-Subjects Effects SourceSum of Squares df Mean Square F Probability B AB WS Error Tests of Between-Subjects Effects Source Sum of Squares df Mean Square F Probability A BS Error
Between-groups mean square error, within- groups mean square error, posttest means, and sample sizes: Data Set V Assuming Time 4 is the time point of interest (e.g., it is the posttest, or the followup), then:
Methods that underestimate effect size I Results reported as verbally “significant”, or as p <.05 or <.01 etc., with sample size Use previous method to convert p to t, and then use t to compute d as before. In Data Set I using p <.05, this method would yield t = , yielding d = Underestimates d because t will increase as p decreases, and p =.05 is too high. Be careful to distinguish 1 vs 2 tailed tests.
Methods that underestimate effect size II Results reported only as nonsignificant. Omitting them from the meta-analysis results in an overestimate of average d. A typical solution is to code them as d = 0 (introduces a constant variance problem), but then do sensitivity analyses. More sophisticated solutions exist such as maximum likelihood imputation.
Discussion Many more methods exist The standard error for all but d and its algebraic equivalents are typically unknown Whether to use the approximations or not involves the same tradeoffs as with results reported only as nonsignificant (missing effect sizes vs approximate results) When doing a meta-analysis, good practice is to code effect size calculation method, and then explore its effects on outcome.
Computer Programs Lipsey and Wilson’s excel macro (free at ES program (purchase at For more meta-analytic software, see Analysis%20Links.htm.