Download presentation
Presentation is loading. Please wait.
Published byMildred Lester Modified over 8 years ago
1
Effect Sizes CAMPBELL COLLABORATION
2
2 Overview of Effect Sizes Effect Sizes from the d Family Effect Sizes from the r Family Effect Sizes for Categorical Data Connections Between the Effect-Size Metrics Overview
3
3 Meta-analysis expresses the results of each study using a quantitative index of effect size (ES). ESs are measures of the strength or magnitude of a relationship of interest. ESs have the advantage of being comparable (i.e., they estimate the same thing) across all of the studies and therefore can be summarized across studies in the meta-analysis. ESs are relatively independent of sample size. Effect sizes
4
4 An effect size is a quantitative index that represents the results of a study. Effect sizes make study results comparable so that … Effect sizes results can be compared across studies, or results can be summarized across studies. standardized mean differences ( d s), and correlation coefficients ( r s). Examples of effect-size indices include
5
5 A crucial conceptual distinction is between effect-size … Effect sizes estimates, computed from studies (sample effect sizes), and parameters (population or true effect sizes). We want to make inferences about effect-size parameters using effect-size estimates.
6
6 Types of effect size Most reviews use effect sizes from one of three families of effect sizes: the d family, including the standardized mean difference, the r family, including the correlation coefficient, and the odds ratio (OR) family, including proportions and other measures for categorical data.
7
7 Types of effect size Effect size and Sample size ( n ) That is, Test Statistic = f(Effect Size, sample size) Test statistics (e.g., t statistics, F tests, and so on) are not ideal ES because they depend on:
8
8 Types of effect size Studies with the same effect sizes can get different p values, simply because they differ in sample size. Studies with fundamentally different results can get the same p values, because they differ in size. Thus, the p value is a misleading index of effect size. The significance level (a.k.a. the p value) is also not an ideal ES because it depends on the test statistic and n.
9
9 A particular index is chosen to make results from different studies comparable to one another. The choice depends on the... The choice of effect size question of interest for the review, designs of studies being reviewed, statistical analyses that have been reported, and measures of the outcome variable.
10
10 When we have continuous data (means and standard deviations) for two groups, we typically compute a raw mean difference or a standardized difference – an effect size from the d family, When we have correlational data, we typically compute a correlation (from the r family), or When we have binary data (the patient lived or died, the student passed or failed), we typically compute an odds ratio, a risk ratio, or a risk difference. The choice of effect size
11
11 We introduce some notation for a common case – the treatment/control comparison. Let be the mean posttest score in the treatment group, be the mean control-group posttest score, and S Ypooled be the pooled within-groups standard deviation for the Y scores (i.e., the t-test SD). Then we may compute standardized T - C differences using posttest means as Features of most effect sizes
12
12 Remember that all statistical estimators are estimating some parameter. What parameter is being estimated by g post ? The answer is, the population standardized mean difference, usually denoted by the Greek letter delta, where population means and the population SD appear in place of the sample values: Features of most effect sizes
13
13 Some ES indices are biased in small samples. It is common to correct for this small-sample bias. Expected values of effect sizes The posttest effect size g post is biased, with expected value E[ g post ] = δ /c( m ), where c( m )=1 - 3/(4 m -1), and m = n T + n C – 2. In general m is the df for the appropriate t test, here, the two-sample t test.
14
14 So now we can correct for bias: d = c( m )* g post. The expected value of d is δ. The correlation is also biased, and can be corrected via r u = r [1- (1 - r 2 )/(2n-2)]. Proportions are not biased, and do not need correction. Expected values of effect sizes
15
15 Effect-size indices also have variances that can be estimated using data from the individual study from which the ES is obtained. Below we provide the variances of many ES indices, noting that in all cases the variance is an inverse function of the study sample size. Thus smaller studies have larger variances, representing less precise information about the effect of interest. The ES variance is a key component of nearly all statistical analyses used in meta-analysis. Variances of effect sizes
16
16 Often the variances of ES indices are also conditional on (i.e., are functions of) the parameter values. Consider the variance of d: Statistical properties (Variances) which is a function of . Below we introduce transformations that can be used with some ES indices to remove the parameter from the variance (i.e., to stabilize the variances).
17
17 As d increases (becomes more unusual or extreme) the variance also increases. We are more uncertain about extreme effects. The variance also depends on the sample sizes, and as the n s increase, the variance decreases. Large studies provide more precise data; we are more certain about effects from large studies. Variance of the standardized mean difference
18
18 Variances of effect sizes are not typically equal across studies, even if stabilized. This is because most variances depend on sample sizes, and it is rare to have identical-sized samples when we look at sets of studies. Thus, homoscedasticity assumptions are nearly always not met by most meta-analysis data!! This is why we do not use “typical” statistical procedures (like t tests and ANOVA) for most analyses in meta-analysis. Statistical properties (Variances)
19
19 Quick Examples: Common Study Outcomes for Treatment-Control Meta-analyses
20
20 Above we introduced the standardized T - C difference in posttest means: We also can compute T-C differences in other metrics and for other outcomes. Common study outcomes for trt/ctrl meta-analysis Treatment (T)/control (C) studies:
21
21 gain or difference score means for D = Y – X standardized by the difference SD standardized by the posttest SD covariate adjusted means We may also compute standardized T - C differences in: Common study outcomes for trt/ctrl meta-analysis: d family
22
22 differences between proportions: odds ratios for proportions: log odds ratios: differences between arcsine-transformed proportions Common study outcomes for trt/ctrl meta-analysis: Categorical outcomes
23
23 differences between transformed variances: 2 log(S T ) - 2log(S C ) or 2 log(S T /S C ) probability values from various tests of Trt/Ctrl differences, such as the t test (shown), ANOVA F test, etc. Less common study outcomes for trt/ctrl meta-analysis
24
24 standardized posttest - pretest mean difference: covariate adjusted means: Other common study outcomes for meta-analysis: d family Single group studies proportions (e.g., post-trt counts for outcome A): arcsine proportions:
25
25 odds ratios for single proportions correlations r correlation matrices r 1,..., r p(p-1) variance ratios S post /S pre or 2log(S post /S pre ) “variance accounted R 2, Eta 2, etc. for” measures Other common study outcomes for meta-analysis or
26
26 Common study outcomes for meta-analysis Effect Sizes from the d Family Effect Sizes from the r Family Effect Sizes for Categorical Data We next treat each of the three families of effect sizes in turn :
27
27 More Detail on Effect Sizes: The d Family
28
28 The standardized mean difference may be appropriate when Standardized mean difference Studies use different (continuous) outcome measures Study designs compare the mean outcomes in treatment and control groups Analyses use ANOVA, t tests, and sometimes chi-squares (if the underlying outcome can be viewed as continuous)
29
29 PopulationSample GroupTreatmentControlTreatmentControl Means SD Effect Sizes Standardized mean difference: Definition
30
30 Sample size and unit information Means and SDs or SEs for treatment and control groups ANOVA tables F or t tests in text, or Tables of counts Computing standardized mean difference The first steps in computing d effect sizes involve assessing what data are available and what’s missing. You will look for:
31
31 Regardless of exactly what you compute you will need to get sample sizes (to correct for bias and compute variances). Sample sizes can vary within studies so check initial reports of n against Sample sizes n for each test or outcome or df associated with each test
32
32 A major issue is often computing the within-group standard deviation S pooled. The standard deviation determines the “metric” for standardized mean differences. Different test statistics (e.g., t vs. multi-way ANOVA F ) use different SD metrics. In general it is best to try to compute or convert to the metric of within-group (i.e., Treatment and Control) standard deviations. Calculating effect-size estimates from research reports
33
where n T and n C are group sample sizes, and are group variances. Also recall that 33 Glass’s or Cohen’s effect size is defined as Calculating effect sizes from means and SDs d = g *[1 - 3/(4m-1)], where m = n T + n C – 2. and
34
34 Most notable for statistical work in meta-analysis is the fact that each of the study indices has a “known” variance. These variances are often conditional on the parameter values. For d the variance is The variance is computed by substituting d for . Variance of the standardized mean difference
35
35 The 95% confidence interval for d is Confidence interval for the standardized mean difference
36
36 Equal n example: Pooled standard deviation: Effect size: S T (S T 2 )nTnT S C (S C 2 )nCnC 98050 (2500)30102060 (3600)30 Calculating effect sizes from means and SDs
37
37 Data: Unbiased effect size: 95% CI: S T (S T 2 )nTnT S C (S C 2 )nCnC 98050 (2500)30102060 (3600)30 -0.71 + 1.96*(.27) = -0.71 + 0.53 or -1.24 to -0.18 Calculating effect sizes from means and SDs g = -0.72, d = -0.72*[1-3/(4*58 - 1)] = -0.72*.987 = -0.71
38
38 Compute the values of d, the SEs, and the 95% CIs for these two studies: MeanSDn Treatment124 Control15612 MeanSDn Treatment6.5460 Control5460 Answers are at the end of the section. Calculating effect sizes: Practice
39
39 If the study’s design is a two group (treatment- control) comparison and the ANOVA F statistic is reported, then You must determine the sign from other information in the study. Calculating effect sizes from the independent groups F test
40
40 When the study makes a two group (treatment- control) comparison and the t statistic is reported, we can also compute d easily. Then Calculating effect sizes from the independent groups t test
41
41 the full ANOVA table is reported, and the cell means and SDs are reported. Exactly how we compute d for the two-way ANOVA depends on the information reported in the study. We consider two cases: Calculating effect sizes from the two-way ANOVA
42
42 where MSWithin is the MSW for the one-way design with A as the only factor. Then d is computed as Calculating effect sizes from the two-way ANOVA table Suppose A is the treatment factor and B is the other factor in this design. We pool the B and AB factors with within-cell variation to get
43
43 Calculating effect sizes from the two-way ANOVA cell means and SDs Suppose we have J subgroups within the treatment and control groups, with means and sample sizes n ij ( i = 1 is the treatment group and i = 2 is the control group). We first compute the treatment and control group means:
44
44 where SS B is the between cells sum of squares within the treatment and control groups Calculating effect sizes from the two-way ANOVA cell means and SDs Then calculate the effect size as Then compute the standard deviation S p via
45
45 There are, of course, variants of these two methods. For example, you might have the MSW, but not the within cell standard deviations (the S ij ). Then you could use df W MSW in place of the sum of weighted S ij 2 values in the last term of the numerator in the expression for S 2 pooled on the previous slide. Calculating effect sizes from the two-way ANOVA: Variants
46
46 Suppose a study uses a one-way ANCOVA with a factor that is a treatment-control comparison. Can we use the ANCOVA F statistic to compute the effect size? NO! Or rather, if we do we will not get a comparable effect-size measure. The error term used in the ANCOVA F test is not the same as the unadjusted within (treatment or control) group variance, and is usually smaller than the one-way MSW. Calculating effect sizes from the one-way A ANCOVA
47
47 MSW is the covariate adjusted squared SD within the treatment and control groups, and MSB is the covariate adjusted mean difference between treatment and control groups. The F statistic is F = MSB/MSW, but To get the SD needed for a comparable effect size, we must reconstruct the unadjusted SD within treatment and control groups. Calculating effect sizes from the one-way A ANCOVA
48
48 where r is the covariate-outcome correlation, so Calculating effect sizes from the one-way A ANCOVA The unadjusted SD is
49
49 Calculating effect sizes from the one-way A ANCOVA computing the g using the ANCOVA F, as if it was from a one-way ANOVA (we will call this g Uncorrected ) then “correcting” the g for covariate adjustment via The procedure is equivalent to
50
50 The effect size given previously uses the adjusted means in the numerator. However, the reviewer needs to decide whether unadjusted or covariate adjusted mean differences are desired. In randomized experiments, they will not differ much. Unadjusted means may not be given in the research report, leading to a practical decision to calculate effects based on adjusted means. Calculating effect sizes from the one-way A ANCOVA
51
51 Calculating effect sizes from the one-way A NCOVA Compute the d statistic as for two-way ANOVA Correct the d value for covariate adjustment via Calculating effect sizes from two-way ANCOVA designs poses a combination of the problems in two-way ANOVA designs and one-way ANCOVA designs The procedure to compute g has two steps:
52
52 Suppose a t statistic is given for a test of the difference between gains of the T and C groups. Can we use this t statistic to get g ? NO! Or rather, as before, this will give a g that is not totally comparable to the standard t-test g. The standard deviation in the t statistic for gains is the SD of gains, not posttest scores. To compute a comparable effect size, we have to reconstruct the SD of the posttest scores. Calculating effect sizes from tests on gain scores
53
53 Calculating effect sizes from tests on gain scores The SD of the posttest scores is where r is the pretest-posttest correlation, thus
54
54 The effect size given previously also uses the difference between mean gains in the numerator. Thus, the reviewer needs to decide whether differences in mean posttest scores or mean gains are desired. In randomized experiments, the two types of mean will not usually differ much from each other. Post-test means may not be given in the research report, leading to a practical decision to calculate effects based on differences in mean gains. Calculating effect sizes from tests on gain scores
55
55 Our examples of the calculation of effect sizes from designs using ANCOVA and gain scores illustrates the fact that sometimes auxiliary information (such as the value of r ) is needed to compute effect sizes. This information may be missing in many studies, or even may be missing from all studies. Auxiliary data for effect-size calculation
56
56 That poses a choice for the reviewer: Auxiliary data for effect-size calculation omit studies with missing r values, or impute r values in some way. The reviewer's decision on imputation must be made explicit in the methods section of the meta-analysis report.
57
57 Compute the values of d, the SEs, and the 95% CIs for these two studies: Study 1MeanSDn Treatment124 Control15612 Study 2MeanSDn Treatment6.5460 Control5460 Calculating effect sizes: Answers to practice exercise
58
58 For study 1 the values of S pooled and g are Study 1MeanSDn Treatment124 Control15612 Calculating effect sizes: Answers to practice exercise g
59
59 For study 1 d is -0.59*[1-3/(4*22 -1)]=-0.59*0.966 or d = -0.570. The values of SE d and the 95% CI are 95% CI: Calculating effect sizes: Answers to practice exercise Study 1MeanSDn Treatment124 Control15612
60
60 For study 2 the values of S pooled and d are Study 2MeanSDn Treatment6.5460 Control5460 Calculating effect sizes: Answers to practice exercise
61
61 95% CI: Calculating effect sizes: Answers to practice exercise Study 2MeanSDn Treatment6.5460 Control5460 For study 2 d is 0.38*[1-3/(4*118 -1)]= 0.38*0.994 = 0.38. the values of SE d and the 95% CI are
62
62 Even though the effect size for study 2 is smaller in absolute value that that for study 1, its SE is smaller and thus the 95% CI does not include 0. 95% CI for study 2: 95% CI for study 1: Calculating effect sizes: Answers to practice exercise
63
63 More Detail on Effect Sizes: The r Family
64
64 The correlation coefficient or r family effects may be appropriate when … The r family studies have a continuous outcome measure, study designs assess the relation between a quantitative predictor and the outcome (possibly controlling for covariates), or the analysis uses regression (or the general linear model).
65
65 Jacob Cohen (1988) proposed general definitions for anticipating the size of effect- size estimates: dr Small.20.10 Medium.50.30 Large.80.50 Cohen’s benchmarks
66
66 Cohen intended these to be “rules of thumb”, and emphasized that they represent average effects from across the social sciences. He cautioned that in some areas, smaller effects may be more typical, due to measurement error or the relative weakness of interventions. Each reviewer will need to make judgments about what is “typical” based on his or her expertise. More on Cohen
67
67 The most commonly used effect size in this family is the correlation coefficient r. This also equals the standardized regression coefficient when there is only one predictor in a regression equation. SamplePopulation rρ The r family
68
68 When computing a correlation coefficient, scores are actually being standardized, which makes r itself standardized. The r family Recall that z is To compute r we have where n is the number of X-Y pairs.
69
69 The variance of the correlation depends on the sample size and parameter value. We estimate the variance by using each study’s correlation to estimate its parameter . So for study i, we have v i = Var(r i ) = (1 - r i 2 ) 2 /(n i - 1), or we can use a consistent estimator of (e.g., an average) v i = Var(r i ) = (1 - 2 ) 2 /(n i - 1). Statistical properties (Variances)
70
70 Statistical properties (Transformation) Sometimes we transform effect sizes because that simplifies statistical analyses or makes our assumptions more justifiable. A common transformation of correlations is the Fisher z-transform of the correlation
71
71 Statistical properties (Transformation) Consider a z-transformed correlation coefficient from a sample of size n. The z transform is a variance stabilizing transformation, which means the variance of z does not depend on , as did the variance of r. The variance of z is
72
72 Example: Effect size: r =.60, z r =.693 SE of r : 95% CI: Study rz n (pairs) 1.60.69350.421 to.779 in the r metric in the r metric Correlation example
73
73 Example: Effect size: r =.60, z r =.693 SE of z r : 95% CI: Study rz n (pairs) 1.60.69350 0.406 to 0.98 in the z metric and.385 to.753 in the r metric in the z metric Correlation example
74
74 Example: Effect size: r =.60, z r =.693 We must return the CI for Z to get the 95% CI: Study rz n (pairs) 1.60.69350 0.406 to 0.98 in the z metric and.385 to.753 in the r metric in the z metric Correlation example
75
75 More Detail on Effect Sizes: Categorical Data
76
76 The effect sizes in the d and r families are mainly for studies with continuous outcomes. Three popular effect sizes for categorical data will be introduced here: Effect sizes for categorical data the odds ratio, the risk ratio (or rate ratio), and the risk difference (the difference between two probabilities).
77
77 Effect sizes for categorical data Consider a study in which a treatment group (T) and a control group (C) are compared with respect to the frequency of a binary characteristic among the participants. In each group we will count how many participants have the binary outcome of interest. We will refer to having the binary outcome as “being in the focal group” (e.g., passing a test, being cured of a disease, etc.).
78
78 Let π T and π C denote the population probabilities of being in the focal group within each of the two groups T and C; P T and P C denote the sample probabilities. Effect sizes for categorical data Population Sample Risk Difference Δ = π T – π C RD = P T – P C Risk Ratio RR = π T / π C RR = P T / P C Odds Ratio
79
79 The frequencies (n 11, n 12, n 21, and n 22 ) for the binary outcomes are counted for both treatment and control groups. The odds ratio ( OR ) is the most widely used effect-size measure for dichotomous outcomes. Odds ratio
80
80 The OR is calculated as An OR = 1 represents no effect, or no difference between treatment and control in the odds of being in the focal group. The lower bound of the OR is 0 (Control outcome is better than Treatment outcome). Its upper bound is infinity (Treatment < Control). Odds ratio
81
81 The range of values of the OR is inconvenient for drawing inferences, and its distribution can be quite non-normal. The logarithm of OR ( LOR ) is more nearly normal, and is calculated as LOR makes interpretation more intuitive. It is similar in some respects to d. A value of 0 represents no T vs. C difference or no treatment effect. The LOR ranges from - to . Log Odds ratio
82
82 The standard error of the LOR ( SE LOR ) is calculated as An approximate 95% confidence interval for each LOR can be calculated as Log Odds ratio
83
83 In 1904, Karl Pearson reviewed evidence on the effects of a vaccine against typhoid (also called enteric fever). Pearson’s review included 11 studies of mortality and immunity to typhoid among British solders. The treatment was an inoculation against typhoid and the “cases” – those who became infected – were the focal group. Example: Pearson’s hospital staff typhoid incidence data
84
84 Example: Pearson’s hospital staff typhoid incidence data
85
85 The focal group is the group of diseased staff members: The OR is calculated as The odds of becoming diseased are 3 times greater for staff members who are not inoculated. Example: Pearson’s hospital staff typhoid incidence data
86
86 OR can also be computed from cell proportions: Example: Pearson’s hospital staff typhoid incidence data
87
87 The LOR in this example is LOR = ln(3.04) = 1.11 The SE of the LOR in this example is Example: Pearson’s hospital staff typhoid incidence data
88
88 The approximate 95% confidence interval for LOR in this example is Upper bound LOR : 1.56 Lower bound LOR : 0.66 The CI does not include zero, suggesting a significant positive treatment effect for inoculations. Example: Pearson’s hospital staff typhoid incidence data
89
89 The LOR s can be transformed back to the OR metric via the exponential function: Upper bound OR = exp(1.56) = 4.76 Lower bound OR = exp(0.66) = 1.93 Again here we can see the CI for the OR does not include 1, indicating a significant treatment effect. Example: Pearson’s hospital staff typhoid incidence data
90
90 If most of the effect sizes are in the d metric and just a few expressed as OR or LOR, the OR or LOR can be converted to d so all the effect sizes can be pooled. The transformation developed by Cox (as cited in Sanchez-Meca, Marín-Martínez, & Chacón- Moscoso, 2003) works well. It is computed as Converting OR or LOR to d
91
91 where P 1 and P 2 are the marginal proportions for the first column (treatment) and the second column (control), respectively. Relative risk Relative risk ( RR ) is also used for dichotomous outcomes.
92
92 The relative risk ranges from 0 to infinity. Relative risk A relative risk ( RR ) of 1 indicates that there is no difference in risk between the two groups. A relative risk ( RR ) larger than one indicates that the treatment group has higher risk (of being in the focal group) than the control. A relative risk ( RR ) less than one indicates that the control group has higher risk than the treatment group..
93
93 As was true for the LOR, the logarithm of the RR ( LRR ) has better statistical properties. It is calculated as The range of the LRR is from - to , and as for the LOR, a value of 0 indicates no treatment effect. Log Relative risk
94
94 The probability of being immune if inoculated is 1.22 times higher than the probability if not inoculated. Example: Pearson’s hospital staff typhoid incidence data
95
95 The standard error of the LRR is calculated as based on the tabled counts Log Relative risk
96
96 The standard error of the LRR for our example is Example: Pearson’s hospital staff typhoid incidence data
97
97 The 95% CI is Upper bound LRR : 0.28 Lower bound LRR : 0.12 These LRR s can be transformed back to the RR metric via the exponential function: Upper bound RR = exp(1.56) = 1.32 Lower bound RR = exp(0.66) = 1.12 Example: Pearson’s hospital staff typhoid incidence data
98
98 The RR can also be converted to the OR via In Pearson’s example, it is Converting RR to OR
99
99 The risk difference ( RD ) between proportions is often considered the most intuitive effect size for categorical data. The standard error of RD ( SE RD ) is Risk difference (difference between two proportions)
100
100 For Pearson’s data, 89.2% of those inoculated were immune (265/297) and 73.1% of those not inoculated were immune (204/279). The difference in immunity rates for those inoculated or not is The standard error of the difference is Example: Pearson’s hospital staff typhoid incidence data
101
101 In general a 95% CI for RD is Upper bound RD :.22 Lower bound RD :.10 Because the value 0 (meaning no difference in risk) is not included in the CI, we again conclude there is a treatment effect. Example: Pearson’s hospital staff typhoid incidence data
102
102 Connections Between the Effect-Size Metrics
103
103 The effects d, r, and the odds ratio (OR) can all be converted from one metric to another. Sometimes it is convenient to convert effects for comparison purposes. A second reason may be that just a few studies present results that require computation of a particular effect size. For example, if most studies present results as means and SDs (and thus allow d to be calculated), but one reports the correlation of treatment with the outcome, one might want to convert the single r to a d. Conversions among effect-size metrics
104
104 Converting d to the log odds ratio Converting the log odds ratio to d Conversions between d and LOR
105
105 To convert r to d, we first compute the SE of the correlation using Then Conversions of r and d
106
106 Converting d to r Conversions of r and d A is a “correction factor” for cases where the groups are not the same size. If the specific group sizes are not available, assume they are equal and use n 1 = n 2 = n which yields A = 4.
107
107 Cox, D. R. (1970). Analysis of binary data. New York, Chapman & Hall/CRC. Pearson, K. (1904). Report on certain enteric fever inoculation statistics. British Medical Journal, 3, 1243-1246. Sánchez-Meca, J., Marín-Martínez, F., & Chacón- Moscoso, S. (2003). Effect-size indices for dichotomized outcomes in meta-analysis. Psychological Methods, 8(4), 448-467. References
108
108 C2 Training Materials Team Thanks are due to the following institutions and individuals Funder: Norwegian Knowledge Centre for the Health Sciences Materials contributors: Betsy Becker, Harris Cooper, Larry Hedges, Mark Lipsey, Therese Pigott, Hannah Rothstein, Will Shadish, Jeff Valentine, David Wilson Materials developers: Ariel Aloe, Betsy Becker, Sunny Kim, Jeff Valentine, Meng-Jia Wu, Ying Zhang Training co-coordinators: Betsy Becker and Therese Pigott
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.