Funded through the ESRC’s Researcher Development Initiative Department of Education, University of Oxford Sessions 1.2-1.3: Effect Size Calculation.

Funded through the ESRC’s Researcher Development Initiative Department of Education, University of Oxford Sessions 1.2-1.3: Effect Size Calculation

 The effect size makes meta-analysis possible  It is based on the “dependent variable” (i.e., the outcome)  It standardizes findings across studies such that they can be directly compared  Any standardized index can be an “effect size” (e.g., standardized mean difference, correlation coefficient, odds-ratio), but must  be comparable across studies (standardization)  represent magnitude & direction of the relationship  be independent of sample size  Different studies in same meta-analysis can be based on different statistics, but have to transform each to a standardized effect size that is comparable across different studies

XLS Sample size, significance and d effect size

5 XLS ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 5 Sample size, significance and d effect size

6 XLS ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 6 Simulate ds on homemade calculator (ES.xls)  Change direction of effects  Change Ns (equal or same?)  Change SDs

79% of T above69% of T above 7 Effect size as proportion in the Treatment group doing better than the average Control group person 57% of T above = Control = Treatment

8 Effect size as proportion of success in the Treatment versus Control group (Binomial Effect Size Display = BESD): Success: 55% of T, 45% of C = Control = Treatment Success: 62% of T, 38% of C Success: 68% of T, 32% of C

 Long focus on significance level (safe-guarding against Type I (  ) error) – today focus on practical and meaningful significance.  Cohen, J. (1994). The earth is round (p <.05), American Psychologist, 49, 997–1003. 9 Why effect size?

10 A short history of the effect size (Huberty, 2002; see also Olejnik & Algina, 2000 for review of effect sizes)

 Power: “Finding what is out there”  Type II (  ) error “not finding what is out there”  Power (1 –  ): the probability of rejecting a false H 0 hypothesis  Power of.80 or.90 in primary research 11 Power and effect size

12 Power, sought effect size, at significance level  =.05 in primary research (prior to conducting study)

13 How meaningful is a “small” effect size?  A small effect size changed the course of an RCT in 1987: placebo group participants were given aspirin instead (see Rosenthal, 1994, p. 242) XLS

 Within the one meta-analysis, can include studies based on any combination of statistical analysis (e.g., t-tests, ANOVA, correlation, odds-ratio, chi- square, etc).  The “art” of meta-analysis is how to compute effect sizes based on non-standard designs and studies that do not supply complete data (see Lipsey&Wilson_AppB.pdf).  Convert all effect sizes into a common metric based on the “natural” metric given research in the area. E.g. d, r, OR 14 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg)

 Standardized mean difference  Group contrast research  Treatment groups  Naturally occurring groups  Inherently continuous construct  Correlation coefficient  Association between inherently continuous constructs  Odds-ratio  Group contrast research  Treatment or naturally occurring groups  Inherently dichotomous construct  Regression coefficients and other multivariate effects  Requires access to covariance-variance (correlation) matrices for each included study 15

16 Means and standard deviations Correlations P-values F -statistics d t -statistics “other” test statistics Almost all test statistics can be transformed into an standardized effect size “d” ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 16 Calculating ds (1)

 Represents a standardized group contrast on an inherently continuous measure  Uses the pooled standard deviation  Commonly called “d” ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) Calculating ds (1)

 Cohen’s d  Hedge’s g  Glass’s  18 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) Various contrast effect sizes

19 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) Calculating d (1) using Ms, SDs and ns Remember to code treatment effect in positive direction!

20 ES_calculator.xls

21 Calculating d (2) using ES calculator, using Ms, ns, and t-value

22 Calculating d (3) using ES calculator, using ns, and t-value  The treatment group scored higher than the control group at Time 2 (t [28] = 4.11; p<.001).  From sample description we learn that n 1 = n 2

 Hedges proposed a correction for small sample size bias (ns < 20)  Must be applied before analysis 23 Calculating d (3) correcting for small sample bias

24 Calculating d (4) using ES calculator, using ns, and F-value Remember: in a two-group ANOVA F = t 2

25 Calculating d (5) using ES calculator, using p-value “The mean-level comparison was not significant (p =.53)”

26 T-test table df = (n1 + ns –2) Sometimes authors only report e.g., p<.01 (n = 22). If so, use a conservative approach to reading the t- test table. NOTE: When p = n.s. some researchers code d = 0 in data base

27 Example dataset so far (1) (ES_enter.sav):

28 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) Use all available tools for calculating the following 5 effect sizes  ES 6: M T = 21, M C = 20, n T = 60, n C = 60, t =.55  ES 7: M T = 103.5, M C = 100, SD T = 22.0, SD C = 18.5, n T = 45, n C = 35,  ES 8: n T = 45, n C = 40, p <.05  ES 9: n T = 100, n C = 120, F = 8.73  ES 10: n T = 200, n C = 160, t = 5.66 (see electronic document: “Correct ds for 5 effect sizes.doc”)

29 Example dataset so far (2) (ES_enter.sav):

30 Calculating d (11) using ES calculator, using number of successful outcomes per group

31 Calculating d (11) using ES calculator, using number of successful outcomes per group

32 Calculating d (12) using ES calculator, using proportion of successes per group (53% vs. 48.5%)

33 Calculating d (13) using paired t-test (only one experimental group; “each person their own control” ) Don’t use the SD of the change score! r = correlation between Time 1 and Time 2

34 Calculating d (14) using paired t-test (only one experimental group)  n (pairs) = 90, t-value = 6.5, r =.70

35 Calculating d (15)  “The 20 participants increased.84 z-scores between time 1 and time 2 (p<.01)”  ES =.84  Correct for small sample bias

36 Example dataset so far 3 (ES_enter.sav): Method difference: mean contrast and gain scores

37 Summary of equations from Lipsey & Wilson (2001) ( for more formulae see Lipsey & Wilson Appendix B )

 The effect sizes are weighted by the inverse of the variance to give more weight to effects based on larger sample sizes  Variance for mean level comparison is calculated as  The standard error of each effect size is given by the square root of the sampling variance SE =  v i 38 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 38 Weighting for mean-level differences

39 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 39  Enter_w.xls

 SE for gain scores  Inverse variance for gain scores 40 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 40 Weighting for gain scores T1 and T2 scores are dependent so we need to get correlation between T1 and T2 into equation (not always reported)

41 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 41 XLS  Enter_w.xls

42 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 42 Compute the weighted mean ES and s.e. of the ES in SPSS (var_ofES.sps) (1)

43 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 43 Compute the weighted mean ES and s.e. of the ES in SPSS (var_ofES.sps) (2)

 Weight the ES by the inverse of the s.e.  The average ES  Standard error of the ES 44 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 44 Compute the weighted mean ES and s.e. of the ES

45 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 45  Enter_w.xls

46 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 46

 Does average of ES converge toward the average of the largest (n) study? 47 Funnel plot for x = sample size, y = ES 95% C.I. = ±1.96 * s.e. 99% C.I. = ±2.58 * s.e. 99.9% C.I. = ±3.29 * s.e.

 ES in smaller sample has larger standard error (s.e.) 48 Funnel plot including s.e. of ES

N = ‘size’  = ‘mean’  = ‘effect size’ Population The “likely” population parameter is the sample parameter ± uncertainty  Standard errors (s.e.)  Confidence intervals (C.I.) Interval estimates 49 Sample n = ‘size’ m = ‘mean’ d = ‘effect size’ ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 49

Means and standard deviations (d) 2 2  P-values F -statistics r t -statistics “other” test statistics Almost all test statistics can be transformed into an standardized effect size “r” ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 50 Calculating rs

51 Correlations / relationships between variables  r xy Pearson’s product moment coefficient (continuous  continuous)  R pb Bi-serial correlation (dichotomous  continuous)   2 (dichotomous  dichotomous)  r s Spearman’s rank-order coefficient (ordinal  ordinal) And others, e.g.,   coefficient, Odds-Ratio (OR)  Cramer’s V, Contingency coefficient C  Tetrachoric and polychoric correlations …. (etc)

52 Bias when dichotomising continuous variables  X or Y are both “truly” continuous, but in the study either is dichotomised X = continuous, Y =50/50 split gives an r pb that is 80% of its value, had it been continuous  X or Y are both “truly” continuous, but both are dichotomised Maximum value of  if x = 30/70 split and Y = 50/50 split is  =.33

53 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 53 Calculating rs from d (1) r can be used in all situations d can, but d cannot be used in all situations where r is appropriate

 If inherently continuous X and Y, mean-contrast is a better option than r pb 54 Calculating r pb (2)

55 Calculating r (3) from t-value  Appropriate for both independent and dependent samples t-test values Calculating r (4) from  2 -value

56 Sources of error  Cf. Structural Equation Model (circle = latent/ unobserved construct, rectangle = manifest/ observed variable) Manifest (observed) variable x Manifest (observed) variable y Latent (unobserved) X Latent (unobserved) Y r x*y* r xx r yy r xy

57 Alternatively: transform rs into Fisher’s Z r -transformed rs, which are more normally distributed ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg) 57

58 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg)  rr.xls

60 Calculating OR (chi2.sps)

63 Pearson’s 5 studies escaping Enteric Fever (1904)

XLS 65 ESRC RDI One Day Meta-analysis workshop (Marsh, O’Mara, Malmberg)

68 Each study is one line in the data base Effect sizeDurationSample sizes Reliability of the instrument Variance of the effect size

69 Organising effect sizes within study (1) “Flat dataset”

70 Organising effect sizes within study (2) “hierarchical dataset” (effect sizes nested within study)

71 Organising effect sizes within study (3) “hierarchical dataset”, with one construct per DV per study

72 Organising effect sizes within study (4) “hierarchical dataset”, with one DV per study NOTE: alternative to aggregating ESs within study: multilevel meta- analysis

73 Exercise: effect size calculation (4 method/result extracts from journals):  Do boys have higher general (global) self-concept (self-worth) than girls?  Decide which effect size to use (d, r, OR)?  Calculate appropriate effect sizes

74 Effect size literature  Cohen, J. (1969). Statistical Power Analysis for the Behavioral Sciences, 1st Edition, Lawrence Erlbaum Associates, Hillsdale (2nd Edition, 1988).  Cohen, J. (1994). The earth is round (p <.05), American Psychologist, 49, 997–1003.  Gwet, K. (2001). Handbook of interrater reliability. How to estimate the level of agreement between two of multiple raters. Gaithersburg: STATAXIS Publishing.  Huberty, C. J. (2002). A history of effect size indices. Educational and Psychological Measurement, 62, 227-240.  McCartney, K., & Rosenthal, R. (2000). Effect size, practical importance, and social policy for children. Child Development, 71, 173- 180.  Olejnik, S., & Algina, J. (2000). Measures of effect size for comparative studies: Applications, interpretations, and limitations. Contemporary Educational Psychology, 25, 241-286.

Funded through the ESRC’s Researcher Development Initiative Department of Education, University of Oxford Sessions 1.2-1.3: Effect Size Calculation.

Similar presentations

Presentation on theme: "Funded through the ESRC’s Researcher Development Initiative Department of Education, University of Oxford Sessions 1.2-1.3: Effect Size Calculation."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Funded through the ESRC’s Researcher Development Initiative Department of Education, University of Oxford Sessions 1.2-1.3: Effect Size Calculation.

Similar presentations

Presentation on theme: "Funded through the ESRC’s Researcher Development Initiative Department of Education, University of Oxford Sessions 1.2-1.3: Effect Size Calculation."— Presentation transcript:

Similar presentations

About project

Feedback