Effect Sizes for Continuous Variables William R. Shadish University of California, Merced.

Slides:



Advertisements
Similar presentations
Mixed Designs: Between and Within Psy 420 Ainsworth.
Advertisements

Statistics for the Social Sciences
1 Chapter 4 Experiments with Blocking Factors The Randomized Complete Block Design Nuisance factor: a design factor that probably has an effect.
Statistics for Linguistics Students Michaelmas 2004 Week 6 Bettina Braun
PSY 307 – Statistics for the Behavioral Sciences Chapter 20 – Tests for Ranked Data, Choosing Statistical Tests.
Effect Size Overheads1 The Effect Size The effect size (ES) makes meta-analysis possible. The ES encodes the selected research findings on a numeric scale.
PSY 307 – Statistics for the Behavioral Sciences
1 G Lect 14b G Lecture 14b Within subjects contrasts Types of Within-subject contrasts Individual vs pooled contrasts Example Between subject.
1 Multifactor ANOVA. 2 What We Will Learn Two-factor ANOVA K ij =1 Two-factor ANOVA K ij =1 –Interaction –Tukey’s with multiple comparisons –Concept of.
Using Statistics in Research Psych 231: Research Methods in Psychology.
ANCOVA Psy 420 Andrew Ainsworth. What is ANCOVA?
PSY 307 – Statistics for the Behavioral Sciences
Test statistic: Group Comparison Jobayer Hossain Larry Holmes, Jr Research Statistics, Lecture 5 October 30,2008.
Statistics for the Social Sciences Psychology 340 Spring 2005 Factorial ANOVA.
Two Groups Too Many? Try Analysis of Variance (ANOVA)
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
Crosstabs and Chi Squares Computer Applications in Psychology.
Practical Meta-Analysis -- D. B. Wilson
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
Using Statistics in Research Psych 231: Research Methods in Psychology.
Statistical Methods in Computer Science Hypothesis Testing II: Single-Factor Experiments Ido Dagan.
Introduction to Analysis of Variance (ANOVA)
Factorial Within Subjects Psy 420 Ainsworth. Factorial WS Designs Analysis Factorial – deviation and computational Power, relative efficiency and sample.
Repeated Measures ANOVA Used when the research design contains one factor on which participants are measured more than twice (dependent, or within- groups.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Statistics for the Social Sciences Psychology 340 Fall 2013 Thursday, November 21 Review for Exam #4.
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Overview of Meta-Analytic Data Analysis
Calculations of Reliability We are interested in calculating the ICC –First step: Conduct a single-factor, within-subjects (repeated measures) ANOVA –This.
Review of Statistics Group Results. Which type of statistics? When comparing two group scores-Use the t-test. When comparing more than two scores: Use.
Practical Meta-Analysis -- The Effect Size -- D. B. Wilson 1 The Effect Size The effect size (ES) makes meta-analysis possible The ES encodes the selected.
Chapter 11 HYPOTHESIS TESTING USING THE ONE-WAY ANALYSIS OF VARIANCE.
t(ea) for Two: Test between the Means of Different Groups When you want to know if there is a ‘difference’ between the two groups in the mean Use “t-test”.
TAUCHI – Tampere Unit for Computer-Human Interaction ERIT 2015: Data analysis and interpretation (1 & 2) Hanna Venesvirta Tampere Unit for Computer-Human.
PSY 307 – Statistics for the Behavioral Sciences Chapter 16 – One-Factor Analysis of Variance (ANOVA)
ANOVA (Analysis of Variance) by Aziza Munir
Psychology 301 Chapters & Differences Between Two Means Introduction to Analysis of Variance Multiple Comparisons.
Statistics Psych 231: Research Methods in Psychology.
Parametric tests (independent t- test and paired t-test & ANOVA) Dr. Omar Al Jadaan.
Statistics (cont.) Psych 231: Research Methods in Psychology.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
I. Statistical Tests: A Repetive Review A.Why do we use them? Namely: we need to make inferences from incomplete information or uncertainty þBut we want.
Psych 5500/6500 Other ANOVA’s Fall, Factorial Designs Factorial Designs have one dependent variable and more than one independent variable (i.e.
Education 793 Class Notes Presentation 10 Chi-Square Tests and One-Way ANOVA.
CHAPTER 4 Analysis of Variance One-way ANOVA
Chapter 14 Repeated Measures and Two Factor Analysis of Variance
Chapter 13 Repeated-Measures and Two-Factor Analysis of Variance
Hypothesis test flow chart frequency data Measurement scale number of variables 1 basic χ 2 test (19.5) Table I χ 2 test for independence (19.9) Table.
Smith/Davis (c) 2005 Prentice Hall Chapter Fifteen Inferential Tests of Significance III: Analyzing and Interpreting Experiments with Multiple Independent.
Statistics for the Social Sciences
Kin 304 Inferential Statistics Probability Level for Acceptance Type I and II Errors One and Two-Tailed tests Critical value of the test statistic “Statistics.
Chapter 13 Understanding research results: statistical inference.
1 Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture #10 Testing the Statistical Significance of Factor Effects.
Factorial BG ANOVA Psy 420 Ainsworth. Topics in Factorial Designs Factorial? Crossing and Nesting Assumptions Analysis Traditional and Regression Approaches.
Statistics (cont.) Psych 231: Research Methods in Psychology.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 4 Investigating the Difference in Scores.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Inferential Statistics Psych 231: Research Methods in Psychology.
29 October 2009 MRC CBU Graduate Statistics Lectures 4: GLM: The General Linear Model - ANOVA & ANCOVA1 MRC Cognition and Brain Sciences Unit Graduate.
Chapter 14 Repeated Measures and Two Factor Analysis of Variance PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Seventh.
Inferential Statistics Psych 231: Research Methods in Psychology.
Effect Sizes.
Dependent-Samples t-Test
Effect Sizes (continued)
Kin 304 Inferential Statistics
I. Statistical Tests: Why do we use them? What do they involve?
Statistics for the Social Sciences
Psych 231: Research Methods in Psychology
Presentation transcript:

Effect Sizes for Continuous Variables William R. Shadish University of California, Merced

Indices for Treatment Outcome Studies Correlation coefficient (r) between treatment and outcome Standardized mean difference statistic (d) Either can be transformed into the other, so we will work with d since it is most common. Other indices do exist but are rare in social science meta-analyes.

Estimating d d itself Algebraic equivalents to d Good approximations to d Methods that require intraclass correlation Methods that require ICC and change scores Methods that underestimate effect Note: Italicized methods will be covered in this workshop.

Sample Data Set I: Two Independent Groups TreatmentComparison Mean Standard Deviation Sample Size1010 Correlation between treatment and outcome is r = -.055

Calculating d

Algebraic Equivalent: Between Groups t-test on raw posttest scores,

Algebraic Equivalent: t-test for two matched groups, sample sizes, correlation between groups

Algebraic Equivalent: Two-group between-groups F-statistic on raw posttest scores (Data Set I)

Algebraic Equivalent: Multifactor Between Subjects ANOVA with Two Treatment Conditions 1.Sums of Squares and Degrees of Freedom for all sources, and Marginal Means for Treatment Conditions 2.Mean Squares and Degrees of Freedom for all sources, and Marginal Means for Treatment Conditions 3.Sums of Squares and Degrees of Freedom for all sources, with Cell Means and Cell Sample Sizes 4.Mean Squares and Degrees of Freedom for all sources, with Cell Means and Cell Sample Sizes 5.Cell means, cell sample sizes, the F-statistic for the treatment factor, and the degrees of freedom for the error term 6.F-statistics and degrees of freedom for all sources, sample size for treatment and comparison groups, where treatment factor has only two levels

Example: Sums of Squares and Degrees of Freedom for all sources, and Marginal Means for Treatment Conditions: Data Set II B1B2B3 A A Row B1B2B3 Marginal A (3)(3)(3)(9) A (3)(3)(3)(9) Column Marginal (6)(6)(6)(18) Sum of Squaresdf Mean Square F Probability A B AB Residual Total

Example: Sums of Squares and Degrees of Freedom for all sources, and Marginal Means for Treatment Conditions For a two group one factor ANOVA: For a two factor ANOVA: Which is the same as would have been obtained had Factor B not existed (with equal n per cell),

Algebraic Equivalent : Oneway two-group ANCOVA: Covariance error term, F for covariate, raw score means, and total sample size (Data Set III) Time 1Time 2Change Time 2 Group Mean Group Group

ANCOVA Table, Time 2 as Outcome, Time 1 as covariate Source Sum of SquaresdfMean SquareFSig. Covariate Groups Error Total Note: This table was computed using the unique sum of squares method as defined in SPSS for Windows Version 7.5.

Algebraic Equivalent: Oneway two-group ANCOVA: Covariance error term, F for covariate, raw score means, and total sample size (Data Set III) Which is the same as would have been obtained had the standard method been applied to the Time 2 scores

Algebraic Equivalent: Exact Probability and Sample Sizes If exact p value from t-test or two group F-test Use sample size to get df, which in turn allows you to get exact t statistic Then apply t-test method previously shown From Data Set I – exact probability for t-test was p =.818. –for df = 20-2 = 18, t =.2336 –so d = -.104, same as before

Algebraic Equivalent: r to d To convert r to d uncorrected for small sample bias, using Data Set I: Which is the same as originally obtained using the standard formula for d

Algebraic Equivalent: Raw Data Sometimes raw data is tabled as, say, –Treatment group N = 10: A = 20%, B =20%, C = 30%, D = 20%, and F = 10% –Comparison group N = 10: A = 10%, B = 20%, C = 20%, D = 30%, and F = 20% Create raw data as, say, A = 4, B = 3, C = 2, D = 1, and F = 0 –treatment group is 4, 4, 3, 3, 2, 2, 2, 1, 1, 0 –comparison group is 4, 3, 3, 2, 2, 1, 1, 1, 0, 0 Then d =.377

Good Approximation Three-group or higher between-groups oneway ANOVA on posttest scores: group means, sample sizes, and F-statistic,

Example Data Set IV PosttestGroup Mean.00 Group Group Group G = F = This is similar but not identical to d = using the standard method comparing groups 1 and 2. Difference due to different s p.

Good Approximation Three-group or higher between-groups oneway ANOVA on raw posttest scores: treatment and comparison group means and mean square error. For Data Set IV:

Good Approximations: Two-Factor RM-ANOVA (groups x time) Between-groups mean square error, within- groups mean square error, posttest means, and sample sizes F-ratio for groups, F-ratio for time, cell means and sample sizes F-ratio for groups, F-ratio for group × time interaction, cell means and sample sizes

Example: Data Set V This data set is taken from Winer (1972, p. 525). It presents a two-factor model with factor A as a between subjects factor having two levels, A1 and A2, and factor B as a within-subjects factor having four levels (columns B1 through B4). The raw data are: B1B2B3B4 A A

Example: Data Set V RM-ANOVA Here are the cell means (and sample sizes) for the same data, along with marginals and grand means. Row B1B2B3B4Marginal A (3)(3)(3)(3)(12) A (3)(3)(3)(3)(12) Col = Grand Mean Marginal (6)(6)(6)(6)(24) Repeated Measures ANOVA Table Tests of Within-Subjects Effects SourceSum of Squares df Mean Square F Probability B AB WS Error Tests of Between-Subjects Effects Source Sum of Squares df Mean Square F Probability A BS Error

Between-groups mean square error, within- groups mean square error, posttest means, and sample sizes: Data Set V Assuming Time 4 is the time point of interest (e.g., it is the posttest, or the followup), then:

Methods that underestimate effect size I Results reported as verbally “significant”, or as p <.05 or <.01 etc., with sample size Use previous method to convert p to t, and then use t to compute d as before. In Data Set I using p <.05, this method would yield t = , yielding d = Underestimates d because t will increase as p decreases, and p =.05 is too high. Be careful to distinguish 1 vs 2 tailed tests.

Methods that underestimate effect size II Results reported only as nonsignificant. Omitting them from the meta-analysis results in an overestimate of average d. A typical solution is to code them as d = 0 (introduces a constant variance problem), but then do sensitivity analyses. More sophisticated solutions exist such as maximum likelihood imputation.

Discussion Many more methods exist The standard error for all but d and its algebraic equivalents are typically unknown Whether to use the approximations or not involves the same tradeoffs as with results reported only as nonsignificant (missing effect sizes vs approximate results) When doing a meta-analysis, good practice is to code effect size calculation method, and then explore its effects on outcome.

Computer Programs Lipsey and Wilson’s excel macro (free at ES program (purchase at For more meta-analytic software, see Analysis%20Links.htm.