Effect Size Estimation in Fixed Factors Between-Groups ANOVA

Slides:

Advertisements

Similar presentations

Multiple-choice question

Advertisements

Effect Size Mechanics.

One-Way BG ANOVA Andrew Ainsworth Psy 420. Topics Analysis with more than 2 levels Deviation, Computation, Regression, Unequal Samples Specific Comparisons.

Other Analysis of Variance Designs Chapter 15. Chapter Topics Basic Experimental Design Concepts  Defining Experimental Design  Controlling Nuisance.

Analysis of variance (ANOVA)-the General Linear Model (GLM)

Correlation Mechanics. Covariance The variance shared by two variables When X and Y move in the same direction (i.e. their deviations from the mean are.

Introduction to Factorial ANOVA Designs

Testing Differences Among Several Sample Means Multiple t Tests vs. Analysis of Variance.

One-Way Between Subjects ANOVA. Overview Purpose How is the Variance Analyzed? Assumptions Effect Size.

Independent Sample T-test Formula

Part I – MULTIVARIATE ANALYSIS

Analysis of Variance: Inferences about 2 or More Means

Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 10: Hypothesis Tests for Two Means: Related & Independent Samples.

Chapter 3 Experiments with a Single Factor: The Analysis of Variance

Intro to Statistics for the Behavioral Sciences PSYC 1900

Intro to Statistics for the Behavioral Sciences PSYC 1900

Lecture 9: One Way ANOVA Between Subjects

Two Groups Too Many? Try Analysis of Variance (ANOVA)

One-way Between Groups Analysis of Variance

Today Concepts underlying inferential statistics

Intro to Parametric Statistics, Assumptions & Degrees of Freedom Some terms we will need Normal Distributions Degrees of freedom Z-values of individual.

One-Way ANOVA Independent Samples. Basic Design Grouping variable with 2 or more levels Continuous dependent/criterion variable H  :  1 =  2 =... =

Chapter 14 Inferential Data Analysis

6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.

Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.

Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.

Extension to ANOVA From t to F. Review Comparisons of samples involving t-tests are restricted to the two-sample domain Comparisons of samples involving.

T-test Mechanics. Z-score If we know the population mean and standard deviation, for any value of X we can compute a z-score Z-score tells us how far.

Intermediate Applied Statistics STAT 460

PROBABILITY & STATISTICAL INFERENCE LECTURE 6 MSc in Computing (Data Analytics)

Psy 524 Lecture 2 Andrew Ainsworth. More Review Hypothesis Testing and Inferential Statistics Making decisions about uncertain events The use of samples.

Stats Lunch: Day 7 One-Way ANOVA. Basic Steps of Calculating an ANOVA M = 3 M = 6 M = 10 Remember, there are 2 ways to estimate pop. variance in ANOVA:

Comparing Two Proportions

Which Test Do I Use? Statistics for Two Group Experiments The Chi Square Test The t Test Analyzing Multiple Groups and Factorial Experiments Analysis of.

Chapter 11 HYPOTHESIS TESTING USING THE ONE-WAY ANALYSIS OF VARIANCE.

One-sample In the previous cases we had one sample and were comparing its mean to a hypothesized population mean However in many situations we will use.

t(ea) for Two: Test between the Means of Different Groups When you want to know if there is a ‘difference’ between the two groups in the mean Use “t-test”.

Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.

Effect Size Estimation in Fixed Factors Between- Groups Anova.

Chapter 22: Comparing Two Proportions

Modern Approaches Effect Size

Psychology 301 Chapters & Differences Between Two Means Introduction to Analysis of Variance Multiple Comparisons.

Testing Hypotheses about Differences among Several Means.

Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide

For a mean Standard deviation changes to standard error now that we are dealing with a sampling distribution of the mean.

Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.

Jeopardy Hypothesis Testing t-test Basics t for Indep. Samples Related Samples t— Didn’t cover— Skip for now Ancient History $100 $200$200 $300 $500 $400.

Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.

Copyright © 2010 Pearson Education, Inc. Chapter 22 Comparing Two Proportions.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 22 Comparing Two Proportions.

1 G Lect 11a G Lecture 11a Example: Comparing variances ANOVA table ANOVA linear model ANOVA assumptions Data transformations Effect sizes.

ANOVA: Analysis of Variance.

Chapter 13 - ANOVA. ANOVA Be able to explain in general terms and using an example what a one-way ANOVA is (370). Know the purpose of the one-way ANOVA.

Analysis of Variance (One Factor). ANOVA Analysis of Variance Tests whether differences exist among population means categorized by only one factor or.

1 ANALYSIS OF VARIANCE (ANOVA) Heibatollah Baghi, and Mastee Badii.

Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: One-way ANOVA Marshall University Genomics Core.

Chapter 13 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Chapter 13: Multiple Comparisons Experimentwise Alpha (α EW ) –The probability.

DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.

Kin 304 Inferential Statistics Probability Level for Acceptance Type I and II Errors One and Two-Tailed tests Critical value of the test statistic “Statistics.

Chapter 13 Understanding research results: statistical inference.

Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.

ANOVA and Multiple Comparison Tests

Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.

24 IVMultiple Comparisons A.Contrast Among Population Means (  i ) 1. A contrast among population means is a difference among the means with appropriate.

Dependent-Samples t-Test

Comparing Two Proportions

Kin 304 Inferential Statistics

Analysis of Variance (ANOVA)

Comparing Two Proportions

I. Statistical Tests: Why do we use them? What do they involve?

Presentation transcript:

Effect Size Estimation in Fixed Factors Between-Groups ANOVA

Contrast Review Given a design with a single factor A with 3 or more levels (conditions) The omnibus comparison concerns all levels (i.e., dfA > 2) A focused comparison or contrast concerns just two levels (i.e.,df = 1) The omnibus effect is often relatively uninteresting compared with specific contrasts (e.g., treatment 1 vs. placebo control) A large omnibus effect can also be misleading if due to a single discrepant mean that is not of substantive interest

Comparing Groups Traditional approach is to analyze the omnibus effect followed by analysis of all possible pairwise contrasts (i.e. compare each condition to every other condition) However, this approach is typically incorrect (Wilkinson & TFSI,1999)—for example, it is rare that all such contrasts are interesting Also, use of traditional methods for post hoc comparisons (e.g. Newman-Keuls) reduces power for every contrast, and power may already be low

Contrast specification and tests A contrast is a directional effect that corresponds to a particular facet of the omnibus effect In a sample, a contrast is calculated as: a1, a2, ... , aj is the set of weights that specifies the contrast As we have mentioned Contrast weights must sum to zero and weights for at least two different means should not equal zero Means assigned a weight of zero are excluded from the contrast Means with positive weights are compared with means given negative weights

Contrast specification and tests For effect size estimation with the d family, we generally want a standard set of contrast weights that will better allow comparison across study In a one-way design, the sum of the absolute values of the weights in a standard set equals two (i.e., ∑ |aj| = 2) E.g. 4 groups comparing 1 and 2 vs. 3 and 4 Use weights of .5 .5 -.5 -.5 Mean difference scaling permits the interpretation of a contrast as the difference between the averages of two subsets of means

Contrast specification and tests An exception to the need for mean difference scaling is for trends (polynomials) specified for a quantitative factor (e.g., drug dosage) There are default sets of weights that define trend components (e.g. linear, quadratic, etc.) that are not typically based on mean difference scaling Not usually a problem because effect size for trends is generally estimated with the r family (measures of association) Measures of association for contrasts of any kind generally correct for the scale of the contrast weights

Orthogonal Contrasts Two contrasts are orthogonal if they each reflect an independent aspect of the omnibus effect For balanced designs and unbalanced designs (latter)

Orthogonal Contrasts Recall that for a set of all possible orthogonal pairwise contrasts, the SSA = the total SS from the contrasts, and their eta-squares will sum to the SSA eta-square That is, the omnibus effect can be broken down into a − 1 independent directional effects The maximum number of orthogonal contrasts is one less than the number of groups dfA = a − 1 However, it is more important to analyze contrasts of substantive interest even if they are not orthogonal

Contrast specification and tests t-test for a contrast against the nil hypothesis The F is

Dependent Means Test statistics for dependent mean contrasts usually have error terms based on only the two conditions compared—for example: s2 here refers to the variance of the contrast difference scores This error term does not assume sphericity

Confidence Intervals Approximate confidence intervals for contrasts are generally fine The general form of an individual confidence interval for Ψ is: dferror is specific to that contrast

Contrast specification and tests There are also corrected confidence intervals for contrasts that adjust for multiple comparisons (i.e., inflated Type I error) Known as simultaneous or joint confidence intervals Their widths are generally wider compared with individual confidence intervals because they are based on a more conservative critical value Examples in R using the MBESS package1 ci.c(means=c(2, 4, 9, 13), error.variance=1, c.weights=c(1, -1, -1, 1), n=c(3, 3, 3, 3), N=12, conf.level=.95) ci.c(means=c(94, 91, 92, 83), error.variance=67.375, c.weights=c(1, -1, 0, 0), n=c(4, 6, 5, 5), N=20, conf.level=.95) 1. If equal n for cell sizes as in the first example, one could have just done n =3 and left it at that

Standardized contrasts The general form for standardized contrasts (in terms of population parameters)

Standardized contrasts There are three general ways to estimate σ (i.e., the standardizer) for contrasts between independent means: 1. Calculate d as Glass’s Δ i.e., use the standard deviation of the control/reference group 2. Calculate d as Hedge’s g i.e., use the square root of the pooled within-conditions variance for just the two groups being compared 3. Calculate d as an extension of g Where the standardizer is the square root of MSW based on all groups Assumes we have met homogeneity of variance assumption Generally recommended

Standardized contrasts Calculate from a d from a tcontrast for a paper not reporting effect size like they should If they report an F instead, which is very common, simply take it’s square root to get the t Recall the weights should sum to 2 CIs Once the d is calculated one can easily obtain exact confidence intervals via the MBESS package in R as you have done in lab

Cohen’s f Cohen’s f1 provides what can interpreted as the average standardized mean difference across the groups in question It has a direct relation to a measure of association As with Cohen’s d, there are guidelines regarding Cohen’s f .10, .25, .40 for small, moderate and large effect sizes These correspond to eta-square values of: .01, .06, .14 Again though, one should conduct the relevant literature for effect size estimation 1. You don’t see f too often, but as an example, it’s what the popular power analysis program G*power uses

Measures of Association A measure of association describes the amount of the covariation between the independent and dependent variables It is expressed in an unsquared metric or a squared metric—the former is a correlation or multiple correlation if more than one predictor, the latter a variance-accounted-for effect size A squared multiple correlation (R2) calculated in ANOVA is also called the correlation ratio or estimated eta-squared (2)

Eta-squared A measure of the degree to which variability among observations can be attributed to conditions Example: 2 = .50 50% of the variability seen in the scores is due to the independent variable

More than One factor It is a fairly common practice to calculate eta2 (correlation ratio) for the omnibus effect but to calculate the partial correlation ratio for each contrast As we have noted before 1. SPSS calls everything partial eta-squared in it’s output, but for a one-way design you’d report it as eta-squared since no other factors’ effects are available to partial out.

Problem Eta-squared (since it is R-squared) is an upwardly biased measure of association (just like R-squared was) As such it is better used descriptively than inferentially

Omega-squared ω2 is another effect size measure that is less biased and interpreted in the same way as eta-squared It is our adjusted R2 for the ANOVA setting So why do we not see omega-squared so much? People don’t like small values Stat packages don’t provide it by default

Omega-squared Put differently

Omega-squared Assumes a balanced design eta2 does not assume a balanced design When unbalanced perhaps stick with eta or maybe use the harmonic mean in the kn part in the previous formula Though the omega values are generally lower than those of the corresponding correlation ratios for the same data, their values converge as the sample size increases Note that the values can be negative—if so, interpret as though the value were zero

Comparing effect size measures Consider our previous example with item difficulty and arousal regarding performance

Comparing effect size measures 2 ω2 Partial 2 f B/t groups .67 .59 1.42 Difficulty .33 .32 .50 .71 Arousal .17 .14 .45 Interaction Slight differences due to rounding, f based on eta-squared. Given the balanced design, when looking at specific effects eta-squared serve as the more appropriate semi-partial correlation squared.

No p-values As before, programs are available to calculate confidence intervals for an effect size measure Example using the MBESS package for the overall effect 95% CI on ω2: .20 to .69

No p-values Ask yourself as we have before, if the null hypothesis is true, what would our effect size be (standardized mean difference or proportion of variance accounted for)? Rather than do traditional hypothesis testing, one can simply see if our CI for the effect size contains the value of zero (or, in eta-squared case, gets really close) If not, reject H0 This is superior in that we can use the NHST approach, get a confidence interval reflecting the precision of our estimates, focus on effect size, and de-emphasize the p-value