Download presentation
Presentation is loading. Please wait.
Published byLouisa Golden Modified over 9 years ago
1
Handout Six: Sample Size, Effect Size, Power, and Assumptions of ANOVA EPSE 592 Experimental Designs and Analysis in Educational Research Instructor: Dr. Amery Wu Handout Six: Sample Size, Effect Size, Power, and Assumptions of ANOVA EPSE 592 Experimental Designs and Analysis in Educational Research Instructor: Dr. Amery Wu 1
2
2 We have learned how 1.ANOVA partitions the score variation attributable to the factor and to the sampling error. 2.we can use the F ratio to hypothesis test whether the variation due to factor (group differences) is true in the population. 3. we can conduct multiple comparisons. we can plan our paired comparisons ahead of time or use a data-driven post hoc approach to detect the mean differences. For these apparatuses to work appropriately or to work to their best effect, some logistics must be considered and checked. Where We have Been with One-way ANOVA
3
3 Factors Influencing the Appropriateness & Effectiveness of ANOVA Sample Size Effect Size Power Data Assumptions
4
4 The power of a statistical test is the probability that the test will reject the null hypothesis when the null hypothesis is false. If we define “false null” as the “CASE” (what we would like to detect), power is how sensitive a statistical test is to detect a true CASE, i.e., the true “positive” rate. It is also called sensitivity. Retain (0)Reject (1) Null is True (0)SpecificityType I Error Null is False (1)Type II ErrorPower (Sensitivity) Statistical Power
5
5 The power is, in general, a function of the size of the population parameter (population effect size), sample size, and the alpha level. Power (π) = F (Δ, N, α). Note. Δ, delta, denotes population effect size. Let’s take one sample t test for example, if the null is false (CASE) and the alpha is fixed (say set to = 0.05), the greater the test statistic t is, the more power we would have to reject the null. Factors Influencing Power Looking at the right side of the equation, we can see that the greater the numerator (effect size Δ, i.e., mean difference), the greater the t (hence, the power) will be. Also, the smaller the denominator (sampling error, i.e., standard error) is, the greater the t (hence, the power) will be. That is, the greater the sample size N is, the greater the t (hence, the power) will be.
6
Issues with the Method of Hypothesis Testing Sample Size When the sample size is small, a true mean difference (M 1 -M 2 ) could be undetected and the t test would fail to reject the H 0. On the contrary, an insignificantly trivial mean difference could be detected, when the sample size is large. To address this issue, researchers are recommended to alternatively report the magnitude of the difference in effect size measures, instead of simply the p value of the hypothesis. 6
7
Observed Effect Size Effect size is a measure of the magnitude of (how big) the effect (group difference ) is. Effect size is a standardized measure because it transforms the magnitude of difference from the raw score scale to a scale of 0-1. Thus, differences found in studies of the same DV but measured in different raw scales can all be compared because they are all on the same scale. A variety of effect size measures have been suggested. It could be either on the unit scale (e.g., Cohen’s d) or the squared unit scale (e.g., eta squared). The population effect size is unknown. Most effect sizes we compute are “observed,” meaning they are computed based on the sample data. 7
8
8 Cohen’s d is the most commonly reported effect size, which is expressed on the standard unit scale. Cohen's d = M 1 - M 2 / s pooled, where s pooled = [(SD 1 + SD 2 ) / 2] (when sample sizes are equal across groups). SPSS drop-down menu does not provide this measure. Lab Activity: Using the information in the table below, calculate the observed effect sizes for all paired comparisons. Effect Size- Cohen’s d
9
9 Eta squared is expressed in the squared unit scale. It is calculated as SS between groups /SS total Effect Size- Eta Squared Lab Activity: Using the information in the ANOVA table below, compute the eta square for the factor “dose”. SPSS provides eta squared for the factor as a whole (all levels combined) as well as paired comparisons. See the slide Computing Retrospective Power Using SPSS.
10
10 Prospective vs. Retrospective Power When the power of a statistical test is required or computed before the data is collected, it is referred to as prospective (a priori) power. When the power is computed after the data have been collected and a statistical decision has been made, it is referred to as retrospective (observed or post hoc) power (Zumbo & Hubley, 1998) A prospective power function is typically used to estimate the sufficient sample size in order to achieve a certain required level of power in the design phase prior to the data collection. Most granting and other funding agencies will not consider funding a proposal without evidence of prospective power and its required sample size. Retrospective is computed based on the “given” data. Post-hoc power analysis uses the obtained sample size and sample effect size to determine what the power was in the study. Assuming the effect size in the sample is equal to the effect size in the population. Journal editors, reviewers and readers are also concerned with the retrospective power of the statistical tests reported when evaluating an existing study. Retrospective power is not the same as retrospective power.
11
11 Lab Activity: Estimating Prospective Power & Sample Size Dr. Karl L. Wuensch is studying the effect of the dose of a new drug on patients’ depression. He plans to have three levels for the treatment, (control, 10mg, & 20mg) that will be randomly prescribed to 3 groups of patients. His hypothesis is that drug dose has a “medium” population effect on depression. He referred to “medium” using Cohen’s d guidelines as follows: Small effect: from 0.2 to 0.3 Medium effect: around 0.5 Large effect: ≥0.8 He would like to know how many patients he has to recruit before he collects data in order to achieve a certain level of power. Lab Activity: Go to the ANOVA power calculation site at http://www.math.yorku.ca/SCS/Online/power/http://www.math.yorku.ca/SCS/Online/power/ Note that this site uses Cohen’s d as an effect size measure. Help Dr. Wuensch determine how many patients he should recruit.
12
12 Lab Activity: Computing Retrospective Power Using SPSS Dr. Karl L. Wuensch ended up recruiting 20 patients for each group because of the constraints of time, budgets, and difficulty to recruiting voluntary patents. His data was complied in the SPSS file called dose.sav Lab Activity: Use SPSS to compute the retrospective (observed or post hoc power) for Dr. Wuensch’s study.
13
13 Lab Activity: Computing Retrospective Power- SPSS Output Questions: Why does the first comparison [Dose=1] have a much higher observed power than the second comparison [Dose=2]? Note that SPSS does not, by default, report Cohen’s d. Instead, the effect size reported by SPSS is (partial) eta squared. [Dose=1]: comparing the first group (control) to the last group (20mg) [Dose=2]: comparing the second group (10 mg) to the last group (20mg)
14
One-way between subjects analysis makes three assumptions about the data. For this method to work well, the data should meet the assumptions “reasonably” well. 1.The observations (scores across the individuals) are independent of one another. 2. The data in each group should follow the normal distribution. 3. The variances are equal between the groups (homogeneity). ANOVA Assumptions One-way Between Subjects Analysis 14
15
15 Observations (scores across the individuals) are independent of one another. The observation of the score of one individual is not influenced by that of another. This assumption should be checked by examining how the individual scores are collected. For example, if some of the scores are relatively more similar to the others because of the time/location when they are collected, then the assumption is violated. A typical example of violation of independent observations is that individual students’ academic achievement data are collected through sampling the schools. Students’ scores are influenced by the fact that they share the same teachers and/or principle and the same school climate. Students’ achievement may tend to be relatively higher/lower in one school than the other. If this assumptions is violated, a random effects ANOVA should be used. Checking Assumption -Independent Observations
16
16 Checking Assumption -Normal Distribution The data are sampled from a normally distributed population hence should be fairly normal. The data in each group should follow the normal distribution. if this assumption is violated, use non-parametric test statistics. This assumption could be checked by the skewness, histogram, boxplot, QQ plots, etc. Use SPSS “Analyze” > “Descriptive” > “Explore” commands for these plots.
17
17 Checking Assumption -Equal Variances (homogeneity) The group variances are equal. If violated, the power is reduced, but the Type-one error rate is still robust. One can check this assumption by eyeball comparing SDs or variances for the groups. Alternatively, one can use Levene's homogeneity test; a non-significant result indicates the variances are all equal. This test could be to too sensitive if the sample size is large. Levene’s test can be found in SPSS.
18
18 When the equal variances assumption is violated, the power is reduced, but the Type-I error rate is still robust. Thus, one can increase sample size or use paired t-tests that do not assume equal variances. Checking Assumption -Equal Variances (homogeneity)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.