Chapter 9 Power
Decisions A null hypothesis significance test tells us the probability of obtaining our results when the null hypothesis is true p(Results|H o is True) If that probability is small, smaller than our significance level (α), it is probable that H O is not true and we reject it
Errors in Hypothesis Testing Sometimes we make the correct decision regarding H O Sometimes we make mistakes when conducting hypothesis tests –Remember: we are talking about probability theory –Less than a.05 chance, doesn’t mean “no chance at all”
Errors in Hypothesis Testing
Type 1 Errors The null hypothesis is correct (in reality) but we have rejected it in favor of the alternative hypothesis The probability of making a Type 1 error is equal to α, the significance level we have selected –α - the probability of rejecting a null hypothesis when it is true
Type 2 Errors The null hypothesis is incorrect, but we have failed to reject it in favor of the alternative hypothesis The probability of a type 2 error is signified by β, and the “power” of a statistical test is 1 - β –Power (1- β) - the probability of rejecting a null hypothesis when it is false
More on α and β
Relation between α and β Although they are related, the relation is complex –If α =.05, the probability of making a correct decision when the null hypothesis is true is 1 – α =.95 What if the null hypothesis is not true? –The probability of rejecting the null when it is not true is 1 - β
Relation between α and β In general, we do not set β, but it is a direct outcome of our experiment and can be determined (we can estimate β by designing our experiment properly) β is generally greater than α One way to decrease β is by increasing α But, we don’t want to do that. Why, you ask?
α and β reconsidered Minimize chances of finding an innocent man guilty vs. finding a guilty man innocent Likewise, we should reduce the likelihood of finding an effect when there isn’t one (making a type 1 error - reject H O when H O is true), vs. decreasing the likelihood of missing an effect when there is one (making a type 2 error - not rejecting H O when H O is false)
Power? The probability of rejecting a false null hypothesis The probability of making a correct decision (one type of) Addresses the type 2 error: “Not finding any evidence of an effect when one is there”
More (on) Power While most focus on type 1 errors, you can’t be naïve (anymore) to type 2 errors, as well Thus, power analyses are becoming the norm in psychological statistics (or they should be)
Hypothesis testing & Power Sampling distribution of the sample mean, when H O is true μ specified in H O
H O : μ =0 0 Our sample mean M
H O : μ=0 0 Our sample mean The probability of obtaining our sample mean (or less) given that the null hypothesis is true M
H O : μ=0 0 Our sample mean We reject the null that our sample came from the distribution specified by H O, because if it were true, our sample mean would be highly improbable, M
H O : μ=0 0 Our sample mean Improbable means “ not likely ” but not “ impossible ”, so the probability that we made an error and rejected H O when it was true is this area OOPS! M
H O : μ=0 0 Our sample mean This area is our “ p-value ” and as long as it is less than α, we reject H O M
H O : μ=0 0 As a reminder and a little “ visual ” help, α defines the critical value and the rejection region Critical Value Rejection Region
H O : μ=0 0 Critical Value Rejection Region Any sample mean that falls within the rejection region ( the critical value(s)), we will reject H O
Let ’ s say, though, that our sample mean is really from a different distribution than specified by H O, one that ’ s consistent with H A Rejection Region
We assume that this second sampling distribution consistent with H A, is normally distributed around our sample mean Rejection Region Our M
If H O is false, the probability of rejecting then, is the area under the second distribution that ’ s part of the rejection region Rejection Region
Namely, this area Rejection Region
And, we all know the probability of rejecting a false H O is POWER Rejection Region POWER
Rejection Region POWER 1-β β
Rejection Region 1-α α
Factors that influence power: α Rejection Region POWER
Rejection Region Factors that influence power: variability Power
Rejection Region Factors that influence power: sample size Power
Rejection Region Power Factors that influence power: effect size (this difference is increased)
Factors that Influence Power α - significance level (the probability of making a type 1 error)
Parametric Statistical Tests Parametric statistical tests, those that test hypotheses about specific population parameters, are generally more powerful than corresponding non-parametric tests Therefore, parametric tests are preferred to non-parametric tests, when possible
Variability Measure more accurately Design a better experiment Standardize procedures for acquiring data Use a dependent-sample
Directional Alternative Hypothesis A directional H A specifies which tail of the distribution is of interest (e.g., H A is specified as some value rather than “different than” or ≠ )
Increasing Sample Size (n) σ M, the standard error of the mean, decreases with increases in sample size
Increasing Sample size n=25, σ M = 2.0 n=400, σ M = 0.5 n=100, σ M = 1.0
Effect Size Effect size is directly related to power
Effect Size Effect size - measure of the magnitude of the effect of the intervention being studied Effect is related to the magnitude of the difference between a hypothesized mean (what we might think it is given the intervention) and the population mean (μ)
Cohen’s d.2 = small effect.5 = moderate effect.8 = large effect For each statistical test, separate formulae are needed to determine d, but When you do this, results are directly comparable regardless of the test used
Implications of Effect Size A study was conducted by Dr. Johnson on productivity in the workplace He compared Method A with Method B Using an n = 80, Johnson found that A was better than B at p <.05 (he rejected the null that A and B were identical, and accepted the directional alternative that A was better)
Implications (cont.) Dr. Sockloff, who invented Method B, disputed these claims and repeated the study Using an n = 20, Sockloff found no difference between A and B at p >.30 (he did not reject the null that A and B were equal)
How can this be? In both cases the effect size was determined to be.5 (the effectiveness of Method A was identical in both studies) However, Johnson could detect an effect because he had the POWER Sockloff had very low power, and did not detect an effect (he had a low probability of rejecting an incorrect null)
Power and Effect Size A desirable level of power is.80 (Cohen, 1965) Thus, β =.20 And, by setting an effect size (the magnitude of the smallest discrepancy that, if it exists, we would be reasonably sure of detecting) We can find an appropriate n (sample size)
Method for Determining Sample Size (n) A priori, or before the study Directional or Non-Directional? Set significance level, α What level of power do we want? Use table B to look up δ (“delta”) Determine effect size and use: n = (δ/d) 2
Example of Power Analysis α =.05 1-β =.80 look up in table B, δ = 2.5 d =.5 (moderate effect) n = (δ/d) 2 = (2.5/.5) 2 = 25 So, in order to detect a moderate effect (.5) with power of.80 and α of.05, we need 25 subjects in our study
***Main Point*** (impress your Research Methods prof) Good experimental design always utilizes power and effect size analyses prior to conducting the study
Inductive Leap The probability of obtaining a particular result assuming the null is true (p level) is equal to a measure of effect size times a measure of the size of the sample p = effect size × size of study Therefore, p (the probability of a type 1 error) is influenced by both the size of the effect and the size of the study Remember, if we want to reject the null, we want a small p (less than alpha)