Download presentation
Presentation is loading. Please wait.
Published byJoseph Owens Modified over 9 years ago
1
Power Winnifred Louis 15 July 2009
2
Overview of Workshop Review of the concept of power Review of antecedents of power Review of power analyses and effect size calculations DL and discussion of write-up guide Intro to G-Power3 Examples of GPower3 usage
3
3 Power Comes down to a “limitation” of Null hypothesis testing approach and concern with decision errors Recall: Significant differences are defined with reference to a criterion, (controlled/acceptable rate) for committing type-1 errors, typically.05 Significant differences are defined with reference to a criterion, (controlled/acceptable rate) for committing type-1 errors, typically.05 the type-1 error finding a significant difference in the sample when it actually doesn’t exist in the populationthe type-1 error finding a significant difference in the sample when it actually doesn’t exist in the population type-1 error rate denoted type-1 error rate denoted However relatively little attention has been paid to the type-2 error the type-2 error finding no significant difference in the sample when there is a difference in the populationthe type-2 error finding no significant difference in the sample when there is a difference in the population type-2 error rate denoted type-2 error rate denoted
4
4 Reality vs Statistical Decisions Hit (correct decision) 1- α Reality: H0H1 Statistical Decision: Reject H0 Retain H0
5
5 Reality vs Statistical Decisions “False alarm” α (aka Type 1 error) Reality: H0H1 Statistical Decision: Reject H0 Retain H0
6
6 Reality vs Statistical Decisions “Miss”β (aka Type 2 error) Reality: H0H1 Statistical Decision: Reject H0 Retain H0
7
7 Reality vs Statistical Decisions Hit (correct decision) 1 - β Power Reality: H0H1 Statistical Decision: Reject H0 Retain H0
8
8 Reality vs Statistical Decisions “False alarm” α (aka Type 1 error) Hit (correct decision) 1 - β Power Hit (correct decision) 1- α “Miss”β (aka Type 2 error) Reality: H0H1 Statistical Decision: Reject H0 Retain H0
9
power is: the probability of correctly rejecting a false null hypothesis the probability of correctly rejecting a false null hypothesis the probability that the study will yield significant results if the research hypothesis is true the probability that the study will yield significant results if the research hypothesis is true the probability of correctly identifying a true alternative hypothesis the probability of correctly identifying a true alternative hypothesis power
10
sampling distributions the distribution of a statistic that we would expect if we drew an infinite number of samples (of a given size) from the population sampling distributions have means and SDs can have a sampling distribution for any statistic, but the most common is the sampling distribution of the mean
11
H 0 : 1 = 2 =.025 Recall: Estimating pop means from sample means Here – Null hyp is true so if our test tells us - our sample of differences between means falls into the shaded areas, we reject the null hypothesis. But, 5% of the time, we will do so incorrectly. (type I error) (type I error)
12
H 0 : 1 = 2 =.025 H 1 : 1 2 =.025 Here – Null hyp is false 11 22
13
H 0 : 1 = 2 =.025 H 1 : 1 2 =.025 to the right of this line we reject the null hypothesis POWER : 1 - Reject H 0 Don’t Reject H 0
14
H 0 : 1 = 2 H 1 : 1 2 Correct decision: Rejection of H 0 1 - POWER type 1 error ( ) type 2 error ( ) Correct decision: Acceptance of H 0 1 -
15
factors that influence power 1. level remember the level defines the probability of making a Type I error the level is typically.05 but the level might change depending on how worried the experimenter is about type I and type II errors the bigger the the more powerful the test (but the greater the risk of erroneously saying there’s an effect when there’s not... type I error) E.g., use one-tail test
16
H 0 : 1 = 2 =.025 (type I error) factors that influence power: level factors that influence power: level
17
H 0 : 1 = 2 =.025 H 1 : 1 2 =.025 factors that influence power: level POWER
18
H 0 : 1 = 2 =.025 H 1 : 1 2 =.025 factors that influence power: level =.05
19
2. the size of the effect (d) the effect size is not something the experimenter can (usually) control - it represents how big the effect is in reality (the size of the relationship between the IV and the DV) Independent of N (population level) it stands to reason that with big effects you’re going to have more power than with small, subtle effects factors that influence power
20
H 0 : 1 = 2 =.025 H 1 : 1 2 =.025 factors that influence power: d
21
H 0 : 1 = 2 =.025 H 1 : 1 2 =.025 factors that influence power: d
22
3. sample size (N) the bigger your sample size, the more power you have large sample size allows small effects to emerge or … big samples can act as a magnifying glass that detects small effects or … big samples can act as a magnifying glass that detects small effects factors that influence power
23
3. sample size (N) you can see this when you look closely at formulas the standard error of the mean tells us how much on average we’d expect a sample mean to differ from a population mean just by chance. The bigger the N the smaller the standard error and … smaller standard errors = bigger z scores factors that influence power Std err
24
4. smaller variance of scores in the population ( 2 ) small standard errors lead to more power. N is one thing that affects your standard error the other thing is the variance of the population ( 2 ) basically, the smaller the variance (spread) in scores the smaller your standard error is going to be factors that influence power
25
H 0 : 1 = 2 =.025 H 1 : 1 2 =.025 factors that influence power: N & 2
26
H 0 : 1 = 2 =.025 H 1 : 1 2 =.025 factors that influence power: N & 2
27
outcomes of interest power determination N determination , effect size, N, and power related
28
Effect sizes Measures of group differences Cohen’s d (t-test) Cohen’s d (t-test) Cohen’s f (ANOVA) Cohen’s f (ANOVA) Measures of association Partial eta-squared ( p 2 ) Partial eta-squared ( p 2 ) Eta-squared ( 2 ) Eta-squared ( 2 ) Omega-squared ( 2 ) Omega-squared ( 2 ) R-squared (R 2 ) R-squared (R 2 ) Classic 1988 text In the library
29
Measures of difference - d When there are only two groups d is the standardised difference between the two groups to calculate an effect size (d) you need to calculate the difference you expect to find between means and divide it by the expected standard deviation of the population conceptually, this tells us how many SD’s apart we expect the populations (null and alternative) to be
30
Cohen’s conventions for d
31
H 0 : 1 = 2 H 1 : 1 2 overlap of distributions MediumSmallLarge
32
Eta squared is the proportion of the total variance in the DV that is attributed to an effect. Partial eta-squared is the proportion of the leftover variance in the DV (after all other IVs are accounted for) that is attributable to the effect This is what SPSS gives you but dodgy (over estimates the effect) This is what SPSS gives you but dodgy (over estimates the effect) Measures of association - Eta-Squared
33
Omega-squared is an estimate of the dependent variable population variability accounted for by the independent variable. For a one-way between groups design: p=number of levels of the treatment variable, F = value and n= the number of participants per treatment level Measures of association - Omega-squared 2 = SS effect – (df effect )MS error SS total + Ms error
34
Cohen’s (1988) f for the one-way between groups analysis of variance can be calculated as follows Or can use eta sq instead of omega It is an averaged standardised difference between the 3 or more levels of the IV (even though the above formula doesn’t look like that) Small effect - f=0.10; Medium effect - f=0.25; Large effect - f=0.40 Measures of difference - f
35
Measures of association - R- Squared R 2 is the proportion of variance explained by the model In general R 2 is given by Can be converted to effect size f 2 F 2 = R 2 /(1- R 2 ) Small effect – f 2 =0.02; Medium effect - f 2 =0.15; Large effect - f 2 =0.35
36
Summary of effect conventions From G*Power http://www.psycho.uni- duesseldorf.de/aap/projects/gpower/user_manual/user_manual_02.html#input_val http://www.psycho.uni- duesseldorf.de/aap/projects/gpower/user_manual/user_manual_02.html#input_val http://www.psycho.uni- duesseldorf.de/aap/projects/gpower/user_manual/user_manual_02.html#input_val
37
estimating effect prior literature assessment of how great a difference is important e.g., effect on reading ability only worth the trouble if at least increases half a SD e.g., effect on reading ability only worth the trouble if at least increases half a SD special conventions
38
38 side issues… recall the logic of calculating estimates of effect size (i.e., criticisms of significance testing) the tradition of significance testing is based upon an arbitrary rule leading to a yes/no decision the tradition of significance testing is based upon an arbitrary rule leading to a yes/no decision power illustrates further some of the caveats with significance testing with a high N you will have enough power to detect a very small effect with a high N you will have enough power to detect a very small effect if you cannot keep error variance low a large effect may still be non-significant if you cannot keep error variance low a large effect may still be non-significant
39
39 side issues… on the other hand… sometimes very small effects are important sometimes very small effects are important by employing strategies to increase power you have a better chance at detecting these small effects by employing strategies to increase power you have a better chance at detecting these small effects
40
40 power Common constraints : Cell size too small Cell size too small B/c sample difficult to recruit or too little time / moneyB/c sample difficult to recruit or too little time / money Small effects are often a focus of theoretical interest (especially in social / clinical / org) Small effects are often a focus of theoretical interest (especially in social / clinical / org) DV is subject to multiple influences, so each IV has small impactDV is subject to multiple influences, so each IV has small impact “Error” or residual variance is large, because many IVs unmeasured in experiment / survey are influencing DV“Error” or residual variance is large, because many IVs unmeasured in experiment / survey are influencing DV Interactions are of interest, and interactions draw on smaller cell sizes (and thus lower power) than tests of main effects [Cell means for interaction are based on n observations, while main effects are based on n x # of levels of other factors collapsed across]Interactions are of interest, and interactions draw on smaller cell sizes (and thus lower power) than tests of main effects [Cell means for interaction are based on n observations, while main effects are based on n x # of levels of other factors collapsed across]
41
determining power sometimes, for practical reasons, it’s useful to try to calculate the power of your experiment before conducting it if the power is very low, then there’s no point in conducting the experiment. basically, you want to make sure you have a reasonable shot at getting an effect (if one exists!) which is why grant reviewers want them
42
Post hoc power calculations Generally useless / difficult to interpret from the point of view of stats Mandated within some fields Examples of post hoc power write-ups online at http://www.psy.uq.edu.au/~wlouis
43
G*POWER G*POWER is a FREE program that can make the calculations a lot easier http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/ Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175-191. G*Power computes: power values for given sample sizes, effect sizes, and alpha levels (post hoc power analyses), sample sizes for given effect sizes, alpha levels, and power values (a priori power analyses) suitable for most fundamental statistical methods Note – some tests assume equal variance across groups and assumes using pop SD (which are likely to be est from sample)
44
Ok, lets do it: BS t-test two random samples of n = 25 expect difference between means of 5 two-tailed test, =.05 – 1 = 5 – 2 = 10 – = 10
45
G*POWER
46
So, with that expected effect size and n we get power = ~.41 We have a probability of correctly rejecting null hyp (if false) 41% of the time Is this good enough? convention dictates that researchers should be entering into an experiment with no less than 80% chance of getting an effect (presuming it exists) ~ power at least.80 determining N
47
Determine n Calculate effect size Use power of.80 (convention)
48
WS t-test Within subjects designs more powerful than between subjects (control for individual differences) WS t-test not very difficult in G*Power, but becomes trickier in ANOVA Need to know correlation between timepoints (luckily SPSS paired t gives this) Or can use the mean and SD of “difference” scores (also in SPSS output)
49
s Screen clipping taken: 7/8/2008, 4:30 PM Method 1 Difference scores
50
Dz = Mean Diff/ SD diff =.0167/.0718 =.233
51
s Screen clipping taken: 7/8/2008, 4:30 PM
53
WS t-test I said before that WS are more powerful than the equivalent BS version Let’s test this by using the same means and SDs and using the Independent Samples t-test calculator in GPower
54
Screen clipping taken: 7/8/2008, 4:30 PM
55
Screen clipping taken: 7/8/2008, 4:30 PM Between subjects Power =.18 Within subjects Power =.07
56
56 Extension to 1-way anova… In PSYC3010 you used Phi prime as the ANOVA equivalent of d which is the same as Cohen’s f G*Power uses Cohen’s f Numerous methods 1) calculate Omega sq and then use the formula for f and enter directly 2) Calculate Omega sq or eta sq and enter into “Direct” under “Effect size from variances” 3) Use means and use “Effect size from means”
57
Calculating omega & f Given the above analysis So
59
Not sure if this works with SPSS partial eta sq – have had problems before & Omega more conservative anyway
60
60 Alternatively Alternatively, if have means (note – this is a different data set) meanDV score n Coffee63.7516 Energy Drink 64.6916 Water46.5616 MS error = 125.21 =58.33 N=48 MS error = 125.21 =58.33 N=48 use square root of MSE to enter into SD within each group in GPOwer
61
61
62
62 how about 2-way factorial anova? Need to test for 3 effects to estimate the power: Main effect IV 1 Main effect IV 2 Interaction effect (usually less power than main effects due to smaller n in each cell) See http://www.psycho.uni- duesseldorf.de/aap/projects/gpower/reference/reference_manual_07.html http://www.psycho.uni- duesseldorf.de/aap/projects/gpower/reference/reference_manual_07.html http://www.psycho.uni- duesseldorf.de/aap/projects/gpower/reference/reference_manual_07.html
63
Within subjects ANOVA Not only need to know effect size but also correlation across time/vars Use a convention for estimating effect size (G*Power uses either Lambda or Cohen’s f) Use a convention for estimating effect size (G*Power uses either Lambda or Cohen’s f) Calculate f using number of levels, effect convention, correlation (e.g., test-retest) Calculate f using number of levels, effect convention, correlation (e.g., test-retest) Calculate Lambda (f * N) Calculate Lambda (f * N) Use Generic F test Use Generic F test
64
Within Example 3 levels over time (m) 64 Participants (n) Look for small effect (f =.01) Test-retest corr =.79 (p) Calc f = (m*f)/(1-p) = (3*.01)/(1-.79) =.143 Calc Lambda = f*n =.143*64 = 9.152 DF 1 = m- 1 = 2 DF 2 = n*(m-1) = 128
65
Note. Can’t do a priori. If need to estimate upfront play with denominator DF (based on N)
66
Within Example Refer to Karl Wuensch’s website for more details re: RM http://core.ecu.edu/psyc/wuenschk/StatsLe ssons.htm http://core.ecu.edu/psyc/wuenschk/StatsLe ssons.htm http://core.ecu.edu/psyc/wuenschk/StatsLe ssons.htm And Gpower manuals online – e.g.: http://www.psycho.uni- duesseldorf.de/abteilungen/aap/gpower3/user- guide-type_of_power_analysis http://www.psycho.uni- duesseldorf.de/abteilungen/aap/gpower3/user- guide-type_of_power_analysis http://www.psycho.uni- duesseldorf.de/abteilungen/aap/gpower3/user- guide-type_of_power_analysis
67
Regression analyses Effect size associated with R 2 f 2 = R 2 /1-R 2 f 2 = R 2 /1-R 2 For semipartial f 2 = sr 2 /1-R 2 full f 2 = sr 2 /1-R 2 full f 2 =.02 (small) f 2 =.15 (medium) f 2 =.35 (large) Convert to variance acct f 2 /(1+ f 2 )
68
R2R2R2R2 3 predictor variables R 2 for full model =.22 f 2 =.22/(1-.22) =.282 N = 110
70
Change R 2 (HMR) 2 steps, 2 predictors in step 1, 3 in step 2 R 2 for full model =.10 Change R 2 for step 2 =.04 f 2 = R 2 change /(1-R 2 full ) f 2 =.04/(1-.1) =.0444 N = 95 DF numerator for Step 2= 3
72
Complex analyses G*POWER useful for basic analyses Complex analyses e.g., SEM, MLM etc usually look to monte carlo studies
73
Additional Resources http://www.danielsoper.com/statcalc/ http://www.danielsoper.com/statcalc/ Some other statistical calculators including for power Some other statistical calculators including for power
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.