Presentation is loading. Please wait.

Presentation is loading. Please wait.

Power Winnifred Louis 15 July 2009. Overview of Workshop   Review of the concept of power   Review of antecedents of power   Review of power analyses.

Similar presentations


Presentation on theme: "Power Winnifred Louis 15 July 2009. Overview of Workshop   Review of the concept of power   Review of antecedents of power   Review of power analyses."— Presentation transcript:

1 Power Winnifred Louis 15 July 2009

2 Overview of Workshop   Review of the concept of power   Review of antecedents of power   Review of power analyses and effect size calculations   DL and discussion of write-up guide   Intro to G-Power3   Examples of GPower3 usage

3 3 Power  Comes down to a “limitation” of Null hypothesis testing approach and concern with decision errors  Recall: Significant differences are defined with reference to a criterion, (controlled/acceptable rate) for committing type-1 errors, typically.05 Significant differences are defined with reference to a criterion, (controlled/acceptable rate) for committing type-1 errors, typically.05 the type-1 error  finding a significant difference in the sample when it actually doesn’t exist in the populationthe type-1 error  finding a significant difference in the sample when it actually doesn’t exist in the population type-1 error rate denoted type-1 error rate denoted   However relatively little attention has been paid to the type-2 error the type-2 error  finding no significant difference in the sample when there is a difference in the populationthe type-2 error  finding no significant difference in the sample when there is a difference in the population type-2 error rate denoted type-2 error rate denoted 

4 4 Reality vs Statistical Decisions Hit (correct decision) 1- α Reality: H0H1 Statistical Decision: Reject H0 Retain H0

5 5 Reality vs Statistical Decisions “False alarm” α (aka Type 1 error) Reality: H0H1 Statistical Decision: Reject H0 Retain H0

6 6 Reality vs Statistical Decisions “Miss”β (aka Type 2 error) Reality: H0H1 Statistical Decision: Reject H0 Retain H0

7 7 Reality vs Statistical Decisions Hit (correct decision) 1 - β Power Reality: H0H1 Statistical Decision: Reject H0 Retain H0

8 8 Reality vs Statistical Decisions “False alarm” α (aka Type 1 error) Hit (correct decision) 1 - β Power Hit (correct decision) 1- α “Miss”β (aka Type 2 error) Reality: H0H1 Statistical Decision: Reject H0 Retain H0

9  power is: the probability of correctly rejecting a false null hypothesis the probability of correctly rejecting a false null hypothesis the probability that the study will yield significant results if the research hypothesis is true the probability that the study will yield significant results if the research hypothesis is true the probability of correctly identifying a true alternative hypothesis the probability of correctly identifying a true alternative hypothesis power

10 sampling distributions  the distribution of a statistic that we would expect if we drew an infinite number of samples (of a given size) from the population  sampling distributions have means and SDs  can have a sampling distribution for any statistic, but the most common is the sampling distribution of the mean

11 H 0 :  1 =  2   =.025 Recall: Estimating pop means from sample means Here – Null hyp is true so if our test tells us - our sample of differences between means falls into the shaded areas, we reject the null hypothesis. But, 5% of the time, we will do so incorrectly. (type I error)  (type I error)

12 H 0 :  1 =  2   =.025 H 1 :  1   2   =.025 Here – Null hyp is false 11 22

13 H 0 :  1 =  2   =.025 H 1 :  1   2   =.025 to the right of this line we reject the null hypothesis POWER : 1 -  Reject H 0 Don’t Reject H 0

14 H 0 :  1 =  2 H 1 :  1   2 Correct decision: Rejection of H 0 1 -  POWER type 1 error (  ) type 2 error (  ) Correct decision: Acceptance of H 0 1 - 

15 factors that influence power 1.  level  remember the  level defines the probability of making a Type I error  the  level is typically.05 but the  level might change depending on how worried the experimenter is about type I and type II errors  the bigger the  the more powerful the test (but the greater the risk of erroneously saying there’s an effect when there’s not... type I error)  E.g., use one-tail test

16 H 0 :  1 =  2  =.025 (type I error) factors that influence power: level factors that influence power:  level

17 H 0 :  1 =  2  =.025 H 1 :  1   2  =.025 factors that influence power:  level POWER

18 H 0 :  1 =  2  =.025 H 1 :  1   2  =.025 factors that influence power:  level  =.05

19 2. the size of the effect (d)  the effect size is not something the experimenter can (usually) control - it represents how big the effect is in reality (the size of the relationship between the IV and the DV)  Independent of N (population level)  it stands to reason that with big effects you’re going to have more power than with small, subtle effects factors that influence power

20 H 0 :  1 =  2  =.025 H 1 :  1   2  =.025 factors that influence power: d

21 H 0 :  1 =  2  =.025 H 1 :  1   2  =.025 factors that influence power: d

22 3. sample size (N)  the bigger your sample size, the more power you have  large sample size allows small effects to emerge or … big samples can act as a magnifying glass that detects small effects or … big samples can act as a magnifying glass that detects small effects factors that influence power

23 3. sample size (N)  you can see this when you look closely at formulas  the standard error of the mean tells us how much on average we’d expect a sample mean to differ from a population mean just by chance. The bigger the N the smaller the standard error and … smaller standard errors = bigger z scores factors that influence power Std err

24 4. smaller variance of scores in the population (  2 )  small standard errors lead to more power. N is one thing that affects your standard error  the other thing is the variance of the population (  2 )  basically, the smaller the variance (spread) in scores the smaller your standard error is going to be factors that influence power

25 H 0 :  1 =  2  =.025 H 1 :  1   2  =.025 factors that influence power: N &  2

26 H 0 :  1 =  2  =.025 H 1 :  1   2  =.025 factors that influence power: N &  2

27 outcomes of interest  power determination  N determination , effect size, N, and power related

28 Effect sizes  Measures of group differences Cohen’s d (t-test) Cohen’s d (t-test) Cohen’s f (ANOVA) Cohen’s f (ANOVA)  Measures of association Partial eta-squared (  p 2 ) Partial eta-squared (  p 2 ) Eta-squared (  2 ) Eta-squared (  2 ) Omega-squared (  2 ) Omega-squared (  2 ) R-squared (R 2 ) R-squared (R 2 ) Classic 1988 text In the library

29 Measures of difference - d  When there are only two groups d is the standardised difference between the two groups  to calculate an effect size (d) you need to calculate the difference you expect to find between means and divide it by the expected standard deviation of the population  conceptually, this tells us how many SD’s apart we expect the populations (null and alternative) to be

30 Cohen’s conventions for d

31 H 0 :  1 =  2 H 1 :  1   2 overlap of distributions MediumSmallLarge

32  Eta squared is the proportion of the total variance in the DV that is attributed to an effect.  Partial eta-squared is the proportion of the leftover variance in the DV (after all other IVs are accounted for) that is attributable to the effect This is what SPSS gives you but dodgy (over estimates the effect) This is what SPSS gives you but dodgy (over estimates the effect) Measures of association - Eta-Squared

33  Omega-squared is an estimate of the dependent variable population variability accounted for by the independent variable.  For a one-way between groups design:  p=number of levels of the treatment variable, F = value and n= the number of participants per treatment level Measures of association - Omega-squared  2 = SS effect – (df effect )MS error SS total + Ms error

34  Cohen’s (1988) f for the one-way between groups analysis of variance can be calculated as follows  Or can use eta sq instead of omega  It is an averaged standardised difference between the 3 or more levels of the IV (even though the above formula doesn’t look like that)  Small effect - f=0.10; Medium effect - f=0.25; Large effect - f=0.40 Measures of difference - f

35 Measures of association - R- Squared  R 2 is the proportion of variance explained by the model  In general R 2 is given by  Can be converted to effect size f 2  F 2 = R 2 /(1- R 2 )  Small effect – f 2 =0.02;  Medium effect - f 2 =0.15;  Large effect - f 2 =0.35

36 Summary of effect conventions  From G*Power  http://www.psycho.uni- duesseldorf.de/aap/projects/gpower/user_manual/user_manual_02.html#input_val http://www.psycho.uni- duesseldorf.de/aap/projects/gpower/user_manual/user_manual_02.html#input_val http://www.psycho.uni- duesseldorf.de/aap/projects/gpower/user_manual/user_manual_02.html#input_val

37 estimating effect  prior literature  assessment of how great a difference is important e.g., effect on reading ability only worth the trouble if at least increases half a SD e.g., effect on reading ability only worth the trouble if at least increases half a SD  special conventions

38 38 side issues…  recall the logic of calculating estimates of effect size (i.e., criticisms of significance testing) the tradition of significance testing is based upon an arbitrary rule leading to a yes/no decision the tradition of significance testing is based upon an arbitrary rule leading to a yes/no decision  power illustrates further some of the caveats with significance testing with a high N you will have enough power to detect a very small effect with a high N you will have enough power to detect a very small effect if you cannot keep error variance low a large effect may still be non-significant if you cannot keep error variance low a large effect may still be non-significant

39 39 side issues…  on the other hand… sometimes very small effects are important sometimes very small effects are important by employing strategies to increase power you have a better chance at detecting these small effects by employing strategies to increase power you have a better chance at detecting these small effects

40 40 power Common constraints : Cell size too small Cell size too small B/c sample difficult to recruit or too little time / moneyB/c sample difficult to recruit or too little time / money Small effects are often a focus of theoretical interest (especially in social / clinical / org) Small effects are often a focus of theoretical interest (especially in social / clinical / org) DV is subject to multiple influences, so each IV has small impactDV is subject to multiple influences, so each IV has small impact “Error” or residual variance is large, because many IVs unmeasured in experiment / survey are influencing DV“Error” or residual variance is large, because many IVs unmeasured in experiment / survey are influencing DV Interactions are of interest, and interactions draw on smaller cell sizes (and thus lower power) than tests of main effects [Cell means for interaction are based on n observations, while main effects are based on n x # of levels of other factors collapsed across]Interactions are of interest, and interactions draw on smaller cell sizes (and thus lower power) than tests of main effects [Cell means for interaction are based on n observations, while main effects are based on n x # of levels of other factors collapsed across]

41 determining power  sometimes, for practical reasons, it’s useful to try to calculate the power of your experiment before conducting it  if the power is very low, then there’s no point in conducting the experiment. basically, you want to make sure you have a reasonable shot at getting an effect (if one exists!)  which is why grant reviewers want them

42 Post hoc power calculations   Generally useless / difficult to interpret from the point of view of stats   Mandated within some fields   Examples of post hoc power write-ups online at http://www.psy.uq.edu.au/~wlouis

43 G*POWER  G*POWER is a FREE program that can make the calculations a lot easier http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/ Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175-191. G*Power computes:   power values for given sample sizes, effect sizes, and alpha levels (post hoc power analyses),   sample sizes for given effect sizes, alpha levels, and power values (a priori power analyses)   suitable for most fundamental statistical methods   Note – some tests assume equal variance across groups and assumes using pop SD (which are likely to be est from sample)

44 Ok, lets do it: BS t-test  two random samples of n = 25  expect difference between means of 5  two-tailed test,  =.05 –  1 = 5 –  2 = 10 –  = 10

45 G*POWER

46  So, with that expected effect size and n we get power = ~.41  We have a probability of correctly rejecting null hyp (if false) 41% of the time  Is this good enough?  convention dictates that researchers should be entering into an experiment with no less than 80% chance of getting an effect (presuming it exists) ~ power at least.80 determining N

47 Determine n  Calculate effect size  Use power of.80 (convention)

48 WS t-test  Within subjects designs more powerful than between subjects (control for individual differences)  WS t-test not very difficult in G*Power, but becomes trickier in ANOVA  Need to know correlation between timepoints (luckily SPSS paired t gives this)  Or can use the mean and SD of “difference” scores (also in SPSS output)

49 s Screen clipping taken: 7/8/2008, 4:30 PM Method 1 Difference scores

50 Dz = Mean Diff/ SD diff =.0167/.0718 =.233

51 s Screen clipping taken: 7/8/2008, 4:30 PM

52

53 WS t-test  I said before that WS are more powerful than the equivalent BS version  Let’s test this by using the same means and SDs and using the Independent Samples t-test calculator in GPower

54 Screen clipping taken: 7/8/2008, 4:30 PM

55 Screen clipping taken: 7/8/2008, 4:30 PM Between subjects Power =.18 Within subjects Power =.07

56 56 Extension to 1-way anova…  In PSYC3010 you used Phi prime as the ANOVA equivalent of d which is the same as Cohen’s f  G*Power uses Cohen’s f  Numerous methods 1) calculate Omega sq and then use the formula for f and enter directly 2) Calculate Omega sq or eta sq and enter into “Direct” under “Effect size from variances” 3) Use means and use “Effect size from means”

57 Calculating omega & f  Given the above analysis  So

58

59 Not sure if this works with SPSS partial eta sq – have had problems before & Omega more conservative anyway

60 60 Alternatively  Alternatively, if have means (note – this is a different data set) meanDV score n Coffee63.7516 Energy Drink 64.6916 Water46.5616 MS error = 125.21  =58.33 N=48 MS error = 125.21  =58.33 N=48 use square root of MSE to enter into SD within each group in GPOwer

61 61

62 62 how about 2-way factorial anova? Need to test for 3 effects to estimate the power:  Main effect IV 1  Main effect IV 2  Interaction effect (usually less power than main effects due to smaller n in each cell) See http://www.psycho.uni- duesseldorf.de/aap/projects/gpower/reference/reference_manual_07.html http://www.psycho.uni- duesseldorf.de/aap/projects/gpower/reference/reference_manual_07.html http://www.psycho.uni- duesseldorf.de/aap/projects/gpower/reference/reference_manual_07.html

63 Within subjects ANOVA  Not only need to know effect size but also correlation across time/vars Use a convention for estimating effect size (G*Power uses either Lambda or Cohen’s f) Use a convention for estimating effect size (G*Power uses either Lambda or Cohen’s f) Calculate f using number of levels, effect convention, correlation (e.g., test-retest) Calculate f using number of levels, effect convention, correlation (e.g., test-retest) Calculate Lambda (f * N) Calculate Lambda (f * N) Use Generic F test Use Generic F test

64 Within Example  3 levels over time (m)  64 Participants (n)  Look for small effect (f =.01)  Test-retest corr =.79 (p)  Calc f = (m*f)/(1-p) = (3*.01)/(1-.79) =.143  Calc Lambda = f*n =.143*64 = 9.152  DF 1 = m- 1 = 2  DF 2 = n*(m-1) = 128

65 Note. Can’t do a priori. If need to estimate upfront play with denominator DF (based on N)

66 Within Example  Refer to Karl Wuensch’s website for more details re: RM  http://core.ecu.edu/psyc/wuenschk/StatsLe ssons.htm http://core.ecu.edu/psyc/wuenschk/StatsLe ssons.htm http://core.ecu.edu/psyc/wuenschk/StatsLe ssons.htm  And Gpower manuals online – e.g.: http://www.psycho.uni- duesseldorf.de/abteilungen/aap/gpower3/user- guide-type_of_power_analysis http://www.psycho.uni- duesseldorf.de/abteilungen/aap/gpower3/user- guide-type_of_power_analysis http://www.psycho.uni- duesseldorf.de/abteilungen/aap/gpower3/user- guide-type_of_power_analysis

67 Regression analyses  Effect size associated with R 2 f 2 = R 2 /1-R 2 f 2 = R 2 /1-R 2  For semipartial f 2 = sr 2 /1-R 2 full f 2 = sr 2 /1-R 2 full  f 2 =.02 (small)  f 2 =.15 (medium)  f 2 =.35 (large)  Convert to variance acct f 2 /(1+ f 2 )

68 R2R2R2R2 3 predictor variables R 2 for full model =.22 f 2 =.22/(1-.22) =.282 N = 110

69

70 Change R 2 (HMR) 2 steps, 2 predictors in step 1, 3 in step 2 R 2 for full model =.10 Change R 2 for step 2 =.04 f 2 = R 2 change /(1-R 2 full ) f 2 =.04/(1-.1) =.0444 N = 95 DF numerator for Step 2= 3

71

72 Complex analyses  G*POWER useful for basic analyses  Complex analyses e.g., SEM, MLM etc usually look to monte carlo studies

73 Additional Resources  http://www.danielsoper.com/statcalc/ http://www.danielsoper.com/statcalc/ Some other statistical calculators including for power Some other statistical calculators including for power


Download ppt "Power Winnifred Louis 15 July 2009. Overview of Workshop   Review of the concept of power   Review of antecedents of power   Review of power analyses."

Similar presentations


Ads by Google