On-line resources html html Note demo page
Effect sizes For a small effect size,.01, The change in success rate is from 46% to 54% For a medium effect size,.06, The change in success rate is from 38% to 62%. For a large effect size,.16, The change in success rate is from 30% to 70% R-squaredrCohen’s D Large Medium Small
But what does.10 really mean? PredictorOutcomeR2r Vietnam veteran status Alcohol abuse TestostoneJuvenile delinquency AZTDeath PsychotherapyImprovement.10.32
Is psychotherapy effective? (after Shapiro & Shapiro, 1983) Therapy targetNumber of studies Cohen’s DrR2 Anxiety & depression % Phobias % Physical and habit problems % Social and sexual problems % Performance anxieties %
Calculating Cohen’s D Effect size = difference between predicted mean and mean of known population divided by population standard deviation (assumes that you know population and sample size) (imagine one population receives treatment, the other does not) d= 1 2 ) / 1 =mean of population 1 (hypothesized mean for the population that is subjected to the experimental manipulation) 2 =mean of population 2 (which is also the mean of the comparison distribution) =standard deviation of population 2 (assumed to be the standard deviation of both populations
One other way to think about D D =.20, overlap 85%, 15 vs. 16 year old girls distribution of heights D=.50, overlap 67%, 14 vs. 18 year old girls distribution of heights D=.80, overlap 53%, 13 vs. 18 years old girls distribution of heights
Effect sizes are interchangeable
Statistical significance vs. effect size p <.05 r =.10 – For 100,000, p<.05 – For 10, p>.05 – Large sample, closer to population, less chance of sampling error
Brief digression Research hypotheses and statistical hypotheses Is psychoanalysis effective? – Null? – Alternate? – Handout Why test the null?
Statistical significance and decision levels. (Z scores, t values and F values) Sampling distributions for the null hypothesis:
One way to think about it…
Two ways to guess wrong Truth for population Do not reject null hypothesis Reject null hypothesis Null is trueCorrect!Type 1 error Null is not trueType 2 errorCorrect! Type 1 error: think something is there and there is nothing Type 2 error: think nothing is there and there is
An example Null hypothesis is falseNull hypothesis is true Reject null hypothesisMerit pay works and we know it We decided merit pay worked, but it doesn’t. Do not reject null hypothesisWe decided merit pay does not work but it does. Merit pay does not work and we know it.
An example Imagine the following research looking at the effects of the drug, AZT, if any, on HIV positive patients. In others words, does a group of AIDs patients given AZT live longer than another group given a placebo. If we conduct the experiment correctly - everything is held constant (or randomly distributed) except for the independent measure and we do find a different between the two groups, there are only two reasonable explanations available to us: From Dave Schultz: Null hypothesis is false Null hypothesis is true Reject null hypothesis Do not reject null hypothesis
Power -> Effect size | If you think that the effect is small (.01), medium, (.06) or large (.15), and you want to find a statistically significant difference defined as p<.05, this table shows you how many participants you need for different levels of “sensitivity” or power. Statistical power is how “sensitive” a study is detecting various associations (magnification metaphor)
Power -> Effect size | If you think that the effect is small (.01), medium, (.06) or large (.15), and you want to find a statistically significant difference defined as p<.01, this table shows you how many participants you need for different levels of “sensitivity” or power.
What determines power? 1.Number of subjects 2.Effect size 3.Alpha level Power = probability that your experiment will reveal whether your research hypothesis is true
How increase power? 1.Increase region of rejection to p<.10 2.Increase sample size 3.Increase treatment effects 4.Decrease within group variability
Study featurePractical way of raising power Disadvantages Predicted differenceIncrease intensity of experimental procedures May not be practical or distort study’s meaning Standard deviationUse a less diverse population May not be available, decreases generalizability Standard deviationUse standardized, controlled circumstances of testing or more precise measurement Not always practical Sample sizeUse a larger sample sizeNot practical, can be costly Significant levelUse a more lenient level of significance Raises alpha, the probability of type 1 error One tailed vs. two tailed test Use a one-tailed testMay not be appropriate to logic of study
What is adequate power?.50 (most current research).80 (recommended) How do you know how much power you have? Guess work Two ways to use power: 1. Post hoc to establish what you could find 2. Determine how many participants need
Outcome statistically significant Sample SizeConclusion YesSmallImportant results YesLargeMight or might not have practical importance NoSmallInconclusive NoLargeResearch H. probably false
Statistical power (for p <.05) r=.10 r=.30 r=.50 Two tailed One tailed Power: Power = 1 - type 2 error Power = 1 - beta