Effect Size and Power.

Slides:



Advertisements
Similar presentations
Inferential Statistics and t - tests
Advertisements

Psych 5500/6500 t Test for Two Independent Groups: Power Fall, 2008.
Comparing One Sample to its Population
Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
PTP 560 Research Methods Week 9 Thomas Ruediger, PT.
Lecture 2: Null Hypothesis Significance Testing Continued Laura McAvinue School of Psychology Trinity College Dublin.
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE
Statistical Issues in Research Planning and Evaluation
Effect Size and Power. ► Two things mentioned previously:  P-values are heavily influenced by sample size (n)  Statistics Commandment #1: P-values are.
Lecture 3: Null Hypothesis Significance Testing Continued Laura McAvinue School of Psychology Trinity College Dublin.
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
RIMI Workshop: Power Analysis Ronald D. Yockey
Hypothesis Testing. Research hypothesis are formulated in terms of the outcome that the experimenter wants, and an alternative outcome that he doesn’t.
1. Estimation ESTIMATION.
Review: What influences confidence intervals?
Using Statistics in Research Psych 231: Research Methods in Psychology.
Power (Reading Packet Sect III) Mon, March 29 th.
Factorial ANOVA 2-Way ANOVA, 3-Way ANOVA, etc.. Factorial ANOVA One-Way ANOVA = ANOVA with one IV with 1+ levels and one DV One-Way ANOVA = ANOVA with.
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
Today Concepts underlying inferential statistics
Using Statistics in Research Psych 231: Research Methods in Psychology.
2-Way ANOVA, 3-Way ANOVA, etc.
STAT 3130 Statistical Methods II Session 6 Statistical Power.
Chapter 5For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Suppose we wish to know whether children who grow up in homes without access to.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Hypothesis Testing:.
Inference in practice BPS chapter 16 © 2006 W.H. Freeman and Company.
Psy B07 Chapter 8Slide 1 POWER. Psy B07 Chapter 8Slide 2 Chapter 4 flashback  Type I error is the probability of rejecting the null hypothesis when it.
RMTD 404 Lecture 8. 2 Power Recall what you learned about statistical errors in Chapter 4: Type I Error: Finding a difference when there is no true difference.
Jan 17,  Hypothesis, Null hypothesis Research question Null is the hypothesis of “no relationship”  Normal Distribution Bell curve Standard normal.
Elementary Statistical Methods André L. Souza, Ph.D. The University of Alabama Lecture 22 Statistical Power.
Hypothesis Testing Quantitative Methods in HPELS 440:210.
Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.
PSY2004 Research Methods PSY2005 Applied Research Methods Week Eleven Stephen Nunn.
Chapter 9 Power. Decisions A null hypothesis significance test tells us the probability of obtaining our results when the null hypothesis is true p(Results|H.
Step 3 of the Data Analysis Plan Confirm what the data reveal: Inferential statistics All this information is in Chapters 11 & 12 of text.
Hypothesis Testing Hypothesis Testing Topic 11. Hypothesis Testing Another way of looking at statistical inference in which we want to ask a question.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.
1 MARKETING RESEARCH Week 5 Session A IBMS Term 2,
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
PSY2004 Research Methods PSY2005 Applied Research Methods Week Five.
Statistical Power The power of a test is the probability of detecting a difference or relationship if such a difference or relationship really exists.
DIRECTIONAL HYPOTHESIS The 1-tailed test: –Instead of dividing alpha by 2, you are looking for unlikely outcomes on only 1 side of the distribution –No.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Chapter 8 Parameter Estimates and Hypothesis Testing.
Chapter 21: More About Tests
Review I A student researcher obtains a random sample of UMD students and finds that 55% report using an illegally obtained stimulant to study in the past.
6 Making Sense of Statistical Significance.
Inferential Statistics Inferential statistics allow us to infer the characteristic(s) of a population from sample data Slightly different terms and symbols.
STA Lecture 221 !! DRAFT !! STA 291 Lecture 22 Chapter 11 Testing Hypothesis – Concepts of Hypothesis Testing.
AP Statistics Chapter 21 Notes
Chapter 8: Introduction to Hypothesis Testing. Hypothesis Testing A hypothesis test is a statistical method that uses sample data to evaluate a hypothesis.
Hypothesis test flow chart
Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.
Inferential Statistics Psych 231: Research Methods in Psychology.
Critical Appraisal Course for Emergency Medicine Trainees Module 2 Statistics.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Introduction to Power and Effect Size  More to life than statistical significance  Reporting effect size  Assessing power.
Slides to accompany Weathington, Cunningham & Pittenger (2010), Chapter 11: Between-Subjects Designs 1.
Effect Size.
Chapter 21 More About Tests.
Hypothesis Testing and Confidence Intervals (Part 2): Cohen’s d, Logic of Testing, and Confidence Intervals Lecture 9 Justin Kern October 17 and 19, 2017.
More about Tests and Intervals
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Test Review: Ch. 7-9
Review: What influences confidence intervals?
Another Example Consider a very popular computer game played by millions of people all over the world. The average score of the game is known to be
Chapter 12 Power Analysis.
Power analysis Chong-ho Yu, Ph.Ds..
Presentation transcript:

Effect Size and Power

Effect Size and Power Two things mentioned previously: P-values are heavily influenced by sample size (n) Statistics Commandment #1: P-values are silent on the strength of the relationship between two variables Effect size is what tells you about this, and we will discuss this today, in more detail Don’t forget, if you haven’t already, read Cohen’s (1992) Power Primer It’s only five pages long, simply-worded, and the best article in statistics you’ll ever read

Effect Size and Power P-values are influence heavily by n So heavily influenced, in fact, that with enough people anything is significant Ex: Data with two samples, and N=10 Group 1 mean = 6, s = 3.16 Group 2 mean = 7, s = 3.16 t = -.5, p = .63  We would fail to reject Ho 2 3 4 5 6 7 8 9 10 11

Effect Size and Power Take same data, but multiply Nx20 (N = 200) Group 1 mean still = 6, s still = 3.16 Group 2 mean still = 7, s still = 3.16 But now t = -2.46, p = .02  We would reject Ho 2 3 4 5 6 7 8 9 10 11 Etc…

Effect Size and Power As I said before, with enough n, anything is significant Because p-values don’t say anything about the size of your effect, you can have two groups that are almost identical (like in our example) that your statistics say are significant P-values just say how likely it is that if you took another sample, that you’d get the same result – the results from big samples are stable, as we’d expect

Effect Size and Power Therefore, we need something to report in addition to p-values that are less influenced by n, and can say something about the size of our IV’s effect In the previous example, we have a low p-value, but our IV had little effect, because both of our groups (both with it and without it) had almost the same mean score Jacob Cohen to the rescue! Cohen and others have been pointing out this flaw in exclusively using p-based statistics for decades and psychologists and medical research are only beginning to catch on – most research still only reports p-values

Effect Size and Power Cohen (and others) championed the use of Effect Size statistics that provide us with this information, and are not influenced by sample size Effect Size: the strength of the effect that our IV had on our DV There is no one formula for effect size, depending on your data, there are many different formulas, and many different statistics (see the Cohen article) – they all take the general form

Effect Size and Power Ex. The effect size estimate for the Independent-Samples T-Test is: This looks a lot like our formula for z, and is interpreted similarly D-hat = the number of standard deviations mean1 is from mean2 – just like z was interpreted as the number of standard deviations our score fell from the mean

Effect Size and Power Interpreting Effect Size: How do we know when our effect size is large? 1. Prior Research – if previous research investigating an educational intervention for low-income kids only increases their grades in school by .5 standard deviations and your does so by 1 s.d., you can say this is a large effect (~twice as large, to be exact) 2. Theoretical Prediction – if we’re developing a treatment for Borderline Personality Disorder, theory behind this disorder says that it’s stable across time and therefore difficult to treat, so we may only look for a medium effect size before we declare success

Effect Size and Power Interpreting Effect Size: How do we know when our effect size is large? 3. Practical Considerations – if our treatment has the potential to benefit a lot of people inexpensively, even if it only helps a little (i.e. a small effect), this may be significant I.e. the average effect size for using aspirin to treat heart disease is small, but since it is inexpensive and easily implemented, and can therefore help many people (even if only a little), this is an important finding Fun Fact – the GRE predicts GPA in graduate school in psychology at an effect size of only r = .15 (which is small), but is still used because there are no better standardized tests available

Effect Size and Power Interpreting Effect Size: How do we know when our effect size is large? 4. Tradition/Convention – when your research is novel and exploratory in nature (i.e. there is little prior research or theory to guide your expectations), we need an alternative to these methods Cohen has devised standard conventions for large, medium, and small effects for the various effect size statistics (see the Cohen article) However, what is large for one effect size statistics IS NOT NECESSARILY large for another Ex. r = .5 corresponds to a large effect size, but d = .5 only corresponds to a medium effect

Effect Size and Power Take Home Messages: 1. Interpreting effect size statistics requires detailed knowledge about your experiment Without any knowledge of how an effect size statistic was obtained, if someone asks: “Is an r = .25 a large effect?”, your answer should be: “It depends…”. 2. When reporting effect size, you CANNOT say: “My effect size was .05, and so was large”, because different effect size statistics have different conventions for small to large values Even David Barlow, a world-renowned expert on the treatment of anxiety disorders in his book The Clinical Handbook of Psychological Disorders made this mistake

Effect Size and Power Just like with too large a sample anything is significant, with too small a sample nothing is significant This refers to the probability of a Type II Error (β), incorrectly failing to reject Ho (AKA rejecting H1) How do we determine what sample size is therefore neither too large, nor too small?

Effect Size and Power How do we maximize power? We try to maximize power, which is the reverse of a Type II Error Type II Error = incorrectly rejecting H1 ( when it is true) ; Power = correctly rejecting H1 (when it is false) How do we maximize power? 1. Increase Type I Error (α) This is problematic for obvious reasons – we don’t want to decrease making one type of error for another if we can help it

Effect Size and Power How do we maximize power? 2. Increase Effect Size We accomplish this by trying to make our IV as potent as possible, or choose a weak control group I.e. Comparing our treatment to an alternative treatment will result in a lower effect size than if we compare it to no treatment 3. Increase n or decrease s Remember: in our statistical tests we are dividing by the standard error (s/√n) – decreasing s makes this number smaller, as does increasing n – dividing by a smaller number gives us a larger value of z or t, which results in an increased chance of rejecting Ho

Effect Size and Power What is good power? Statistical convention says that power = .8 is a good value that minimizes both Type I and Type II Error Before we conduct our experiment, i.e. a priori, we need to do what is called a Power Analysis that tells us what sample size will give us our needed power You can download a program called G*Power from the internet that does these calculations for you – you type in the kind of test you’re doing (remember how tests can be more or less “powerful”), your alpha, the power you want, and the effect size you expect, and it give you the sample size you’d need Other programs also do this, like Power and Precision, but G*Power is free Find it at: http://www.psycho.uni-duesseldorf.de/aap/projects/gpower/

Effect Size and Power You can also do the calculations by hand (see the textbook) However, understanding the concept of effect size and power is more important than knowing how to calculate it by hand, and since I don’t want to overwhelm you guys, you won’t be tested on these calculations (Sections 15.4 - 15.9 in the text)