Effect Size and Power. ► Two things mentioned previously:  P-values are heavily influenced by sample size (n)  Statistics Commandment #1: P-values are.

Slides:



Advertisements
Similar presentations
Introduction to Hypothesis Testing
Advertisements

Inferential Statistics and t - tests
Comparing One Sample to its Population
Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Lecture 2: Null Hypothesis Significance Testing Continued Laura McAvinue School of Psychology Trinity College Dublin.
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE
Effect Size and Power.
Statistical Issues in Research Planning and Evaluation
Statistical Decision Making
Lecture 3: Null Hypothesis Significance Testing Continued Laura McAvinue School of Psychology Trinity College Dublin.
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
RIMI Workshop: Power Analysis Ronald D. Yockey
Review: What influences confidence intervals?
Using Statistics in Research Psych 231: Research Methods in Psychology.
Power (Reading Packet Sect III) Mon, March 29 th.
Factorial ANOVA 2-Way ANOVA, 3-Way ANOVA, etc.. Factorial ANOVA One-Way ANOVA = ANOVA with one IV with 1+ levels and one DV One-Way ANOVA = ANOVA with.
Introduction to Hypothesis Testing CJ 526 Statistical Analysis in Criminal Justice.
BCOR 1020 Business Statistics Lecture 21 – April 8, 2008.
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
Today Concepts underlying inferential statistics
Using Statistics in Research Psych 231: Research Methods in Psychology.
2-Way ANOVA, 3-Way ANOVA, etc.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Osama A Samarkandi, PhD-RN, NIAC BSc, GMD, BSN, MSN.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
AP Statistics Section 13.1 A. Which of two popular drugs, Lipitor or Pravachol, helps lower bad cholesterol more? 4000 people with heart disease were.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Hypothesis Testing:.
Inference in practice BPS chapter 16 © 2006 W.H. Freeman and Company.
Psy B07 Chapter 8Slide 1 POWER. Psy B07 Chapter 8Slide 2 Chapter 4 flashback  Type I error is the probability of rejecting the null hypothesis when it.
RMTD 404 Lecture 8. 2 Power Recall what you learned about statistical errors in Chapter 4: Type I Error: Finding a difference when there is no true difference.
Jan 17,  Hypothesis, Null hypothesis Research question Null is the hypothesis of “no relationship”  Normal Distribution Bell curve Standard normal.
Hypothesis Testing Quantitative Methods in HPELS 440:210.
Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.
CHAPTER 18: Inference about a Population Mean
PSY2004 Research Methods PSY2005 Applied Research Methods Week Eleven Stephen Nunn.
Chapter 9 Power. Decisions A null hypothesis significance test tells us the probability of obtaining our results when the null hypothesis is true p(Results|H.
Introduction to inference Use and abuse of tests; power and decision IPS chapters 6.3 and 6.4 © 2006 W.H. Freeman and Company.
Step 3 of the Data Analysis Plan Confirm what the data reveal: Inferential statistics All this information is in Chapters 11 & 12 of text.
Hypothesis Testing Hypothesis Testing Topic 11. Hypothesis Testing Another way of looking at statistical inference in which we want to ask a question.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
PSY2004 Research Methods PSY2005 Applied Research Methods Week Five.
Statistical Power The power of a test is the probability of detecting a difference or relationship if such a difference or relationship really exists.
STA Lecture 251 STA 291 Lecture 25 Testing the hypothesis about Population Mean Inference about a Population Mean, or compare two population means.
DIRECTIONAL HYPOTHESIS The 1-tailed test: –Instead of dividing alpha by 2, you are looking for unlikely outcomes on only 1 side of the distribution –No.
Chapter 8 Parameter Estimates and Hypothesis Testing.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Chapter 21: More About Tests
Review I A student researcher obtains a random sample of UMD students and finds that 55% report using an illegally obtained stimulant to study in the past.
6 Making Sense of Statistical Significance.
Inferential Statistics Inferential statistics allow us to infer the characteristic(s) of a population from sample data Slightly different terms and symbols.
STA Lecture 221 !! DRAFT !! STA 291 Lecture 22 Chapter 11 Testing Hypothesis – Concepts of Hypothesis Testing.
AP Statistics Chapter 21 Notes
Chapter 8: Introduction to Hypothesis Testing. Hypothesis Testing A hypothesis test is a statistical method that uses sample data to evaluate a hypothesis.
Hypothesis test flow chart
Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Inferential Statistics Psych 231: Research Methods in Psychology.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Slides to accompany Weathington, Cunningham & Pittenger (2010), Chapter 11: Between-Subjects Designs 1.
INF397C Introduction to Research in Information Studies Spring, Day 12
Effect Size.
Chapter 21 More About Tests.
Hypothesis Testing and Confidence Intervals (Part 2): Cohen’s d, Logic of Testing, and Confidence Intervals Lecture 9 Justin Kern October 17 and 19, 2017.
Review: What influences confidence intervals?
Another Example Consider a very popular computer game played by millions of people all over the world. The average score of the game is known to be
Chapter 12 Power Analysis.
Presentation transcript:

Effect Size and Power

► Two things mentioned previously:  P-values are heavily influenced by sample size (n)  Statistics Commandment #1: P-values are silent on the strength of the relationship between two variables ► Effect size is what tells you about this, and we will discuss this today, in more detail ► Don’t forget, if you haven’t already, read Cohen’s (1992) Power Primer  It’s only five pages long, simply-worded, and the best article in statistics you’ll ever read

Effect Size and Power ► P-values are influence heavily by n  So heavily influenced, in fact, that with enough people anything is significant (a Type I Error)  Ex: Data with two samples, and N=10  Group 1 mean = 6, s = 3.16  Group 2 mean = 7, s = 3.16  t = -.5, p =.63  We would fail to reject H o

Effect Size and Power ► Take same data, but multiply Nx20 (N = 200)  Group 1 mean still = 6, s still = 3.16  Group 2 mean still = 7, s still = 3.16  But now t = -2.46, p =.02  We would reject H o Etc…Etc…

Effect Size and Power ► As I said before, with enough n, anything is significant  Because p-values don’t say anything about the size of your effect, you can have two groups that are almost identical (like in our example) that your statistics say are significant  P-values just say how likely it is that if you took another sample, that you’d get the same result – the results from big samples are stable, as we’d expect

Effect Size and Power ► Therefore, we need something to report in addition to p-values that are less influenced by n, and can say something about the size of our IV’s effect  In the previous example, we have a low p-value, but our IV had little effect, because both of our groups (both with it and without it) had almost the same mean score ► Jacob Cohen to the rescue!  Cohen and others have been pointing out this flaw in exclusively using p-based statistics for decades and psychologists and medical research are only beginning to catch on – most research still only reports p-values

Effect Size and Power ► Cohen (and others) championed the use of Effect Size statistics that provide us with this information, and are not influenced by sample size  Effect Size: the strength of the effect that our IV had on our DV ► There is no one formula for effect size, depending on your data, there are many different formulas, and many different statistics (see the Cohen article) – they all take the general form

Effect Size and Power  Ex. The effect size estimate for the Independent- Samples T-Test is:  This looks a lot like our formula for z, and is interpreted similarly  D-hat = the number of standard deviations mean 1 is from mean 2 – just like z was interpreted as the number of standard deviations our score fell from the mean

Effect Size and Power ► Interpreting Effect Size:  How do we know when our effect size is large? ► 1. Prior Research – if previous research investigating an educational intervention for low-income kids only increases their grades in school by.5 standard deviations and your does so by 1 s, you can say this is a large effect (~twice as large, to be exact) ► 2. Theoretical Prediction – if we’re developing a treatment for Borderline Personality Disorder, theory behind this disorder says that it’s stable across time and therefore difficult to treat, so we may only look for a medium effect size before we declare success

Effect Size and Power ► Interpreting Effect Size:  How do we know when our effect size is large? ► 3. Practical Considerations – if our treatment has the potential to benefit a lot of people inexpensively, even if it only helps a little (i.e. a small effect), this may be significant  I.e. the average effect size for using aspirin to treat heart disease is small, but since it is inexpensive and easily implemented, and can therefore help many people (even if only a little), this is an important finding  Fun Fact – the GRE predicts GPA in graduate school in psychology at an effect size of only r =.15 (which is small), but is still used because there are no better standardized tests available

Effect Size and Power ► Interpreting Effect Size:  How do we know when our effect size is large? ► 4. Tradition/Convention – when your research is novel and exploratory in nature (i.e. there is little prior research or theory to guide your expectations), we need an alternative to these methods  Cohen has devised standard conventions for large, medium, and small effects for the various effect size statistics (see the Cohen article)  However, what is large for one effect size statistics IS NOT NECESSARILY large for another ► Ex. r =.5 corresponds to a large effect size, but d =.5 only corresponds to a medium effect

Effect Size and Power ► Take Home Messages:  1. Interpreting effect size statistics requires detailed knowledge about your experiment ► Without any knowledge of how an effect size statistic was obtained, if someone asks: “Is an r =.25 a large effect?”, your answer should be: “It depends…”.  2. When reporting effect size, you CANNOT say: “My effect size was.05, and so was large”, because different effect size statistics have different conventions for small to large values ► Even David Barlow, a world-renowned expert on the treatment of anxiety disorders in his book The Clinical Handbook of Psychological Disorders made this mistake

Effect Size and Power ► Just like with too large a sample anything is significant, with too small a sample nothing is significant  This refers to the probability of a Type II Error (β), incorrectly failing to reject H o (AKA rejecting H 1 ) ► How do we determine what sample size is therefore neither too large, nor too small?

Effect Size and Power  We try to maximize power (1 – β), which is the reverse of a Type II Error (β) ► Type II Error = incorrectly failing to reject H o ( when it is false) ; Power = correctly rejecting H o (when it is false) ► How do we maximize power?  1. Increase Type I Error (α) ► This is problematic for obvious reasons – we don’t want to decrease making one type of error for another if we can help it

Effect Size and Power ► How do we maximize power?  2. Increase Effect Size ► We accomplish this by trying to make our IV as potent as possible, or choose a weak control group  I.e. Comparing our treatment to an alternative treatment will result in a lower effect size than if we compare it to no treatment  3. Increase n or decrease s ► Remember: in our statistical tests we are dividing by the standard error (s/√n) – decreasing s makes this number smaller, as does increasing n – dividing by a smaller number gives us a larger value of z or t, which results in an increased chance of rejecting H o

Effect Size and Power ► What is good power?  Statistical convention says that power =.8 is a good value that minimizes both Type I and Type II Error ► Power =.80  20% chance of making Type II Error  Before we conduct our experiment, i.e. a priori, we need to do what is called a Power Analysis that tells us what sample size will give us our needed power ► You can download a program called G*Power from the internet that does these calculations for you ► You type in the kind of test you’re doing (remember how tests can be more or less “powerful”), your alpha, the power you want, and the effect size you expect, and it gives you the sample size you’d need ► Other programs also do this, like Power and Precision, but G*Power is free ► Find it at:

Effect Size and Power ► You can also do the calculations by hand (see the textbook)  However, understanding the concept of effect size and power is more important than knowing how to calculate it by hand, and since I don’t want to overwhelm you guys, you won’t be tested on these calculations (you can skip Secs )

Effect Size and Power ► What is good power?  Power Analysis ► Involves estimating a predicted effect size ahead of time ► Prediction based on interpretation guidelines:  Prior Research  Theory  Practical Considerations  Convention

Effect Size and Power ► How does effect size add to interpretation of study results over-and-above p-values? P-value/E.s P-value/E.sHighLow High IV had a strong and reliable effect on DV IV had a weak effect on DV inflated by large n Low IV had a strong effect on DV, but too low n to detect it/IV had a strong effect of unknown reliability IV had a weak effect on DV

Effect Size and Power ► Retrospective Power  SPSS provides an estimate of power given the p-value and effect size obtained and sample size used  Tempting to interpret low power as indication that too few subjects were used to detect the effect obtained  Recall though that this information inferred directly from p-value and e.s., which are used to calculate power ► Retrospective power estimates add nothing to interpretation of p-values and e.s.