RIMI Workshop: Power Analysis Ronald D. Yockey

Slides:



Advertisements
Similar presentations
Lecture 2: Null Hypothesis Significance Testing Continued Laura McAvinue School of Psychology Trinity College Dublin.
Advertisements

Statistical Issues in Research Planning and Evaluation
Lecture 3: Null Hypothesis Significance Testing Continued Laura McAvinue School of Psychology Trinity College Dublin.
T-tests Computing a t-test  the t statistic  the t distribution Measures of Effect Size  Confidence Intervals  Cohen’s d.
Statistics for the Social Sciences
Statistics Are Fun! Analysis of Variance
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 9: Hypothesis Tests for Means: One Sample.
Lecture 9: One Way ANOVA Between Subjects
T-Tests Lecture: Nov. 6, 2002.
UNDERSTANDING RESEARCH RESULTS: STATISTICAL INFERENCE © 2012 The McGraw-Hill Companies, Inc.
Lecture 7 PY 427 Statistics 1 Fall 2006 Kin Ching Kong, Ph.D
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 11: Power.
Today Concepts underlying inferential statistics
Getting Started with Hypothesis Testing The Single Sample.
Chapter 14 Inferential Data Analysis
Richard M. Jacobs, OSA, Ph.D.
Correlation A bit about Pearson’s r.
Relationships Among Variables
Inferential Statistics
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Inferential Statistics
Choosing Statistical Procedures
Testing Group Difference
AM Recitation 2/10/11.
Comparing Two Population Means
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
Chapter 8 Introduction to Hypothesis Testing
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Hypothesis Testing PowerPoint Prepared by Alfred.
January 31 and February 3,  Some formulae are presented in this lecture to provide the general mathematical background to the topic or to demonstrate.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
PSY 307 – Statistics for the Behavioral Sciences Chapter 16 – One-Factor Analysis of Variance (ANOVA)
Exam Exam starts two weeks from today. Amusing Statistics Use what you know about normal distributions to evaluate this finding: The study, published.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
Introduction to Inferential Statistics Statistical analyses are initially divided into: Descriptive Statistics or Inferential Statistics. Descriptive Statistics.
Statistical Power The power of a test is the probability of detecting a difference or relationship if such a difference or relationship really exists.
Chapter 9 Three Tests of Significance Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Correct decisions –The null hypothesis is true and it is accepted –The null hypothesis is false and it is rejected Incorrect decisions –Type I Error The.
Review Hints for Final. Descriptive Statistics: Describing a data set.
General Linear Model 2 Intro to ANOVA.
I271B The t distribution and the independent sample t-test.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chapter 12 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Chapter 12: One-Way Independent ANOVA What type of therapy is best for alleviating.
Chapter Eight: Using Statistics to Answer Questions.
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 7 Making Sense of Statistical.
Testing Hypotheses II Lesson 10. A Directional Hypothesis (1-tailed) n Does reading to young children increase IQ scores?  = 100,  = 15, n = 25 l sample.
Handout Six: Sample Size, Effect Size, Power, and Assumptions of ANOVA EPSE 592 Experimental Designs and Analysis in Educational Research Instructor: Dr.
Hypothesis test flow chart
Chapter 13 Understanding research results: statistical inference.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Choosing and using your statistic. Steps of hypothesis testing 1. Establish the null hypothesis, H 0. 2.Establish the alternate hypothesis: H 1. 3.Decide.
Chapter 10: The t Test For Two Independent Samples.
Chapter 9 Introduction to the t Statistic
Logic of Hypothesis Testing
INF397C Introduction to Research in Information Studies Spring, Day 12
Estimation & Hypothesis Testing for Two Population Parameters
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Test Review: Ch. 7-9
Calculating Sample Size: Cohen’s Tables and G. Power
Chapter 6 Making Sense of Statistical Significance: Decision Errors, Effect Size and Statistical Power Part 1: Sept. 18, 2014.
Hypothesis Testing.
Presentation transcript:

RIMI Workshop: Power Analysis Ronald D. Yockey

Goals of the Power Analysis Workshop 1. Understand what power is and why power analyses are important in conducting research. 2. Recognize the limits of Null Hypothesis Significance Testing (NHST) and how effect sizes complement NHST. 3. Understand the relationship between power, effect size, and sample size. 4. Use GPower to estimate the sample size (N) required to obtain a desired level of power (e.g., 80%) for a number of statistical procedures. 5. Provide an estimate of power for your grant proposals!

Null Hypothesis Significance Testing (NHST)

What is Power? Power – the probability of rejecting the null hypothesis (i.e., obtaining significance) when it is false - Ranges from 0 to 1 - Ranges from 0 to 1 - When multiplied by 100%, power is expressed as a percentage

Examples of Power Example #1 Power = % of the time the null hypothesis will be rejected (i.e., statistical significance will be obtained) - 50% of the time the null hypothesis will not be rejected (i.e., statistical significance will not be obtained) – A Type II Error

Examples of Power (continued) Example #2 Power = % of the time the null hypothesis will be rejected (i.e., statistical significance will be obtained) - 20% of the time the null hypothesis will not be rejected (i.e., statistical significance will not be obtained) – A Type II Error

Rationale for Power Analysis - Work involved in a study - conceiving the idea, literature review, grant proposal submission, running participants, analyzing data, writing the results - High power = high chance of obtaining significance (supporting the research hypothesis) - Low power = low chance of obtaining significance - Neglecting “A priori” Power Analysis frequently results in low power studies - Power Analysis - Crucial for increasing the probability of getting significant results!

Rationale for Power Analysis (continued) Low power studies are very common e.g., Power = % chance of achieving significance (rejecting H 0 ) Is spending the time and effort to conduct the study (not to mention taxpayers’ money) worth it when there is only a 3 in 10 chance of getting significance? Recommended power level – 70% to 80% (Diminishing returns in the 90%+ range)

Factors That Influence Power 1. Alpha level (α =.05 or.01) - Larger α = greater power 2. One-tailed vs. two-tailed tests - One-tailed tests have greater power (for a constant α) - Two-tailed tests are much more common (a one- tailed test may require justification) 3. The size of the standard deviation (σ) - Smaller standard deviation = greater power (σ can be very difficult to manipulate)

Factors That Influence Power (continued) 4. Effect size – the size of the “treatment effect” in your study - Larger effect size = greater power 5. Sample size (N) - Larger N = greater power (The most commonly manipulated factor for increasing power)

Examples of Low Power Studies Very “realistic” low power study examples (for the independent samples t test): Example #1 (2-tailed, α=.05) n 1 = 30 n 2 = 30 Small effect (i.e., a relatively small difference between the groups; characteristic of many studies in the social and behavioral sciences) Power = 12%!

Examples of Low Power Studies (continued) Example #2 (2-tailed, α=.05) n 1 =50, n 2 =50; Small effect Power = 17%! Example #3 (2-tailed, α=.05) n 1 =30, n 2 =30; Medium effect Power = 48% All three studies suffer from insufficient power.

Rationale for Power Analysis (continued) The prevalence of low power studies is one reason why funding agencies such as NIH and NIMH (among others) often require estimates of power with the submission of a grant proposal. And that’s why we’re here today!

Null Hypothesis Significance Testing (NHST)

NHST (Continued) If statistical significance is obtained (e.g., p <.05), then we can declare that the groups are different. While a “statistically significant” result with NHST tells us the groups are different, it says nothing about how different they are. Statistical significance means “beyond normal sampling error” or “reliable difference,” but it does not necessarily mean “big difference” or “important.”

NHST (Continued) - While NHST can be a very useful tool, it has frequently been misused, as far too many researchers have made the mistake of assuming statistical significance means “practical importance” - Due to this common misunderstanding, the American Psychological Association (APA) now strongly encourages that effect sizes be presented (alongside the results of significance tests), and many journals require the reporting of effect sizes for manuscript consideration.

What is an Effect Size? Effect size – Indicates the size or degree of the effect of some treatment or phenomenon Definitions of effect size provided by Cohen (1988; p. 8-9) - “The degree to which the phenomenon is present in the population.” - “The degree to which the null hypothesis is false.”

NHST vs. Effect Size Cohen’s second definition of effect size (repeated): - “The degree to which the null hypothesis is false.” 1. NHST – If reject null – what do you conclude? The null is false – i.e., Experimental ≠ Control (NHST doesn’t indicate how different the groups are, just that they’re not equal) 2. Effect size – indicates how different the groups are

NHST vs. Effect Size (continued) Basic Question of Significance Testing (NHST) – Is there an effect? - Yes or No Basic Question of Effect sizes – How big is the effect? - A question of degree

Effect Sizes in Power Analysis Effect sizes play a fundamental role in power analysis – To conduct a power analysis, the effect size must be estimated. (We’ll examine several effect size measures shortly.)

Effect Sizes in Power Analysis (continued) Different effect sizes are often used for different statistical procedures (t tests, ANOVA, Correlation, etc.)

Effect Sizes – Mean Differences Effect size of the difference between two means Example #1 – IQ scores: group 1 = 115, group 2 = 105 Effect size = mean group 1 – mean group 2 = = 10 IQ points = = 10 IQ points Effect size of 10 IQ points (notice the effect size indicates how different the groups are)

Effect Sizes – Mean Differences (continued) Example #2: Stress – breathing exercises vs. control breathing exercises = 60, control = 67 breathing exercises = 60, control = 67 (higher scores = greater stress) Effect size = 60 – 67 = –7; effect size of 7 points (Often the absolute value for an effect size is reported.)

Effect Sizes – Mean Differences (continued) Problems with mean difference approach: 1. When different scales are used (with different M and SD) to measure the same construct, the results of different studies cannot be meaningfully compared (comparing apples and oranges). 2. Power analysis requires a standardized or “scale free” measure of effect size.

Standardized Measures of Effect Size t tests – Cohen’s d ANOVA – η 2 (eta-square) or R 2 Correlation – Pearson’s r Multiple Regression – R 2 Chi–Square Test of Independence – Cramer’s Phi

Cohen’s d Used for all t tests (one sample t, independent samples t, dependent samples t) A standardized or “scale free” measure of mean differences

Cohen’s d (continued)

Example: Examining the effect of a drug on pain levels - Pain questionnaire on a scale administered to people suffering from back pain (higher score = greater pain). - old drug – 25, new drug – 20 - standard deviation of 10.

Cohen’s d (continued) d =.5 (Interpret in terms of standard deviation differences - like z-scores) Those who took the new drug had pain levels that were.5 standard deviations lower than those who took the old drug.

Cohen’s conventions for d Magnituded Small.20 1/5 of a std. dev. difference Medium.50 1/2 of a std. dev. difference Large.80 8/10 of a std. dev. difference Cohen’s standards for small, medium, and large effect sizes for the independent samples t test, one sample t test, and the dependent samples t test.

Effect Size (d) Power.20 – Small.50 – Medium.80 – Large (388) 32 (64) 14 (28) (492) 41 (82) 17 (34) (620) 51 (102) 21 (42) (788) 64 (128) 26 (52) Sample size required per group (with total N listed in parentheses) for a given level of power and effect size for the Independent Samples t test (α =.05, 2-tailed). Note: Assumes equal n per group. Power Table – Independent t (abridged)

Cohen’s conventions for Pearson’s r Magnituder Small.10 Medium.30 Large.50 Cohen’s standards for small, medium, and large effect sizes for the Pearson r correlation coefficient.

Effect Size (r) Power.10 – Small.30 – Medium.50 – Large Sample size (N) required for a given level of power and effect size for the Pearson r correlation coefficient (α =.05, 2-tailed). Power Table – Pearson’s r (abridged)

Cohen’s Conventions for Cramer’s Phi/w (Chi-Square) Magnitude Phi, w Small.10 Medium.30 Large.50 Cohen’s standards for small, medium, and large effect sizes for the chi-square test of independence. Note: Applies only to 2 x k tables, where k ≥ 2.

Effect Size (Phi, w) Power.10 – Small.30 – Medium.50 – Large Sample size required for a given level of power and effect size for the chi-square test of independence (α =.05, df = 1, i.e., 2 x 2 table). Power Table – Chi-Square Test of Independence (abridged)

Effect Size - ANOVA k = the number of groups, m i = the mean of the ith group, m = the grand (overall) mean, and σ = the average (or pooled) standard deviation.

Effect Size - ANOVA

Cohen’s Conventions for ANOVA (f and η 2 ) Magnitudef η2η2η2η2 Small Medium Large Cohen’s standards for small, medium, and large effect sizes for the one-way between subjects analysis of variance (ANOVA).

Effect Size (f, η 2 ) Power f =.10; η 2 =.01 Small f =.25; η 2 =.06 Medium f =.40; η 2 =.14 Large (501) 28 (84) 12 (36) (627) 35 (105) 14 (42) (774) 43 (129) 18 (54) (969) 53 (159) 22 (66) Sample size (N) required per group (and total N) for a given level of power and effect size for the one-way between subjects ANOVA (α =.05). The power values provided are based on 3 groups; larger N is required to achieve the same level of power as the number of groups increase. Power Table – ANOVA (abridged)

Effect Size – Multiple Regression

Cohen’s conventions for Multiple Regression (f 2 and R 2 ) Magnitude f2f2f2f2 R2R2R2R2 Small Medium Large Cohen’s standards for small, medium, and large effect sizes for multiple regression.

Effect Size (f 2, R 2 ) Power f 2 =.02; R 2 =.02 Small f 2 =.15; R 2 =.13 Medium f 2 =.35; R 2 =.26 Large Sample size (N) required for a given level of power and effect size for multiple regression (α =.05). The power values provided are based on 3 predictors (IVs); larger N is required to achieve the same level of power as the number of predictors increase. Power Table – Multiple Regression ( abridged)

Estimating Power using GPower GPower illustration…