The ability to find a difference when one really exists.

Slides:



Advertisements
Similar presentations
Introduction to Hypothesis Testing
Advertisements

Statistical Issues in Research Planning and Evaluation
Statistical Significance What is Statistical Significance? What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant?
HYPOTHESIS TESTING Four Steps Statistical Significance Outcomes Sampling Distributions.
Behavioural Science II Week 1, Semester 2, 2002
Using Statistics in Research Psych 231: Research Methods in Psychology.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Cal State Northridge  320 Ainsworth Sampling Distributions and Hypothesis Testing.
Statistical Significance What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant? How Do We Know Whether a Result.
Power and Effect Size.
Lecture 11 Psyc 300A. Null Hypothesis Testing Null hypothesis: the statistical hypothesis that there is no relationship between the variables you are.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Inferences About Means of Single Samples Chapter 10 Homework: 1-6.
Inferences About Means of Single Samples Chapter 10 Homework: 1-6.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 11: Power.
Using Statistics in Research Psych 231: Research Methods in Psychology.
Major Points Formal Tests of Mean Differences Review of Concepts: Means, Standard Deviations, Standard Errors, Type I errors New Concepts: One and Two.
The t Tests Independent Samples.
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
Inferential Statistics
Introduction to Testing a Hypothesis Testing a treatment Descriptive statistics cannot determine if differences are due to chance. A sampling error occurs.
Inferential Statistics
AM Recitation 2/10/11.
Hypothesis Testing.
Copyright © 2012 by Nelson Education Limited. Chapter 8 Hypothesis Testing II: The Two-Sample Case 8-1.
1/2555 สมศักดิ์ ศิวดำรงพงศ์
Comparing Means From Two Sets of Data
Statistics Primer ORC Staff: Xin Xin (Cindy) Ryan Glaman Brett Kellerstedt 1.
Statistical Power The ability to find a difference when one really exists.
1 Power and Sample Size in Testing One Mean. 2 Type I & Type II Error Type I Error: reject the null hypothesis when it is true. The probability of a Type.
Hypothesis Testing Quantitative Methods in HPELS 440:210.
Chapter 9 Hypothesis Testing II: two samples Test of significance for sample means (large samples) The difference between “statistical significance” and.
Copyright © 2012 by Nelson Education Limited. Chapter 7 Hypothesis Testing I: The One-Sample Case 7-1.
Statistics for the Behavioral Sciences Second Edition Chapter 11: The Independent-Samples t Test iClicker Questions Copyright © 2012 by Worth Publishers.
Statistics (cont.) Psych 231: Research Methods in Psychology.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
1.State your research hypothesis in the form of a relation between two variables. 2. Find a statistic to summarize your sample data and convert the above.
Chapter 9: Testing Hypotheses Overview Research and null hypotheses One and two-tailed tests Type I and II Errors Testing the difference between two means.
STEP BY STEP Critical Value Approach to Hypothesis Testing 1- State H o and H 1 2- Choose level of significance, α Choose the sample size, n 3- Determine.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
STEP BY STEP Critical Value Approach to Hypothesis Testing 1- State H o and H 1 2- Choose level of significance, α Choose the sample size, n 3- Determine.
Statistics (cont.) Psych 231: Research Methods in Psychology.
CHAPTER 7: TESTING HYPOTHESES Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for a Diverse Society.
Inferential Statistics Psych 231: Research Methods in Psychology.
Chapter 8 Introducing Inferential Statistics.
Hypothesis Testing: One Sample Cases
INF397C Introduction to Research in Information Studies Spring, Day 12
Statistics for the Social Sciences
Is this quarter fair?. Is this quarter fair? Is this quarter fair? How could you determine this? You assume that flipping the coin a large number of.
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Central Limit Theorem, z-tests, & t-tests
Hypothesis Testing: Hypotheses
Design Considerations: Independent Samples v. Repeated Measures
Decision Errors and Power
Significance and t testing
Hypothesis Testing.
Wednesday, November 16 Statistical Power
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Inferential Statistics
Chapter 7: Statistical Issues in Research planning and Evaluation
Hypothesis Testing and Confidence Intervals (Part 2): Cohen’s d, Logic of Testing, and Confidence Intervals Lecture 9 Justin Kern April 9, 2018.
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Section 11.1: Significance Tests: Basics
Type I and Type II Errors
Rest of lecture 4 (Chapter 5: pg ) Statistical Inferences
Presentation transcript:

The ability to find a difference when one really exists. Statistical Power The ability to find a difference when one really exists.

Statistical Power The probability of rejecting a false null hypothesis (H0). The probability of obtaining a value of t (or z) that is large enough to reject H0 when H0 is actually false We always test the null hypothesis against an alternative/research hypothesis Usually the goal is to reject the null hypothesis in favor of the alternative

Why is Power Important? As researchers, we put a lot of effort into designing and conducting our research. This effort may be wasted if we do not have sufficient power in our studies to find the effect of interest.

Type I versus Type II Error A researcher can make two types of error when reporting the results of a statistical test. Actual State of Reality Researcher Decision H0 is true H0 is false Reject H0 Type I error () Correct Decision (1 – β) Accept H0 Correct Decision (1 – ) Type II error (β)

Actual State of Reality Type I Error The probability of a type I Error is determined by the alpha () level set by the researcher Actual State of Reality Researcher Decision H0 is true H0 is false Reject H0 Type I error () Correct Decision (1 – β) Accept H0 Correct Decision (1 – ) Type II error (β)

Actual State of Reality Type II Error A type II Error (β) results when the researcher finds that there isn’t a difference, when there really is one. Actual State of Reality Researcher Decision H0 is true H0 is false Reject H0 Type I error () Correct Decision (1 – β) Accept H0 Correct Decision (1 – ) Type II error (β)

Actual State of Reality Statistical Power Power is the ability of a test to detect a real effect. It is measured as a probability that equals 1 – β. Actual State of Reality Researcher Decision H0 is true H0 is false Reject H0 Type I error () Correct Decision (1 – β) Accept H0 Correct Decision (1 – ) Type II error (β)

Power depends on… To discuss power we need to understand the variables that affect its size. The alpha level set by the researcher The sample size (N) The effect size (e.g., Cohen’s d)

Power and Alpha () An increase in alpha, say from .05 to .1, artificially increases the power of a study. Increasing alpha reduces the risk of making a type II error, but increases that of a type I. Increasing the risk of making a type I error, in many cases, may be worse than making a type II error. E.g., replacing an effective chemotherapy drug with one that is, in reality, less effective.

Power and Sample Size (N) Power increases as N increases. The more independent scores that are measured or collected, the more likely it is that the sample mean represents the true mean. Prior to a study, researchers rearrange the power calculation to determine how many scores (subjects or N) are needed to achieve a certain level of power (usually 80%).

Power and Effect Size Effect size is a measure of the difference between the means of two groups of data. For example, the difference in mean jump ht. between samples of vball and bball players. As effect size increases, so does power. For example, if the difference in mean jump ht. was very large, then it would be very likely that a t-test on the two samples would detect that true difference.

A Little More on Effect Size While a p-value indicates the statistical significance of a test, the effect size indicates the “practical” significance. If the units of measurement are meaningful (e.g., jump height in cm), then the effect size can simply be portrayed as the difference between two means. If the units of measurement are not meaningful (questionnaire on behaviour), then a standardized method of calculating effect size is useful.

Cohen’s d Cohen’s d is a common effect size index It describes the difference between two means in terms of number of standard deviations The standard deviation (σpooled) represents a weighted average variance from both samples

Hypothetical Example To understand statistical power the following slides provide a hypothetical example. Assume that we know the actual effect size. The actual difference between the means.

Jump Height Example Basketball vs. Volleyball, who jumps higher? We have 16 athletes in each sample (N=16) We know the population means are: Basketball: Mean jump ht = 30  5.7 in. Volleyball: Mean jump ht = 36  5.7 in. Alpha = .05 Using the above information we can graphically demonstrate statistical power. Knowing there is a difference, how many times out of 100 tests would we be correct?

Step 1. What’s t-critical for the study? For what t score will we consider there to be a significant difference between bball and vball? We know, N=32 (df = 30),  =.05, and 2-tailed Use =tinv() in Excel tinv(.05/2, 30) = 2.36….t critical = 2.36

Use the independent t-test equation Step 2 -3 -4 -1 -2 1 2 3 4 t-distribution for 30 degrees of freedom  / 2 = .025 t critical: -2.36 t critical: +2.36  / 2 = .025 We know the mean jump height for bball is 30, then what would vball need to be to get t = 2.36? Use the independent t-test equation For both groups N = 16, Stdev = 5.7

Step 3 t critical: +2.36 -3 -4 -1 -2 1 2 3 4  / 2 = .025 t critical: -2.36 t-distribution for 30 degrees of freedom 24 22 28 26 30 32 34 36 38 Distribution of values ofXV if H0 is true (V = 30) and SEDiff is 2 Critical value ofXV: 34.8 in On the actual distribution ofXV,which has a mean of 36, what would be the t-value for 34.8 in? Can you calculate the probability (area) of getting a mean  34.8 if the real mean is 36?...Type 2 Error

Step 4 tdist(t,df,tails) tdist(.87,15,1) = .20 -3 -4 -1 -2 1 2 3 4 1 2 3 4 t-distribution for 15 degrees of freedom t = - 0.87 Distribution of values ofXV if V = 36 and SEMean is 1.43. 34.8 31.7 30.3 33.2 36 37.4 38.9 40.3 41.7 34.6 tdist(t,df,tails) Power (1- β) = .80 β = .20 tdist(.87,15,1) = .20

Step 4 Distribution of values ofX if H0 is true ( = 30) and SEDiff is 2. t critical: +2.36 -3 -4 -1 -2 1 2 3 4 24 22 28 26 30 32 34 36 38 Critical value ofX: 34.8 in  / 2 = .025 t critical: -2.36 t-distribution for 30 degrees of freedom β = .20 Power (1- β) = .80 Distribution of values ofXV if V = 36 and SEMean is 1.43. 34.8 31.7 30.3 33.2 36 37.4 38.9 40.3 41.7 34.6

Power Calculator http://www.stat.ubc.ca/~rollin/stats/ssize/n2.html