Hypothesis Testing and Confidence Intervals (Part 2): Cohen’s d, Logic of Testing, and Confidence Intervals Lecture 9 Justin Kern April 9, 2018.

Slides:



Advertisements
Similar presentations
Topics Today: Case I: t-test single mean: Does a particular sample belong to a hypothesized population? Thursday: Case II: t-test independent means: Are.
Advertisements

Statistical Significance What is Statistical Significance? What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant?
HYPOTHESIS TESTING Four Steps Statistical Significance Outcomes Sampling Distributions.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Statistical Significance What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant? How Do We Know Whether a Result.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
BCOR 1020 Business Statistics Lecture 21 – April 8, 2008.
Chapter 9 Hypothesis Testing.
BCOR 1020 Business Statistics Lecture 20 – April 3, 2008.
PY 427 Statistics 1Fall 2006 Kin Ching Kong, Ph.D Lecture 6 Chicago School of Professional Psychology.
The t Tests Independent Samples.
Inference about Population Parameters: Hypothesis Testing
Chapter 5For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Suppose we wish to know whether children who grow up in homes without access to.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Overview of Statistical Hypothesis Testing: The z-Test
Overview Definition Hypothesis
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
Hypothesis testing is used to make decisions concerning the value of a parameter.
Mid-semester feedback In-class exercise. Chapter 8 Introduction to Hypothesis Testing.
The Probability of a Type II Error and the Power of the Test
Jan 17,  Hypothesis, Null hypothesis Research question Null is the hypothesis of “no relationship”  Normal Distribution Bell curve Standard normal.
1 Power and Sample Size in Testing One Mean. 2 Type I & Type II Error Type I Error: reject the null hypothesis when it is true. The probability of a Type.
1 Lecture note 4 Hypothesis Testing Significant Difference ©
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 1): Two-tail Tests & Confidence Intervals Fall, 2008.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Hypothesis Testing Errors. Hypothesis Testing Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean.
Statistical Techniques
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Major Steps. 1.State the hypotheses.  Be sure to state both the null hypothesis and the alternative hypothesis, and identify which is the claim. H0H0.
1 Section 8.2 Basics of Hypothesis Testing Objective For a population parameter (p, µ, σ) we wish to test whether a predicted value is close to the actual.
Power of a test. power The power of a test (against a specific alternative value) Is In practice, we carry out the test in hope of showing that the null.
Chapter 9: Hypothesis Tests for One Population Mean 9.2 Terms, Errors, and Hypotheses.
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Hypothesis Testing and Statistical Significance
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Copyright © 2009 Pearson Education, Inc. 9.2 Hypothesis Tests for Population Means LEARNING GOAL Understand and interpret one- and two-tailed hypothesis.
+ Homework 9.1:1-8, 21 & 22 Reading Guide 9.2 Section 9.1 Significance Tests: The Basics.
9.3 Hypothesis Tests for Population Proportions
Chapter 9 Hypothesis Testing.
Hypothesis Testing: One Sample Cases
Chapter 5: Introduction to Statistical Inference
Unit 5: Hypothesis Testing
Tests of Significance The reasoning of significance tests
Hypothesis Testing: Preliminaries
Hypothesis Testing and Confidence Intervals (Part 2): Cohen’s d, Logic of Testing, and Confidence Intervals Lecture 9 Justin Kern October 17 and 19, 2017.
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Hypothesis Testing: Hypotheses
AP STATISTICS REVIEW INFERENCE
Hypothesis Tests for a Population Mean,
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Test Review: Ch. 7-9
Chapter 9 Hypothesis Testing.
Chapter Nine Part 1 (Sections 9.1 & 9.2) Hypothesis Testing
Significance Tests: The Basics
Chapter 3 Probability Sampling Theory Hypothesis Testing.
Significance Tests: The Basics
Another Example Consider a very popular computer game played by millions of people all over the world. The average score of the game is known to be
Virtual University of Pakistan
CHAPTER 9 Testing a Claim
Power Section 9.7.
Power of a test.
Chapter Outline Inferences About the Difference Between Two Population Means: s 1 and s 2 Known.
STA 291 Summer 2008 Lecture 18 Dustin Lueker.
Chapter 9: Significance Testing
Type I and Type II Errors
STA 291 Spring 2008 Lecture 17 Dustin Lueker.
Introduction To Hypothesis Testing
Presentation transcript:

Hypothesis Testing and Confidence Intervals (Part 2): Cohen’s d, Logic of Testing, and Confidence Intervals Lecture 9 Justin Kern April 9, 2018

Measuring Effect Size: Cohen’s d Simply finding whether a hypothesis test is significant or not tells us very little. It only tells whether or not an effect exists in a population. It does not tell us how much of an effect there actually is! Definition: Effect size is a statistical measure of the size of an effect—how far the sample statistic and a population parameter—in a population. This allows researchers to describe how far scores have shifted in the population, or the percent of variability in scores that can be explained by a given variable.

Measuring Effect Size: Cohen’s d One common measure of effect size is Cohen’s d. 𝑑= 𝑥 −𝜇 𝜎 The direction of shift is determined by the sign of d. 𝑑<0 means that the scores are shifted to the left of 𝜇. 𝑑>0 means that the scores are shifted to the right of 𝜇. Larger absolute values of d mean that there is a larger effect. There are standard conventions for describing effect size, but take these with a grain of salt. Small effect: 𝑑 <0.2 Medium effect: 0.2< 𝑑 <0.8 Large effect: 𝑑 >0.8

Example A developmental psychologist claims that a training program he developed according to a theory should improve problem- solving ability. For a population of 7-year-olds, the mean score 𝜇 on a standard problem-solving test is known to be 80 with a standard deviation of 10. To test the training program, 26 7-year-olds are selected at random, and their mean score is found to be 82. Let’s assume the population of scores is normally distributed. Can we conclude, at an 𝛼=.05 level of significance, that the program works? Assume the sd of scores after the training program is also 10. We hypothesize that the test improves problem-solving. What are the null and alternative hypotheses? 𝐻 0 :𝜇=80 vs. 𝐻 1 :𝜇>80 The data are normally distributed, so 𝑋 ~𝑁 𝜇, 𝜎 2 𝑛 A z-statistic can then be formed: 𝑧= 𝑥 − 𝜇 0 𝜎 𝑛 = 82−80 10 26 ≈1.0198 Critical value method: 𝛼=.05→ 𝑧 𝛼 =1.644854 Since 𝑧=1.0198<1.644854= 𝑧 𝛼 , then we cannot reject 𝐻 0 . Substantively, this means that a mean score of 82 is likely to occur by chance, so we cannot say that the training program improved problem-solving skills in 7-year-olds. What is the effect size as measured by Cohen’s d? 𝑑= 𝑥 −𝜇 𝜎 = 82−80 10 =0.20 Small effect size Show on the board the duality of using p-values and using the critical value method.

Basic Hypothesis Testing Consider a very popular computer game played by millions of people all over the world. The average score of the game is known to be 𝜇=5000 and the standard deviation is known to be σ=1000. Suppose the game developers have just created a one-week tutorial to help players increase their performance. In order to find out if it actually works, they administer the tutorial to a random sample of 𝑛= 100 players, whose average score is calculated to be 𝑥 =5200 after one week. Does the tutorial actually work, or did those players happen to get an average score as high as 5200 just by chance? 𝐻 0 : 𝜇=5000 𝐻 1 : 𝜇>5000 𝑧= 𝑥 −𝜇 𝜎/ 𝑛 = 5200−5000 1000/ 100 =2 p-value =𝑃 𝑍>𝑧 =𝑃 𝑍>2 =0.0228 Interpretation of p-value: Given that the tutorial actually doesn’t work ( 𝐻 0 ), there is only a 2.28% chance that that a random sample of 100 players gets an average score as high as 5200 (or higher). How low should the p-value be before we are convinced that the tutorial does work (reject 𝐻 0 and accept 𝐻 1 )? This consideration is somewhat arbitrary, but common standards are 𝛼=0.05 or 𝛼=0.01 Note that 𝛼 is the cutoff p-value (below which we reject 𝐻 0 , above which we fail to reject 𝐻 0 ) If we use 𝛼=0.05, then (p-value =0.0228) < (𝛼=0.05), so reject 𝐻 0 and accept 𝐻 1 (tutorial does work) If we use 𝛼=0.01, then (p-value =0.0228) > (𝛼=0.01), so fail to reject 𝐻 0 and accept 𝐻 0 (tutorial doesn’t work)

Type I Error Let’s say that in reality 𝐻 0 is true (i.e., 𝐻 1 is false, or tutorial doesn’t work) Suppose we repeat the experiment over and over again (repeatedly draw random samples of 100 players and put them through the tutorial) and conduct a hypothesis test each time using 𝛼=0.05 We would falsely reject 𝐻 0 (mistakenly decide that the tutorial works) 5% of the time just due to chance In other words, 𝛼 is the probability of rejecting 𝐻 0 when in reality it is true (incorrectly deciding that the tutorial works when it actually doesn’t) Type I Error: rejecting 𝐻 0 when, in fact, it is true 𝛼=𝑃(Type I Error) Ultimately, 𝛼 is the probability of Type I Error we are willing to live with 𝑧 𝛼 = 𝑥 𝛼 −𝜇 𝜎/ 𝑛 → 𝑥 𝛼 = 𝑧 𝛼 𝜎 𝑛 +𝜇 𝛼=0.05 → 𝑧 𝛼 = 𝑧 0.05 =1.645 → 1.645= 𝑥 0.05 −5000 1000/ 100 → 𝑥 0.05 =1.645 1000 100 +5000=5164.5 𝛼=0.05 𝑥 0.05 =5164.5

Type II Error Let’s say that in reality 𝐻 0 is false (i.e., 𝐻 1 is true, or tutorial does work) Under the false premise of 𝐻 0 , the population mean of players who take the tutorial is 𝜇 0 =5000 (tutorial doesn’t work) and the standard deviation of the sampling distribution is 𝜎 𝑛 = 1000 100 =100 Suppose that, in fact, the tutorial actually increases a player’s score by 300 points on average Under the true premise of 𝐻 1 , the population mean of players who take the tutorial is 𝜇 1 =5300 (tutorial increases mean score by 300), but assume the standard error stays the same at 𝜎 𝑛 = 1000 100 =100 𝐻 0 𝐻 1 𝜇 0 =5000 𝜇 1 =5300 𝜎/ 𝑛 =100 𝜎/ 𝑛 =100

Type II Error Type II Error: failing to reject 𝐻 0 when, in fact, it is false (incorrectly deciding tutorial doesn’t work when it actually does) 𝛽=𝑃(Type II Error) 𝑥 𝛼 = 𝑧 𝛼 𝜎 𝑛 + 𝜇 0 → 𝑥 0.05 =1.645 100 +5000=5164.5 𝑧 𝛽 = 𝑥 𝛼 − 𝜇 1 𝜎/ 𝑛 = 5164.5−5300 100 =−1.355 𝛽=𝑃 𝑍< 𝑧 𝛽 =𝑃 𝑍<−1.355 =0.0877 𝐻 0 𝐻 1 𝜇 0 =5000 𝜇 1 =5300 𝜎/ 𝑛 =100 𝜎/ 𝑛 =100 𝜷=𝟎.𝟎𝟖𝟕𝟕 𝛼=0.05 𝑥 0.05 =5164.5

Power Power: probability of rejecting 𝐻 0 when, in fact, it is false (correctly deciding tutorial works when it actually does) Power is the complement of 𝛽 Power = 1−𝛽=1−0.0877=0.9123 𝐻 0 𝐻 1 Power =𝟎.𝟗𝟏𝟐𝟑 𝜇 0 =5000 𝜇 1 =5300 𝜎/ 𝑛 =100 𝜎/ 𝑛 =100 𝜷=𝟎.𝟎𝟖𝟕𝟕 𝑥 0.05 =5164.5