Hypothesis Testing: p-value

Slides:



Advertisements
Similar presentations
Hypothesis Testing: Intervals and Tests
Advertisements

Bootstrap Distributions Or: How do we get a sense of a sampling distribution when we only have ONE sample?
Chapter 12 Tests of Hypotheses Means 12.1 Tests of Hypotheses 12.2 Significance of Tests 12.3 Tests concerning Means 12.4 Tests concerning Means(unknown.
Our goal is to assess the evidence provided by the data in favor of some claim about the population. Section 6.2Tests of Significance.
Hypothesis Testing I 2/8/12 More on bootstrapping Random chance
Inference: Neyman’s Repeated Sampling STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical Science.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: p-value STAT 250 Dr. Kari Lock Morgan SECTION 4.2 Randomization distribution p-value.
Hypothesis Testing: Hypotheses
Introduction to Hypothesis Testing AP Statistics Chap 11-1.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Significance STAT 101 Dr. Kari Lock Morgan SECTION 4.3, 4.5 Significance level (4.3)
Testing Hypotheses About Proportions Chapter 20. Hypotheses Hypotheses are working models that we adopt temporarily. Our starting hypothesis is called.
Section 7.1 Hypothesis Testing: Hypothesis: Null Hypothesis (H 0 ): Alternative Hypothesis (H 1 ): a statistical analysis used to decide which of two competing.
Section 4.4 Creating Randomization Distributions.
Chapter 9 Hypothesis Testing.
BCOR 1020 Business Statistics Lecture 20 – April 3, 2008.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Significance STAT 250 Dr. Kari Lock Morgan SECTION 4.3 Significance level (4.3) Statistical.
Determining Statistical Significance
Introducing Inference with Simulation Methods; Implementation at Duke University Kari Lock Morgan Department of Statistical Science, Duke University
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: p-value STAT 101 Dr. Kari Lock Morgan 9/25/12 SECTION 4.2 Randomization distribution.
Using Bootstrap Intervals and Randomization Tests to Enhance Conceptual Understanding in Introductory Statistics Kari Lock Morgan Department of Statistical.
Hypothesis Testing III 2/15/12 Statistical significance Errors Power Significance and sample size Section 4.3 Professor Kari Lock Morgan Duke University.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Hypotheses STAT 101 Dr. Kari Lock Morgan SECTION 4.1 Statistical test Null and alternative.
Randomization Tests Dr. Kari Lock Morgan PSU /5/14.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Building Conceptual Understanding of Statistical Inference Patti Frazer Lock Cummings Professor of Mathematics St. Lawrence University
Understanding the P-value… Really! Kari Lock Morgan Department of Statistical Science, Duke University with Robin Lock, Patti Frazer.
Testing Hypotheses About Proportions
Using Simulation Methods to Introduce Inference Kari Lock Morgan Duke University In collaboration with Robin Lock, Patti Frazer Lock, Eric Lock, Dennis.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Significance STAT 101 Dr. Kari Lock Morgan 9/27/12 SECTION 4.3 Significance level Statistical.
Unit 8 Section : z Test for a Mean  Many hypotheses are tested using the generalized statistical formula: Test value = (Observed Value)-(expected.
Introducing Inference with Simulation Methods; Implementation at Duke University Kari Lock Morgan Department of Statistical Science, Duke University
LECTURE 19 THURSDAY, 14 April STA 291 Spring
Using Randomization Methods to Build Conceptual Understanding of Statistical Inference: Day 2 Lock, Lock, Lock Morgan, Lock, and Lock MAA Minicourse- Joint.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Unit 5: Hypothesis Testing.
Agresti/Franklin Statistics, 1 of 122 Chapter 8 Statistical inference: Significance Tests About Hypotheses Learn …. To use an inferential method called.
Lecture 16 Dustin Lueker.  Charlie claims that the average commute of his coworkers is 15 miles. Stu believes it is greater than that so he decides to.
Significance Test A claim is made. Is the claim true? Is the claim false?
Lecture 17 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Statistics: Unlocking the Power of Data Lock 5 Bootstrap Intervals Dr. Kari Lock Morgan PSU /12/14.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Lecture 18 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
Lecture 17 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Conclusions STAT 250 Dr. Kari Lock Morgan SECTION 4.3 Significance level (4.3) Statistical.
Logic and Vocabulary of Hypothesis Tests Chapter 13.
Chapter 9: Hypothesis Tests Based on a Single Sample 1.
AP Statistics Section 11.1 B More on Significance Tests.
Early Inference: Using Randomization to Introduce Hypothesis Tests Kari Lock, Harvard University Eric Lock, UNC Chapel Hill Dennis Lock, Iowa State Joint.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Hypotheses STAT 250 Dr. Kari Lock Morgan SECTION 4.1 Hypothesis test Null and alternative.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
+ Chapter 9 Testing a Claim 9.1Significance Tests: The Basics 9.2Tests about a Population Proportion 9.3Tests about a Population Mean.
+ Unit 6: Comparing Two Populations or Groups Section 10.2 Comparing Two Means.
Testing a Single Mean Module 16. Tests of Significance Confidence intervals are used to estimate a population parameter. Tests of Significance or Hypothesis.
Today: Hypothesis testing p-value Example: Paul the Octopus In 2008, Paul the Octopus predicted 8 World Cup games, and predicted them all correctly Is.
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Slide 20-1 Copyright © 2004 Pearson Education, Inc.
Section 9.1 First Day The idea of a significance test What is a p-value?
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: p-value STAT 250 Dr. Kari Lock Morgan SECTION 4.2 p-value.
Statistics: Unlocking the Power of Data Lock 5 Section 4.1 Introducing Hypothesis Tests.
+ Chapter 9 Testing a Claim 9.1Significance Tests: The Basics 9.2Tests about a Population Proportion 9.3Tests about a Population Mean.
+ Homework 9.1:1-8, 21 & 22 Reading Guide 9.2 Section 9.1 Significance Tests: The Basics.
Hypothesis Testing: Conclusions
Introducing Hypothesis Tests
Unit 5: Hypothesis Testing
Measuring Evidence with p-values
Introducing Hypothesis Tests
Statistical Test A test of significance is a formal procedure for comparing observed data with a claim (also called a hypothesis) whose truth we want to.
Presentation transcript:

Hypothesis Testing: p-value 2/13/12 Randomization distribution p-value Statistical significance Section 4.2 Professor Kari Lock Morgan Duke University

To Do Project 1 Proposal (due Wednesday) Homework 4 (due Monday) NO LATE HOMEWORK ACCEPTED!

Exercise and Pulse In the actual experiment, the people who exercised for 5 seconds had an average pulse of 85.5. Those who did not exercise had an average pulse of 69.6. Is this sample difference larger than we would see, just by random chance, if exercising for 5 seconds did not increase pulse rate?

Exercise and Pulse p-value www.lock5stat.com/statkey/ If 5 seconds of exercise does not increase pulse rate, we would see a sample difference as extreme as 15.9 in only 0.002 of all such experiments.

Cocaine Addiction In a randomized experiment on treating cocaine addiction, 48 people were randomly assigned to take either Desipramine (a new drug), or Lithium (an existing drug), and then followed to see who relapsed Question of interest: Is Desipramine better than Lithium at treating cocaine addiction?

Cocaine Addiction What is the statistic of interest? What are the hypotheses of this test?

R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R 1. Randomly assign units to treatment groups Desipramine Lithium R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R

3. Observe relapse counts in each group 2. Conduct experiment 3. Observe relapse counts in each group R = Relapse N = No Relapse 1. Randomly assign units to treatment groups Desipramine Lithium R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R N R N R R R R R R R R R R R R R N N R N N N N N N N N R R R R R R R R R R R R N N N N N N N N N N N N N N N N N N N N N N N N 10 relapse, 14 no relapse 18 relapse, 6 no relapse

Cocaine Addiction Two options: H0 is true (the drugs cause the same proportion of relapses) Ha is true (Desipramine causes a smaller proportion of relapses than Lithium) In situation (1), how would you explain the observed difference in the proportion of relapses? How can we see whether this is a plausible explanation?

Measuring Evidence against H0 To see if a statistic provides evidence against H0, we need to see what kind of sample statistics we would observe, just by random chance, if H0 were true

Randomization Test Assume the null hypothesis is true Simulate new randomizations For each, calculate the statistic of interest Find the proportion of these simulated statistics that are as extreme as your observed statistic

R R R R R R R R R R R R R R R R N N R R R R R R N N N N N N R R R R R R N N N N N N N N N N N N 10 relapse, 14 no relapse 18 relapse, 6 no relapse

R R R R R R R R R R R R R R R R N N R R R R R R N N N N N N R R R R R R N N N N N N N N N N N N Simulate another randomization Desipramine Lithium R N R N N N N R R R R R R R N R R N N N R N R R R N N R N R R N R N N N R R R N R R R R 16 relapse, 8 no relapse 12 relapse, 12 no relapse

Simulate another randomization Desipramine Lithium R R R R R R R R R R R R R N R R N N R R R R R R R R N R N R R R R R R R R N R N R R N N N N N N 17 relapse, 7 no relapse 11 relapse, 13 no relapse

Proportion as extreme as observed statistic Cocaine Addiction Proportion as extreme as observed statistic p-value www.lock5stat.com/statkey observed statistic The probability of getting a sample difference in proportions as low as -0.33 just by random chance, if the drugs really are equally effective, is 0.02

p-value Based on a randomization distribution, the p-value is the proportion of statistics that are more extreme than that observed This is the area in the tail(s) beyond the observed statistic in the randomization distribution Which tail(s) to include depends on the alternative hypothesis

Alternative Hypothesis A one-sided alternative contains either > or < A two-sided alternative contains ≠ The alternative hypothesis depends on the research question of interest For a one-sided alternative, the p-value is the proportion in the tail specified by Ha For a two-sided alternative, the p-value is twice the proportion in the smallest tail

Sleep or Caffeine for Memory? Students were given words to memorize, and then randomly assigned to either take a 90 min nap, or to stay awake and take a caffeine pill. 2 ½ hours later, all students were tested on their recall ability. Is sleep or caffeine better for memory? How extreme would this be if H0 were true??? Mednick, Cai, Kanady, and Drummond (2008). “Comparing the benefits of caffeine, naps and placebo on verbal, motor and perceptual memory,” Behavioral Brain Research, 193, 79-86.

Sleep or Caffeine for Memory? www.lock5stat.com/statkey p-value = 2 × 0.022 = 0.044 If sleep and caffeine are equally effective for memory, we would get a sample difference in means as extreme as 3 in about 0.044 of all experiments.

Infections in Childbirth The Center for Disease Control (CDC) conducted a randomized trial in South Africa in which half of women in labor are randomly assigned to be treated with a wipe containing chlorohexidine, and the other half with a sterile wipe (control) Source: Eriksen, Sweeten, Blanco (1997). “Chlorohexidine vs Sterile Vaginal Wash During Labor to Prevent Peripartum Infection,” American Journal of Obstetrics and Gynecology, 176:426-430.

Infections in Childbirth What can you conclude about the p-value? p-value < 0.5 p-value > 0.5 Nothing

Alternative Hypothesis The p-value is the probability of getting results are extreme as your sample statistic, if the null hypothesis is true “As extreme as” is defined in the direction of the alternative hypothesis. (for two-sided alternatives, consider both tails) If your sample statistic does not support your alternative hypothesis, there is no point in going through the test!

p-value What can you conclude about the p-value? p-value = 0 Nothing

p-value What can you conclude about the p-value? p-value < 0.5 Nothing

Multiple Sclerosis and Sunlight It is believed that sunlight offers some protection against multiple sclerosis, but the reason is unknown To find out, researchers randomly assigned mice to one of three treatments: Control (nothing) Vitamin D Supplements UV Light All mice were injected with proteins known to induce a mouse form of MS, and they observed which mice got MS Seppa, Nathan. “Sunlight may cut MS risk by itself”, Science News, April 24, 2010 pg 9, reporting on a study appearing March 22, 2010 in the Proceedings of the National Academy of Science.

Multiple Sclerosis and Sunlight In testing whether UV light provides protection against MS in mice, what are the null and alternative hypotheses? pUV = proportion of mice exposed to UV light that get MS pC = proportion of mice not exposed to UV light that get MS H0 : pUV – pC > 0, Ha: pUV – pC = 0 H0 : pUV – pC < 0, Ha: pUV – pC = 0 H0 : pUV – pC = 0, Ha: pUV – pC > 0 H0 : pUV – pC = 0, Ha: pUV – pC < 0

Multiple Sclerosis and Sunlight In testing whether UV light provides protection against MS in mice, the experiment yielded a p-value of 0.002. What would you conclude? H0 is probably not true  UV light does provide protection against MS H0 is probably not true  UV light does not provide protection against MS Ha is probably not true  UV light does provide protection against MS Ha is probably not true  UV light does not provide protection against MS Nothing

Multiple Sclerosis and Sunlight In testing whether Vitamin D provides protection against MS in mice, the experiment yielded a p-value of 0.47. What would you conclude? H0 is probably not true  Vitamin D does provide protection against MS H0 is probably not true  Vitamin D does not provide protection against MS Ha is probably not true  Vitamin D does provide protection against MS Ha is probably not true  Vitamin D does not provide protection against MS Nothing

Strength of Evidence The p-value is the probability of getting results as extreme as those observed, if the null hypothesis is true The p-value measures evidence against the null hypothesis p-value

The smaller the p-value, the stronger the evidence against Ho.

How small is small enough? Hypothesis Testing If the p-value is small enough, we reject the null hypothesis, in favor of the alternative hypothesis How small is small enough?

Statistical Significance The significance level, , is the threshold below which the p-value is deemed small enough to reject the null hypothesis If the p-value is less than , the results are statistically significant, and we reject the null hypothesis in favor of the alternative mention t-shirt

Statistical Significance www.xkcd.com

Formal Decisions For a given significance level, , p-value <   Reject Ho p-value >   Do not Reject Ho

Statistical Conclusions Strength of evidence against H0: Formal decision of hypothesis test, based on  = 0.05 :

Formal Decisions A formal hypothesis test has only two possible conclusions: The p-value is small: reject the null hypothesis in favor of the alternative The p-value is not small: do not reject the null hypothesis

Elephant Example Example: H0 : X is an elephant Ha : X is not an elephant Would you conclude, if you get the following data? X has four legs X walks on two legs

Summary A randomization distribution shows the distribution of statistics that would be observed if H0 were true A p-value is the probability of getting a statistic as extreme as that observed, if H0 is true The p-value measures the strength of evidence against the null hypothesis Results are statistically significant if the p-value is less than the significance level, α In making formal decisions, reject H0 if the p-value is less than α, otherwise do not reject H0