Confidence Intervals and Hypothesis Tests

Slides:



Advertisements
Similar presentations
Introducing Hypothesis Tests
Advertisements

Hypothesis Testing, Synthesis
Hypothesis Testing: Intervals and Tests
Hypothesis Testing An introduction. Big picture Use a random sample to learn something about a larger population.
Inference Sampling distributions Hypothesis testing.
Last Time (Sampling &) Estimation Confidence Intervals Started Hypothesis Testing.
Hypothesis Testing I 2/8/12 More on bootstrapping Random chance
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: p-value STAT 250 Dr. Kari Lock Morgan SECTION 4.2 Randomization distribution p-value.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Significance STAT 101 Dr. Kari Lock Morgan SECTION 4.3, 4.5 Significance level (4.3)
Chapter 16: Inference in Practice STAT Connecting Chapter 16 to our Current Knowledge of Statistics ▸ Chapter 14 equipped you with the basic tools.
STAT E100 Exam 2 Review.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Stat 301 – Day 28 Review. Last Time - Handout (a) Make sure you discuss shape, center, and spread, and cite graphical and numerical evidence, in context.
Today Today: Chapter 10 Sections from Chapter 10: Recommended Questions: 10.1, 10.2, 10-8, 10-10, 10.17,
Stat 350 Lab Session GSI: Yizao Wang Section 016 Mon 2pm30-4pm MH 444-D Section 043 Wed 2pm30-4pm MH 444-B.
Inferences About Means of Single Samples Chapter 10 Homework: 1-6.
Section 4.4 Creating Randomization Distributions.
Stat 217 – Day 15 Statistical Inference (Topics 17 and 18)
Chapter 6: Introduction to Formal Statistical Inference November 19, 2008.
CHAPTER 23 Inference for Means.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: p-value STAT 101 Dr. Kari Lock Morgan 9/25/12 SECTION 4.2 Randomization distribution.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Hypotheses STAT 101 Dr. Kari Lock Morgan SECTION 4.1 Statistical test Null and alternative.
Synthesis and Review 3/26/12 Multiple Comparisons Review of Concepts Review of Methods - Prezi Essential Synthesis 3 Professor Kari Lock Morgan Duke University.
1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 5: Generalisability of Social Research and the Role of Inference Dr Gwilym Pryce.
Unit 7b Statistical Inference - 2 Hypothesis Testing Using Data to Make Decisions FPP Chapters 27, 27, possibly 27 &/or 29 Z-tests for means Z-tests.
June 19, 2008Stat Lecture 12 - Testing 21 Introduction to Inference More on Hypothesis Tests Statistics Lecture 12.
Testing Hypotheses about a Population Proportion Lecture 29 Sections 9.1 – 9.3 Tue, Oct 23, 2007.
More Randomization Distributions, Connections
Essential Synthesis SECTION 4.4, 4.5, ES A, ES B
Statistics: Unlocking the Power of Data Lock 5 Synthesis STAT 250 Dr. Kari Lock Morgan SECTIONS 4.4, 4.5 Connecting bootstrapping and randomization (4.4)
Confidence Intervals Nancy D. Barker, M.S.. Statistical Inference.
Introduction to Statistical Inference Probability & Statistics April 2014.
Comparing Two Population Means
Hypothesis Testing for Proportions
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.2.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 8-3 Testing a Claim About a Proportion.
Agresti/Franklin Statistics, 1 of 122 Chapter 8 Statistical inference: Significance Tests About Hypotheses Learn …. To use an inferential method called.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Cautions STAT 250 Dr. Kari Lock Morgan SECTION 4.3, 4.5 Type I and II errors (4.3) Statistical.
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
1 Chapter 8 Hypothesis Testing 8.2 Basics of Hypothesis Testing 8.3 Testing about a Proportion p 8.4 Testing about a Mean µ (σ known) 8.5 Testing about.
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.1.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.1.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Section 3.3: The Story of Statistical Inference Section 4.1: Testing Where a Proportion Is.
Chapter 16: Inference in Practice STAT Connecting Chapter 16 to our Current Knowledge of Statistics ▸ Chapter 14 equipped you with the basic tools.
BPS - 3rd Ed. Chapter 141 Tests of significance: the basics.
Slide Slide 1 Section 8-4 Testing a Claim About a Mean:  Known.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Testing Hypotheses about a Population Proportion Lecture 29 Sections 9.1 – 9.3 Fri, Nov 12, 2004.
BPS - 5th Ed. Chapter 151 Thinking about Inference.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Testing Hypotheses about a Population Proportion Lecture 29 Sections 9.1 – 9.3 Wed, Nov 1, 2006.
Exercise - 1 A package-filling process at a Cement company fills bags of cement to an average weight of µ but µ changes from time to time. The standard.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
Testing Hypotheses about a Population Proportion Lecture 31 Sections 9.1 – 9.3 Wed, Mar 22, 2006.
Statistics: Unlocking the Power of Data Lock 5 Section 4.5 Confidence Intervals and Hypothesis Tests.
AP Statistics Chapter 11 Notes. Significance Test & Hypothesis Significance test: a formal procedure for comparing observed data with a hypothesis whose.
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
Chapter 12 Tests of Hypotheses Means 12.1 Tests of Hypotheses 12.2 Significance of Tests 12.3 Tests concerning Means 12.4 Tests concerning Means(unknown.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Hypothesis Tests Hypothesis Tests Large Sample 1- Proportion z-test.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.1.
A Closer Look at Testing
Section 4.5 Making Connections.
CHAPTER 10 Comparing Two Populations or Groups
Comparing Two Proportions
Presentation transcript:

Confidence Intervals and Hypothesis Tests Section 4.5 Confidence Intervals and Hypothesis Tests

Bootstrap and Randomization Distributions Bootstrap Distribution Randomization Distribution Estimates the distribution of sample statistics Estimates the distribution of sample statistics, if H0 true Centered around the observed sample statistic Centered around the null hypothesized value Simulate sampling from the population by resampling from the original sample Simulate samples assuming H0 were true Big difference: a randomization distribution assumes H0 is true, while a bootstrap distribution does not

Which Distribution? Let  be the average amount of sleep college students get per night. Data was collected on a sample of students, and for this sample hours. A bootstrap distribution is generated to create a confidence interval for , and a randomization distribution is generated to see if the data provide evidence that  > 7. Which distribution below is the bootstrap distribution? (a) is centered around the sample statistic, 6.7

Which Distribution? Intro stat students are surveyed, and we find that 152 out of 218 are female. Let p be the proportion of intro stat students at that university who are female. A bootstrap distribution is generated for a confidence interval for p, and a randomization distribution is generated to see if the data provide evidence that p > 1/2. Which distribution is the randomization distribution? (a) is centered around the null value, 1/2

Body Temperature We created a bootstrap distribution for average body temperature by resampling with replacement from the original sample (

Body Temperature We also created a randomization distribution to see if average body temperature differs from 98.6F by adding 0.34 to every value to make the null true, and then resampling with replacement from this modified sample:

Body Temperature These two distributions are identical (up to random variation from simulation to simulation) except for the center The bootstrap distribution is centered around the sample statistic, 98.26, while the randomization distribution is centered around the null hypothesized value, 98.6 The randomization distribution is equivalent to the bootstrap distribution, but shifted over

Body Temperature Bootstrap Distribution Randomization Distribution 98.26 98.6 Randomization Distribution H0:  = 98.6 Ha:  ≠ 98.6 Talk about the fact that the null hypothesized value is in the extremes of the bootstrap distribution, so the sample statistic is in the extremes of the randomization distribution

Body Temperature Bootstrap Distribution Randomization Distribution 98.26 98.4 Randomization Distribution H0:  = 98.4 Ha:  ≠ 98.4 Talk about the fact that the null hypothesized value is not in the extremes of the bootstrap distribution, so the sample statistic is not in the extremes of the randomization distribution

Intervals and Tests If a 95% CI contains the parameter in H0, then a two-tailed test should not reject H0 at a 5% significance level. If a 95% CI misses the parameter in H0, then a two-tailed test should reject H0 at a 5% significance level.

Intervals and Tests A confidence interval represents plausible values for the population parameter If the null hypothesized value IS NOT within the CI, it is not plausible and should be rejected If the null hypothesized value IS within the CI, it is plausible and should not be rejected

Body Temperatures Using bootstrapping, we found a 95% confidence interval for the mean body temperature to be (98.05, 98.47) This does not contain 98.6, so at α = 0.05 we would reject H0 for the hypotheses H0 :  = 98.6 Ha :  ≠ 98.6

Both Father and Mother “Does a child need both a father and a mother to grow up happily?” Let p be the proportion of adults aged 18-29 in 2010 who say yes. A 95% CI for p is (0.487, 0.573). Testing H0: p = 0.5 vs Ha: p ≠ 0.5 with α = 0.05, do we reject H0 or not reject H0? Do not reject H0 0.5 is within the CI, so is a plausible value for p. http://www.pewsocialtrends.org/2011/03/09/for-millennials-parenthood-trumps-marriage/#fn-7199-1

Both Father and Mother “Does a child need both a father and a mother to grow up happily?” Let p be the proportion of adults aged 18-29 in 1997 who say yes. A 95% CI for p is (0.533, 0.607). Testing H0: p = 0.5 vs Ha: p ≠ 0.5 with α = 0.05, we reject H0 or do not reject H0? Reject H0 0.5 is not within the CI, so is not a plausible value for p. http://www.pewsocialtrends.org/2011/03/09/for-millennials-parenthood-trumps-marriage/#fn-7199-1

Intervals and Tests Confidence intervals are most useful when you want to estimate population parameters Hypothesis tests and p-values are most useful when you want to test hypotheses about population parameters Confidence intervals give you a range of plausible values; p-values quantify the strength of evidence against the null hypothesis

Interval, Test, or Neither? Are the following questions best assessed using a confidence interval, a hypothesis test, or is statistical inference not relevant? Do a majority of adults riding a bicycle wear a helmet? On average, how much more do adults who played sports in high school exercise than adults who did not play sports in high school? On average, were the 23 players on the 2010 Canadian Olympic hockey team older than the 23 players on the 2010 US Olympic hockey team? test; interval; inference not relevant

Cautions About Significance With small sample sizes, even large differences or effects may not be significant With large sample sizes, even a very small difference or effect can be significant A statistically significant result is not always practically significant, especially with large sample sizes

Statistical vs Practical Significance Example: Suppose a weight loss program recruits 10,000 people for a randomized experiment. A difference in average weight loss of only 0.5 lbs could be found to be statistically significant Suppose the experiment lasted for a year. Is a loss of ½ a pound practically significant?

Diet and Sex of Baby Are certain foods in your diet associated with whether or not you conceive a boy or a girl? To study this, researchers asked women about their eating habits, including asking whether or not they ate 133 different foods regularly. For each of the 133 foods studied, a hypothesis test was conducted for a difference between mothers who conceived boys and girls in the proportion who consume each food. http://www.newscientist.com/article/dn13754-breakfast-cereals-boost-chances-of-conceiving-boys.html

Hypothesis Tests State the null and alternative hypotheses If there are NO differences (all null hypotheses are true), about how many significant differences would be found using α = 0.05? A significant difference was found for breakfast cereal (mothers of boys eat more), prompting the headline “Breakfast Cereal Boosts Chances of Conceiving Boys”. How might you explain this? pb: proportion of mothers who have boys that consume the food regularly pg: proportion of mothers who have girls that consume the food regularly H0: pb = pg Ha: pb ≠ pg 133  0.05 = 6.65 Random chance; several tests (about 6 or 7) are going to be significant, even if no differences exist

Multiple Testing When multiple hypothesis tests are conducted, the chance that at least one test incorrectly rejects a true null hypothesis increases with the number of tests. If the null hypotheses are all true, α of the tests will yield statistically significant results just by random chance.

Multiple Comparisons Consider a topic that is being investigated by research teams all over the world  Using α = 0.05, 5% of teams are going to find something significant, even if the null hypothesis is true

Multiple Comparisons Consider a research team/company doing many hypothesis tests Using α = 0.05, 5% of tests are going to be significant, even if the null hypotheses are all true

Multiple Comparisons This is a serious problem The most important thing is to be aware of this issue, and not to trust claims that are obviously one of many tests (unless they specifically mention an adjustment for multiple testing) There are ways to account for this (e.g. Bonferroni’s Correction), but these are beyond the scope of this class

Publication Bias Publication bias refers to the fact that usually only the significant results get published The one study that turns out significant gets published, and no one knows about all the insignificant results This combined with the problem of multiple comparisons, can yield very misleading results

Jelly Beans Cause Acne! http://xkcd.com/882/ Consider having your students act this out in class, each reading aloud a different part. it’s very fun! http://xkcd.com/882/

http://xkcd.com/882/

Summary If a null hypothesized value lies inside a 95% CI, a two-tailed test using α = 0.05 would not reject H0 If a null hypothesized value lies outside a 95% CI, a two-tailed test using α = 0.05 would reject H0 Statistical significance is not always the same as practical significance Using α = 0.05, 5% of all hypothesis tests will lead to rejecting the null, even if all the null hypotheses are true