Chapter 12 Hypothesis testing.

Slides:



Advertisements
Similar presentations
Introduction to Hypothesis Testing
Advertisements

Statistics Hypothesis Testing.
Our goal is to assess the evidence provided by the data in favor of some claim about the population. Section 6.2Tests of Significance.
Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
More About Type I and Type II Errors. O.J. Simpson trial: the situation O.J. is assumed innocent. Evidence collected: size 12 Bruno Magli bloody footprint,
Chapter 12 Tests of Hypotheses Means 12.1 Tests of Hypotheses 12.2 Significance of Tests 12.3 Tests concerning Means 12.4 Tests concerning Means(unknown.
Likelihood ratio tests
Chapter 10 Hypothesis Testing
Fundamentals of Hypothesis Testing: One-Sample Tests
Sampling Distributions and Hypothesis Testing. 2 Major Points An example An example Sampling distribution Sampling distribution Hypothesis testing Hypothesis.
IE241: Introduction to Hypothesis Testing. We said before that estimation of parameters was one of the two major areas of statistics. Now let’s turn to.
CHAPTER 9 Testing a Claim
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Introduction Suppose that a pharmaceutical company is concerned that the mean potency  of an antibiotic meet the minimum government potency standards.
Chapter 12 Tests of Hypotheses Means 12.1 Tests of Hypotheses 12.2 Significance of Tests 12.3 Tests concerning Means 12.4 Tests concerning Means(unknown.
6.2 Large Sample Significance Tests for a Mean “The reason students have trouble understanding hypothesis testing may be that they are trying to think.”
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.1 Significance Tests:
Section 9.1 First Day The idea of a significance test What is a p-value?
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.1 Significance Tests:
+ Homework 9.1:1-8, 21 & 22 Reading Guide 9.2 Section 9.1 Significance Tests: The Basics.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
Ch06 Hypothesis Testing.
Statistics for Managers Using Microsoft® Excel 5th Edition
CHAPTER 9 Testing a Claim
Chapter 5 STATISTICAL INFERENCE: ESTIMATION AND HYPOTHESES TESTING
Chapter 8: Inferences Based on a Single Sample: Tests of Hypotheses
Chapter 9: Testing a Claim
Unit 5: Hypothesis Testing
Tests of Significance The reasoning of significance tests
Keller: Stats for Mgmt & Econ, 7th Ed Hypothesis Testing
CHAPTER 9 Testing a Claim
Warm Up Check your understanding p. 541
Testing Hypotheses About Proportions
CHAPTER 9 Testing a Claim
Hypothesis Testing: Hypotheses
Hypothesis Testing Summer 2017 Summer Institutes.
Chapter 9: Testing a Claim
CHAPTER 9 Testing a Claim
Chapter 9: Hypothesis Tests Based on a Single Sample
Confidence Intervals Chapter 11.
Chapter 11: Introduction to Hypothesis Testing Lecture 5a
CHAPTER 9 Testing a Claim
Significance Tests: The Basics
Section 9.1 Significance Tests: The Basics
Significance Tests: The Basics
Chapter 9: Testing a Claim
Testing Hypotheses About Proportions
CHAPTER 9 Testing a Claim
Chapter 9: Testing a Claim
STAT 111 Introductory Statistics
Chapter 9: Testing a Claim
CHAPTER 9 Testing a Claim
Chapter 9: Testing a Claim
Chapter 9: Testing a Claim
Chapter 9: Testing a Claim
CHAPTER 9 Testing a Claim
Chapter 9: Testing a Claim
Chapter 9: Testing a Claim
Chapter 9 Chapter 9 – Point estimation
Chapter 9: Significance Testing
CHAPTER 9 Testing a Claim
Chapter 9 Hypothesis Testing: Single Population
Chapter 9: Testing a Claim
CHAPTER 9 Testing a Claim
Chapter 9: Testing a Claim
CHAPTER 9 Testing a Claim
Statistical Test A test of significance is a formal procedure for comparing observed data with a claim (also called a hypothesis) whose truth we want to.
CHAPTER 9 Testing a Claim
Chapter 9: Testing a Claim
Presentation transcript:

Chapter 12 Hypothesis testing

Is it plausible? that the parameter of interest is zero? that the parameter of interest is larger than 5? that the parameter of interest is between -2 and 2?

Free throws My friend claims he makes 70% of his free throws. A statement about a population’s distribution is called a hypothesis. In this case, the hypothesis is that the distribution is BER(0.7). The initial hypothesis is called the ‘null hypothesis.’ We assume it is true unless the data collected is not consistent with it. I decide to watch him shoot 20 times. If he makes less than 11 shots, I won’t believe his claim. The set of outcomes for which I’ll reject his initial claim is called the ‘rejection region.’ Rejection region = {0, 1, 2, 3, …, 10}. He makes only 10 shots (50%).

Simple and Composite Hypotheses Simple hypotheses completely specify the distribution. E.g., BER(0.70). Composite hypotheses only partially specify the distribution. E.g., N(μ, 1), with μ <=0.

How to determine a good rejection region? These are usually specified in terms of an MLE or a sufficient statistic. In the free throw example, we decided to reject his claim (null hypothesis) if he made too few shots. More on how to choose the exact number of shots later.

Errors happen Unfortunately errors are possible. I concluded before that my friend was exaggerating and that he cannot actually make 70% of his free throws. But it is possible that he made only 10 simply by chance (bad luck). A person that makes 70% on average would make only 10 out of 20 with probability 4.8%. So, although it isn’t too likely, it’s possible that his claim is true.

Types of errors

What type of error… H0: not pregnant.

What type of error… If a guilty man is found ‘not guilty?’ If an innocent man is found ‘guilty?’ If a guilty woman is found ‘guilty?’

What type of error… Is a pharmaceutical company interested in keeping small? H0: drug doesn’t work. What type of error does a drug company care about? Both. Type II error because they want to make money and Type I error because they are decent human beings.

Notations for probability of error and power β = P(Type II error | H0 is false) β(θ) = P(Not rejecting H0 | true parameter = θ) α = P(Type I error | H0 is true) α = max θ ∈ Ω0 P(Rejecting H0 | true parameter = θ) π = P(Correctly rejecting H0| H0 is false) π (θ) = P(Rejecting H0| true parameter = θ)

Example: simple hypotheses Xi ~ i.i.d. N(μ, 2) N = 15 H0: μ = 8 H1: μ = 12 Test statistic: sample mean Find a rejection region such that α = 0.05

Identify rejection region, error probabilities, and power. Show where the type I error rate is.

Example: composite hypotheses Xi ~ i.i.d. N(μ, 2) N = 15 H0: μ ≤ 8 H1: μ > 8 Test statistic: sample mean Find a rejection region such that α = 0.05

Example continued: power Xi ~ i.i.d. N(μ, 2) N = 15 H0: μ ≤ 8 H1: μ > 8 Find the power of the test as a function of the true mean, μ.

Identify rejection region, error probabilities, and power. Show where the power is in the figure.

Example continued: β Xi ~ i.i.d. N(μ, 2) N = 15 H0: μ ≤ 8 H1: μ > 8 Find β(μ). Don’t really do the math. It’s just 1 – power.

Example: Two-sided alternative Xi ~ i.i.d. N(μ, 2) N = … (do for arbitrary N) H0: μ = -8 H1: μ ≠ -8 Test statistic: sample mean Find a rejection region such that α = 0.05 Find the power

This is the power as a function of the true value of the parameter This is the power as a function of the true value of the parameter. I can’t remember what value of N and sigma I used for this figure, but the shape is the same either way.

This shows the power as a function of the sample size This shows the power as a function of the sample size. I don’t know what value of sigma I used for the figure, but the basic shape is the same regardless.

Example: P-value Xi ~ i.i.d. N(μ, 2) N = 15 H0: μ = 8 H1: μ > 8 Outcome of sample mean = 10 If the null hypothesis is true, what is the probability that you would get an outcome of 10 or larger (again) if you were to repeat the experiment?

P-value Definition 1: The p-value is the maximum probability of observing something as extreme or more extreme than the observed outcome, assuming that the null hypothesis is true. [Equivalent] Definition 2: The p-value of a test is the smallest α at which the null hypothesis can be rejected. Report the p-value and people can decide for themselves if you have presented enough evidence to reject the null.

A most powerful test of size α (simple hypotheses) How to choose a testing procedure (test statistic and rejection region) to maximize power? A test T is the most powerful test of size α if πT(θ0) = α; AND πT(θ1) ≥ πT*(θ1) for any other test T* with πT*(θ0) = α.

Neyman-Pearson Lemma The most powerful test of θ = θ0 against the alternative, θ = θ1 is obtained by rejecting the null hypothesis if λ(x; θ0, θ1) := f(x; θ0)/f(x; θ1) < k. k is chosen to obtain the desired type I error rate. Why does this make sense? Prove it (on the whiteboard). The very last step is tricky. It uses the partitioning of C and C* again.

Example 12.6.1 Xi ~ i.i.d. EXP(θ) N = 27 H0: θ = 5 H1: θ = 7 Determine the most powerful test of size α = 0.05. The p-value is 1-pchisq(81,df=54) = 0.01

Example 12.6.1 - Continued Suppose the outcome of the sample mean is 7.5. Reject? Note: χ20.95(54) = 72.15 What is the p-value? The p-value is 1-pchisq(81,df=54) = 0.01

Uniformly most powerful test? In the last example, the most powerful test was exactly the same whether θ1 = 7 or θ1 = 700. As long as θ1 > 5, the same test would be most powerful. Consider the following hypotheses: H0: θ = 5 H1: θ > 5 The previous test is said to be the ‘uniformly most powerful test of size α.’

Definition: Uniformly most powerful A test is said to be the ‘uniformly most powerful test of size α’ if It has size α; AND The alternative is composite; AND it is the most powerful test for all values of parameter that are possible under the alternative.

A uniformly most powerful test will exist if The joint density function has a ‘monotone likelihood ratio’ in a statistic T; AND You have a one-sided alternative.

Definition: monotone likelihood ratio The joint density function has a ‘monotone likelihood ratio’ in a statistic T if f(x; θ0)/f(x; θ1) depends on x only through T; AND f(x; θ0)/f(x; θ1) is a monotone function of T.

Example 12.7.3 Xi ~ i.i.d. EXP(1, η) N = 30 H0: η > 5 H1: η ≤ 5 Find a uniformly most powerful test of size α = 0.05 if you can.

Example 12.7.3 - continued Suppose x1:30 = 7.5 Note that the 95th percentile of an EXP(1) is 3. Conclusion at α = 0.05? P-value? P-value = 1-pexp(2.5) = 0.08.

Generalized likelihood ratio test Desirable because it provides a test for a two-sided alternative. Reject H0 if λ = max θ ∈ Ω0 f(x; θ) / max θ ∈ Ω f(x; θ) is small Note that -2ln(λ) is approximately χ2(r), if r parameters are fixed under the null.

Example 12.8.1 Xi ~ i.i.d. N(μ, 2) N = 15 H0: μ = 8 H1: μ ≠ 8 Outcome of sample mean = 10 Use the generalized likelihood ratio test to find an approximate p- value.

Example Xi ~ i.i.d. N(μ, σ2) N = 15 H0: μ = 8, σ2 = 2 Identify the test statistic and rejection region for the generalized likelihood ratio test of size 5%.