Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 12 Hypothesis testing.

Similar presentations


Presentation on theme: "Chapter 12 Hypothesis testing."— Presentation transcript:

1 Chapter 12 Hypothesis testing

2 Is it plausible? that the parameter of interest is zero?
that the parameter of interest is larger than 5? that the parameter of interest is between -2 and 2?

3 Free throws My friend claims he makes 70% of his free throws.
A statement about a population’s distribution is called a hypothesis. In this case, the hypothesis is that the distribution is BER(0.7). The initial hypothesis is called the ‘null hypothesis.’ We assume it is true unless the data collected is not consistent with it. I decide to watch him shoot 20 times. If he makes less than 11 shots, I won’t believe his claim. The set of outcomes for which I’ll reject his initial claim is called the ‘rejection region.’ Rejection region = {0, 1, 2, 3, …, 10}. He makes only 10 shots (50%).

4 Simple and Composite Hypotheses
Simple hypotheses completely specify the distribution. E.g., BER(0.70). Composite hypotheses only partially specify the distribution. E.g., N(μ, 1), with μ <=0.

5 How to determine a good rejection region?
These are usually specified in terms of an MLE or a sufficient statistic. In the free throw example, we decided to reject his claim (null hypothesis) if he made too few shots. More on how to choose the exact number of shots later.

6 Errors happen Unfortunately errors are possible.
I concluded before that my friend was exaggerating and that he cannot actually make 70% of his free throws. But it is possible that he made only 10 simply by chance (bad luck). A person that makes 70% on average would make only 10 out of 20 with probability 4.8%. So, although it isn’t too likely, it’s possible that his claim is true.

7 Types of errors

8 What type of error… H0: not pregnant.

9 What type of error… If a guilty man is found ‘not guilty?’
If an innocent man is found ‘guilty?’ If a guilty woman is found ‘guilty?’

10 What type of error… Is a pharmaceutical company interested in keeping small? H0: drug doesn’t work. What type of error does a drug company care about? Both. Type II error because they want to make money and Type I error because they are decent human beings.

11 Notations for probability of error and power
β = P(Type II error | H0 is false) β(θ) = P(Not rejecting H0 | true parameter = θ) α = P(Type I error | H0 is true) α = max θ ∈ Ω0 P(Rejecting H0 | true parameter = θ) π = P(Correctly rejecting H0| H0 is false) π (θ) = P(Rejecting H0| true parameter = θ)

12 Example: simple hypotheses
Xi ~ i.i.d. N(μ, 2) N = 15 H0: μ = 8 H1: μ = 12 Test statistic: sample mean Find a rejection region such that α = 0.05

13 Identify rejection region, error probabilities, and power.
Show where the type I error rate is.

14 Example: composite hypotheses
Xi ~ i.i.d. N(μ, 2) N = 15 H0: μ ≤ 8 H1: μ > 8 Test statistic: sample mean Find a rejection region such that α = 0.05

15 Example continued: power
Xi ~ i.i.d. N(μ, 2) N = 15 H0: μ ≤ 8 H1: μ > 8 Find the power of the test as a function of the true mean, μ.

16 Identify rejection region, error probabilities, and power.
Show where the power is in the figure.

17 Example continued: β Xi ~ i.i.d. N(μ, 2) N = 15 H0: μ ≤ 8 H1: μ > 8
Find β(μ). Don’t really do the math. It’s just 1 – power.

18 Example: Two-sided alternative
Xi ~ i.i.d. N(μ, 2) N = … (do for arbitrary N) H0: μ = -8 H1: μ ≠ -8 Test statistic: sample mean Find a rejection region such that α = 0.05 Find the power

19 This is the power as a function of the true value of the parameter
This is the power as a function of the true value of the parameter. I can’t remember what value of N and sigma I used for this figure, but the shape is the same either way.

20 This shows the power as a function of the sample size
This shows the power as a function of the sample size. I don’t know what value of sigma I used for the figure, but the basic shape is the same regardless.

21 Example: P-value Xi ~ i.i.d. N(μ, 2) N = 15 H0: μ = 8 H1: μ > 8
Outcome of sample mean = 10 If the null hypothesis is true, what is the probability that you would get an outcome of 10 or larger (again) if you were to repeat the experiment?

22 P-value Definition 1: The p-value is the maximum probability of observing something as extreme or more extreme than the observed outcome, assuming that the null hypothesis is true. [Equivalent] Definition 2: The p-value of a test is the smallest α at which the null hypothesis can be rejected. Report the p-value and people can decide for themselves if you have presented enough evidence to reject the null.

23 A most powerful test of size α (simple hypotheses)
How to choose a testing procedure (test statistic and rejection region) to maximize power? A test T is the most powerful test of size α if πT(θ0) = α; AND πT(θ1) ≥ πT*(θ1) for any other test T* with πT*(θ0) = α.

24 Neyman-Pearson Lemma The most powerful test of θ = θ0 against the alternative, θ = θ1 is obtained by rejecting the null hypothesis if λ(x; θ0, θ1) := f(x; θ0)/f(x; θ1) < k. k is chosen to obtain the desired type I error rate. Why does this make sense? Prove it (on the whiteboard). The very last step is tricky. It uses the partitioning of C and C* again.

25 Example 12.6.1 Xi ~ i.i.d. EXP(θ) N = 27 H0: θ = 5 H1: θ = 7
Determine the most powerful test of size α = 0.05. The p-value is 1-pchisq(81,df=54) = 0.01

26 Example Continued Suppose the outcome of the sample mean is 7.5. Reject? Note: χ20.95(54) = 72.15 What is the p-value? The p-value is 1-pchisq(81,df=54) = 0.01

27 Uniformly most powerful test?
In the last example, the most powerful test was exactly the same whether θ1 = 7 or θ1 = As long as θ1 > 5, the same test would be most powerful. Consider the following hypotheses: H0: θ = 5 H1: θ > 5 The previous test is said to be the ‘uniformly most powerful test of size α.’

28 Definition: Uniformly most powerful
A test is said to be the ‘uniformly most powerful test of size α’ if It has size α; AND The alternative is composite; AND it is the most powerful test for all values of parameter that are possible under the alternative.

29 A uniformly most powerful test will exist if
The joint density function has a ‘monotone likelihood ratio’ in a statistic T; AND You have a one-sided alternative.

30 Definition: monotone likelihood ratio
The joint density function has a ‘monotone likelihood ratio’ in a statistic T if f(x; θ0)/f(x; θ1) depends on x only through T; AND f(x; θ0)/f(x; θ1) is a monotone function of T.

31 Example 12.7.3 Xi ~ i.i.d. EXP(1, η) N = 30 H0: η > 5 H1: η ≤ 5
Find a uniformly most powerful test of size α = 0.05 if you can.

32 Example 12.7.3 - continued Suppose x1:30 = 7.5
Note that the 95th percentile of an EXP(1) is 3. Conclusion at α = 0.05? P-value? P-value = 1-pexp(2.5) = 0.08.

33 Generalized likelihood ratio test
Desirable because it provides a test for a two-sided alternative. Reject H0 if λ = max θ ∈ Ω0 f(x; θ) / max θ ∈ Ω f(x; θ) is small Note that -2ln(λ) is approximately χ2(r), if r parameters are fixed under the null.

34 Example 12.8.1 Xi ~ i.i.d. N(μ, 2) N = 15 H0: μ = 8 H1: μ ≠ 8
Outcome of sample mean = 10 Use the generalized likelihood ratio test to find an approximate p- value.

35 Example Xi ~ i.i.d. N(μ, σ2) N = 15 H0: μ = 8, σ2 = 2
Identify the test statistic and rejection region for the generalized likelihood ratio test of size 5%.


Download ppt "Chapter 12 Hypothesis testing."

Similar presentations


Ads by Google