The Neymann-Pearson Lemma Suppose that the data x 1, …, x n has joint density function f(x 1, …, x n ; ) where is either 1 or 2. Let g(x 1, …, x n ) = f(x 1, …, x n ; 1 ) and h(x 1, …, x n ) = f(x 1, …, x n ; 2 ) We want to test H 0 : = 1 (g is the correct distribution) against H A : = 2 (h is the correct distribution)
The Neymann-Pearson Lemma states that the Uniformly Most Powerful (UMP) test of size is to reject H 0 if: and accept H 0 if: where k is chosen so that the test is of size .
Neyman Pearson Lemma g(x)g(x) h(x)h(x) x
g(x)g(x) h(x)h(x) x
g(x)g(x) h(x)h(x) x
Definition The hypothesis H: is called simple if consists only of one point. If the set consists of more than one point the hypothesis H is called composite. The Neymann Pearson lemma finds the UMP (uniformly most powerful) test of size when testing a simple Null Hypothesis (H 0 ) vs a simple Alternative Hypothesis (H A )
A technique for finding the UMP (uniformly most powerful) test of size for testing a simple Null Hypothesis (H 0 ) vs a composite Alternative Hypothesis (H A ). 1.Pick an arbitrary value of the parameter, 1, when the Alternative Hypothesis (H A ) is true. (convert H A into a simple hypothesis 2.Use the Neymann Pearson lemma to find UMP (uniformly most powerful) test of size for testing a simple Null Hypothesis (H 0 ) vs a simple Alternative Hypothesis 3.If this test does not depend on the choice of 1 then the test found is the uniformly most powerful test of size for testing a simple Null Hypothesis (H 0 ) vs a composite Alternative Hypothesis (H A ).
Likelihood Ratio Tests A general technique for developing tests using the Likelihood function This technique works for composite Null hypotheses (H 0 ) and composite Alternative hypotheses (H A )
Likelihood Ratio Tests Suppose that the data x 1, …, x n has joint density function f(x 1, …, x n ; 1, …, p ) where ( 1, …, p ) are unknown parameters assumed to lie in (a subset of p-dimensional space). Let denote a subset of . We want to test H 0 : ( 1, …, p ) against H A : ( 1, …, p )
The Likelihood Ratio Test of size rejects H 0 : ( 1, …, p ) in favour of H A : ( 1, …, p ) if: where k is chosen so that the test is of size and L ( 1, …, p ) = f(x 1, …, x n ; 1, …, p ) Is the Likelihood function the maximum of L ( 1, …, p ) subject to the restriction ( 1, …, p ) Also
Example Suppose that x 1, …, x n is a sample from the Normal distribution with mean (unknown) and variance 2 (unknown). Then x 1, …, x n have joint density function f(x 1, …, x n ; ) Suppose that we want to test H 0 : = 0 against H A : ≠ 0 L( ) = f(x 1, …, x n ; )
Note: We want to test H 0 : = 0 against H A : ≠ 0 L( 2 ) = f(x 1, …, x n ; 2 ) Note: and if H 0 : = 0 is true then
We have already shown that is at a maximum when L( 2 ) = f(x 1, …, x n ; 2 ) Thus
Now consider maximizing when L( 2 ) = f(x 1, …, x n ; 2 ) This is equivalent to choosing v to maximize or
Hence if or
Thus Is maximized subject to L( 2 ) = f(x 1, …, x n ; 2 ) when thus
The Likelihood Ratio Test of size rejects H 0 : = 0 in favour of H A : ≠ 0 if: i.e. if
or Now and
thus and
The Likelihood Ratio Test of size rejects H 0 : = 0 in favour of H A : ≠ 0 if: or where k (or K ) are chosen so that the test is of size . The value that achieves this is
Conclusion: The Likelihood Ratio Test of size for testing H 0 : = 0 against H A : ≠ 0 is the Students t-test
Example Suppose that x 1, …, x n is a sample from the Uniform distribution from 0 to (unknown) Then x 1, …, x n have joint density function f(x 1, …, x n ; ) Suppose that we want to test H 0 : = 0 against H A : ≠ 0 Note: L( ) = f(x 1, …, x n ; )
We have already shown that is at a maximum when L( ) = f(x 1, …, x n ; ) Thus
Also it can be shown that is maximized subject to = { 0 } when L( ) = f(x 1, …, x n ; ) Thus
Hence We will reject H 0 if < k
Hence will reject H 0 if: or if i. e. or
Summarizing: We reject H 0 if: or if Where K (equivalently k ) is chosen so that Again to find K we need to determine the sampling distribution of : and when H 0 is true
then The sampling distribution of: We want when H 0 is true Let when H 0 is true thus or
Final Summary: We reject H 0 if: or ifand
Example: Suppose we have a sample of n = 30 from the Uniform distribution or if and We want to test H 0 : = 10 ( 0 ) against H 0 : ≠ 10 We are going to reject H 0 : = 10 if: hence we accept H 0 : = 10.
Comparing Populations Proportions and means
Sums, Differences, Combinations of R.V.’s A linear combination of random variables, X, Y,... is a combination of the form: L = aX + bY + … where a, b, etc. are numbers – positive or negative. Most common: Sum = X + YDifference = X – Y
Means of Linear Combinations The mean of L is: L = a X + b Y + … Most common: X+Y = X + Y X – Y = X - Y IfL = aX + bY + …
Variances of Linear Combinations If X, Y,... are independent random variables and L = aX + bY + … then Most common:
If X, Y,... are independent normal random variables, then L = aX + bY + … is normally distributed. In particular: X + Y is normal with X – Y is normal with Combining Independent Normal Random Variables
Comparing proportions Situation We have two populations (1 and 2) Let p 1 denote the probability (proportion) of “success” in population 1. Let p 2 denote the probability (proportion) of “success” in population 2. Objective is to compare the two population proportions
We want to test either: or
is an estimate of p i (i = 1,2) Recall: has approximately a normal distribution with
Where: A sample of n 1 is selected from population 1 resulting in x 1 successes A sample of n 2 is selected from population 2 resulting in x 2 successes
is an estimate of p 1 – p 2 We want to estimate and test p 1 – p 2 has approximately a normal distribution with
The statistic:
If then is true Hence
The test statistic:
Where: A sample of n 1 is selected from population 1 resulting in x 1 successes A sample of n 2 is selected from population 2 resulting in x 2 successes
The Alternative Hypothesis H A The Critical Region
Example In a national study to determine if there was an increase in mortality due to pipe smoking, a random sample of n 1 = 1067 male nonsmoking pensioners were observed for a five-year period. In addition a sample of n 2 = 402 male pensioners who had smoked a pipe for more than six years were observed for the same five-year period. At the end of the five-year period, x 1 = 117 of the nonsmoking pensioners had died while x 2 = 54 of the pipe-smoking pensioners had died. Is there a the mortality rate for pipe smokers higher than that for non-smokers
We want to test:
The test statistic:
Note:
The test statistic:
We reject H 0 if: Not true hence we accept H 0. Conclusion: There is not a significant ( = 0.05) increase in the mortality rate due to pipe-smoking
Estimating a difference proportions using confidence intervals Situation We have two populations (1 and 2) Let p 1 denote the probability (proportion) of “success” in population 1. Let p 2 denote the probability (proportion) of “success” in population 2. Objective is to estimate the difference in the two population proportions = p 1 – p 2.
Confidence Interval for = p 1 – p 2 100P% = 100(1 – ) % :
Example Estimating the increase in the mortality rate for pipe smokers higher over that for non- smokers = p 2 – p 1