Inference for the mean vector
Univariate Inference Let x1, x2, … , xn denote a sample of n from the normal distribution with mean m and variance s2. Suppose we want to test H0: m = m0 vs HA: m ≠ m0 The appropriate test is the t test: The test statistic: Reject H0 if |t| > ta/2
The multivariate Test Let denote a sample of n from the p-variate normal distribution with mean vector and covariance matrix S. Suppose we want to test
Roy’s Union- Intersection Principle This is a general procedure for developing a multivariate test from the corresponding univariate test. Convert the multivariate problem to a univariate problem by considering an arbitrary linear combination of the observation vector.
Perform the test for the arbitrary linear combination of the observation vector. Repeat this for all possible choices of Reject the multivariate hypothesis if H0 is rejected for any one of the choices for Accept the multivariate hypothesis if H0 is accepted for all of the choices for Set the type I error rate for the individual tests so that the type I error rate for the multivariate test is a.
Application of Roy’s principle to the following situation Let denote a sample of n from the p-variate normal distribution with mean vector and covariance matrix S. Suppose we want to test Then u1, …. un is a sample of n from the normal distribution with mean and variance .
to test we would use the test statistic:
and
Thus We will reject if
Using Roy’s Union- Intersection principle: We will reject We accept
i.e. We reject We accept
Consider the problem of finding: where
thus
Thus Roy’s Union- Intersection principle states: We reject We accept is called Hotelling’s T2 statistic
Choosing the critical value for Hotelling’s T2 statistic We reject , we need to find the sampling distribution of T2 when H0 is true. It turns out that if H0 is true than has an F distribution with n1 = p and n2 = n - p
Thus Hotelling’s T2 test We reject or if
Another derivation of Hotelling’s T2 statistic Another method of developing statistical tests is the Likelihood ratio method. Suppose that the data vector, , has joint density Suppose that the parameter vector, , belongs to the set W. Let w denote a subset of W. Finally we want to test
The Likelihood ratio test rejects H0 if
The situation Let denote a sample of n from the p-variate normal distribution with mean vector and covariance matrix S. Suppose we want to test
The Likelihood function is: and the Log-likelihood function is:
the Maximum Likelihood estimators of are and
the Maximum Likelihood estimators of when H 0 is true are: and
The Likelihood function is: now
Thus similarly
and
Note: Let
and Now and
Also
Thus
Thus using
Then Thus to reject H0 if l < la This is the same as Hotelling’s T2 test if
Example For n = 10 students we measure scores on Math proficiency test (x1), Science proficiency test (x2), English proficiency test (x3) and French proficiency test (x4) The average score for each of the tests in previous years was 60. Has this changed?
The data
Summary Statistics
Simultaneous Inference for means Recall (Using Roy’s Union Intersection Principle)
Now
Thus and the set of intervals Form a set of (1 – a)100 % simultaneous confidence intervals for
Recall Thus the set of (1 – a)100 % simultaneous confidence intervals for
The two sample problem
Univariate Inference Let x1, x2, … , xn denote a sample of n from the normal distribution with mean mx and variance s2. Let y1, y2, … , ym denote a sample of n from the normal distribution with mean my and variance s2. Suppose we want to test H0: mx = my vs HA: mx ≠ my
The appropriate test is the t test: The test statistic: Reject H0 if |t| > ta/2 d.f. = n + m -2
The multivariate Test Let denote a sample of n from the p-variate normal distribution with mean vector and covariance matrix S. Let denote a sample of m from the p-variate normal distribution with mean vector and covariance matrix S. Suppose we want to test
Hotelling’s T2 statistic for the two sample problem if H0 is true than has an F distribution with n1 = p and n2 = n +m – p - 1
Thus Hotelling’s T2 test We reject
Simultaneous inference for the two-sample problem Hotelling’s T2 statistic can be shown to have been derived by Roy’s Union-Intersection principle
Thus
Thus
Thus Hence
Thus form 1 – a simultaneous confidence intervals for
A graphical explanation Hotelling’s T2 test A graphical explanation
Hotelling’s T2 statistic for the two sample problem
is the test statistic for testing:
Hotelling’s T2 test X2 Popn A Popn B X1
Univariate test for X1 X2 Popn A Popn B X1
Univariate test for X2 X2 Popn A Popn B X1
Univariate test for a1X1 + a2X2 Popn A Popn B X1
A graphical explanation Mahalanobis distance A graphical explanation
Euclidean distance
Mahalanobis distance: S, a covariance matrix
Hotelling’s T2 statistic for the two sample problem
Case I X2 Popn A Popn B X1
Case II X2 Popn A Popn B X1
In Case I the Mahalanobis distance between the mean vectors is larger than in Case II, even though the Euclidean distance is smaller. In Case I there is more separation between the two bivariate normal distributions