Inference for the mean vector

Slides:



Advertisements
Similar presentations
“Students” t-test.
Advertisements

Hypothesis Testing. To define a statistical Test we 1.Choose a statistic (called the test statistic) 2.Divide the range of possible values for the test.
General Linear Model With correlated error terms  =  2 V ≠  2 I.
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
The General Linear Model. The Simple Linear Model Linear Regression.
BCOR 1020 Business Statistics Lecture 17 – March 18, 2008.
Horng-Chyi HorngStatistics II_Five43 Inference on the Variances of Two Normal Population &5-5 (&9-5)
Techniques for studying correlation and covariance structure
Correlation. The sample covariance matrix: where.
The Multivariate Normal Distribution, Part 1 BMTRY 726 1/10/2014.
The Neymann-Pearson Lemma Suppose that the data x 1, …, x n has joint density function f(x 1, …, x n ;  ) where  is either  1 or  2. Let g(x 1, …,
Maximum Likelihood Estimation
Copyright © 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Statistical Inference: Estimation and Hypothesis Testing chapter.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
Profile Analysis. Definition Let X 1, X 2, …, X p denote p jointly distributed variables under study Let  1,  2, …,  p denote the means of these variables.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Principles of Pattern Recognition
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
MANOVA Multivariate Analysis of Variance. One way Analysis of Variance (ANOVA) Comparing k Populations.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Multivariate Analysis of Variance
Brief Review Probability and Statistics. Probability distributions Continuous distributions.
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
§2.The hypothesis testing of one normal population.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Module 25: Confidence Intervals and Hypothesis Tests for Variances for One Sample This module discusses confidence intervals and hypothesis tests.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Inference about the slope parameter and correlation
CHAPTER 10 Comparing Two Populations or Groups
Chapter 4. Inference about Process Quality
Confidence Intervals and Hypothesis Tests for Variances for One Sample
CHAPTER 10 Comparing Two Populations or Groups
Model Inference and Averaging
Chapter 8: Inference for Proportions
CHAPTER 10 Comparing Two Populations or Groups
Probability and Statistics
Determining the distribution of Sample statistics
Inference about the Slope and Intercept
CONCEPTS OF ESTIMATION
Statistical inference
Inference about the Slope and Intercept
Comparing Populations
EC 331 The Theory of and applications of Maximum Likelihood Method
The Multivariate Normal Distribution, Part 2
CHAPTER 10 Comparing Two Populations or Groups
Interval Estimation and Hypothesis Testing
CHAPTER 10 Comparing Two Populations or Groups
Summary of Tests Confidence Limits
A graphical explanation
CHAPTER 10 Comparing Two Populations or Groups
Multivariate Statistical Methods
Confidence Intervals.
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
Multivariate Methods Berlin Chen
CHAPTER 10 Comparing Two Populations or Groups
The Multivariate Normal Distribution, Part I
CHAPTER 10 Comparing Two Populations or Groups
Multivariate Methods Berlin Chen, 2005 References:
CHAPTER 10 Comparing Two Populations or Groups
Determination of Sample Size
CHAPTER 10 Comparing Two Populations or Groups
The two sample problem.
CHAPTER 10 Comparing Two Populations or Groups
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Probabilistic Surrogate Models
Presentation transcript:

Inference for the mean vector

Univariate Inference Let x1, x2, … , xn denote a sample of n from the normal distribution with mean m and variance s2. Suppose we want to test H0: m = m0 vs HA: m ≠ m0 The appropriate test is the t test: The test statistic: Reject H0 if |t| > ta/2

The multivariate Test Let denote a sample of n from the p-variate normal distribution with mean vector and covariance matrix S. Suppose we want to test

Roy’s Union- Intersection Principle This is a general procedure for developing a multivariate test from the corresponding univariate test. Convert the multivariate problem to a univariate problem by considering an arbitrary linear combination of the observation vector.

Perform the test for the arbitrary linear combination of the observation vector. Repeat this for all possible choices of Reject the multivariate hypothesis if H0 is rejected for any one of the choices for Accept the multivariate hypothesis if H0 is accepted for all of the choices for Set the type I error rate for the individual tests so that the type I error rate for the multivariate test is a.

Application of Roy’s principle to the following situation Let denote a sample of n from the p-variate normal distribution with mean vector and covariance matrix S. Suppose we want to test Then u1, …. un is a sample of n from the normal distribution with mean and variance .

to test we would use the test statistic:

and

Thus We will reject if

Using Roy’s Union- Intersection principle: We will reject We accept

i.e. We reject We accept

Consider the problem of finding: where

thus

Thus Roy’s Union- Intersection principle states: We reject We accept is called Hotelling’s T2 statistic

Choosing the critical value for Hotelling’s T2 statistic We reject , we need to find the sampling distribution of T2 when H0 is true. It turns out that if H0 is true than has an F distribution with n1 = p and n2 = n - p

Thus Hotelling’s T2 test We reject or if

Another derivation of Hotelling’s T2 statistic Another method of developing statistical tests is the Likelihood ratio method. Suppose that the data vector, , has joint density Suppose that the parameter vector, , belongs to the set W. Let w denote a subset of W. Finally we want to test

The Likelihood ratio test rejects H0 if

The situation Let denote a sample of n from the p-variate normal distribution with mean vector and covariance matrix S. Suppose we want to test

The Likelihood function is: and the Log-likelihood function is:

the Maximum Likelihood estimators of are and

the Maximum Likelihood estimators of when H 0 is true are: and

The Likelihood function is: now

Thus similarly

and

Note: Let

and Now and

Also

Thus

Thus using

Then Thus to reject H0 if l < la This is the same as Hotelling’s T2 test if

Example For n = 10 students we measure scores on Math proficiency test (x1), Science proficiency test (x2), English proficiency test (x3) and French proficiency test (x4) The average score for each of the tests in previous years was 60. Has this changed?

The data

Summary Statistics

Simultaneous Inference for means Recall (Using Roy’s Union Intersection Principle)

Now

Thus and the set of intervals Form a set of (1 – a)100 % simultaneous confidence intervals for

Recall Thus the set of (1 – a)100 % simultaneous confidence intervals for

The two sample problem

Univariate Inference Let x1, x2, … , xn denote a sample of n from the normal distribution with mean mx and variance s2. Let y1, y2, … , ym denote a sample of n from the normal distribution with mean my and variance s2. Suppose we want to test H0: mx = my vs HA: mx ≠ my

The appropriate test is the t test: The test statistic: Reject H0 if |t| > ta/2 d.f. = n + m -2

The multivariate Test Let denote a sample of n from the p-variate normal distribution with mean vector and covariance matrix S. Let denote a sample of m from the p-variate normal distribution with mean vector and covariance matrix S. Suppose we want to test

Hotelling’s T2 statistic for the two sample problem if H0 is true than has an F distribution with n1 = p and n2 = n +m – p - 1

Thus Hotelling’s T2 test We reject

Simultaneous inference for the two-sample problem Hotelling’s T2 statistic can be shown to have been derived by Roy’s Union-Intersection principle

Thus

Thus

Thus Hence

Thus form 1 – a simultaneous confidence intervals for

A graphical explanation Hotelling’s T2 test A graphical explanation

Hotelling’s T2 statistic for the two sample problem

is the test statistic for testing:

Hotelling’s T2 test X2 Popn A Popn B X1

Univariate test for X1 X2 Popn A Popn B X1

Univariate test for X2 X2 Popn A Popn B X1

Univariate test for a1X1 + a2X2 Popn A Popn B X1

A graphical explanation Mahalanobis distance A graphical explanation

Euclidean distance

Mahalanobis distance: S, a covariance matrix

Hotelling’s T2 statistic for the two sample problem

Case I X2 Popn A Popn B X1

Case II X2 Popn A Popn B X1

In Case I the Mahalanobis distance between the mean vectors is larger than in Case II, even though the Euclidean distance is smaller. In Case I there is more separation between the two bivariate normal distributions