Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 

Slides:



Advertisements
Similar presentations
Tests of Hypotheses Based on a Single Sample
Advertisements

“Students” t-test.
Discrimination and Classification. Discrimination Situation: We have two or more populations  1,  2, etc (possibly p-variate normal). The populations.
Hypothesis Testing. To define a statistical Test we 1.Choose a statistic (called the test statistic) 2.Divide the range of possible values for the test.
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Hypothesis testing Another judgment method of sampling data.
Pattern Recognition and Machine Learning
Sampling: Final and Initial Sample Size Determination
The General Linear Model. The Simple Linear Model Linear Regression.
Multivariate distributions. The Normal distribution.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Sample size computations Petter Mostad
Maximum likelihood (ML) and likelihood ratio (LR) test
Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.
Basics of discriminant analysis
BCOR 1020 Business Statistics Lecture 21 – April 8, 2008.
Inference about a Mean Part II
Continuous Random Variables and Probability Distributions
STATISTICAL INFERENCE PART VI
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Techniques for studying correlation and covariance structure
Correlation. The sample covariance matrix: where.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
The Neymann-Pearson Lemma Suppose that the data x 1, …, x n has joint density function f(x 1, …, x n ;  ) where  is either  1 or  2. Let g(x 1, …,
Maximum Likelihood Estimation
ECONOMETRICS I CHAPTER 5: TWO-VARIABLE REGRESSION: INTERVAL ESTIMATION AND HYPOTHESIS TESTING Textbook: Damodar N. Gujarati (2004) Basic Econometrics,
Hypothesis Testing:.
Copyright © Cengage Learning. All rights reserved. 8 Tests of Hypotheses Based on a Single Sample.
Profile Analysis. Definition Let X 1, X 2, …, X p denote p jointly distributed variables under study Let  1,  2, …,  p denote the means of these variables.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Principles of Pattern Recognition
1 Power and Sample Size in Testing One Mean. 2 Type I & Type II Error Type I Error: reject the null hypothesis when it is true. The probability of a Type.
Chapter 7 Hypothesis testing. §7.1 The basic concepts of hypothesis testing  1 An example Example 7.1 We selected 20 newborns randomly from a region.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
MANOVA Multivariate Analysis of Variance. One way Analysis of Variance (ANOVA) Comparing k Populations.
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Brief Review Probability and Statistics. Probability distributions Continuous distributions.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Multivariate Analysis of Variance
Brief Review Probability and Statistics. Probability distributions Continuous distributions.
Discrimination and Classification. Discrimination Situation: We have two or more populations  1,  2, etc (possibly p-variate normal). The populations.
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
Statistical Inference Making decisions regarding the population base on a sample.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
§2.The hypothesis testing of one normal population.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Example The strength of concrete depends, to some extent on the method used for drying it. Two different drying methods were tested independently on specimens.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Chapter 12 – Discriminant Analysis
Inference for the mean vector
Discrimination and Classification
Hypothesis Testing: Hypotheses
CONCEPTS OF HYPOTHESIS TESTING
Chapter 9 Hypothesis Testing.
Comparing Populations
Pattern Recognition and Machine Learning
A graphical explanation
Mathematical Foundations of BME
Confidence Intervals.
Multivariate Methods Berlin Chen
The two sample problem.
Discrimination and Classification
Presentation transcript:

Inference for the mean vector

Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance  2. Suppose we want to test H 0 :  =  0 vs H A :  ≠  0 The appropriate test is the t test: The test statistic: Reject H 0 if |t| > t  /2

The multivariate Test Let denote a sample of n from the p-variate normal distribution with mean vector  and covariance matrix . Suppose we want to test

Example For n = 10 students we measure scores on –Math proficiency test (x 1 ), –Science proficiency test (x 2 ), –English proficiency test (x 3 ) and –French proficiency test (x 4 ) The average score for each of the tests in previous years was 60. Has this changed?

The data

Summary Statistics the mean vector the sample covariance matrix

Roy’s Union- Intersection Principle This is a general procedure for developing a multivariate test from the corresponding univariate test. 1.Convert the multivariate problem to a univariate problem by considering an arbitrary linear combination of the observation vector.

2.Perform the test for the arbitrary linear combination of the observation vector. 3.Repeat this for all possible choices of 4.Reject the multivariate hypothesis if H 0 is rejected for any one of the choices for 5.Accept the multivariate hypothesis if H 0 is accepted for all of the choices for 6.Set the type I error rate for the individual tests so that the type I error rate for the multivariate test is .

Let denote a sample of n from the p-variate normal distribution with mean vector  and covariance matrix . Suppose we want to test Application of Roy’s principle to the following situation Then u 1, …. u n is a sample of n from the normal distribution with mean and variance.

to test we would use the test statistic:

and

Thus We will reject if

We will reject Using Roy’s Union- Intersection principle: We accept

We reject i.e. We accept

Consider the problem of finding: where

thus

We reject Thus Roy’s Union- Intersection principle states: We accept is called Hotelling’s T 2 statistic

We reject Choosing the critical value for Hotelling’s T 2 statistic, we need to find the sampling distribution of T 2 when H 0 is true. It turns out that if H 0 is true than has an F distribution with 1 = p and 2 = n - p

We reject Thus Hotelling’s T 2 test or if

Another derivation of Hotelling’s T 2 statistic Another method of developing statistical tests is the Likelihood ratio method. Suppose that the data vector,, has joint density Suppose that the parameter vector,, belongs to the set . Let  denote a subset of . Finally we want to test

The Likelihood ratio test rejects H 0 if

The situation Let denote a sample of n from the p-variate normal distribution with mean vector  and covariance matrix . Suppose we want to test

The Likelihood function is: and the Log-likelihood function is:

the Maximum Likelihood estimators of are and

the Maximum Likelihood estimators of when H 0 is true are: and

The Likelihood function is: now

Thus similarly

and

Note: Let

and Now and

Also

Thus

using

Then Thus to reject H 0 if <  This is the same as Hotelling’s T 2 test if

Example For n = 10 students we measure scores on –Math proficiency test (x 1 ), –Science proficiency test (x 2 ), –English proficiency test (x 3 ) and –French proficiency test (x 4 ) The average score for each of the tests in previous years was 60. Has this changed?

The data

Summary Statistics

Inference for the mean vector

Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance  2. Suppose we want to test H 0 :  =  0 vs H A :  ≠  0 The appropriate test is the t test: The test statistic: Reject H 0 if |t| > t  /2

We reject Hotelling’s T 2 statistic and test

Example For n = 10 students we measure scores on –Math proficiency test (x 1 ), –Science proficiency test (x 2 ), –English proficiency test (x 3 ) and –French proficiency test (x 4 ) The average score for each of the tests in previous years was 60. Has this changed?

The data

Summary Statistics

The two sample problem

Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  x and variance  2. Let y 1, y 2, …, y m denote a sample of n from the normal distribution with mean  y and variance  2. Suppose we want to test H 0 :  x =  y vs H A :  x ≠  y

The appropriate test is the t test: The test statistic: Reject H 0 if |t| > t  /2 d.f. = n + m -2

The multivariate Test Let denote a sample of n from the p-variate normal distribution with mean vector  and covariance matrix . Suppose we want to test Let denote a sample of m from the p-variate normal distribution with mean vector  and covariance matrix .

Hotelling’s T 2 statistic for the two sample problem if H 0 is true than has an F distribution with 1 = p and 2 = n +m – p - 1

We reject Thus Hotelling’s T 2 test

Example 2 Annual financial data are collected for firms approximately 2 years prior to bankruptcy and for financially sound firms at about the same point in time. The data on the four variables x 1 = CF/TD = (cash flow)/(total debt), x 2 = NI/TA = (net income)/(Total assets), x 3 = CA/CL = (current assets)/(current liabilties, and x 4 = CA/NS = (current assets)/(net sales) are given in the following table.

The data are given in the following table:

Hotelling’s T 2 test A graphical explanation

Hotelling’s T 2 statistic for the two sample problem

is the test statistic for testing:

Popn A Popn B X1X1 X2X2 Hotelling’s T 2 test

Popn A Popn B X1X1 X2X2 Univariate test for X 1

Popn A Popn B X1X1 X2X2 Univariate test for X 2

Popn A Popn B X1X1 X2X2 Univariate test for a 1 X 1 + a 2 X 2

Mahalanobis distance A graphical explanation

Euclidean distance

Mahalanobis distance: , a covariance matrix

Hotelling’s T 2 statistic for the two sample problem

Popn A Popn B X1X1 X2X2 Case I

Popn A Popn B X1X1 X2X2 Case II

In Case I the Mahalanobis distance between the mean vectors is larger than in Case II, even though the Euclidean distance is smaller. In Case I there is more separation between the two bivariate normal distributions

Discrimination and Classification

Discrimination Situation: We have two or more populations  1,  2, etc (possibly p-variate normal). The populations are known (or we have data from each population) We have data for a new case (population unknown) and we want to identify the which population for which the new case is a member.

Examples

The Basic Problem Suppose that the data from a new case x 1, …, x p has joint density function either :  1 : f(x 1, …, x n ) or  2 : g(x 1, …, x n ) We want to make the decision to D 1 : Classify the case in  1 (f is the correct distribution) or D 2 : Classify the case in  2 (g is the correct distribution)

The Two Types of Errors 1.Misclassifying the case in  1 when it actually lies in  2. Let P[1|2] = P[D 1 |  2 ] = probability of this type of error 2.Misclassifying the case in  2 when it actually lies in  1. Let P[2|1] = P[D 2 |  1 ] = probability of this type of error This is similar Type I and Type II errors in hypothesis testing.

Note: 1. C 1 = the region were we make the decision D 1. (the decision to classify the case in  1 ) A discrimination scheme is defined by splitting p – dimensional space into two regions. 2. C 2 = the region were we make the decision D 2. (the decision to classify the case in  2 )

1.Set up the regions C 1 and C 2 so that one of the probabilities of misclassification, P[2|1] say, is at some low acceptable value . Accept the level of the other probability of misclassification P[1|2] = . There can be several approaches to determining the regions C 1 and C 2. All concerned with taking into account the probabilities of misclassification P[2|1] and P[1|2]

2.Set up the regions C 1 and C 2 so that the total probability of misclassification: P[Misclassification] = P[1] P[2|1] + P[2]P[1|2] is minimized P[1] = P[the case belongs to  1 ] P[2] = P[the case belongs to  2 ]

3.Set up the regions C 1 and C 2 so that the total expected cost of misclassification: E[Cost of Misclassification] = c 2|1 P[1] P[2|1] + c 1|2 P[2]P[1|2] is minimized P[1] = P[the case belongs to  1 ] P[2] = P[the case belongs to  2 ] c 2|1 = the cost of misclassifying the case in  2 when the case belongs to  1. c 1|2 = the cost of misclassifying the case in  1 when the case belongs to  2.

4.Set up the regions C 1 and C 2 The two types of error are equal: P[2|1] = P[1|2]

Computer security: P[2|1] = P[identifying a valid user as an imposter] P[2] = P[imposter]  1 : Valid users  2 : Imposters c 1|2 = the cost of identifying the user as a valid user when the user is an imposter. P[1|2] = P[identifying an imposter as a valid user ] P[1] = P[valid user] c 2|1 = the cost of identifying the user as an imposter when the user is a valid user.

This problem can be viewed as an Hypothesis testing problem P[2|1] =  H 0 :  1 is the correct population H A :  2 is the correct population P[1|2] =  Power = 1 - 

The Neymann-Pearson Lemma Suppose that the data x 1, …, x n has joint density function f(x 1, …, x n ;  ) where  is either  1 or  2. Let g(x 1, …, x n ) = f(x 1, …, x n ;  1 ) and h(x 1, …, x n ) = f(x 1, …, x n ;  2 ) We want to test H 0 :  =  1 (g is the correct distribution) against H A :  =  2 (h is the correct distribution)

The Neymann-Pearson Lemma states that the Uniformly Most Powerful (UMP) test of size  is to reject H 0 if: and accept H 0 if: where k  is chosen so that the test is of size .

Proof: Let C be the critical region of any test of size . Let Note: We want to show that

hence and Thus

and

Thus and when we add the common quantity to both sides. Q.E.D.

Fishers Linear Discriminant Function. Suppose that x 1, …, x p is either data from a p-variate Normal distribution with mean vector: The covariance matrix  is the same for both populations  1 and  2.

The Neymann-Pearson Lemma states that we should classify into populations  1 and  2 using: That is make the decision D 1 : population is  1 if ≥ k 

or and Finally we make the decision D 1 : population is  1 if where

The function Is called Fisher’s linear discriminant function

In the case where the populations are unknown but estimated from data Fisher’s linear discriminant function

Example 2 Annual financial data are collected for firms approximately 2 years prior to bankruptcy and for financially sound firms at about the same point in time. The data on the four variables x 1 = CF/TD = (cash flow)/(total debt), x 2 = NI/TA = (net income)/(Total assets), x 3 = CA/CL = (current assets)/(current liabilties, and x 4 = CA/NS = (current assets)/(net sales) are given in the following table.

The data are given in the following table: