Brief Review Probability and Statistics. Probability distributions Continuous distributions.

Slides:



Advertisements
Similar presentations
General Linear Model With correlated error terms  =  2 V ≠  2 I.
Advertisements

Point Estimation Notes of STAT 6205 by Dr. Fan.
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Lecture XXIII.  In general there are two kinds of hypotheses: one concerns the form of the probability distribution (i.e. is the random variable normally.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
Ch11 Curve Fitting Dr. Deshi Ye
Chap 9: Testing Hypotheses & Assessing Goodness of Fit Section 9.1: INTRODUCTION In section 8.2, we fitted a Poisson dist’n to counts. This chapter will.
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
The General Linear Model. The Simple Linear Model Linear Regression.
Maximum likelihood (ML) and likelihood ratio (LR) test
Hypothesis testing Some general concepts: Null hypothesisH 0 A statement we “wish” to refute Alternative hypotesisH 1 The whole or part of the complement.
Elementary hypothesis testing
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Statistical Inference Chapter 12/13. COMP 5340/6340 Statistical Inference2 Statistical Inference Given a sample of observations from a population, the.
Chapter 7 Sampling and Sampling Distributions
Elementary hypothesis testing
Maximum likelihood (ML) and likelihood ratio (LR) test
Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.
2. Point and interval estimation Introduction Properties of estimators Finite sample size Asymptotic properties Construction methods Method of moments.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Linear and generalised linear models
Continuous Random Variables and Probability Distributions
STATISTICAL INFERENCE PART VI
July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Minimal sufficient statistic.
Inferences About Process Quality
Maximum likelihood (ML)
The Neymann-Pearson Lemma Suppose that the data x 1, …, x n has joint density function f(x 1, …, x n ;  ) where  is either  1 or  2. Let g(x 1, …,
Maximum Likelihood Estimation
Probability Theory Summary
Chapter 5 Sampling and Statistics Math 6203 Fall 2009 Instructor: Ayona Chatterjee.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Functions of Random Variables. Methods for determining the distribution of functions of Random Variables 1.Distribution function method 2.Moment generating.
Random Sampling, Point Estimation and Maximum Likelihood.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
1 SMU EMIS 7364 NTU TO-570-N Inferences About Process Quality Updated: 2/3/04 Statistical Quality Control Dr. Jerrell T. Stracener, SAE Fellow.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Consistency An estimator is a consistent estimator of θ, if , i.e., if
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Brief Review Probability and Statistics. Probability distributions Continuous distributions.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Brief Review Probability and Statistics. Probability distributions Continuous distributions.
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Week 31 The Likelihood Function - Introduction Recall: a statistical model for some data is a set of distributions, one of which corresponds to the true.
n Point Estimation n Confidence Intervals for Means n Confidence Intervals for Differences of Means n Tests of Statistical Hypotheses n Additional Comments.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
Conditional Expectation
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Econometrics III Evgeniya Anatolievna Kolomak, Professor.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Stat 223 Introduction to the Theory of Statistics
STATISTICS POINT ESTIMATION
Probability and Statistics
Chapter 9 Hypothesis Testing.
Stat 223 Introduction to the Theory of Statistics
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Presentation transcript:

Brief Review Probability and Statistics

Probability distributions Continuous distributions

Defn (density function) Let x denote a continuous random variable then f(x) is called the density function of x 1) f(x) ≥ 0 2) 3)

Defn (Joint density function) Let x = (x 1,x 2,x 3,..., x n ) denote a vector of continuous random variables then f(x) = f(x 1,x 2,x 3,..., x n ) is called the joint density function of x = (x 1,x 2,x 3,..., x n ) if 1) f(x) ≥ 0 2) 3)

Note:

Defn (Marginal density function) The marginal density of x 1 = (x 1,x 2,x 3,..., x p ) (p < n) is defined by: f 1 (x 1 ) = = where x 2 = (x p+1,x p+2,x p+3,..., x n ) The marginal density of x 2 = (x p+1,x p+2,x p+3,..., x n ) is defined by: f 2 (x 2 ) = = where x 1 = ( x 1,x 2,x 3,..., x p )

Defn (Conditional density function) The conditional density of x 1 given x 2 (defined in previous slide) (p < n) is defined by: f 1|2 (x 1 |x 2 ) = conditional density of x 2 given x 1 is defined by: f 2|1 (x 2 |x 1 ) =

Marginal densities describe how the subvector x i behaves ignoring x j Conditional densities describe how the subvector x i behaves when the subvector x j is held fixed

Defn (Independence) The two sub-vectors (x 1 and x 2 ) are called independent if: f(x) = f(x 1, x 2 ) = f 1 (x 1 )f 2 (x 2 ) = product of marginals or the conditional density of x i given x j : f i|j (x i |x j ) = f i (x i ) = marginal density of x i

Example (p-variate Normal) The random vector x (p × 1) is said to have the p-variate Normal distribution with mean vector  (p × 1) and covariance matrix  (p × p) (written x ~ N p ( ,  )) if:

Example (bivariate Normal) The random vector is said to have the bivariate Normal distribution with mean vector and covariance matrix

Theorem (Transformations) Let x = (x 1,x 2,x 3,..., x n ) denote a vector of continuous random variables with joint density function f(x 1,x 2,x 3,..., x n ) = f(x). Let y 1 =  1 (x 1,x 2,x 3,..., x n ) y 2 =  2 (x 1,x 2,x 3,..., x n )... y n =  n (x 1,x 2,x 3,..., x n ) define a 1-1 transformation of x into y.

Then the joint density of y is g(y) given by: g(y) = f(x)|J| where = the Jacobian of the transformation

Corollary (Linear Transformations) Let x = (x 1,x 2,x 3,..., x n ) denote a vector of continuous random variables with joint density function f(x 1,x 2,x 3,..., x n ) = f(x). Let y 1 = a 11 x 1 + a 12 x 2 + a 13 x 3,... + a 1n x n y 2 = a 21 x 1 + a 22 x 2 + a 23 x 3,... + a 2n x n... y n = a n1 x 1 + a n2 x 2 + a n3 x 3,... + a nn x n define a 1-1 transformation of x into y.

Then the joint density of y is g(y) given by:

Corollary (Linear Transformations for Normal Random variables) Let x = (x 1,x 2,x 3,..., x n ) denote a vector of continuous random variables having an n-variate Normal distribution with mean vector  and covariance matrix . i.e. x ~ N n ( ,  ) Let y 1 = a 11 x 1 + a 12 x 2 + a 13 x 3,... + a 1n x n y 2 = a 21 x 1 + a 22 x 2 + a 23 x 3,... + a 2n x n... y n = a n1 x 1 + a n2 x 2 + a n3 x 3,... + a nn x n define a 1-1 transformation of x into y. Then y = (y 1,y 2,y 3,..., y n ) ~ N n (A ,A  A')

Defn (Expectation) Let x = (x 1,x 2,x 3,..., x n ) denote a vector of continuous random variables with joint density function f(x) = f(x 1,x 2,x 3,..., x n ). Let U = h(x) = h(x 1,x 2,x 3,..., x n ) Then

Defn (Conditional Expectation) Let x = (x 1,x 2,x 3,..., x n ) = (x 1, x 2 ) denote a vector of continuous random variables with joint density function f(x) = f(x 1,x 2,x 3,..., x n ) = f(x 1, x 2 ). Let U = h(x 1 ) = h(x 1,x 2,x 3,..., x p ) Then the conditional expectation of U given x 2

Defn (Variance) Let x = (x 1,x 2,x 3,..., x n ) denote a vector of continuous random variables with joint density function f(x) = f(x 1,x 2,x 3,..., x n ). Let U = h(x) = h(x 1,x 2,x 3,..., x n ) Then

Defn (Conditional Variance) Let x = (x 1,x 2,x 3,..., x n ) = (x 1, x 2 ) denote a vector of continuous random variables with joint density function f(x) = f(x 1,x 2,x 3,..., x n ) = f(x 1, x 2 ). Let U = h(x 1 ) = h(x 1,x 2,x 3,..., x p ) Then the conditional variance of U given x 2

Defn (Covariance, Correlation) Let x = (x 1,x 2,x 3,..., x n ) denote a vector of continuous random variables with joint density function f(x) = f(x 1,x 2,x 3,..., x n ). Let U = h(x) = h(x 1,x 2,x 3,..., x n ) and V = g(x) =g(x 1,x 2,x 3,..., x n ) Then the covariance of U and V.

Properties Expectation Variance Covariance Correlation

1. E[a 1 x 1 + a 2 x 2 + a 3 x a n x n ] = a 1 E[x 1 ] + a 2 E[x 2 ] + a 3 E[x 3 ] a n E[x n ] or E[a'x] = a'E[x]

2.E[UV] = E[h(x 1 )g(x 2 )] = E[U]E[V] = E[h(x 1 )]E[g(x 2 )] if x 1 and x 2 are independent

3. Var[a 1 x 1 + a 2 x 2 + a 3 x a n x n ] or Var[a'x] = a′  a

4. Cov[a 1 x 1 + a 2 x a n x n, b 1 x 1 + b 2 x b n x n ] or Cov[a'x, b'x] = a′  b

5. 6.

Statistical Inference Making decisions from data

There are two main areas of Statistical Inference Estimation – deciding on the value of a parameter –Point estimation –Confidence Interval, Confidence region Estimation Hypothesis testing –Deciding if a statement (hypotheisis) about a parameter is True or False

The general statistical model Most data fits this situation

Defn (The Classical Statistical Model) The data vector x = (x 1,x 2,x 3,..., x n ) The model Let f(x|  ) = f(x 1,x 2,..., x n |  1,  2,...,  p ) denote the joint density of the data vector x = (x 1,x 2,x 3,..., x n ) of observations where the unknown parameter vector    (a subset of p-dimensional space).

An Example The data vector x = (x 1,x 2,x 3,..., x n ) a sample from the normal distribution with mean  and variance  2 The model Then f(x| ,  2 ) = f(x 1,x 2,..., x n | ,  2 ), the joint density of x = (x 1,x 2,x 3,..., x n ) takes on the form: where the unknown parameter vector  ( ,  2 )   ={(x,y)|-∞ < x < ∞, 0 ≤ y < ∞}.

Defn (Sufficient Statistics) Let x have joint density f(x|  ) where the unknown parameter vector   . Then S = (S 1 (x),S 2 (x),S 3 (x),..., S k (x)) is called a set of sufficient statistics for the parameter vector  if the conditional distribution of x given S = (S 1 (x),S 2 (x),S 3 (x),..., S k (x)) is not functionally dependent on the parameter vector . A set of sufficient statistics contains all of the information concerning the unknown parameter vector

A Simple Example illustrating Sufficiency Suppose that we observe a Success-Failure experiment n = 3 times. Let  denote the probability of Success. Suppose that the data that is collected is x 1, x 2, x 3 where x i takes on the value 1 is the i th trial is a Success and 0 if the i th trial is a Failure.

The following table gives possible values of (x 1, x 2, x 3 ). The data can be generated in two equivalent ways: 1.Generating (x 1, x 2, x 3 ) directly from f (x 1, x 2, x 3 |  ) or 2.Generating S from g(S|  ) then generating (x 1, x 2, x 3 ) from f (x 1, x 2, x 3 |S). Since the second step does involve  no additional information will be obtained by knowing (x 1, x 2, x 3 ) once S is determined

The Sufficiency Principle Any decision regarding the parameter  should be based on a set of Sufficient statistics S 1 (x), S 2 (x),...,S k (x) and not otherwise on the value of x.

A useful approach in developing a statistical procedure 1.Find sufficient statistics 2.Develop estimators, tests of hypotheses etc. using only these statistics

Defn (Minimal Sufficient Statistics) Let x have joint density f(x|  ) where the unknown parameter vector   . Then S = (S 1 (x),S 2 (x),S 3 (x),..., S k (x)) is a set of Minimal Sufficient statistics for the parameter vector  if S = (S 1 (x),S 2 (x),S 3 (x),..., S k (x)) is a set of Sufficient statistics and can be calculated from any other set of Sufficient statistics.

Theorem (The Factorization Criterion) Let x have joint density f(x|  ) where the unknown parameter vector   . Then S = (S 1 (x),S 2 (x),S 3 (x),..., S k (x)) is a set of Sufficient statistics for the parameter vector  if f(x|  ) = h(x)g(S,  ) = h(x)g(S 1 (x),S 2 (x),S 3 (x),..., S k (x),  ). This is useful for finding Sufficient statistics i.e. If you can factor out q-dependence with a set of statistics then these statistics are a set of Sufficient statistics

Defn (Completeness) Let x have joint density f(x|  ) where the unknown parameter vector   . Then S = (S 1 (x),S 2 (x),S 3 (x),..., S k (x)) is a set of Complete Sufficient statistics for the parameter vector  if S = (S 1 (x),S 2 (x),S 3 (x),..., S k (x)) is a set of Sufficient statistics and whenever E[  (S 1 (x),S 2 (x),S 3 (x),..., S k (x)) ] = 0 then P[  (S 1 (x),S 2 (x),S 3 (x),..., S k (x)) = 0] = 1

Defn (The Exponential Family) Let x have joint density f(x|  )| where the unknown parameter vector   . Then f(x|  ) is said to be a member of the exponential family of distributions if:  ,where

1)- ∞ < a i < b i < ∞ are not dependent on . 2)  contains a nondegenerate k-dimensional rectangle. 3) g(  ), a i,b i and p i (  ) are not dependent on x. 4) h(x), a i,b i and S i (x) are not dependent on q.

If in addition. 5) The S i (x) are functionally independent for i = 1, 2,..., k. 6)  [S i (x)]/  x j exists and is continuous for all i = 1, 2,..., k j = 1, 2,..., n. 7) p i (  ) is a continuous function of  for all i = 1, 2,..., k. 8) R = {[p 1 (  ),p 2 (  ),...,p K (  )] |   ,} contains nondegenerate k-dimensional rectangle. Then the set of statistics S 1 (x), S 2 (x),...,S k (x) form a Minimal Complete set of Sufficient statistics.

Defn (The Likelihood function) Let x have joint density f(x|  ) where the unkown parameter vector  . Then for a given value of the observation vector x,the Likelihood function, L x (  ), is defined by: L x (  ) = f(x|  ) with   The log Likelihood function l x (  ) is defined by: l x (  ) =lnL x (  ) = lnf(x|  ) with  

The Likelihood Principle Any decision regarding the parameter  should be based on the likelihood function L x (  ) and not otherwise on the value of x. If two data sets result in the same likelihood function the decision regarding  should be the same.

Some statisticians find it useful to plot the likelihood function L x (  ) given the value of x. It summarizes the information contained in x regarding the parameter vector .

An Example The data vector x = (x 1,x 2,x 3,..., x n ) a sample from the normal distribution with mean  and variance  2 The joint distribution of x Then f(x| ,  2 ) = f(x 1,x 2,..., x n | ,  2 ), the joint density of x = (x 1,x 2,x 3,..., x n ) takes on the form: where the unknown parameter vector  ( ,  2 )   ={(x,y)|-∞ < x < ∞, 0 ≤ y < ∞}.

The Likelihood function Assume data vector is known x = (x 1,x 2,x 3,..., x n ) The Likelihood function Then L( ,  )= f(x| ,  ) = f(x 1,x 2,..., x n | ,  2 ),

or

hence Now consider the following data: (n = 10)

 

 

Now consider the following data: (n = 100)

 

 

The Sufficiency Principle Any decision regarding the parameter  should be based on a set of Sufficient statistics S 1 (x), S 2 (x),...,S k (x) and not otherwise on the value of x. If two data sets result in the same values for the set of Sufficient statistics the decision regarding  should be the same.

Theorem (Birnbaum - Equivalency of the Likelihood Principle and Sufficiency Principle) L x 1 (  )  L x 2 (  ) if and only if S 1 (x 1 ) = S 1 (x 2 ),..., and S k (x 1 ) = S k (x 2 )

The following table gives possible values of (x 1, x 2, x 3 ). The Likelihood function

Estimation Theory Point Estimation

Defn (Estimator) Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector  . Then an estimator of the parameter  (  ) =  (  1,  2,...,  k ) is any function T(x)=T(x 1,x 2,x 3,..., x n ) of the observation vector.

Defn (Mean Square Error) Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector  . Let T(x) be an estimator of the parameter  (  ). Then the Mean Square Error of T(x) is defined to be:

Defn (Uniformly Better) Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector  . Let T(x) and T*(x) be estimators of the parameter  (  ). Then T(x) is said to be uniformly better than T*(x) if:

Defn (Unbiased ) Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector  . Let T(x) be an estimator of the parameter  (  ). Then T(x) is said to be an unbiased estimator of the parameter  (  ) if:

Theorem (Cramer Rao Lower bound) Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector  . Suppose that: i) exists for all x and for all. ii) iii) iv)

Let M denote the p x p matrix with ij th element. Then V = M -1 is the lower bound for the covariance matrix of unbiased estimators of . That is, var(c' ) = c'var( )c ≥ c'M -1 c = c'Vc where is a vector of unbiased estimators of .

Defn (Uniformly Minimum Variance Unbiased Estimator) Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector   . Then T*(x) is said to be the UMVU (Uniformly minimum variance unbiased) estimator of  (  ) if: 1) E[T*(x)] =  (  ) for all   . 2) Var[T*(x)] ≤ Var[T(x)] for all    whenever E[T(x)] =  (  ).

Theorem (Rao-Blackwell) Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector  . Let S 1 (x), S 2 (x),...,S K (x) denote a set of sufficient statistics. Let T(x) be any unbiased estimator of  (  ). Then T*[S 1 (x), S 2 (x),...,S k (x)] = E[T(x)|S 1 (x), S 2 (x),...,S k (x)] is an unbiased estimator of  (  ) such that: Var[T*(S 1 (x), S 2 (x),...,S k (x))] ≤ Var[T(x)] for all   .

Theorem (Lehmann-Scheffe') Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector  . Let S 1 (x), S 2 (x),...,S K (x) denote a set of complete sufficient statistics. Let T*[S 1 (x), S 2 (x),...,S k (x)] be an unbiased estimator of  (  ). Then: T*(S 1 (x), S 2 (x),...,S k (x)) )] is the UMVU estimator of  (  ).

Defn ( Consistency ) Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector   . Let T n (x) be an estimator of  (  ). Then T n (x) is called a consistent estimator of  (  ) if for any  > 0:

Defn (M. S. E. Consistency ) Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector   . Let T n (x) be an estimator of  (  ). Then T n (x) is called a M. S. E. consistent estimator of  (  ) if for any  > 0:

Methods for Finding Estimators 1.The Method of Moments 2.Maximum Likelihood Estimation

Methods for finding estimators 1.Method of Moments 2.Maximum Likelihood Estimation

Let x 1, …, x n denote a sample from the density function f(x;  1, …,  p ) = f(x;  ) Method of Moments The k th moment of the distribution being sampled is defined to be:

To find the method of moments estimator of  1, …,  p we set up the equations: The k th sample moment is defined to be:

for  1, …,  p. We then solve the equations The solutions are called the method of moments estimators

The Method of Maximum Likelihood Suppose that the data x 1, …, x n has joint density function f(x 1, …, x n ;  1, …,  p ) where  (  1, …,  p ) are unknown parameters assumed to lie in  (a subset of p-dimensional space). We want to estimate the parameters  1, …,  p

Definition: Maximum Likelihood Estimation Suppose that the data x 1, …, x n has joint density function f(x 1, …, x n ;  1, …,  p ) Then the Likelihood function is defined to be L(  ) = L(  1, …,  p ) = f(x 1, …, x n ;  1, …,  p ) the Maximum Likelihood estimators of the parameters  1, …,  p are the values that maximize L(  ) = L(  1, …,  p )

the Maximum Likelihood estimators of the parameters  1, …,  p are the values Such that Note: is equivalent to maximizing the log-likelihood function

Application The General Linear Model

Consider the random variable Y with 1. E[Y] = g(U 1,U 2,..., U k ) =  1  1 (U 1,U 2,..., U k ) +  2  2 (U 1,U 2,..., U k )  p  p (U 1,U 2,..., U k ) = and 2. var(Y) =  2 where  1,  2,...,  p are unknown parameters and  1,  2,...,  p are known functions of the nonrandom variables U 1,U 2,..., U k. Assume further that Y is normally distributed.

Thus the density of Y is: f(Y|  1,  2,...,  p,  2 ) = f(Y| ,  2 ) i = 1,2, …, p

Now suppose that n independent observations of Y, (y 1, y 2,..., y n ) are made corresponding to n sets of values of (U 1,U 2,..., U k ) - (u 11,u 12,..., u 1k ), (u 21,u 22,..., u 2k ),... (u n1,u n2,..., u nk ). Let x ij =  j (u i1,u i2,..., u ik ) j =1, 2,..., p; i =1, 2,..., n. Then the joint density of y = (y 1, y 2,... y n ) is: f(y 1, y 2,..., y n |  1,  2,...,  p,  2 ) = f(y| ,  2 )

Thus f(y| ,  2 ) is a member of the exponential family of distributions and S = (y'y, X'y) is a Minimal Complete set of Sufficient Statistics.

Hypothesis Testing

Defn (Test of size  ) Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector  . Let  be any subset of . Consider testing the the Null Hypothesis H 0 :   against the alternative hypothesis H 1 :   .

Let A denote the acceptance region for the test. (all values x = (x 1,x 2,x 3,..., x n ) of such that the decision to accept H 0 is made.) and let C denote the critical region for the test (all values x = (x 1,x 2,x 3,..., x n ) of such that the decision to reject H 0 is made.). Then the test is said to be of size  if

Defn (Power) Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector  . Consider testing the the Null Hypothesis H 0 :   against the alternative hypothesis H 1 :  . where  is any subset of . Then the Power of the test for   is defined to be:

Defn (Uniformly Most Powerful (UMP) test of size  ) Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector  . Consider testing the the Null Hypothesis H 0 :   against the alternative hypothesis H 1 :   . where  is any subset of . Let C denote the critical region for the test. Then the test is called the UMP test of size  if:

Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector  . Consider testing the the Null Hypothesis H 0 :   against the alternative hypothesis H 1 :   . where  is any subset of . Let C denote the critical region for the test. Then the test is called the UMP test of size  if:

and for any other critical region C* such that: then

Theorem (Neymann-Pearson Lemma) Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector   = (  0,  1 ). Consider testing the the Null Hypothesis H 0 :  =  0 against the alternative hypothesis H 1 :  =  1. Then the UMP test of size  has critical region: where K is chosen so that

Defn (Likelihood Ratio Test of size  ) Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector  . Consider testing the the Null Hypothesis H 0 :   against the alternative hypothesis H 1 :   . where  is any subset of  Then the Likelihood Ratio (LR) test of size a has critical region: where K is chosen so that

Theorem (Asymptotic distribution of Likelihood ratio test criterion) Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector  . Consider testing the the Null Hypothesis H 0 :   against the alternative hypothesis H 1 :   . where  is any subset of  Then under proper regularity conditions on U = -2ln (x) possesses an asymptotic Chi-square distribution with degrees of freedom equal to the difference between the number of independent parameters in  and .