CHAPTER 8 More About Estimation
8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering Bayesian estimates, which are also based upon sufficient statistics if the latter exist. We shall now describe the Bayesian approach to the problem of estimation. This approach takes into account any prior knowledge of the experiment that the statistician has and it is one application of a principle of statistical inference that may be called Bayesian statistics. Consider a random variable X that has a distribution of probability that depends upon the symbol, where is an element of a well-defined set. random variable : has a distribution of probability over the set. x: a possible value of the random variable X.
: a possible value of the random variable. The distribution of X depends upon, an experimental value of the random variable. We shall denote the p.d.f. of by and we take when is not an element of. Moreover, we now denote the p.d.f. of X by since we think of it as a conditional p.d.f. of X, given Say is a random sample from his conditional distribution of X. Thus we can write the joint conditional p.d.f. of, given as Thus the joint p.d.f. of and is
If is a random variable of the continuous type, the joint marginal p.d.f. of is given by If is a random variable of the discrete type, integration would be replaced by summation. In either case the conditional p.d.f. of,given is This ralationship is another form of Bayes' formula.
Example. Let be a random sample from a Poission distribution with mean, where is the observed value of a random variable having a gamma distribution with known parameters and. Thus provided that and, and is equal to zero elsewhere. Then
Finally, the conditional p.d.f. of, given is provided that, and is equal to zero elsewhere. This conditional p.d.f. is one of the gamma type with parameters and
Bayesian statisticians frequently write that is proportional to that is, In example 1, the Bayesian statistician would simply write or, equivalently, and is equal to zero elsewhere.
In Bayesian statistics, the p.d.f. is called the prior p.d.f. of, and the conditional p.d.f. is called the posterior p.d.f. of. Suppose that we want a point estimate of, this really amounts to selecting a decision function, so that is a predicted value of when the computed value y and are known. : an experimental value of any random variable; : the mean, of the distribution of ; : the loss function. A Bayes' solution is a decision function that minimizes
If an interval estimate of is desired, we can find two functions and so that the conditional probability is large, say 0.95.
8.2 Fisher Information and the Rao-Cramer Inequality Let X be a random variable with p.d.f. where the parameter space is an interval. We consider only special cases, sometimes called regular cases, of probability density functions as we wish to differentiate under an integral sign. We have that and, by taking the derivative with respect to, (1)
The latter expression can be rewritten as or, equivalently, If we differentiate again, if follows that (2) We rewrite the second term of the left-hand member of this equation as
This is called Fisher information and is denoted by That is, but, from Equation (2), we see that can be computed from Sometimes, one expression is easier to compute than the other, but often we prefer the second expression.
Example 1. Let X be binomial Thus and Clearly, which is larger for values close to zero or 1.
The likelihood function is The Rao-Cram é r inequality: ( )
Definition 1. Let Y be an unbiased estimator of a parameter in such a case of point estimation. The statistic Y is called an efficient estimator of if and only if the variance of Y attains the Rao-Cram é r lower bound. Definition 2. In cases in which we can differentiate with respect to a parameter under an integral or summation symbol, the ratio of the Rao- Cram é r lower bound to the actual variance of any unbiased estimation of a parameter is called the efficiency of that statistic. Example 2. Let denote a random sample from a Poisson distribution that has the mean
It is known that is an m.l.e. of we shall show that it is also an efficient estimator of. We have Accordingly, The Rao-Cram é r lower bound in this case is But is the variance of. Hence is an efficient estimator of.
8.3 Limiting Distributions of Maximum Likelihood Estimators We can differentiate under the integral sign, so that has mean zero and variance. In addition, we want to be able to find the maximum likelihood estimator by solving
This equation can be rewrited as (1) Since Z is the sum of the i.i.d. random variables each with mean zero and variance,the numerator of the right-hand member of Equation (1)
is limiting N(0,1) by the central limit theorem. Example. Suppose that the random sample arises from a distribution with p.d.f. zero elsewhere. We have and
Since, the lower bound of the variance of every unbiased estimator of is. Moreover, the maximum likelihood estimator has an approximate normal distribution with mean and variance.Thus, in a limiting sense, is the unbiased minimum variance estimator of ; that is, is asymptotically efficient.
8.4 Robust M-Estimation We have found the m.l.e. of the center of the Cauchy distribution with p.d.f. where.The logarithm of the likelihood function of a random sample from this distribution is
The equation can be solved by some iterative process. We use the weight function where and where We have
and In addition, we define a weight function as which equals in the Cauchy case. Definition 1. An estimator that is fairly good (small variance, say) for a wide variety of distributions (not necessarily the best for any one of them) is called a robust estimator. Definition 2. Estimators associated with the solution of the equation
are frequently called robust M-estimators (denoted by ) because they can be thought of as maximum likelihood estimators. Huber's function: with weight provided that With Huber's function, another problem arises: If we double each estimators such as and median also double.
This is not at all true with the solution of the equation where the function is that of Huber. (1) A popular d to use is The scheme of selecting d also provides us with a clue for selecting k.
To satisfy the inequality Because then If all the values satisfy this inequality, then Equation (1) becomes This has the solution, which of course is most desirable with normal distributions.
To solve Equation (1). Newton's method. Let be a first estimate of, such as (the one-step M-estimate of ) If we use in place of, we obtain,the two-step M-estimate of.
The scale parameter is known. Two terms of Taylor's expansion of about provides the approximation This can be rewritten (2)
We have considered Clearly, Thus Equation (2) can be rewritten as (3)