Copyright © Cengage Learning. All rights reserved. 6 Point Estimation Copyright © Cengage Learning. All rights reserved.
6.2 Methods of Point Estimation Copyright © Cengage Learning. All rights reserved.
Methods of Point Estimation We now introduce two “constructive” methods for obtaining point estimators: the method of moments and the method of maximum likelihood. By constructive we mean that the general definition of each type of estimator suggests explicitly how to obtain the estimator in any specific problem.
Methods of Point Estimation Although maximum likelihood estimators are generally preferable to moment estimators because of certain efficiency properties, they often require significantly more computation than do moment estimators. It is sometimes the case that these methods yield unbiased estimators.
The Method of Moments
The Method of Moments The basic idea of this method is to equate certain sample characteristics, such as the mean, to the corresponding population expected values. Then solving these equations for unknown parameter values yields the estimators.
The Method of Moments Definition Thus the first population moment is E(X) = , and the first sample moment is Xi/n = The second population and sample moments are E(X2) and Xi2/n, respectively. The population moments will be functions of any unknown parameters 1, 2, . . . .
The Method of Moments Definition
The Method of Moments If, for example, m = 2, E(X) and E(X2) will be functions of 1 and 2. Setting E(X) = (1/n) Xi (= ) and E(X2) = (1/n) Xi2 gives two equations in 1 and 2. The solution then defines the estimators.
Example 6.12 Let X1, X2, . . . , Xn represent a random sample of service times of n customers at a certain facility, where the underlying distribution is assumed exponential with parameter . Since there is only one parameter to be estimated, the estimator is obtained by equating E(X) to . Since E(X) = 1/ for an exponential distribution, this gives 1/ = or = 1/ . The moment estimator of is then
Maximum Likelihood Estimation
Maximum Likelihood Estimation The method of maximum likelihood was first introduced by R. A. Fisher, a geneticist and statistician, in the 1920s. Most statisticians recommend this method, at least when the sample size is large, since the resulting estimators have certain desirable efficiency properties.
Example 6.15 The best protection against hacking into an online account is to use a password that has at least 8 characters consisting of upper- and lowercase letters, numerals, and special characters. [Note: The Jan. 2012 issue of Consumer Reports reported that only 25% of individuals surveyed used a strong password.] Suppose that 10 individuals who have email accounts with a certain provider are selected, and it is found that the first, third, and tenth individuals have such strong protection, whereas the others do not.
Example 6.15 Let p = P(strong protection), i.e., p is the proportion of all such account holders having strong protection. Define (Bernoulli) random variables 𝑋 1 , 𝑋 2 ,…, 𝑋 10 by Then for the obtained sample, 𝑋 1 = 𝑋 3 = 𝑋 10 =1and the other seven 𝑋 𝑖 ’s are all zero. The probability mass function of any particular 𝑋 𝑖 is, 𝑝 𝑥𝑖 ( 1−𝑝) 1−𝑥𝑖 , which becomes p if 𝑋 𝑖 = 1 and 1 - p when 𝑋 𝑖 = 0. Now suppose that the conditions of various passwords are independent of one another.
Example 6.15 This implies that the 𝑋 𝑖 ’s are independent, so their joint probability mass function is the product of the individual pmf’s. Thus the joint pmf evaluated at the observed 𝑋 𝑖 ’s is (6.4) Suppose that p = .25. Then the probability of observing the sample that we actually obtained is (.25) 3 (.75) 7 =.002086 If instead p = .50, then this probability is (.50) 3 (.50) 7 =.000977. For what value of p is the obtained sample most likely to have occurred? That is, for what value of p is the joint pmf (6.4) as large as it can be? What value of p maximizes (6.4)?
Graph of the likelihood (joint pmf) (6.4) from Example 15 cont’d Figure 6.6(a) shows a graph of the likelihood (6.4) as a function of p. It appears that the graph reaches its peak above p = .3 = the proportion of flawed helmets in the sample. Graph of the likelihood (joint pmf) (6.4) from Example 15 Figure 6.6(a)
Graph of the natural logarithm of the likelihood Example 6.15 cont’d Figure 6.6(b) shows a graph of the natural logarithm of (6.4); since ln[g(u)] is a strictly increasing function of g(u), finding u to maximize the function g(u) is the same as finding u to maximize ln[g(u)]. Graph of the natural logarithm of the likelihood Figure 6.6(b)
Example 6.15 cont’d We can verify our visual impression by using calculus to find the value of p that maximizes (6.4). Working with the natural log of the joint pmf is often easier than working with the joint pmf itself, since the joint pmf is typically a product so its logarithm will be a sum. Here ln[ f (x1, . . . , x10; p)] = ln[p3(1 – p)7] = 3ln(p) + 7ln(1 – p) (6.5)
Example 6.15 Thus [the (1) comes from the chain rule in calculus]. cont’d Thus [the (1) comes from the chain rule in calculus].
Example 6.15 cont’d Equating this derivative to 0 and solving for p gives 3(1 – p) = 7p, from which 3 = 10p and so p = 3/10 = .30 as conjectured. That is, our point estimate is = .30. It is called the maximum likelihood estimate because it is the parameter value that maximizes the likelihood (joint pmf) of the observed sample. In general, the second derivative should be examined to make sure a maximum has been obtained, but here this is obvious from Figure 6.5.
Example 6.15 cont’d Suppose that rather than being told the condition of every password, we had only been informed that three of the ten were strong. Then we would have the observed value of a binomial random variable X = the number with strong passwords. The pmf of X is For x = 3, this becomes The binomial coefficient is irrelevant to the maximization, so again = .30.
Maximum Likelihood Estimation Definition
Maximum Likelihood Estimation The likelihood function tells us how likely the observed sample is as a function of the possible parameter values. Maximizing the likelihood gives the parameter values for which the observed sample is most likely to have been generated—that is, the parameter values that “agree most closely” with the observed data.
Example 6.16 Suppose X1, X2, . . . , Xn is a random sample from an exponential distribution with parameter . Because of independence, the likelihood function is a product of the individual pdf’s: The natural logarithm of the likelihood function is ln[ f (x1, . . . , xn ; )] = n ln() – xi
Example 6.16 cont’d Equating (d/d)[ln(likelihood)] to zero results in n/ – xi = 0, or = n/xi = Thus the mle is it is identical to the method of moments estimator [but it is not an unbiased estimator, since
Example 6.17 Let X1, . . . , Xn be a random sample from a normal distribution. The likelihood function is so
Example 6.17 cont’d To find the maximizing values of and 2, we must take the partial derivatives of ln(f ) with respect to and 2, equate them to zero, and solve the resulting two equations. Omitting the details, the resulting mle’s are The mle of 2 is not the unbiased estimator, so two different principles of estimation (unbiasedness and maximum likelihood) yield two different estimators.
Estimating Functions of Parameters
Estimating Functions of Parameters Once the mle for a parameter is available , the mle for any function of , such as 1/ or 𝜃 , is easily obtained Proposition
Example 6.20 Example 6.17 continued… In the normal case, the mle’s of and 2 are To obtain the mle of the function substitute the mle’s into the function: The mle of is not the sample standard deviation S, though they are close unless n is quite small.
Large Sample Behavior of the MLE
Large Sample Behavior of the MLE Although the principle of maximum likelihood estimation has considerable intuitive appeal, the following proposition provides additional rationale for the use of mle’s. Proposition
Large Sample Behavior of the MLE Because of this result and the fact that calculus-based techniques can usually be used to derive the mle’s (though often numerical methods, such as Newton’s method, are necessary), maximum likelihood estimation is the most widely used estimation technique among statisticians. Many of the estimators used in the remainder of the book are mle’s. Obtaining an mle, however, does require that the underlying distribution be specified.