Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: introduction to maximum likelihood estimation Original citation: Dougherty,

Slides:



Advertisements
Similar presentations
Christopher Dougherty EC220 - Introduction to econometrics (chapter 8) Slideshow: model b: properties of the regression coefficients Original citation:
Advertisements

EC220 - Introduction to econometrics (chapter 1)
1 MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS X Y XiXi 11  1  +  2 X i Y =  1  +  2 X We will now apply the maximum likelihood principle.
1 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION This sequence introduces the principle of maximum likelihood estimation and illustrates it with some simple.
EC220 - Introduction to econometrics (chapter 3)
EC220 - Introduction to econometrics (review chapter)
Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: asymptotic properties of estimators: the use of simulation Original.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: slope dummy variables Original citation: Dougherty, C. (2012) EC220 -
Christopher Dougherty EC220 - Introduction to econometrics (chapter 2) Slideshow: a Monte Carlo experiment Original citation: Dougherty, C. (2012) EC220.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 11) Slideshow: adaptive expectations Original citation: Dougherty, C. (2012) EC220.
1 THE DISTURBANCE TERM IN LOGARITHMIC MODELS Thus far, nothing has been said about the disturbance term in nonlinear regression models.
EC220 - Introduction to econometrics (chapter 7)
1 XX X1X1 XX X Random variable X with unknown population mean  X function of X probability density Sample of n observations X 1, X 2,..., X n : potential.
Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: asymptotic properties of estimators: plims and consistency Original.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 13) Slideshow: stationary processes Original citation: Dougherty, C. (2012) EC220 -
Christopher Dougherty EC220 - Introduction to econometrics (chapter 13) Slideshow: tests of nonstationarity: introduction Original citation: Dougherty,
Christopher Dougherty EC220 - Introduction to econometrics (chapter 12) Slideshow: dynamic model specification Original citation: Dougherty, C. (2012)
Christopher Dougherty EC220 - Introduction to econometrics (chapter 2) Slideshow: testing a hypothesis relating to a regression coefficient Original citation:
Christopher Dougherty EC220 - Introduction to econometrics (chapter 7) Slideshow: exercise 7.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
1 THE NORMAL DISTRIBUTION In the analysis so far, we have discussed the mean and the variance of a distribution of a random variable, but we have not said.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: interactive explanatory variables Original citation: Dougherty, C. (2012)
EC220 - Introduction to econometrics (chapter 7)
1 PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE red This sequence provides an example of a discrete random variable. Suppose that you.
Random effects estimation RANDOM EFFECTS REGRESSIONS When the observed variables of interest are constant for each individual, a fixed effects regression.
MEASUREMENT ERROR 1 In this sequence we will investigate the consequences of measurement errors in the variables in a regression model. To keep the analysis.
ASYMPTOTIC PROPERTIES OF ESTIMATORS: PLIMS AND CONSISTENCY
EC220 - Introduction to econometrics (chapter 2)
EC220 - Introduction to econometrics (chapter 9)
EXPECTED VALUE OF A RANDOM VARIABLE 1 The expected value of a random variable, also known as its population mean, is the weighted average of its possible.
Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: expected value of a function of a random variable Original citation:
TESTING A HYPOTHESIS RELATING TO THE POPULATION MEAN 1 This sequence describes the testing of a hypothesis at the 5% and 1% significance levels. It also.
Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: confidence intervals Original citation: Dougherty, C. (2012) EC220.
EC220 - Introduction to econometrics (review chapter)
Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: continuous random variables Original citation: Dougherty, C. (2012)
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: prediction Original citation: Dougherty, C. (2012) EC220 - Introduction.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: precision of the multiple regression coefficients Original citation:
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: semilogarithmic models Original citation: Dougherty, C. (2012) EC220.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: maximum likelihood estimation of regression coefficients Original citation:
DERIVING LINEAR REGRESSION COEFFICIENTS
Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: the normal distribution Original citation: Dougherty, C. (2012)
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy variable classification with two categories Original citation:
1 PREDICTION In the previous sequence, we saw how to predict the price of a good or asset given the composition of its characteristics. In this sequence,
EC220 - Introduction to econometrics (review chapter)
1 UNBIASEDNESS AND EFFICIENCY Much of the analysis in this course will be concerned with three properties of estimators: unbiasedness, efficiency, and.
Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: sampling and estimators Original citation: Dougherty, C. (2012)
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: the effects of changing the reference category Original citation: Dougherty,
Christopher Dougherty EC220 - Introduction to econometrics (chapter 12) Slideshow: autocorrelation, partial adjustment, and adaptive expectations Original.
Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: conflicts between unbiasedness and minimum variance Original citation:
Christopher Dougherty EC220 - Introduction to econometrics (chapter 8) Slideshow: measurement error Original citation: Dougherty, C. (2012) EC220 - Introduction.
THE FIXED AND RANDOM COMPONENTS OF A RANDOM VARIABLE 1 In this short sequence we shall decompose a random variable X into its fixed and random components.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 11) Slideshow: Friedman Original citation: Dougherty, C. (2012) EC220 - Introduction.
ALTERNATIVE EXPRESSION FOR POPULATION VARIANCE 1 This sequence derives an alternative expression for the population variance of a random variable. It provides.
CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE
Christopher Dougherty EC220 - Introduction to econometrics (chapter 7) Slideshow: weighted least squares and logarithmic regressions Original citation:
EC220 - Introduction to econometrics (chapter 8)
Christopher Dougherty EC220 - Introduction to econometrics (chapter 12) Slideshow: footnote: the Cochrane-Orcutt iterative process Original citation: Dougherty,
A.1The model is linear in parameters and correctly specified. PROPERTIES OF THE MULTIPLE REGRESSION COEFFICIENTS 1 Moving from the simple to the multiple.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 9) Slideshow: instrumental variable estimation: variation Original citation: Dougherty,
Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: multiple restrictions and zero restrictions Original citation: Dougherty,
1 We will now look at the properties of the OLS regression estimators with the assumptions of Model B. We will do this within the context of the simple.
1 We will continue with a variation on the basic model. We will now hypothesize that p is a function of m, the rate of growth of the money supply, as well.
Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: alternative expression for population variance Original citation:
1 ASYMPTOTIC PROPERTIES OF ESTIMATORS: THE USE OF SIMULATION In practice we deal with finite samples, not infinite ones. So why should we be interested.
Definition of, the expected value of a function of X : 1 EXPECTED VALUE OF A FUNCTION OF A RANDOM VARIABLE To find the expected value of a function of.
INSTRUMENTAL VARIABLES 1 Suppose that you have a model in which Y is determined by X but you have reason to believe that Assumption B.7 is invalid and.
1 ESTIMATORS OF VARIANCE, COVARIANCE, AND CORRELATION We have seen that the variance of a random variable X is given by the expression above. Variance.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 2) Slideshow: confidence intervals Original citation: Dougherty, C. (2012) EC220 -
Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: independence of two random variables Original citation: Dougherty,
Christopher Dougherty EC220 - Introduction to econometrics (chapter 1) Slideshow: simple regression model Original citation: Dougherty, C. (2012) EC220.
Introduction to Econometrics, 5th edition
Presentation transcript:

Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: introduction to maximum likelihood estimation Original citation: Dougherty, C. (2012) EC220 - Introduction to econometrics (chapter 10). [Teaching Resource] © 2012 The Author This version available at: Available in LSE Learning Resources Online: May 2012 This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License. This license allows the user to remix, tweak, and build upon the work even for commercial purposes, as long as the user credits the author and licenses their new creations under the identical terms

1 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION This sequence introduces the principle of maximum likelihood estimation and illustrates it with some simple examples. L p  

2 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION Suppose that you have a normally-distributed random variable X with unknown population mean  and standard deviation , and that you have a sample of two observations, 4 and 6. For the time being, we will assume that  is equal to 1. L p  

3 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION Suppose initially you consider the hypothesis  = 3.5. Under this hypothesis the probability density at 4 would be and that at 6 would be L p    p(4) p(6)

4 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION The joint probability density, shown in the bottom chart, is the product of these,  p(4) p(6) L L p  

5 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION Next consider the hypothesis  = 4.0. Under this hypothesis the probability densities associated with the two observations are and , and the joint probability density is  p(4) p(6) L L p  

6 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION Under the hypothesis  = 4.5, the probability densities are and , and the joint probability density is  p(4) p(6) L L p  

7 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION Under the hypothesis  = 5.0, the probability densities are both and the joint probability density is  p(4) p(6) L L p  

8 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION Under the hypothesis  = 5.5, the probability densities are and and the joint probability density is  p(4) p(6) L L p  

9 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION The complete joint density function for all values of  has now been plotted in the lower diagram. We see that it peaks at  = 5.  p(4) p(6) L p L  

10 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION Now we will look at the mathematics of the example. If X is normally distributed with mean  and standard deviation , its density function is as shown.

11 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION For the time being, we are assuming  is equal to 1, so the density function simplifies to the second expression.

12 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION Hence we obtain the probability densities for the observations where X = 4 and X = 6.

13 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION The joint probability density for the two observations in the sample is just the product of their individual densities. joint density

14 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION In maximum likelihood estimation we choose as our estimate of  the value that gives us the greatest joint density for the observations in our sample. This value is associated with the greatest probability, or maximum likelihood, of obtaining the observations in the sample. joint density

15 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION In the graphical treatment we saw that this occurs when  is equal to 5. We will prove this must be the case mathematically.  p(4) p(6) L p L  

16 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION To do this, we treat the sample values X = 4 and X = 6 as given and we use the calculus to determine the value of  that maximizes the expression.

17 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION When it is regarded in this way, the expression is called the likelihood function for , given the sample observations 4 and 6. This is the meaning of L(  | 4,6).

18 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION To maximize the expression, we could differentiate with respect to  and set the result equal to 0. This would be a little laborious. Fortunately, we can simplify the problem with a trick.

19 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION log L is a monotonically increasing function of L (meaning that log L increases if L increases and decreases if L decreases).

20 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION It follows that the value of  which maximizes log L is the same as the one that maximizes L. As it so happens, it is easier to maximize log L with respect to  than it is to maximize L.

21 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION The logarithm of the product of the density functions can be decomposed as the sum of their logarithms.

22 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION Using the product rule a second time, we can decompose each term as shown.

23 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION Now one of the basic rules for manipulating logarithms allows us to rewrite the second term as shown.

24 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION log e is equal to 1, another basic logarithm result. (Remember, as always, we are using natural logarithms, that is, logarithms to base e.)

25 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION Hence the second term reduces to a simple quadratic in X. And so does the fourth.

26 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION We will now choose  so as to maximize this expression.

27 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION Quadratic terms of the type in the expression can be expanded as shown.

28 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION Thus we obtain the differential of the quadratic term.

29 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION Applying this result, we obtain the differential of log L with respect to . (The first term in the expression for log L disappears completely since it is not a function of .)

30 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION Thus from the first order condition we confirm that 5 is the value of  that maximizes the log-likelihood function, and hence the likelihood function.

31 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION Note that a caret mark has been placed over , because we are now talking about an estimate of , not its true value.

32 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION Note also that the second differential of log L with respect to  is -2. Since this is negative, we have found a maximum, not a minimum.

33 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION We will generalize this result to a sample of n observations X 1,...,X n. The probability density for X i is given by the first line.

34 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION The joint density function for a sample of n observations is the product of their individual densities.

35 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION Now treating the sample values as fixed, we can re-interpret the joint density function as the likelihood function for , given this sample. We will find the value of  that maximizes it.

36 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION We will do this indirectly, as before, by maximizing log L with respect to . The logarithm decomposes as shown.

37 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION We differentiate log L with respect to .

38 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION The first order condition for a minimum is that the differential be equal to zero.

39 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION Thus we have demonstrated that the maximum likelihood estimator of  is the sample mean. The second differential, -n, is negative, confirming that we have maximized log L.

40 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION So far we have assumed that , the standard deviation of the distribution of X, is equal to 1. We will now relax this assumption and find the maximum likelihood estimator of it.

41 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION We will illustrate the process graphically with the two-observation example, keeping  fixed at 5. We will start with  equal to 2. L  p 

42 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION With  equal to 2, the probability density is for both X = 4 and X = 6, and the joint density is L  p  p(4) p(6) L 

43 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION Now try  equal to 1. The individual densities are and so the joint density, , has increased. L  p  p(4) p(6) L 

44 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION Now try putting  equal to 0.5. The individual densities have fallen and the joint density is only L  p  p(4) p(6) L 

45 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION The joint density has now been plotted as a function of  in the lower diagram. You can see that in this example it is greatest for  equal to 1.  p(4) p(6) L L p  

46 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION We will now look at this mathematically, starting with the probability density function for X given  and .

47 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION The joint density function for the sample of n observations is given by the second line.

48 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION As before, we can re-interpret this function as the likelihood function for  and , given the sample of observations.

49 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION We will find the values of  and  that maximize this function. We will do this indirectly by maximizing log L.

50 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION We can decompose the logarithm as shown. To maximize it, we will set the partial derivatives with respect to  and  equal to zero.

51 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION When differentiating with respect to , the first two terms disappear. We have already seen how to differentiate the other terms.

52 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION Setting the first differential equal to 0, the maximum likelihood estimate of  is the sample mean, as before.

53 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION Next, we take the partial differential of the log-likelihood function with respect to .

54 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION Before doing so, it is convenient to rewrite the equation.

55 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION The derivative of log  with respect to  is 1/ . The derivative of  --2 is –2  --3.

56 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION Setting the first derivative of log L to zero gives us a condition that must be satisfied by the maximum likelihood estimator.

57 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION We have already demonstrated that the maximum likelihood estimator of  is the sample mean.

58 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION Hence the maximum likelihood estimator of the population variance is the mean square deviation of X.

59 INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION Note that it is biased. The unbiased estimator is obtained by dividing by n – 1, not n.

INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION However it can be shown that the maximum likelihood estimator is asymptotically efficient, in the sense of having a smaller mean square error than the unbiased estimator in large samples. 60

Copyright Christopher Dougherty These slideshows may be downloaded by anyone, anywhere for personal use. Subject to respect for copyright and, where appropriate, attribution, they may be used as a resource for teaching an econometrics course. There is no need to refer to the author. The content of this slideshow comes from Section 10.6 of C. Dougherty, Introduction to Econometrics, fourth edition 2011, Oxford University Press. Additional (free) resources for both students and instructors may be downloaded from the OUP Online Resource Centre Individuals studying econometrics on their own and who feel that they might benefit from participation in a formal course should consider the London School of Economics summer school course EC212 Introduction to Econometrics or the University of London International Programmes distance learning course 20 Elements of Econometrics