Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE)

Slides:

Advertisements

Similar presentations

What Could We Do better? Alternative Statistical Methods Jim Crooks and Xingye Qiao.

Advertisements

Tests of Static Asset Pricing Models

Probit The two most common error specifications yield the logit and probit models. The probit model results if the are distributed as normal variates,

The Simple Regression Model

1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.

Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests

Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.

Instrumental Variables Estimation and Two Stage Least Square

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.

The General Linear Model. The Simple Linear Model Linear Regression.

Visual Recognition Tutorial

Cross section and panel method

Binary Response Lecture 22 Lecture 22.

Maximum likelihood (ML) and likelihood ratio (LR) test

Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.

Chapter 3 Simple Regression. What is in this Chapter? This chapter starts with a linear regression model with one explanatory variable, and states the.

Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.

Today Today: Chapter 9 Assignment: Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.

Maximum likelihood (ML)

Chapter 4 Multiple Regression.

Maximum likelihood (ML) and likelihood ratio (LR) test

Today Today: Chapter 9 Assignment: 9.2, 9.4, 9.42 (Geo(p)=“geometric distribution”), 9-R9(a,b) Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.

Visual Recognition Tutorial

Linear and generalised linear models

Lecture 14-2 Multinomial logit (Maddala Ch 12.2)

Copyright © Cengage Learning. All rights reserved. 6 Point Estimation.

Linear and generalised linear models

Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.

Maximum likelihood (ML)

Lecture 16 Duration analysis: Survivor and hazard function estimation

9. Binary Dependent Variables 9.1 Homogeneous models –Logit, probit models –Inference –Tax preparers 9.2 Random effects models 9.3 Fixed effects models.

Lecture 14-1 (Wooldridge Ch 17) Linear probability, Probit, and

Lecture 15 Tobit model for corner solution

Random Sampling, Point Estimation and Maximum Likelihood.

1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.

Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.

Gu Yuxian Wang Weinan Beijing National Day School.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.

Issues in Estimation Data Generating Process:

PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.

HMM - Part 2 The EM algorithm Continuous density HMM.

1 Standard error Estimated standard error,s,. 2 Example 1 While measuring the thermal conductivity of Armco iron, using a temperature of 100F and a power.

Lecture 2: Statistical learning primer for biologists

Week 41 How to find estimators? There are two main methods for finding estimators: 1) Method of moments. 2) The method of Maximum likelihood. Sometimes.

M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.

Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,

Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.

Review of statistical modeling and probability theory Alan Moses ML4bio.

Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.

G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.

Computacion Inteligente Least-Square Methods for System Identification.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.

Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION Statistical Interpretation of Least Squares ASEN.

Data Modeling Patrice Koehl Department of Biological Sciences

The simple linear regression model and parameter estimation

Lecture 15 Tobit model for corner solution

12. Principles of Parameter Estimation

Probability Theory and Parameter Estimation I

Classification of unlabeled data:

Distribution of the Sample Means

Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests

Estimation Maximum Likelihood Estimates Industrial Engineering

CONCEPTS OF ESTIMATION

Mathematical Foundations of BME Reza Shadmehr

Estimation Maximum Likelihood Estimates Industrial Engineering

12. Principles of Parameter Estimation

Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.

Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.

Presentation transcript:

Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE) Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE)

Basic idea Maximum likelihood estimation (MLE) is a method to find the most likely density function that would have generated the data. Thus, MLE requires you to make a distributional assumption first. This handout provides you with an intuition behind the MLE using examples.

Example 1 Let me explain the basic idea of MLE using this data. Let us make an assumption that the variable X follows normal distribution. Remember that the density function of normal distribution with mean μ and variance σ2 is given by: Id X 1 2 4 3 5 6 9

The data is plotted on the horizontal line. Now, ask yourself the following question. “Which distribution, A or B, is more likely to have generated the data?” Id X 1 2 4 3 5 6 9 A B 1 4 5 6 9

Answer to the question is A, because the data are clustered around the center of the distribution A, but not around the center of the distribution B. This example illustrates that, by looking at the data, it is possible to find the distribution that is most likely to have generated the data. Now, I will explain exactly how to find the distribution in practice.

The illustration of the estimation procedure. MLE starts with computing the likelihood contribution of each observation. The likelihood contribution is the height of the density function. We use Li to denote the likelihood contribution of ith observation.

Graphical illustration of the likelihood contribution Data value The likelihood contribution of the first observation = Id X 1 2 4 3 5 6 9 A 1 4 5 6 9

Then, you multiply the likelihood contributions of all the observations. This is called the likelihood function. We use the notation L. In our example, n=5. This notation means you multiply from i=1 through n.

In our example, the likelihood function looks like: I wrote L(μ,σ) to emphasize that the likelihood function depends on these parameters. Id Y 1 2 4 3 5 6 9

Then you find the values of μ and σ that maximize the likelihood function. The values of μ and σ which are obtained this way are called the Maximum Likelihood Estimators of μ and σ. Most of the MLE cannot be solved ‘by hand’. Thus, you need to write an iterative procedure to solve it on computer.

Fortunately, there are many optimization computer programs that can do this. Most common programs among Economists are GQOPT. This program runs on FORTRAN. Thus, you need to write a FORTRAN program. Even more fortunately, many of the models that requires MLE (like Probit or Logit models) can be estimated automatically on STATA. However, it is necessary for you to understand the basic idea of MLE in order to understand what STATA does.

Example 2 Example 1 was the simplest case. We are usually interested in estimating a model like y=β0+β1x+u. Estimating such a model can be done using MLE.

Suppose that you have this data, and you are interested in estimating the model: y=β0+β1x+u Let us make an assumption that u follows the normal distribution with mean 0 and variance σ2. Id Y X 1 2 6 4 3 7 5 9 15

You can write the model as: u=y-(β0+β1x) This means that y-(β0+β1x) follows the normal distribution with with mean 0 and variance σ2. The likelihood contribution of each person is the height of the density function at the data point (y-β0+β1x).

For example, the likelihood contribution of the 2nd observation is given by Data point The likelihood contribution of the 2nd observation = Id Y X 1 2 6 4 3 7 5 9 15 2-β0-β1 15-β0-9β1 6-β0-4β1 7-β0-5β1 9-β0-6β1

Then the likelihood function is given by Id Y X 1 2 6 4 3 7 5 9 15 The likelihood function is a function of β0,β1, and σ.

You choose the values of β0,β1, and σ that maximizes the likelihood function. These are the maximum likelihood estimators of of β0,β1, and σ . Again, maximization can be easily done using GQOPT or any other programs that have the optimization programs (like Matlab).

Example 3 Consider the following model. y*=β0+β1x+u Sometimes, we only know whether y*≥0 or not.

The data contain a variable Y which is either 0 or 1. If Y=1, it means that y*≥0 If Y=0, it means that y*<0 Id Y X 1 2 4 3 5 6 9

Then, what is the likelihood contribution of each observation Then, what is the likelihood contribution of each observation? In this case, we only know if y* ≥0 or y*<0. We do not know the exact value of y* . In such case, we use the probability that y* ≥0 or y*<0 as the likelihood contribution. Now, let’s make an assumption that u follows the standard normal distribution (normal distribution with mean 0 and variance 1.)

Thus, the likelihood contribution is Take 2nd observation as an example. Since Y=0 for this observation, we know y*<0 Thus, the likelihood contribution is Id Y X 1 2 4 3 5 6 9 L2 -β0-β1 -β0-9β1 -β0-4β1 -β0-5β1 -β0-6β1

Thus, the likelihood contribution is Now, take 3nd observation as an example. Since Y=1 for this observation, we know y*≥0 Thus, the likelihood contribution is Id Y X 1 2 4 3 5 6 9 L3 -β0-β1 -β0-9β1 -β0-4β1 -β0-5β1 -β0-6β1

Thus, the likelihood function has the following complicated form. Id Y X 1 2 4 3 5 6 9

You choose the values of β0 and β1 that maximizes the likelihood function. These are the maximum likelihood estimators of of β0 and β1 .

Procedure of the MLE Compute the likelihood contribution of each observation: Li for i=1…n Multiply all the likelihood contribution to form the likelihood function L. Maximize L by choosing the values of the parameters. The values of parameters that maximizes L is the maximum likelihood estimators of the parameters.

The log likelihood function It is usually easier to maximize the natural log of the likelihood function than the likelihood function itself.

The standard errors in MLE This is usually an advanced topic. However, it is useful to know how the standard errors are computed in MLE, since we use it for t-tests.

The score vector is the first derivative of the log likelihood function with respect to the parameters Let θ be a column vector of the parameters. In Example 2, θ=(β0,β1,σ)’. Then the score vector q is given by

Then, the standard errors of the parameters are given by the square root of the diagonal elements of the following matrix.