The Maximum Likelihood Method

Slides:

Advertisements

Similar presentations

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.

Advertisements

Estimation of Means and Proportions

Probit The two most common error specifications yield the logit and probit models. The probit model results if the are distributed as normal variates,

General Linear Model With correlated error terms  =  2 V ≠  2 I.

Statistical Techniques I EXST7005 Simple Linear Regression.

Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.

CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.

1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.

6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.

Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester

The General Linear Model. The Simple Linear Model Linear Regression.

Maximum likelihood (ML) and likelihood ratio (LR) test

Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.

The Simple Regression Model

Lecture 8 The Principle of Maximum Likelihood. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.

Linear and generalised linear models

Computer vision: models, learning and inference

Chi Square Distribution (c2) and Least Squares Fitting

Linear and generalised linear models

Maximum likelihood (ML)

Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;

Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.

Introduction l Example: Suppose we measure the current (I) and resistance (R) of a resistor. u Ohm's law relates V and I: V = IR u If we know the uncertainties.

880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.

STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)

880.P20 Winter 2006 Richard Kass 1 Maximum Likelihood Method (MLM) Does this procedure make sense? The MLM answers this question and provides a method.

Model Inference and Averaging

Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.

Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.

R. Kass/W03P416/Lecture 7 1 Lecture 7 Some Advanced Topics using Propagation of Errors and Least Squares Fitting Error on the mean (review from Lecture.

Physics 114: Exam 2 Review Lectures 11-16

R Kass/SP07 P416 Lecture 4 1 Propagation of Errors ( Chapter 3, Taylor ) Introduction Example: Suppose we measure the current (I) and resistance (R) of.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

Lab 3b: Distribution of the mean

Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.

Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.

Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.

Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.

Economics 173 Business Statistics Lecture 10 Fall, 2001 Professor J. Petry

Machine Learning 5. Parametric Methods.

Maximum likelihood estimators Example: Random data X i drawn from a Poisson distribution with unknown  We want to determine  For any assumed value of.

R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.

CHAPTER 4 ESTIMATES OF MEAN AND ERRORS. 4.1 METHOD OF LEAST SQUARES I n Chapter 2 we defined the mean  of the parent distribution and noted that the.

MathematicalMarketing Slide 5.1 OLS Chapter 5: Ordinary Least Square Regression We will be discussing  The Linear Regression Model  Estimation of the.

Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.

Richard Kass/F02P416 Lecture 6 1 Lecture 6 Chi Square Distribution (  2 ) and Least Squares Fitting Chi Square Distribution (  2 ) (See Taylor Ch 8,

Conditional Expectation

Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.

R. Kass/Sp07P416/Lecture 71 More on Least Squares Fit (LSQF) In Lec 5, we discussed how we can fit our data points to a linear function (straight line)

Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION Statistical Interpretation of Least Squares ASEN.

Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.

STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.

MathematicalMarketing Slide 3c.1 Mathematical Tools Chapter 3: Part c – Parameter Estimation We will be discussing  Nonlinear Parameter Estimation  Maximum.

The simple linear regression model and parameter estimation

The Maximum Likelihood Method

Physics 114: Lecture 13 Probability Tests & Linear Fitting

Model Inference and Averaging

The Maximum Likelihood Method

Simple Linear Regression - Introduction

The Maximum Likelihood Method

Modelling data and curve fitting

Chi Square Distribution (c2) and Least Squares Fitting

Regression Models - Introduction

Lecture 4 Propagation of errors

5.2 Least-Squares Fit to a Straight Line

Simple Linear Regression

Parametric Methods Berlin Chen, 2005 References:

Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.

Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.

Regression Models - Introduction

Presentation transcript:

The Maximum Likelihood Method (Taylor: “Principle of maximum likelihood”) l Suppose we are trying to measure the true value of some quantity (xT). u We make repeated measurements of this quantity {x1, x2, … xn}. u The standard way to estimate xT from our measurements is to calculate the mean value: and set xT = mx. DOES THIS PROCEDURE MAKE SENSE??? The maximum likelihood method (MLM) answers this question and provides a general method for estimating parameters of interest from data. l Statement of the Maximum Likelihood Method u Assume we have made N measurements of x {x1, x2, … xn}. u Assume we know the probability distribution function that describes x: f(x, a). u Assume we want to determine the parameter a. (e.g. we want to determine the mean of a gaussian) MLM: pick a to maximize the probability of getting the measurements (the xi's) that we got! l How do we use the MLM? u The probability of measuring x1 is f(x1, a)dx u The probability of measuring x2 is f(x2, a)dx u The probability of measuring xn is f(xn, a)dx u If the measurements are independent, the probability of getting the measurements we got is: We can drop the dxn term as it is only a proportionality constant. R. Kass/S12 P416 Lecture 5

u We want to pick the a that maximizes L: u Often easier to maximize lnL. Both L and lnL are maximum at the same location. we maximize lnL rather than L itself because lnL converts the product into a summation. The new maximization condition is: n a could be an array of parameters (e.g. slope and intercept) or just a single variable. n equations to determine a range from simple linear equations to coupled non-linear equations. l Example: u Let f(x, a) be given by a Gaussian distribution function. u Let a = m be the mean of the Gaussian. We want to use our data+MLM to find the mean, m. u We want the best estimate of a from our set of n measurements {x1, x2, … xn}. u Let’s assume that s is the same for each measurement. u The likelihood function for this problem is: gaussian pdf R. Kass/S12 P416 Lecture 5

We want to find the a that maximizes the log likelihood function: u If s are different for each data point then a is just the weighted average: factor out s since it is a constant don’t forget the factor of n Average! The MLM method says that calculating the average is the best thing we can do in this situation. Weighted Average R. Kass/S12 P416 Lecture 5

Some general properties of the Maximum Likelihood Method l Example: Poisson Distribution u Let f(x, a) be given by a Poisson distribution. u Let a = m be the mean of the Poisson. u We want the best estimate of a from our set of n measurements {x1, x2, … xn}. u The likelihood function for this problem is: u Find a that maximizes the log likelihood function: Some general properties of the Maximum Likelihood Method J For large data samples (large n) the likelihood function, L, approaches a Gaussian distribution. J Maximum likelihood estimates are usually consistent. For large n the estimates converge to the true value of the parameters we wish to determine. J Maximum likelihood estimates are usually unbiased. For all sample sizes the parameter of interest is calculated correctly. J Maximum likelihood estimate is efficient: the estimate has the smallest variance. J Maximum likelihood estimate is sufficient: it uses all the information in the observations (the xi’s). J The solution from MLM is unique. L Bad news: we must know the correct probability distribution for the problem at hand! Average! R. Kass/S12 P416 Lecture 5

Maximum Likelihood Fit of Data to a Function (essentially Taylor Chapter 8) l Suppose we have a set of n measurements: u Assume each measurement error (s) is a standard deviation from a Gaussian pdf. u Assume that for each measured value y, there’s an x which is known exactly. u Suppose we know the functional relationship between the y’s and the x’s: n a, b...are parameters that we are trying determine from our data. MLM gives us a method to determine a, b... from our data. l Example: Fitting data points to a straight line. We want to determine the slope (b) and intercept (a). u Find a and b by maximizing the likelihood function L: two linear equations with two unknowns R. Kass/S12 P416 Lecture 5

These equations correspond to Taylor’s Eq. 8.10-8.12 For the sake of simplicity assume that all s’s are the same (s1= s2= si=s): We now have two equations that are linear in the unknowns a, b: in matrix form These equations correspond to Taylor’s Eq. 8.10-8.12 R. Kass/S12 P416 Lecture 5

l EXAMPLE: A trolley moves along a track at constant speed. Suppose the following measurements of the time vs. distance were made. From the data find the best value for the speed (v) of the trolley. u Our model of the motion of the trolley tells us that: The variables in this example are time (t) and distance (d) instead of x and y. Time t (seconds) 1.0 2.0 3.0 4.0 5.0 6.0 Distance d (mm) 11 19 33 40 49 61 u We want to find v, the slope (b) of the straight line describing the motion of the trolley. u We need to evaluate the sums listed in the above formula: best estimate of the speed d0 = 0.8 mm best estimate of the starting point R. Kass/S12 P416 Lecture 5

The MLM fit to the data for d=d0+vt The line in the above plot “best” represents our data. Not all the data points are "on" this line. The line minimizes the sum of squares of the deviations (d) between a line and our data (di): di=data-prediction= di-(d0+vti) → minimize: Σ[di2] → same as MLM! Often we call this technique “Least Squares” (LSQ). LSQ is more general than MLM but not always justified. R. Kass/S12 P416 Lecture 5