458 Fitting models to data – II (The Basics of Maximum Likelihood Estimation) Fish 458, Lecture 9
458 The Principle of ML Estimation We wish to select the values for the parameters so that the probability that the model generated (is responsible for) the data is a high as possible. Taken another way: if we have two candidate sets of parameters and the probability that one generated the data is ten times the other, we would naturally prefer the former. OK, so how to we define this probability.
458 The Likelihood Function What we need to compute is the likelihood function: If we have a discrete set of hypotheses / set of parameter vectors, then
458 A First Example We observe Y=6 and know that the observation process is based on the equation: Given Y=6, the likelihood function is normal :
458 A First Example - II Y=6 Y=4 Note: the parameter and not the data; we are given the data
458 Multiple Data Sources If we have multiple data sources (CPUE and survey data for Cape Hake), we can establish a likelihood for each data source. The likelihood for the two data sources combined is the product of the likelihoods for each data source: Note: We often work with the logarithm of the likelihood function, i.e.:
458 Likelihood Estimation Identify the questions. Identity the data sources. Select alternative models. Select appropriate likelihood functions for each data source. Find the values for the parameters that maximize the likelihood function (hence Maximum Likelihood Estimation).
458 Finding the Maximum Likelihood Estimates The best estimate is 6, because this value of leads to the maximum likelihood
458 Therefore…. We need to know which probability density functions to use for which data types. The probability distributions encountered most commonly are: 1.Normal / multivariate normal 2.t 3.Log-normal 4.Poisson 5.Negative binomial 6.Beta 7.Binomial / multinomial You need to know when to use each distribution and its functional form (up to any normalizing constants).
458 The Normal and t-distributions The density functions for the normal and t- distributions are: is the mean is the standard deviation ( for the t) k is the degrees of freedom. We use these distributions when the data are the sum of terms. The t-distribution allows account to be taken of small sample sizes ( <30).
458 The Normal and t-distributions
458 Let us say we wish to fit the model assuming normally distributed errors, i.e. Key Point with Normal Likelihood The likelihood function is therefore: Taking logarithms and multiplying by -1 gives: This is implies that if you assume normally-distributed errors, the answers will be identical to those from least squares.
458 Time for an Example! We wish to fit the Dynamic Schaefer model to the bowhead census data. q is assumed to be 1 here because the surveys provide absolute indices of abundance. We have information on the trend in abundance from (increase of 3.2% per annum (SD 0.76%) based on 8 data points). We have an estimate of abundance for 1993 of 7800 (SD 564).
458 How to Deal with this Example! The model : The likelihood function is the product of a normal likelihood (for the abundance estimate) and a t- likelihood (for the trend). Ignoring constants independent of the model parameters: We take logs, multiply by minus one and minimize to find the estimates for K and r. Note that we can ignore any constants – why? The t-distribution is chosen for the slope – why?
458 The Outcome B 1993 =7710 Slope =2.95%
458 The Lognormal distribution The density function : is the median (not the mean) is the standard deviation of the logarithm (approximately the coefficient of variation of x). The lognormal distribution is used extensively in fisheries assessments because x is always larger than zero – this is true for most data sources (CPUE, survey indices, estimates of death rates, etc.)
458 The Multivariate Normal-I The density function: is the vector of means. is the variance-covariance matrix. d is the length of the vector. This isn’t nearly as bad as it looks.
458 The Multivariate Normal-II We use the multivariate normal when the data points are correlated (e.g. surveys with common correction factors). For example for bowheads:
458 Readings Hilborn and Mangel (1997); Chapter 7 Haddon (2001), Chapter 4