458 Fitting models to data – II (The Basics of Maximum Likelihood Estimation) Fish 458, Lecture 9.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

The Maximum Likelihood Method
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Chapter 10 Curve Fitting and Regression Analysis
SOLVED EXAMPLES.
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
The Simple Linear Regression Model: Specification and Estimation
Copyright © 2008 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Managerial Economics, 9e Managerial Economics Thomas Maurice.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Chapter 10 Simple Regression.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
458 More on Model Building and Selection (Observation and process error; simulation testing and diagnostics) Fish 458, Lecture 15.
458 Fitting models to data – IV (Yet more on Maximum Likelihood Estimation) Fish 458, Lecture 11.
458 Lumped population dynamics models Fish 458; Lecture 2.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Topic 2: Statistical Concepts and Market Returns
Chapter 11 Multiple Regression.
Computer vision: models, learning and inference
Chi Square Distribution (c2) and Least Squares Fitting
458 Fitting models to data – III (More on Maximum Likelihood Estimation) Fish 458, Lecture 10.
458 Fitting models to data – I (Sum of Squares) Fish 458, Lecture 7.
Relationships Among Variables
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Correlation and Linear Regression
Regression Analysis (2)
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
The Triangle of Statistical Inference: Likelihoood
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University 15 October 2010 Lecture 05: Statistical Analysis Tools.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference.
Likelihood Methods in Ecology November 16 th – 20 th, 2009 Millbrook, NY Instructors: Charles Canham and María Uriarte Teaching Assistant Liza Comita.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Statistical Estimation Vasileios Hatzivassiloglou University of Texas at Dallas.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Machine Learning 5. Parametric Methods.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
MathematicalMarketing Slide 5.1 OLS Chapter 5: Ordinary Least Square Regression We will be discussing  The Linear Regression Model  Estimation of the.
Richard Kass/F02P416 Lecture 6 1 Lecture 6 Chi Square Distribution (  2 ) and Least Squares Fitting Chi Square Distribution (  2 ) (See Taylor Ch 8,
Review. Common probability distributions Discrete: binomial, Poisson, negative binomial, multinomial Continuous: normal, lognormal, beta, gamma, (negative.
Virtual University of Pakistan Lecture No. 34 of the course on Statistics and Probability by Miss Saleha Naghmi Habibullah.
Probability and Likelihood. Likelihood needed for many of ADMB’s features Standard deviation Variance-covariance matrix Profile likelihood Bayesian MCMC.
Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION Statistical Interpretation of Least Squares ASEN.
MathematicalMarketing Slide 3c.1 Mathematical Tools Chapter 3: Part c – Parameter Estimation We will be discussing  Nonlinear Parameter Estimation  Maximum.
Data Modeling Patrice Koehl Department of Biological Sciences
Chapter 4: Basic Estimation Techniques
Chapter 4 Basic Estimation Techniques
Probability Theory and Parameter Estimation I
Basic Estimation Techniques
Chapter 4. Inference about Process Quality
CH 5: Multivariate Methods
The Maximum Likelihood Method
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
The Maximum Likelihood Method
Basic Estimation Techniques
CHAPTER 29: Multiple Regression*
Simple Linear Regression
Presentation transcript:

458 Fitting models to data – II (The Basics of Maximum Likelihood Estimation) Fish 458, Lecture 9

458 The Principle of ML Estimation We wish to select the values for the parameters so that the probability that the model generated (is responsible for) the data is a high as possible. Taken another way: if we have two candidate sets of parameters and the probability that one generated the data is ten times the other, we would naturally prefer the former. OK, so how to we define this probability.

458 The Likelihood Function What we need to compute is the likelihood function: If we have a discrete set of hypotheses / set of parameter vectors, then

458 A First Example We observe Y=6 and know that the observation process is based on the equation: Given Y=6, the likelihood function is normal :

458 A First Example - II Y=6 Y=4 Note: the parameter and not the data; we are given the data

458 Multiple Data Sources If we have multiple data sources (CPUE and survey data for Cape Hake), we can establish a likelihood for each data source. The likelihood for the two data sources combined is the product of the likelihoods for each data source: Note: We often work with the logarithm of the likelihood function, i.e.:

458 Likelihood Estimation Identify the questions. Identity the data sources. Select alternative models. Select appropriate likelihood functions for each data source. Find the values for the parameters that maximize the likelihood function (hence Maximum Likelihood Estimation).

458 Finding the Maximum Likelihood Estimates The best estimate is 6, because this value of  leads to the maximum likelihood

458 Therefore…. We need to know which probability density functions to use for which data types. The probability distributions encountered most commonly are: 1.Normal / multivariate normal 2.t 3.Log-normal 4.Poisson 5.Negative binomial 6.Beta 7.Binomial / multinomial You need to know when to use each distribution and its functional form (up to any normalizing constants).

458 The Normal and t-distributions The density functions for the normal and t- distributions are:  is the mean  is the standard deviation ( for the t) k is the degrees of freedom. We use these distributions when the data are the sum of terms. The t-distribution allows account to be taken of small sample sizes (  <30).

458 The Normal and t-distributions

458 Let us say we wish to fit the model assuming normally distributed errors, i.e. Key Point with Normal Likelihood The likelihood function is therefore: Taking logarithms and multiplying by -1 gives: This is implies that if you assume normally-distributed errors, the answers will be identical to those from least squares.

458 Time for an Example! We wish to fit the Dynamic Schaefer model to the bowhead census data. q is assumed to be 1 here because the surveys provide absolute indices of abundance. We have information on the trend in abundance from (increase of 3.2% per annum (SD 0.76%) based on 8 data points). We have an estimate of abundance for 1993 of 7800 (SD 564).

458 How to Deal with this Example! The model : The likelihood function is the product of a normal likelihood (for the abundance estimate) and a t- likelihood (for the trend). Ignoring constants independent of the model parameters: We take logs, multiply by minus one and minimize to find the estimates for K and r. Note that we can ignore any constants – why? The t-distribution is chosen for the slope – why?

458 The Outcome B 1993 =7710 Slope =2.95%

458 The Lognormal distribution The density function :  is the median (not the mean)  is the standard deviation of the logarithm (approximately the coefficient of variation of x). The lognormal distribution is used extensively in fisheries assessments because x is always larger than zero – this is true for most data sources (CPUE, survey indices, estimates of death rates, etc.)

458 The Multivariate Normal-I The density function: is the vector of means. is the variance-covariance matrix. d is the length of the vector. This isn’t nearly as bad as it looks.

458 The Multivariate Normal-II We use the multivariate normal when the data points are correlated (e.g. surveys with common correction factors). For example for bowheads:

458 Readings Hilborn and Mangel (1997); Chapter 7 Haddon (2001), Chapter 4