Nonlinear Regression Ecole Nationale Vétérinaire de Toulouse Didier Concordet ECVPT Workshop April 2011 Can be downloaded at

Slides:



Advertisements
Similar presentations
Ecole Nationale Vétérinaire de Toulouse Linear Regression
Advertisements

An introduction to population kinetics Didier Concordet NATIONAL VETERINARY SCHOOL Toulouse.
Econometric Modeling Through EViews and EXCEL
The Simple Regression Model
CHAPTER 3: TWO VARIABLE REGRESSION MODEL: THE PROBLEM OF ESTIMATION
Brief introduction on Logistic Regression
CHAPTER 13 M ODELING C ONSIDERATIONS AND S TATISTICAL I NFORMATION “All models are wrong; some are useful.”  George E. P. Box Organization of chapter.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Visual Recognition Tutorial
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.1 CorrelationCorrelation The underlying principle of correlation analysis.
The Simple Linear Regression Model: Specification and Estimation
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Curve-Fitting Regression
Generalized Regression Model Based on Greene’s Note 15 (Chapter 8)
Nonlinear Regression Probability and Statistics Boris Gervits.
7. Least squares 7.1 Method of least squares K. Desch – Statistical methods of data analysis SS10 Another important method to estimate parameters Connection.
Linear and generalised linear models
Maximum likelihood (ML)
Correlation and Regression Analysis
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Collaborative Filtering Matrix Factorization Approach
Objectives of Multiple Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Introduction to Linear Regression and Correlation Analysis
Simple Linear Regression Models
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
Model Inference and Averaging
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Summarizing Bivariate Data
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Chapter Three TWO-VARIABLEREGRESSION MODEL: THE PROBLEM OF ESTIMATION
Copyright © 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin The Two-Variable Model: Hypothesis Testing chapter seven.
Maximum Likelihood Estimation Psych DeShon.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
EC 532 Advanced Econometrics Lecture 1 : Heteroscedasticity Prof. Burak Saltoglu.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
KNN Ch. 3 Diagnostics and Remedial Measures Applied Regression Analysis BUSI 6220.
Variations on Backpropagation.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and  2 Now, we need procedures to calculate  and  2, themselves.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Numerical Analysis – Data Fitting Hanyang University Jong-Il Park.
Computacion Inteligente Least-Square Methods for System Identification.
Maximum Likelihood. Much estimation theory is presented in a rather ad hoc fashion. Minimising squared errors seems a good idea but why not minimise the.
Regularized Least-Squares and Convex Optimization.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Estimation Econometría. ADE.. Estimation We assume we have a sample of size T of: – The dependent variable (y) – The explanatory variables (x 1,x 2, x.
Presentation : “ Maximum Likelihood Estimation” Presented By : Jesu Kiran Spurgen Date :
CWR 6536 Stochastic Subsurface Hydrology Optimal Estimation of Hydrologic Parameters.
Estimator Properties and Linear Least Squares
Chapter 4 Basic Estimation Techniques
Deep Feedforward Networks
Chapter 3: TWO-VARIABLE REGRESSION MODEL: The problem of Estimation
Chapter 2 Minimum Variance Unbiased estimation
Modelling data and curve fitting
Collaborative Filtering Matrix Factorization Approach
Linear regression Fitting a straight line to observations.
OVERVIEW OF LINEAR MODELS
6.5 Taylor Series Linearization
OVERVIEW OF LINEAR MODELS
Generally Discriminant Analysis
Parametric Methods Berlin Chen, 2005 References:
Chengyuan Yin School of Mathematics
Presentation transcript:

Nonlinear Regression Ecole Nationale Vétérinaire de Toulouse Didier Concordet ECVPT Workshop April 2011 Can be downloaded at

2 An example

3 Questions What does nonlinear mean ? –What is a nonlinear kinetics ? –What is a nonlinear statistical model ? For a given model, how to fit the data ? Is this model relevant ?

4 What does nonlinear mean ? Definition : An operator (P) is linear if : for all objects x, y on which it operates P(x+y) = P (x) + P(y) for all numbers  and all objects x P (  x) =  P(x) When an operator is not linear, it is nonlinear

5 Examples P (t) = a  t P(t) = a P(t) = a + b  t P(t) = a  t + b  t² Among the operators below which one are nonlinear ? P(a,b) = a  t + b  t² P(A,  ) = A exp (-  t) P(A) = A exp (- 0.1 t) P(t) = A exp (-  t)

6 What is a nonlinear kinetics ? For a given dose D Concentration at time t, C(t,D) The kinetics is linear when the operator : is linear When P(D) is not linear, the kinetics is nonlinear

7 What is a nonlinear kinetics ? Examples :

8 What is a nonlinear statistical model ? A statistical model Observation : Dep. variable Parameters Covariates : indep. variables Error : residual function

9 What is a nonlinear statistical model ? A statistical model is linear when the operator : is linear. When is not linear the model is nonlinear

10 What is a nonlinear statistical model ? Example : Y = Concentration t = time The model : is linear

11 Examples Among the statistical models below which one are nonlinear ?

12 Questions What does nonlinear mean ? –What is a nonlinear kinetics ? –What is a nonlinear statistical model ? For a given model, how to fit the data ? Is this model relevant ?

13 How to fit the data ? Write a (statistical) model Choose a criterion Minimize the criterion Proceed in three main steps

14 Write a (statistical) model Find a function of covariate(s) to describe the mean variation of the dependent variable (mean model). Find a function of covariate(s) to describe the dispersion of the dependent variable about the mean (variance model).

15 Example is assumed gaussian with a constant variance homoscedastic model

16 How to choose the criterion to optimize ? Homoscedasticity : Ordinary Least Squares (OLS) When normality OLS are equivalent to maximum likelihood Heteroscedasticity: Weight Least Squares (WLS) Extended Least Squares (ELS)

17 Homoscedastic models Define : The Ordinary Least-Squares criterion

18 Heteroscedastic models : Weight Least-Squares criterion Define :

19 How to choose the weights ? When the model is heteroscedastic (ie is not constant with i) It is possible to rewrite it as where does not depend on i The weights are chosen as

20 Example with The model can be rewritten as with The weights are chosen as

21 Extended (Weight) Least Squares Define :

22 Balance sheet

23 The criterion properties It converges It leads to consistent (unbiased) estimates It leads to efficient estimates It has several minima

24 It converges When the sample size increases, it concentrates about a value of the parameter Example : Consider the homoscedastic model The criterion to use is the Least Squares criterion

25 It converges Small sample size Large sample size

26 It leads to consistent estimates The criterion concentrates about the true value

27 It leads to efficient estimates For a fixed n, the variance of an consistent estimator is always greater than a limit (Cramer-Rao lower bound). For a fixed n, the "precision" of a consistent estimator is bounded An estimator is efficient when its variance equals this lower bound

28 Geometric interpretation criterion This ellipsoid is a confidence region of the parameter

29 It leads to efficient estimates For a given large n, it does not exist a criterion giving consistent estimates more "convex" than - 2 ln(likelihood) - 2 ln(likelihood) criterion

30 It has several minima criterion

31 Minimize the criterion Suppose that the criterion to optimize has been chosen We are looking for the value ofdenoted which achieve the minimum of the criterion. We need an algorithm to minimize such a criterion

32 Example Consider the homoscedastic model We are looking for the value ofdenoted which achieve the minimumof the criterion

33 Isocontours

34 Different families of algorithms Zero order algorithms : computation of the criterion First order algorithms : computation of the first derivative of the criterion Second order algorithms : computation of the second derivative of the criterion

35 Zero order algorithms Simplex algorithm Grid search and Monte-Carlo methods

36 Simplex algorithm

37 Monte-carlo algorithm

38 First order algorithms Line search algorithm Conjugate gradient

39 First order algorithms The derivatives of the criterion cancel at its optima Suppose that there is only one parameter to estimate The criterion (e.g. SS) depends only on How to find the value(s) of where the criterion cancels ?

40 Line search algorithm Derivative of the criterion   1 2

41 Second order algorithms Gauss-Newton (steepest descent method) Marquardt

42 Second order algorithms The derivatives of the criterion cancel at its optima. When the criterion is (locally) convex there is a path to reach the minimum : the steepest direction.

43 Gauss Newton (one dimension) Derivative of the criterion   The criterion is convex

44 Gauss Newton (one dimension) Derivative of the criterion   The criterion is not convex 1 2

45 Gauss Newton

46 Marquardt Derivative of the criterion   Allows to deal with the case where the criterion is not convex 12 When the second derivative <0 (first derivative decreases) it is set to a positive value 3

47 Balance sheet

48 Questions What does nonlinear mean ? –What is a nonlinear kinetics ? –What is a nonlinear statistical model ? For a given model, how to fit the data ? Is this model relevant ?

49 Is this model relevant ? Graphical inspection of the residuals –mean model ( f ) –variance model ( g ) Inspection of numerical results –variance-correlation matrix of the estimator – Akaike indice

50 Graphical inspection of the residuals For the model Calculate the weight residuals : and draw vs

51 Check the mean model scatterplot of weight residuals vs fitted values 0 No structure in the residuals OK 0 structure in the residuals change the mean model (f function)

52 Check the variance model : homoscedasticity Scatterplot of weight residuals vs fitted values 0 homoscedasticity OK No structure in the residuals but heteroscedasticity change the model (g function) 0

53 Example homoscedastic model Criterion : OLS

54 Example structure in the residuals change the mean model New model homoscedastic model

55 Example heteroscedasticity change the variance model New model Need WLS

56 Example No structure Weight residuals homoscedastic OK

57 Inspection of numerical results correlation matrix of the estimator Strong correlations between estimators : the model is over-parametrized the parametrization is not good the model is not identifiable

58 The model is over-parametrized Change the mean and/or variance model (f and/or g ) Example : The appropriate model is and you fitted Perform a test or check the AIC

59 The parametrization is not good Change the parametrization of your model Example : you fitted try Two useful indices : the parametric curvature the intrinsic curvature

60 The model is not identifiable The model has too many parameters compare to the number of data : there are lots of solutions to the optimisation Examples : criterion Look at the eigenvalues of the correlation matrix if is too large and/ortoo small, simplify the model

61 The Akaike indice The Akaike indice allows to select a model among several models in "competition". The Akaike indice is nothing else but the penalized log likelihood. That is, it chooses the model which is the more likely. The penality is chosen such that the indice is convergent : when the sample size increases, the indice selects the "true" model. n = sample size, SS = (Weight or Ordinary) SS p = number of parameters that have been estimated The model with the smaller AIC is the best among the compared models

62 Example IterationLoss

63 Example R = essentially intrinsic curvature

64 About the ellipsoid It is linked to the convexity of the criterion It is linked to the variance of the estimator The convexity of the criterion is linked to the variance of the estimator

65 Different degres of convexity flat criterion weakly convex convex criterion locally convex convex in some directions locally convex

66 How to measure convexity ? Calculate the hessian matrix matrix of partial second derivatives When the second derivative is positive, the criterion is convex at the point where the second derivative is evaluated One parameter Several parameters

67 How to measure convexity ? It is possible to find a linear transformation of the parameters such that the hessian matrix is are the eigenvalues of the hessian matrix When for all, and the criterion is convex

68 How to measure convexity ? When for some, and the criterion is locally convex What is the point for which and ? When and are low (but >0), the criterion is flat

69 The variance-covariance matrix The variance-covariance matrix of the estimator (denoted V) is proportional to It is possible to find a linear transformation of the parameters such that V is

70 The variance-covariance matrix are the eigenvalues of the variance-covariance matrix V

71 The correlation matrix The correlation matrix of the estimator (denoted C ) is obtained from V correlation matrix

72 Geometric interpretation criterion r = 0 Axes of the ellipsoid // axes