Nonlinear Regression Ecole Nationale Vétérinaire de Toulouse Didier Concordet ECVPT Workshop April 2011 Can be downloaded at
2 An example
3 Questions What does nonlinear mean ? –What is a nonlinear kinetics ? –What is a nonlinear statistical model ? For a given model, how to fit the data ? Is this model relevant ?
4 What does nonlinear mean ? Definition : An operator (P) is linear if : for all objects x, y on which it operates P(x+y) = P (x) + P(y) for all numbers and all objects x P ( x) = P(x) When an operator is not linear, it is nonlinear
5 Examples P (t) = a t P(t) = a P(t) = a + b t P(t) = a t + b t² Among the operators below which one are nonlinear ? P(a,b) = a t + b t² P(A, ) = A exp (- t) P(A) = A exp (- 0.1 t) P(t) = A exp (- t)
6 What is a nonlinear kinetics ? For a given dose D Concentration at time t, C(t,D) The kinetics is linear when the operator : is linear When P(D) is not linear, the kinetics is nonlinear
7 What is a nonlinear kinetics ? Examples :
8 What is a nonlinear statistical model ? A statistical model Observation : Dep. variable Parameters Covariates : indep. variables Error : residual function
9 What is a nonlinear statistical model ? A statistical model is linear when the operator : is linear. When is not linear the model is nonlinear
10 What is a nonlinear statistical model ? Example : Y = Concentration t = time The model : is linear
11 Examples Among the statistical models below which one are nonlinear ?
12 Questions What does nonlinear mean ? –What is a nonlinear kinetics ? –What is a nonlinear statistical model ? For a given model, how to fit the data ? Is this model relevant ?
13 How to fit the data ? Write a (statistical) model Choose a criterion Minimize the criterion Proceed in three main steps
14 Write a (statistical) model Find a function of covariate(s) to describe the mean variation of the dependent variable (mean model). Find a function of covariate(s) to describe the dispersion of the dependent variable about the mean (variance model).
15 Example is assumed gaussian with a constant variance homoscedastic model
16 How to choose the criterion to optimize ? Homoscedasticity : Ordinary Least Squares (OLS) When normality OLS are equivalent to maximum likelihood Heteroscedasticity: Weight Least Squares (WLS) Extended Least Squares (ELS)
17 Homoscedastic models Define : The Ordinary Least-Squares criterion
18 Heteroscedastic models : Weight Least-Squares criterion Define :
19 How to choose the weights ? When the model is heteroscedastic (ie is not constant with i) It is possible to rewrite it as where does not depend on i The weights are chosen as
20 Example with The model can be rewritten as with The weights are chosen as
21 Extended (Weight) Least Squares Define :
22 Balance sheet
23 The criterion properties It converges It leads to consistent (unbiased) estimates It leads to efficient estimates It has several minima
24 It converges When the sample size increases, it concentrates about a value of the parameter Example : Consider the homoscedastic model The criterion to use is the Least Squares criterion
25 It converges Small sample size Large sample size
26 It leads to consistent estimates The criterion concentrates about the true value
27 It leads to efficient estimates For a fixed n, the variance of an consistent estimator is always greater than a limit (Cramer-Rao lower bound). For a fixed n, the "precision" of a consistent estimator is bounded An estimator is efficient when its variance equals this lower bound
28 Geometric interpretation criterion This ellipsoid is a confidence region of the parameter
29 It leads to efficient estimates For a given large n, it does not exist a criterion giving consistent estimates more "convex" than - 2 ln(likelihood) - 2 ln(likelihood) criterion
30 It has several minima criterion
31 Minimize the criterion Suppose that the criterion to optimize has been chosen We are looking for the value ofdenoted which achieve the minimum of the criterion. We need an algorithm to minimize such a criterion
32 Example Consider the homoscedastic model We are looking for the value ofdenoted which achieve the minimumof the criterion
33 Isocontours
34 Different families of algorithms Zero order algorithms : computation of the criterion First order algorithms : computation of the first derivative of the criterion Second order algorithms : computation of the second derivative of the criterion
35 Zero order algorithms Simplex algorithm Grid search and Monte-Carlo methods
36 Simplex algorithm
37 Monte-carlo algorithm
38 First order algorithms Line search algorithm Conjugate gradient
39 First order algorithms The derivatives of the criterion cancel at its optima Suppose that there is only one parameter to estimate The criterion (e.g. SS) depends only on How to find the value(s) of where the criterion cancels ?
40 Line search algorithm Derivative of the criterion 1 2
41 Second order algorithms Gauss-Newton (steepest descent method) Marquardt
42 Second order algorithms The derivatives of the criterion cancel at its optima. When the criterion is (locally) convex there is a path to reach the minimum : the steepest direction.
43 Gauss Newton (one dimension) Derivative of the criterion The criterion is convex
44 Gauss Newton (one dimension) Derivative of the criterion The criterion is not convex 1 2
45 Gauss Newton
46 Marquardt Derivative of the criterion Allows to deal with the case where the criterion is not convex 12 When the second derivative <0 (first derivative decreases) it is set to a positive value 3
47 Balance sheet
48 Questions What does nonlinear mean ? –What is a nonlinear kinetics ? –What is a nonlinear statistical model ? For a given model, how to fit the data ? Is this model relevant ?
49 Is this model relevant ? Graphical inspection of the residuals –mean model ( f ) –variance model ( g ) Inspection of numerical results –variance-correlation matrix of the estimator – Akaike indice
50 Graphical inspection of the residuals For the model Calculate the weight residuals : and draw vs
51 Check the mean model scatterplot of weight residuals vs fitted values 0 No structure in the residuals OK 0 structure in the residuals change the mean model (f function)
52 Check the variance model : homoscedasticity Scatterplot of weight residuals vs fitted values 0 homoscedasticity OK No structure in the residuals but heteroscedasticity change the model (g function) 0
53 Example homoscedastic model Criterion : OLS
54 Example structure in the residuals change the mean model New model homoscedastic model
55 Example heteroscedasticity change the variance model New model Need WLS
56 Example No structure Weight residuals homoscedastic OK
57 Inspection of numerical results correlation matrix of the estimator Strong correlations between estimators : the model is over-parametrized the parametrization is not good the model is not identifiable
58 The model is over-parametrized Change the mean and/or variance model (f and/or g ) Example : The appropriate model is and you fitted Perform a test or check the AIC
59 The parametrization is not good Change the parametrization of your model Example : you fitted try Two useful indices : the parametric curvature the intrinsic curvature
60 The model is not identifiable The model has too many parameters compare to the number of data : there are lots of solutions to the optimisation Examples : criterion Look at the eigenvalues of the correlation matrix if is too large and/ortoo small, simplify the model
61 The Akaike indice The Akaike indice allows to select a model among several models in "competition". The Akaike indice is nothing else but the penalized log likelihood. That is, it chooses the model which is the more likely. The penality is chosen such that the indice is convergent : when the sample size increases, the indice selects the "true" model. n = sample size, SS = (Weight or Ordinary) SS p = number of parameters that have been estimated The model with the smaller AIC is the best among the compared models
62 Example IterationLoss
63 Example R = essentially intrinsic curvature
64 About the ellipsoid It is linked to the convexity of the criterion It is linked to the variance of the estimator The convexity of the criterion is linked to the variance of the estimator
65 Different degres of convexity flat criterion weakly convex convex criterion locally convex convex in some directions locally convex
66 How to measure convexity ? Calculate the hessian matrix matrix of partial second derivatives When the second derivative is positive, the criterion is convex at the point where the second derivative is evaluated One parameter Several parameters
67 How to measure convexity ? It is possible to find a linear transformation of the parameters such that the hessian matrix is are the eigenvalues of the hessian matrix When for all, and the criterion is convex
68 How to measure convexity ? When for some, and the criterion is locally convex What is the point for which and ? When and are low (but >0), the criterion is flat
69 The variance-covariance matrix The variance-covariance matrix of the estimator (denoted V) is proportional to It is possible to find a linear transformation of the parameters such that V is
70 The variance-covariance matrix are the eigenvalues of the variance-covariance matrix V
71 The correlation matrix The correlation matrix of the estimator (denoted C ) is obtained from V correlation matrix
72 Geometric interpretation criterion r = 0 Axes of the ellipsoid // axes